R Programming for Data Science: Certification by iTrain Malaysia

R Programming for Data Science

HRDF Claimable! | 4 Days (Beginner to Intermediate) + 3 (Advanced)

R Programming for Data Science: Course Overview

R is a programming language that is well-known for its power in statistical computing. The use of R Programming for Data Science enables insights from data to be extracted, and these insights allow companies to get ahead of their competitors. This course provides an introduction to the fundamentals of R language, with a specific focus on how it can be used in Data Science. You’ll gain knowledge on how to gather data, and what you can do to work with it, starting from reading and writing, to manipulation and visualisation. Databases, the means by which data is stored, and Structured Query Language (SQL), the language that is used to interact with the database, will be covered as well. You’ll also be exposed to a wide range of topics including Big Data and its real-world applications, supervised learning, predictive analytics, exploratory data analysis, basic statistics, logistic regression, and data mining.

Learning Outcomes

Upon completion of the Beginner to Intermediate course, you will be able to:

  • Understand R language fundamentals, including basic syntax, variables, and types.
  • Create functions and use control flow.
  • Read and write data in R.
  • Work with data in R.
  • Create and customise visualisations using ggplot2.
  • Perform predictive analytics using R.


Who Should Attend & Prerequisites

This workshop is intended for individuals who are interested in learning data science, or who want to begin their career as a data scientist. All participants should have a basic Statistical knowledge with some experience in programming but no specific language is required for this course.

Course Outline


Beginner to Intermediate (4 Days)

Day 1

  • What Is Data
  • Why Data Collection Is Important
  • Types of Data
  • What Is Data Science
  • Activity

  • Introduction
  • Big Data Overview
  • Characteristics of Big Data - The Three V's of Big Data
  • Sources of Big Data
  • Big Data Analytics
  • Data Science vs. Big data vs. Data Analytics
  • Skills of Data Scientist vs. Big Data Professional vs. Data Analyst

  • Data Analytics Life Cycle Overview
  • Value of Using Data Analytics Life Cycle
  • Detailed Explanation on Data Analytics Life Cycle

  • Variables
  • Constants
  • Keywords
  • Comments
  • Syntax

  • What Is R
  • Install R and RStudio
  • Explore RStudio Interface (With Lab Exercises)

Day 2

  • Numbers
  • String
  • Vector
  • Matrix
  • Array
  • Data Frame
  • List
  • Factor (With Lab Exercises)

  • Conditional
  • Loop
  • Break & Next (With Lab Exercises)
  • Operators
  • Function Syntax
  • Default Arguments (With Lab Exercises)
  • Apply Functions (lapply, sapply, vapply)

  • Date Time Representation
  • Date Time Arithmetic
  • Date Time Comparison

Day 3

  • Reading Data from CSV File
  • Reading Data from JSON File
  • Reading Data from XML File
  • Reading Data from Web

  • Data Cleansing
  • Aggregation, Filtering, Sorting, Joining
  • Dealing with Missing Data
  • Selecting Columns and Rows
  • Data Wrangling
  • Summarise and Group By

  • What Is Visualisation
  • Importance of Visualisation
  • Types of Visualisation
  • How to Handle the Properties for Chart Creation
  • Activity

Day 4

  • Scatter Plot
  • Boxplot
  • Bar Chart
  • Pie Chart
  • Histogram
  • Activity: Apply Various Charts to the Dataset

  • Getting Started with ggplot
  • Mapping Color, Shape and Size
  • Creating Attractive Color Scheme
  • Creating Bar Charts
  • Creating Box Plots (With Lab Exercises)

  • Analysing Mean, Median, Mode with the Chart
  • Measuring Normal Distribution
  • Measuring Binomial Distribution
  • Measuring Poisson Distribution
  • Measuring Covariance
  • Measuring Chi Square
  • Measuring Linear Regression
  • Measuring Multiple Regression
  • Measuring Logistic Regression
  • Measuring Decision Tree
  • Measuring Survival Analysis
  • Measuring Random Forest

Advanced (3 Days)

Day 1

  • What is Machine Learning?
  • Types of Machine Learning
  • Supervised and Unsupervised Learning
  • Sampling, Training Set, Testing Set

  • What is Prediction
  • Need for Prediction
  • Test Harness
  • Apply Different models on the Dataset

  • K-Means Clustering
  • Exploring the data
  • Clustering on IRIS Dataset
  • Lab Exercise
  • Association Rules
  • The titanic dataset
  • Association Rule Mining
  • Pruning Redundant Rules
  • Visualizing Association Rules
  • Lab Exercise

  • Linear Regression
  • Multiple Regression
  • Lab Exercise

  • Introduction to Logistic Regression
  • Logistic Regression in R
  • Bar Chart
  • Lab Exercise

Day 2

  • Introduction to Naive Bayes algorithm
  • Naive Bayes Algorithms working principles
  • Applications of Naive Bayes Algorithm
  • Pros and Cons of using Naive Bayes
  • Steps to build a basic Naive Bayes Model in R-LAB Session

  • Introduction
  • Types of decision Tree
  • Regression Trees vs Classification Trees
  • Algorithm for Decision Tree
  • Advantages & Disadvantages
  • Tree based Model Vs Linear Model

  • Introduction
  • Reading Time Series data
  • Plotting Time Series
  • Decomposing Time Series
  • Forecasts using Exponential Smoothing
  • ARIMA Models

Day 3

  • Introduction to Text Analysis
  • Reading the Data
  • Create a Corpus and Term Document Matrix
  • Some Basic Analyses of Term Frequencies
  • Cluster Analysis
  • Reading a Representative Subset of the File
  • Comparing Term Frequencies

  • Check the accuracy test by using the selected Model
  • Iterate the process if required

“Learned something new and discovered more than expected.”

Chuo Sing Bic, IT Manager, Redtone International Berhad

“Learned to manipulate R language that can build charts from zero, my objectives were met!”

Heng Jin Wei, Manager, Globalknox Sdn Bhd

“Great trainer as he explained the concept in detail and provided examples.”

Sharifah Fazlinda, Assistant Manager, Sime Darby Holdings Berhad

“Learned something new to apply to my job. Yes it is fulfilling!”

Ch’ng Ping Choon, Senior Engineering Specialist Manager, Measat Satellite Systems



Students will be given a Certificate of Attendance after successfully completing the course.

You bet it is! Our Certification Body for this course is iTrain Asia Pte Ltd, the region’s top Certifications Tech Provider headquartered in Singapore, with branch offices in Malaysia and Indonesia.

Upon completion of this course, you will be able to:

● Explain data science concepts and grasp the basics of R language
● Apply fundamentals of R language to a list of practical problems related to data science

This is a 4-day course for Beginner to Intermediate level at an instructor-led training centre.

Computers are provided for iTrain students. However participants can also use their own computers as long as it’s installed with the necessary applications.

Trusted By Public, Private & Education Sectors