Certified R Programmer for Data Science: Certification by iTrain Malaysia
Online l_earning banners-01 (1)

Certified R Programmer for Data Science


Certified R Programmer for Data Science: Course Overview

R is a programming language that is well-known for its power in statistical computing. The use of R in Data Science enables insights from data to be extracted, and these insights allow companies to get ahead of their competitors. This course provides an introduction to the fundamentals of R language, with a specific focus on how it can be used in Data Science.

You’ll gain knowledge on how to gather data, and what you can do with it, starting from reading and cleansing, to manipulation and visualisation. You’ll also be exposed to a wide range of topics including Big Data and data analytics lifecycle, exploratory data analysis and Shiny R package.

Learning Outcomes

Upon completion of this course, you will be able to:

  • Understand R language fundamentals, including basic syntax, variables, and types.
  • Create functions and use control flow.
  • Read and write data in R.
  • Work with data in R.
  • Create and customise visualisations using ggplot2.
  • Perform predictive analytics using R.


Who Should Attend & Prerequisites

This workshop is intended for individuals who are interested in learning Data Science, or who want to begin their career as a data scientist. All participants should have a basic statistical knowledge with some experience in programming but no specific language is required for this course.

Course Outline


Beginner to Intermediate (5 Days)

Day 1

  • What is Data?
  • Why Data Collection is Important
  • Types of Data
  • What is Data Science?
  • Characteristics of Big Data - The Three V's of Big Data
  • Big Data Analytics and its Types

  • Data Analytics Lifecycle Overview
  • Detailed Explanation on Data Analytics Lifecycle

  • Variables
  • Constants
  • Keywords
  • Comments
  • Syntax

  • What is R?
  • Install R and RStudio
  • Explore RStudio Interface (With Lab Exercises)
  • Comments
  • Syntax

  • Numbers
  • String
  • Vector
  • Matrix
  • Arrays

Day 2

  • Data Frames
  • Lists
  • Factor (With Lab Exercises)

  • Conditional Statements
  • Looping Statements
  • Operators
  • Functions Syntax
  • Scoping Rules
  • Subsetting
  • Apply Functions (lapply, sapply, vapply)
  • Debugging Tools
  • Split Function

  • Date Time Representation
  • Date Time Arithmetic
  • Date Time Comparison

Day 3

  • Reading Data from CSV File
  • Reading Data from JSON File
  • Reading Data from XML File
  • Reading Data from Web

  • Extract, Transform and Load (ETL)
  • Data Cleansing
  • Aggregation, Filtering, Sorting, Joining
  • Dealing with Missing Data
  • Selecting Columns and Rows
  • Data Wrangling
  • Summarise and Group By

  • Random Sampling
  • Generate Random Numbers
  • R Profiler

  • What Is Visualisation?
  • Need of Visualisation
  • Types of Visualisation
  • How to Handle the Properties for Chart Creation
  • Activity

Day 4

  • Scatter Plots
  • Boxplots
  • Bar Charts
  • Pie Charts
  • Histograms

  • Getting Started with ggplot
  • Mapping Color, Shape and Size
  • Creating Attractive Color Scheme
  • Creating Bar Charts
  • Creating Box Plots

  • Correlation
  • Deviation
  • Ranking
  • Distribution
  • Composition
  • Time Series Plots
  • Groups
  • Spatial

Day 5

  • Introduction
  • How to Build a Simple Shiny Module?

Advanced (4 Days)

Day 1

  • Analysing Mean, Median, Mode
  • Variability
  • Distributions
  • Asymptotics
  • Confidence Intervals
  • Hypothesis Testing
  • P-Values
  • Bivariate Correlation
  • Autocorrelation Test - Durbin- Watson Test, Newey-West Estimator
  • Outlier Test - Cook’s Distance, Studentised Residuals
  • Normality Test - Kolmogorov- Smirnov, Cramer-von-Mises, Anderson Darling, Jarque- Bera Test
  • T-Test
  • Non Parametric Testing- Wilcoxon, Mann-Whitney, Kruskal-Wallis
  • Chi-Squared Test
  • Stationarity Test - Augmented Dickey-Fuller Test, Seasonal Augmented Dickey-Fuller Test
  • Shapiro Wilk, Mobility Matrix
  • Multicollinearity Test - Pearson’s Correlation - Variance Inflation Factor
  • Linearity Test
  • Heteroscedasticity Test - White Test, Breusch-Pagan Test
  • Regression Test - Out of Time Test

Day 2

  • What is Machine Learning?
  • Types of Machine Learning Algorithms

  • Understanding the Collected Data with Statistics
  • Understanding Data with Visualisation
  • Data Preparation

  • Linear Regression
  • Multiple Regression
  • Lab Exercise

  • Introduction to Logistic Regression
  • Logistic Regression in R
  • Lab Exercise

  • Support Vector Machines
  • Random Forest

  • Introduction to Naive Bayes Algorithm
  • Naive Bayes Algorithms Working Principles
  • Applications of Naive Bayes Algorithm
  • Pros and Cons of Using Naive Bayes
  • Steps to Build a Basic Naive Bayes Model

Day 3

  • Introduction
  • Types of Decision Tree
  • Regression Trees vs Classification Trees
  • Algorithm for Decision Tree
  • Advantages and Disadvantages
  • Tree-Based Model Vs Linear Model

  • Introduction
  • Reading Time Series Data
  • Plotting Time Series
  • Decomposing Time Series
  • Forecasts Using Exponential Smoothing
  • ARIMA Models

  • Introduction to Text Analysis
  • Reading the Data
  • Create a Corpus and Term Document Matrix
  • Sentiment Analysis
  • Wordclouds
  • N-gram Analysis
  • Network Analysis

  • Finding Nearest Neighbors
  • Performance Metrics
  • Automatic Workflows

Day 4

  • The Basics of Neural Network
  • Fitting Neural Network in R
  • Cross-Validation of a Neural Network

“Learned something new and discovered more than expected.”

Chuo Sing Bic, IT Manager, Redtone International Berhad

“Learned to manipulate R language that can build charts from zero, my objectives were met!”

Heng Jin Wei, Manager, Globalknox Sdn Bhd

“Great trainer as he explained the concept in detail and provided examples.”

Sharifah Fazlinda, Assistant Manager, Sime Darby Holdings Berhad

“Learned something new to apply to my job. Yes, it is fulfilling!”

Ch’ng Ping Choon, Senior Engineering Specialist Manager, Measat Satellite Systems



Students will be given a Certificate of Attendance after successfully completing the course.

You bet it is! Our Certification Body for this course is iTrain Asia Pte Ltd, the region’s top Certifications Tech Provider headquartered in Singapore, with branch offices in Malaysia and Indonesia.

Upon completion of this course, you will be able to:

● Explain data science concepts and grasp the basics of R language
● Apply fundamentals of R language to a list of practical problems related to data science

This is a 5-day course for Beginner to Intermediate and 4-day course for Advanced at an instructor-led training centre.

Computers are provided for iTrain students. However participants can also use their own computers as long as it’s installed with the necessary applications.

Trusted By Public, Private & Education Sectors