R for Data Science: Certification by iTrain Malaysia


4 Days (Beginner to Intermediate) + 3 (Advanced) | HRDF Claimable!

R for Data Science Course Overview

R is a programming language that is well-known for its power in statistical computing. The use of R in data science enables insights from data to be extracted, and these insights allow companies to get ahead of their competitors. This course provides an introduction to the fundamentals of R language, with a specific focus on how it can be used in Data Science. You’ll gain knowledge on how to gather data, and what you can do to work with it, starting from reading and writing, to manipulation and visualisation. Databases, the means by which data is stored, and Structured Query Language (SQL), the language that is used to interact with the database, will be covered as well. You’ll also be exposed to a wide range of topics including Big Data and its real-world applications, supervised learning, predictive analytics, exploratory data analysis, basic statistics, logistic regression, and data mining.

Learning Outcomes

Upon completion of the Beginner to Intermediate course, you will be able to:

  • Understand R language fundamentals, including basic syntax, variables, and types.
  • Create functions and use control flow.
  • Read and write data in R.
  • Work with data in R.
  • Create and customize visualizations using ggplot2.
  • Perform predictive analytics using R.

Upon completion of the Advanced course, you will be able to:

  • Get exposure on the applications of R.
  • Perform data manipulation using R.
  • Use various techniques to import data.
  • Understand exploratory data analysis.
  • Learn the basics of statistics and logistic regression.
  • Understand clustering techniques, regression, and classification in data mining.


Who Should Attend & Prerequisites

This workshop is intended for individuals who are interested in learning data science, or who want to begin their career as a data scientist. All participants should have a basic knowledge of programming in any language (Java, C, C++, Pascal, Fortran, Javascript, PHP, Python, etc.).

Course Outline


Beginner to Intermediate (4 days)

  • What is Data?
  • Types of Data?
  • What is Data Science?
  • Statistical Thinking
  • Knowledge Check
  • Lab Activity

  • Extract, Transform and Load (ETL)
  • Data Cleansing
  • Aggregation, Filtering, Sorting, Joining
  • Data Workflow
  • Knowledge Check
  • Lab Activity

  • Raw vs Tidy Data
  • Key Features of Data Quality
  • Maintenance of Data Quality
  • Data Profiling
  • Data Completeness and Consistency

  • Identify Problem
  • Define Question
  • Define Ideal Dataset
  • Obtain Data
  • Analyze Data
  • Interpret Results
  • Distribute Results
  • Knowledge Check

  • Types of Databases
  • Relational Databases
  • NoSQL
  • Hybrid Database
  • Knowledge Check
  • Lab Activity

  • Performing CRUD (Create, Retrieve, Update, Delete)
  • Designing a Real World Database
  • Normalizing a Table
  • Knowledge Check
  • Lab Activity

  • Obtain Data from Online Repositories
  • Import Data from Local File Formats (json, xml)
  • Import Data using Web API
  • Scrape Website for Data
  • Knowledge Check
  • Lab Activity

  • What is EDA?
  • Goals of EDA
  • The Role of Graphics
  • Handling Outliers Dimension Reduction

  • Features of R
  • Vectors
  • Matrices and Arrays
  • Data Frame
  • Input/Output

  • Linear Models and Regression
  • Multivariate Regression
  • Logistic Regression
  • Knowledge Check
  • Lab Activity

  • What is Prediction?
  • Sampling, Training Set, Testing Set
  • Constructing a Deceision Tree
  • Knowledge Check
  • Lab Activity

  • Choosing the Right Visualization
  • Plotting and Charting in R
  • Knowledge Check
  • Lab Activity

  • Using Markdown Language
  • Convert your Data into Slides
  • Data Presentation Techniques
  • The Pitfall of Data Analysis
  • Knowledge Check
  • Lab Activity
  • Group Presentation

  • What is Small Data?
  • What is Big Data?
  • Big Data Analytics vs Data Science
  • Key Elements in Big Data (3Vs)
  • Extracting Values from Big Data
  • Challenges in Big Data

  • Introducing Hadoop Ecosystem
  • Cloudera vs Hortonworks
  • Real World Big Data Applications
  • R/Hadoop
  • Knowledge Check
  • Group Discussion

Advanced (3 days)

  • Business Analytics, Data, Information
  • Understanding Business Analytics and R
  • Compare R with Other Software in Analytics
  • Install R
  • Perform Basic Operations in R using Command Line
  • Learn the Use of IDE R Studio
  • Use the 'R help' Feature in R

  • Variables in R
  • Scalars
  • Vectors
  • Matrices
  • List
  • Data Frames
  • Using C, Cbind, Rbind, Attach and Detach Functions in R
  • Factors

  • Data Sorting
  • Find and Remove Duplicate Records
  • Cleaning Data
  • Recoding Data
  • Slicing of Data
  • Merging Data
  • Apply Functions

  • Reading Data
  • Writing Data
  • Basic SQL Queries in R
  • Web Scraping

  • Box Plot
  • Histogram
  • Pareto Charts
  • Pie Graph
  • Line Chart
  • Scatter Plot
  • Developing Graphs

  • Basics of Statistics
  • Inferencial Statistics
  • Probability
  • Hypothesis
  • Standard Deviation
  • Outliers
  • Correlation
  • Linear and Logistic Regression

  • Introduction to Data Mining
  • Understanding Machine Learning
  • Supervised and Unsupervised Machine Learning Algorithms
  • K-means Clustering

“Learned something new and discovered more than expected.”

Chuo Sing Bic, IT Manager, Redtone International Berhad

“Learned to manipulate R language that can build charts from zero, my objectives were met!”

Heng Jin Wei, Manager, Globalknox Sdn Bhd

“Great trainer as he explained the concept in detail and provided examples.”

Sharifah Fazlinda, Assistant Manager, Sime Darby Holdings Berhad

“Learned something new to apply to my job. Yes it is fulfilling!”

Ch’ng Ping Choon, Senior Engineering Specialist Manager, Measat Satellite Systems



Students will be given a Certificate of Attendance after successfully completing the course.

You bet it is! Our Certification Body for this course is iTrain Asia Pte Ltd, the region’s top Certifications Tech Provider headquartered in Singapore, with branch offices in Malaysia and Indonesia.

Upon completion of this course, you will be able to:

● Explain data science concepts and grasp the basics of R language
● Apply fundamentals of R language to a list of practical problems related to data science

This is a 4-day course for Beginner to Intermediate level and a 3-day course for Advanced level at an instructor-led training centre.

Computers are provided for iTrain students. However participants can also use their own computers as long as it’s installed with the necessary applications.

Trusted By Public, Private & Education Sectors