Day 1
Introduction to Data Science (2 hrs)
• What is Data?
• Types of Data
• What is Data Science?
• Knowledge Check
• Lab Activity
Data Science Workflow (2 hrs)
• Data Gathering
• Data Preparation & Cleansing
• Data Analysis – Descriptive, Predictive, and Prescriptive
• Data Visualization and Model Deployment
• Knowledge Check
Life of a data scientist (2 hrs)
• What is a Data Scientist?
• Data Scientist Roles
• What does a Data Scientist Look Like?
• T-Shaped Skillset
• Data Scientist Roadmap
• Data Scientist Education Framework
• Thinking like a Data Scientist
• Knowns and Unknowns
• Demand and Opportunity
• Labor Market
• Applications of Data Science
• Data Science Principles
• Data-Driven Organization
• Developing Data Products
• Knowledge Check
Data Gathering (2 hrs)
• Obtain data from online repositories
• Import data from local file formats (json, xml)
• Import data using Web API
• Scrape website for data
• Knowledge check
Day 2
Data Science Prerequisites (2 hrs)
• Probability and Statistics
• Linear Algebra
• Calculus
• Combinatorics
Beginning Databases (1.5 hrs)
• Types of Databases
• Relational Databases
• NoSQL
• Hybrid database
• Knowledge Check Lab activity
Structured Query Language (SQL) (2 hrs)
• Performing CRUD (Create, Retrieve,Update, Delete)
• Designing a Real world database
• Normalizing a table
• Knowledge Check Lab Activity
Introduction to Python (2 hrs)
• Basics of Python language
• Functions and packages
• Python lists
• Functional programming in Python
• Numpy and Scipy
• iPython
• Knowledge check
• Lab Activity
• Lab: Exploring data using Python
Day 3
Data Preparation and Cleansing (2 hrs)
• Extract, Transform and Load (ETL)
– Pentaho, Talend, etc
• Data Cleansing with OpenRefine
• Aggregation, Filtering, Sorting, Joining
• Knowledge Check Lab Activity
Introduction to R (2 hrs)
• Packages for data import, wrangling, and visualization
• Conditionals and Control Flow
• Loops and Functions
• Knowledge check
• Lab activity
• Lab: Exploring data using R
Exploratory Data Analysis (Descriptive) (2 hrs)
• What is EDA?
• Goals of EDA
• The role of graphics
• Handling outliers
• Dimension reduction
Data Quality (2 hrs)
• Raw vs Tidy Data
• Key Features of Data Quality
• Maintenance of Data Quality
• Data Profiling
• Data Completeness and Consistency
Day 4
Machine Learning (Predictive) (2 hrs)
• Bayes Theorem
• Information Theory
• NLP
• Statistical Algorithms
• Stochastic Algorithms
Introduction to Text Mining (3.5 hrs)
• What is Text Mining?
• Natural Language Processing
• Pre-processing text data
• Extracting features from
documents
• Using BeautifulSoup
• Measuring document similarity
• Knowledge check Lab activity
Supervised, Unsupervised, and
Semi-supervised Learning
(2.5 hrs)
• What is prediction?
• Sampling, training set, testing set.
• Constructing a decision tree
• Knowledge check Lab Activity
Day 5
Data Visualization (2 hrs)
• Choosing the right visualization
• Plotting data using Python libraries
• Plotting data using R
• Using Jupyter Notebook to validate scripts
• Knowledge check
• Lab Activity
Big Data Landscape (1.5 hrs)
• What is small data?
• What is big data?
• Big data analytics vs Data Science
• Key elements in Big Data (3Vs)
• Extracting values from big data
• Challenges in Big data
Data Analysis Presentation (2 hrs)
• Using Markdown language
• Convert your data into slides
• Data presentation techniques
• The pitfall of data analysis
• Knowledge check
• Lab Activity
• Group presentation Lab: Mini Project
Big data Tools and Applications (2 hrs)
• Introducing Hadoop Ecosystem
• Cloudera vs Hortonworks
• Real world big data applications
• Knowledge check
• Group discussion
What’s Next? (0.5 hrs)
• Preview of Data Science Specialist
• Showing advanced data analysis techniques
• Demo: Interactive visualizations