Course Curriculum
Module 1: Foundations of Data Science
Description: Data science is a multi-disciplinary field that uses scientific methods, processes,
algorithms and systems to extract knowledge and insights from structured and unstructured
data. In this first module we will introduce to the field of Data Science and how it relates to other
fields of data like Artificial Intelligence, Machine Learning and Deep Learning.
• Introduction to Data Science
• High level view of Data Science, Artificial Intelligence & Machine Learning
• Subtle differences between Data Science, Machine Learning & Artificial Intelligence
• Approaches to Machine Learning
• Terms & Terminologies of Data Science
• Understanding an end to end Data Science Pipeline, Implementation cycle
Module 2: Math for Data Science, Machine Learning and Artificial
Intelligence
Description: Mathematics is very important in the field of data science as concepts within
mathematics aid in identifying patterns and assist in creating algorithms. The understanding of
various notions of Statistics and Probability Theory are key for the implementation of such
algorithms in data science.
• Linear Algebra
• Matrices, Matrix Operations
• Eigen Values, Eigen Vectors
• Scalar, Vector and Tensors
• Prior and Posterior Probability
• Conditional Probability
• Calculus
• Differentiation, Gradient and Cost Functions
• Graph Theory
Module 3: Statistics for Data Science
Description: This module focuses on understanding statistical concepts required for Data
Science, Machine Learning and Deep Learning. In this module, you will be introduced to the
estimation of various statistical measures of a data set, simulating random distributions,
performing hypothesis testing, and building statistical models.
Descriptive Statistics
• Types of Data (Discrete vs Continuous)
• Types of Data (Nominal, Ordinal)
• Measures of Central Tendency (Mean, Median, Mode)
• Measures of Dispersion (Variance, Standard Deviation)
• Range, Quartiles, Inter Quartile Ranges
• Measures of Shape (Skewness and Kurtosis)
• Tests for Association (Correlation and Regression)
• Random Variables
• Probability Distributions
• Standard Normal Distribution
• Probability Distribution Function
• Probability Mass Function
• Cumulative Distribution Function
Inferential Statistics
• Statistical sampling & Inference
• Hypothesis Testing
• Null and Alternate Hypothesis
• Margin of Error
• Type I and Type II errors
• One Sided Hypothesis Test, Two-Sided Hypothesis Test
• Tests of Inference: Chi-Square, T-test, Analysis of Variance
• t-value and p-value
• Confidence Intervals
Module 4: Python for Data Science
Python for Data Science
• Numpy
• Pandas
• Matplotlib & Seaborn
• Jupyter Notebook
Numpy
NumPy is a Python library that works with arrays when performing scientific computing with
Python. Explore how to initialize and load data into arrays and learn about basic array
manipulation operations using NumPy.
• Loading data with Numpy
• Comparing Numpy with Traditional Lists
• Numpy Data Types
• Indexing and Slicing
• Copies and Views
• Numerical Operations with Numpy
• Matrix Operations on Numpy Arrays • Aggregations functions
• Shape Manipulations
• Broadcasting
• Statistical operations using Numpy
• Resize, Reshape, Ravel
• Image Processing with Numpy
Pandas
Pandas is a Python library that provides utilities to deal with structured data stored in the form of
rows and columns. Discover how to work with series and tabular data, including initialization,
population, and manipulation of Pandas Series and DataFrames.
• Basics of Pandas
• Loading data with Pandas
• Series
• Operations on Series
• DataFrames and Operations of DataFrames
• Selection and Slicing of DataFrames
• Descriptive statistics with Pandas
• Map, Apply, Iterations on Pandas DataFrame
• Working with text data
• Multi Index in Pandas
• GroupBy Functions
• Merging, Joining and Concatenating DataFrames
• Visualization using Pandas
Data Visualization using Matplotlib
Matplotlib is a plotting library for the Python programming language and its numerical mathematics extension NumPy. It provides an object-oriented API for embedding plots into applications using general-purpose GUI toolkits like Tkinter, wxPython, Qt, or GTK+
• Anatomy of Matplotlib figure
• Plotting Line plots with labels and colors
• Adding markers to line plots
• Histogram plots
• Scatter plots
• Size, Color and Shape selection in Scatter plots.
• Applying Legend to Scatter plots
• Displaying multiple plots using subplots
• Boxplots, scatter_matrix and Pair plots
Data Visualization using Seaborn
Seaborn is a data visualization library that provides a high-level interface for drawing graphs. These graphs are able to convey a lot of information, while also being visually appealing.
• Basic Plotting using Seaborn
• Violin Plots
• Box Plots
• Cat Plots
• Facet Grid
• Swarm Plot
• Pair Plot
• Bar Plot
• LM Plot
• Variations in LM plot using hue, markers, row and col
Module 5: Exploratory Data Analysis
Exploratory Data Analysis helps in identifying the patterns in the data by using basic statistical
methods as well as using visualization tools to displays graphs and charts. With EDA we can
assess the distribution of the data and conclude various models to be used.
Pipeline ideas
• Exploratory Data Analysis
• Feature Creation
• Evaluation Measures
Data Analytics Cycle ideas
• Data Acquisition
• Data Preparation
o Data cleaning
o Data Visualization
o Plotting
• Model Planning & Model Building
Data Inputting
• Reading and writing data to text files
• Reading data from a csv
• Reading data from JSON
Data preparation
• Selection and Removal of Columns
• Transform
• Rescale
• Standardize
• Normalize
• Binarize
• One hot Encoding
• Imputing
• Train, Test Splitting
Module 6: Machine Learning
In machine learning, computers apply statistical learning techniques to automatically identify patterns in data. This module on Machine Learning is a deep dive to Supervised, Unsupervised learning and Gaussian / Naive-Bayes methods. Also you will be exposed to different
classification, clustering and regression methods.
• Introduction to Machine Learning
• Applications of Machine Learning
• Supervised Machine Learning
o Classification
o Regression
• Unsupervised Machine Learning
• Reinforcement Learning
• Latest advances in Machine Learning
• Model Representation
• Model Evaluation
• Hyper Parameter tuning of Machine Learning Models.
• Evaluation of ML Models.
• Estimating and Prediction of Machine Learning Models
• Deployment strategy of ML Models.
Module 7: Supervised Machine Learning – Classification
Supervised learning is one of the most popular techniques in machine learning. In this module, you will learn about more complicated supervised learning models and how to use them to solve problems.
Classification methods & respective evaluation
• K Nearest Neighbors
• Decision Trees
• Naive Bayes
• Stochastic Gradient Descent
• SVM –
o Linear
o Non linear
o Radial Basis Function
• Random Forest
• Gradient Boosting Machines
• XGboos
Ensemble methods
• Combining models
• Bagging
• Boosting
• Voting
• Choosing best classification method
Model Tuning
• Train Test Splitting
• K-fold cross validation
• Variance bias tradeoff
• L1 and L2 norm
• Overfit, underfit along with learning curves variance bias sensibility using graphs
• Hyper Parameter Tuning using Grid Search CV
Respective Performance measures
• Different Errors (MAE, MSE, RMSE)
• Accuracy, Confusion Matrix, Precision, Recall
Module 8: Supervised Machine Learning – Regression
Regression is a type of predictive modelling technique which is heavily used to derive the relationship between variables (the dependent and independent variables). This technique finds its usage mostly in forecasting, time series modelling and finding the causal effect relationship between the variables. The module discusses in detail about regression and types of regression
and its usage & applicability
Regression
• Linear Regression
• Variants of Regression
o Lasso
o Ridge
• Multi Linear Regression
• Logistic Regression (effectively, classification only)
• Regression Model Improvement
• Polynomial Regression
• Random Forest Regression
• Support Vector Regression
Respective Performance measures
• Different Errors (MAE, MSE, RMSE)
• Mean Absolute Erro
Module 9: Unsupervised Machine Learning
Unsupervised learning can provide powerful insights on data without the need to
annotate examples. In this module, you will learn several different techniques in
unsupervised machine learning.
Clustering
• K means
• Hierarchical Clustering
• DBSCAN
Association Rule Mining
• Association Rule Mining.
• Market Basket Analysis using Apriori Algorithm
• Dimensionality reduction using Principal Component analysis (PCA)
Module 10: Natural Language Processing
Module 11: Advanced Analytics
Module 12: Reinforcement Learning
Module 13: Artificial Intelligence
Module 14: Deep Learning
Module 15: Cloud Computing for Data Science
Module 16: DevOps for Data Science