#### Course Curriculum

**Module 1: Foundations of Data Science**

Description: Data science is a multi-disciplinary field that uses scientific methods, processes,

algorithms and systems to extract knowledge and insights from structured and unstructured

data. In this first module we will introduce to the field of Data Science and how it relates to other

fields of data like Artificial Intelligence, Machine Learning and Deep Learning.

• Introduction to Data Science

• High level view of Data Science, Artificial Intelligence & Machine Learning

• Subtle differences between Data Science, Machine Learning & Artificial Intelligence

• Approaches to Machine Learning

• Terms & Terminologies of Data Science

• Understanding an end to end Data Science Pipeline, Implementation cycle

**Module 2: Math for Data Science, Machine Learning and Artificial**

Intelligence

Description: Mathematics is very important in the field of data science as concepts within

mathematics aid in identifying patterns and assist in creating algorithms. The understanding of

various notions of Statistics and Probability Theory are key for the implementation of such

algorithms in data science.

• Linear Algebra

• Matrices, Matrix Operations

• Eigen Values, Eigen Vectors

• Scalar, Vector and Tensors

• Prior and Posterior Probability

• Conditional Probability

• Calculus

• Differentiation, Gradient and Cost Functions

• Graph Theory

**Module 3: Statistics for Data Science**

Description: This module focuses on understanding statistical concepts required for Data

Science, Machine Learning and Deep Learning. In this module, you will be introduced to the

estimation of various statistical measures of a data set, simulating random distributions,

performing hypothesis testing, and building statistical models.

**Descriptive Statistics**

• Types of Data (Discrete vs Continuous)

• Types of Data (Nominal, Ordinal)

• Measures of Central Tendency (Mean, Median, Mode)

• Measures of Dispersion (Variance, Standard Deviation)

• Range, Quartiles, Inter Quartile Ranges

• Measures of Shape (Skewness and Kurtosis)

• Tests for Association (Correlation and Regression)

• Random Variables

• Probability Distributions

• Standard Normal Distribution

• Probability Distribution Function

• Probability Mass Function

• Cumulative Distribution Function

**Inferential Statistics**

• Statistical sampling & Inference

• Hypothesis Testing

• Null and Alternate Hypothesis

• Margin of Error

• Type I and Type II errors

• One Sided Hypothesis Test, Two-Sided Hypothesis Test

• Tests of Inference: Chi-Square, T-test, Analysis of Variance

• t-value and p-value

• Confidence Intervals

**Module 4: Python for Data Science**

Python for Data Science

• Numpy

• Pandas

• Matplotlib & Seaborn

• Jupyter Notebook

**Numpy**

NumPy is a Python library that works with arrays when performing scientific computing with

Python. Explore how to initialize and load data into arrays and learn about basic array

manipulation operations using NumPy.

• Loading data with Numpy

• Comparing Numpy with Traditional Lists

• Numpy Data Types

• Indexing and Slicing

• Copies and Views

• Numerical Operations with Numpy

• Matrix Operations on Numpy Arrays • Aggregations functions

• Shape Manipulations

• Broadcasting

• Statistical operations using Numpy

• Resize, Reshape, Ravel

• Image Processing with Numpy

**Pandas**

Pandas is a Python library that provides utilities to deal with structured data stored in the form of

rows and columns. Discover how to work with series and tabular data, including initialization,

population, and manipulation of Pandas Series and DataFrames.

• Basics of Pandas

• Loading data with Pandas

• Series

• Operations on Series

• DataFrames and Operations of DataFrames

• Selection and Slicing of DataFrames

• Descriptive statistics with Pandas

• Map, Apply, Iterations on Pandas DataFrame

• Working with text data

• Multi Index in Pandas

• GroupBy Functions

• Merging, Joining and Concatenating DataFrames

• Visualization using Pandas

**Data Visualization using Matplotlib**

Matplotlib is a plotting library for the Python programming language and its numerical mathematics extension NumPy. It provides an object-oriented API for embedding plots into applications using general-purpose GUI toolkits like Tkinter, wxPython, Qt, or GTK+

• Anatomy of Matplotlib figure

• Plotting Line plots with labels and colors

• Adding markers to line plots

• Histogram plots

• Scatter plots

• Size, Color and Shape selection in Scatter plots.

• Applying Legend to Scatter plots

• Displaying multiple plots using subplots

• Boxplots, scatter_matrix and Pair plots

**Data Visualization using Seaborn**

Seaborn is a data visualization library that provides a high-level interface for drawing graphs. These graphs are able to convey a lot of information, while also being visually appealing.

• Basic Plotting using Seaborn

• Violin Plots

• Box Plots

• Cat Plots

• Facet Grid

• Swarm Plot

• Pair Plot

• Bar Plot

• LM Plot

• Variations in LM plot using hue, markers, row and col

**Module 5: Exploratory Data Analysis**

Exploratory Data Analysis helps in identifying the patterns in the data by using basic statistical

methods as well as using visualization tools to displays graphs and charts. With EDA we can

assess the distribution of the data and conclude various models to be used.

**Pipeline ideas**

• Exploratory Data Analysis

• Feature Creation

• Evaluation Measures

**Data Analytics Cycle ideas**

• Data Acquisition

• Data Preparation

o Data cleaning

o Data Visualization

o Plotting

• Model Planning & Model Building

**Data Inputting**

• Reading and writing data to text files

• Reading data from a csv

• Reading data from JSON

**Data preparation**

• Selection and Removal of Columns

• Transform

• Rescale

• Standardize

• Normalize

• Binarize

• One hot Encoding

• Imputing

• Train, Test Splitting

**Module 6: Machine Learning**

In machine learning, computers apply statistical learning techniques to automatically identify patterns in data. This module on Machine Learning is a deep dive to Supervised, Unsupervised learning and Gaussian / Naive-Bayes methods. Also you will be exposed to different

classification, clustering and regression methods.

• Introduction to Machine Learning

• Applications of Machine Learning

• Supervised Machine Learning

o Classification

o Regression

• Unsupervised Machine Learning

• Reinforcement Learning

• Latest advances in Machine Learning

• Model Representation

• Model Evaluation

• Hyper Parameter tuning of Machine Learning Models.

• Evaluation of ML Models.

• Estimating and Prediction of Machine Learning Models

• Deployment strategy of ML Models.

**Module 7: Supervised Machine Learning – Classification**

Supervised learning is one of the most popular techniques in machine learning. In this module, you will learn about more complicated supervised learning models and how to use them to solve problems.

Classification methods & respective evaluation

• K Nearest Neighbors

• Decision Trees

• Naive Bayes

• Stochastic Gradient Descent

• SVM –

o Linear

o Non linear

o Radial Basis Function

• Random Forest

• Gradient Boosting Machines

• XGboos

**Ensemble methods**

• Combining models

• Bagging

• Boosting

• Voting

• Choosing best classification method

**Model Tuning**

• Train Test Splitting

• K-fold cross validation

• Variance bias tradeoff

• L1 and L2 norm

• Overfit, underfit along with learning curves variance bias sensibility using graphs

• Hyper Parameter Tuning using Grid Search CV

**Respective Performance measures**

• Different Errors (MAE, MSE, RMSE)

• Accuracy, Confusion Matrix, Precision, Recall

**Module 8: Supervised Machine Learning – Regression**

Regression is a type of predictive modelling technique which is heavily used to derive the relationship between variables (the dependent and independent variables). This technique finds its usage mostly in forecasting, time series modelling and finding the causal effect relationship between the variables. The module discusses in detail about regression and types of regression

and its usage & applicability

**Regression**

• Linear Regression

• Variants of Regression

o Lasso

o Ridge

• Multi Linear Regression

• Logistic Regression (effectively, classification only)

• Regression Model Improvement

• Polynomial Regression

• Random Forest Regression

• Support Vector Regression

Respective Performance measures

• Different Errors (MAE, MSE, RMSE)

• Mean Absolute Erro

**Module 9: Unsupervised Machine Learning**

Unsupervised learning can provide powerful insights on data without the need to

annotate examples. In this module, you will learn several different techniques in

unsupervised machine learning.

**Clustering**

• K means

• Hierarchical Clustering

• DBSCAN

**Association Rule Mining**

• Association Rule Mining.

• Market Basket Analysis using Apriori Algorithm

• Dimensionality reduction using Principal Component analysis (PCA)

Module 10: Natural Language Processing

Module 11: Advanced Analytics

Module 12: Reinforcement Learning

Module 13: Artificial Intelligence

Module 14: Deep Learning

Module 15: Cloud Computing for Data Science

Module 16: DevOps for Data Science