Understanding the Mathematics behind the Data Science Engineering

Understanding the Mathematics behind the Data Science Engineering

Data science Engineering, is a culmination of Science, Mathematics, Huge Data Sets, Programming skills on working with Large Data Sets and developing a solution that can be run on particular hardware or cloud infrastructure efficiently.

Data Science problems take up huge chunks of computing resources and can also take considerable time to output a result, even running on Supercomputer. This also means if you are running a cloud application with a small Machine Learning or Data Analytics plugin, its sufficient enough to take up the entire computational core, memory and deadlock the application for a long time, and a huge bill end of the month. The importance of understanding the mathematics under the hood is fine-tuning a mathematical formula to fit the problem in hand and provide the best possible efficient solution.

In mathematics, there are numerous ways to solve a problem, and it’s always up to the mathematicians to come out with the best solution that can work problems with the same patterns.

As mentioned earlier, data science engineering, being a combination of technologies and science at work, efficiency is distributed over layers of technology, but the foremost importance is the design of the solution itself, which comes from mathematical modeling.

The following is a list of mathematical theories, techniques, and methods widely used in Datascience with their application scenarios.

TopicCommonly Used TheoriesApplications
Functions, Variables, Equations, and GraphsLinear Regression
Cost functions
Data plotting
Pattern Identification
Forecasting
StatisticsProbability

Statistical inference

Validation

Estimates of error

confidence intervals

Prediction
Pattern Recognition
Linear AlgebraMatrices
Vectors
Tensors
Dimensionality reduction techniques
Hugely useful for compact representation of linear transformations on data transformations on data – Dimensionality reduction techniques
CalculusDifferentials
Integrals
Partial Derivatives
Partial Integrals
Multivariate Calculus
Integral Transforms
Functions of single variable, limit, continuity and differentiability,

Mean value theorems, indeterminate forms and L’Hospital rule,

Maxima and minima,

Product and chain rule,

Taylor’s series, infinite series summation/integration concepts

Fundamental and mean value-theorems of integral calculus, evaluation of definite and improper integrals,

Beta and Gamma functions,

Functions of multiple variables, limit, continuity, partial derivatives,

Basics of ordinary and partial differential equations (not too advanced)

Anywhere there is rate of change of data, curves, signals.

 

Discrete MathSets, subsets, power sets

Counting functions, combinatorics, countability

Basic Proof Techniques — induction, proof by contradiction

Basics of inductive, deductive, and propositional logic

Basic data structures- stacks, queues, graphs, arrays, hash tables, trees

Graph properties — connected components, degree, maximum flow/minimum cut concepts, graph coloring

Recurrence relations and equations

Growth of functions and O(n) notation concept

Classification Problems, Data Predictions, Data Structure Design, Cryptography, Encryption, Decryption
Numerical Optimization and Operation ResearchMaxima, minima, convex function, global solution

Linear programming, simplex algorithm

Integer programming

Constraint programming, knapsack problem

Logistics
Supply chain management
Scaling
Optimization

 

Fortunately for data science engineering, almost all mathematical formulations have been implemented in libraries like numpy, scypy, scikit, scikit-learn etc. Knowing the mathematics will help in selecting the right library and making the best use of it.

Here in our articles we will cover the use of various mathematical techniques and concepts in real world data science scenarios. Although its best and easy to use software like Octave and Matlab to play around with the concept, we will go by a more realistic approach, doing it the hardcoding way using python and its libraries.

Phone: 512-539-0390
NJ Training Academy Inc , 405 Dry Gulch Bend
Cedar Park, Texas, 78613