Data science Engineering, is a culmination of Science, Mathematics, Huge Data Sets, Programming skills on working with Large Data Sets and developing a solution that can be run on particular hardware or cloud infrastructure efficiently.
Data Science problems take up huge chunks of computing resources and can also take considerable time to output a result, even running on Supercomputer. This also means if you are running a cloud application with a small Machine Learning or Data Analytics plugin, its sufficient enough to take up the entire computational core, memory and deadlock the application for a long time, and a huge bill end of the month. The importance of understanding the mathematics under the hood is fine-tuning a mathematical formula to fit the problem in hand and provide the best possible efficient solution.
In mathematics, there are numerous ways to solve a problem, and it’s always up to the mathematicians to come out with the best solution that can work problems with the same patterns.
As mentioned earlier, data science engineering, being a combination of technologies and science at work, efficiency is distributed over layers of technology, but the foremost importance is the design of the solution itself, which comes from mathematical modeling.
The following is a list of mathematical theories, techniques, and methods widely used in Datascience with their application scenarios.
|Topic||Commonly Used Theories||Applications|
|Functions, Variables, Equations, and Graphs||Linear Regression|
Estimates of error
Dimensionality reduction techniques
|Hugely useful for compact representation of linear transformations on data transformations on data – Dimensionality reduction techniques|
Functions of single variable, limit, continuity and differentiability,
Mean value theorems, indeterminate forms and L’Hospital rule,
Maxima and minima,
Product and chain rule,
Taylor’s series, infinite series summation/integration concepts
Fundamental and mean value-theorems of integral calculus, evaluation of definite and improper integrals,
Beta and Gamma functions,
Functions of multiple variables, limit, continuity, partial derivatives,
Basics of ordinary and partial differential equations (not too advanced)
|Anywhere there is rate of change of data, curves, signals.|
|Discrete Math||Sets, subsets, power sets|
Counting functions, combinatorics, countability
Basic Proof Techniques — induction, proof by contradiction
Basics of inductive, deductive, and propositional logic
Basic data structures- stacks, queues, graphs, arrays, hash tables, trees
Graph properties — connected components, degree, maximum flow/minimum cut concepts, graph coloring
Recurrence relations and equations
Growth of functions and O(n) notation concept
|Classification Problems, Data Predictions, Data Structure Design, Cryptography, Encryption, Decryption|
|Numerical Optimization and Operation Research||Maxima, minima, convex function, global solution|
Linear programming, simplex algorithm
Constraint programming, knapsack problem
Supply chain management
Fortunately for data science engineering, almost all mathematical formulations have been implemented in libraries like numpy, scypy, scikit, scikit-learn etc. Knowing the mathematics will help in selecting the right library and making the best use of it.
Here in our articles we will cover the use of various mathematical techniques and concepts in real world data science scenarios. Although its best and easy to use software like Octave and Matlab to play around with the concept, we will go by a more realistic approach, doing it the hardcoding way using python and its libraries.