How to be a Data Scientist - Full ROADMAP

 



So you chose CSE in college and after some years it came the time to choose the specialization you want, while it's cool to hear specialization the fact remains that you dread making the wrong decision, should you choose web development, Data Science, game design or others like cyber security, etc.


You have heard everywhere "Data is the new oil " or sentences like "21st century is a data-driven economy" the buzzwords you hear are ML, AI & Deep Learning that brought exciting fields and technological advancements like image recognition.


But you don't know a lot about Data Science and it is understandable after all it isn't talked about an in-depth lot what does it mean to be a data scientist? what would you do? and although that is a topic for another post we shall briefly understand these doubts and then comprehend how to be one? 


 -------What does it mean to be a data scientist?

To be a data scientist is being able to gather data, clean and format data, and apply statistics, mathematics, and algorithms to get keen and actionable insights. which then has to be presented using  data visualization that makes the insights easy to ascertain for individuals that aren't of the field.

Now I know right, kind of a buzzkill this definition!
Don't worry, It basically means that a data scientist does the following most of the time
College Integral




Skills required to be a data scientist include:

Mathematics-
Any aspiring data scientist must have a great understanding of mathematics. Mathematics is in most cases a pre-requisite to any engineering field.
Various ML algorithms like regression and classification and other topics like algebra & calculus are also required.

Statistics & Probability-
This sub-topic of Mathematics is vast enough to be considered a major subject on its own. A basic understanding of statistics that was taught in high school is enough to get one started.

Understanding is crucial because of its great usage in data analytics. Probability is also needed. Probability distributions(Discrete & continuous) are important as well.

Data Cleansing & Formatting-
Data is required to be cleaned and formatted, for it to we worked upon and get some information.

Data Visualization tools like tableau, Excel VBA & PowerBI-
Any assessment formulated by a data scientist whether experienced or a rookie has to show it in a format that is easy to understand for someone who isn't well-read in data science assertions and to pitch it to a higher authority. 
For this, there are many visualization tools like tableau, PowerBI, and Excel VBA's that provide a data scientist with the necessary tools and thus some basic understanding and knowhow of these are prerequisites.

Python Fundamentals-
Python goes hand-in-hand with a data scientist and is the industry-standard as well as the top choice of most data scientists for its easy-to-use, regularly updated, and well-maintained libraries some of which are customized to be used in data science.

Python Libraries including NumPy, Pandas, and Matplotlib-
Premade libraries like these are part of what is so alluring about python, these libraries and others like them are used everywhere in this field because it makes it a lot easier to get work done using them, NumPy arrays save time, pandas are great for making a dataset and matplotlib for graphs.

Scikit-learn-
Algorithms of supervised, semi-supervised and unsupervised learning, model selection and dimensionality reduction and evaluation, for inspection(dependence), algorithms for pre-processing of data all can be found on scikit learn and are a must for ML scientists and an industry-standard.

Domain Knowledge-
This is the knowledge of the field that you are going to apply data analysis in.
For example, if working in ONGC u will need to have a basic understanding of oil and gas for you to be able to make actionable assessments in Data Analytics & Quality Assessment.

Communication, Soft and interpersonal Skills-
Although this is a broad term, the idea is simple behind communication skills, soft or interpersonal skills here, as without this no matter what insights you may or may not be having will go without any consideration by the senior management in your corporate company.

It also includes being able to simply and decrementing the level of complexity for a layman and others to get drawn to your assertions and assumptions.

Database Management(DBMS)-
Any data work whether its data cleaning or formatting or even working on clean data can only be done after gathering data in a productive and working format and a format that has kept on in the field of data science as the industry standard is a database and with the advent of Big Data and complex analysis of data with ML, AI & DL large amounts of data that must be kept in an orderly manner has rendered a database pretty much a necessity for big corporations and MNC's.

SQL(Structured Query Language)-
SQL is a language that is used for the creation, deletion & manipulation of data in the database. Any top-level MNC and corp. that has built a high-level, fast, and efficient database management system generally employees the usage of SQL in it.

Machine Learning-
Machine learning is the subject that works on algorithms that can improve automatically or by intaking data that make it easier to make assertions, predictions, or assumptions based on a data model without the need for human programming.

Deep Learning-
A niche of machine learning, deep learning is an advanced concept that uses Neural networks and is a great asset for an ML & AI career.


Post a Comment

Previous Post Next Post