Nowdays, mailboxes, ads and messages are flooded with “Become Data Scientist in 3 months, 6 months etc”. Are they really making data scientist in 3 months or 6 months ? What magic-wadon they have that transform people to become data scientist in 3 months. My answer : it is very difficult to become data scientist in 3 months, however one can understand very high level analytics
My Journey
I started my journey 6-7 years back in data analytics . This journey was not started because of interest in this field but to survive in software industry ( As I was hard core mainframe/ iSeries guys and working for almost more than 12 years on black screen). There were very few opportunity in my technology that forced to look for different avenues. So, I chose Data Analytics and Machine Learning. Thanks to social media that created hype for it that made my decision easier.
I too thought within 3 months , I would learn data analytics and would consider myself veteran in analytics.I picked a well known book to understand concepts of algorithms of machine learning. I completed 100 pages in 2 days, however problem started when words were started to transformed to mathematics or algorithms and statistics.My pace slowed down and hardly I could read 5 pages in a day. I became disheartened and confused whether I took right decision to pursue analytics. I kept on reading one algorithm for weeks to understand but could not make any progress.
So, I introspected and realized that analytics was not a 20-20 match cricket match but test match for 5 days. I re-planned my strategy and started from scratch, though I lost my precious 1-2 months out of 3 month where I was planning to become data scientist
Really Journey Started
Now, I realized that analytics need a lot of efforts, patience and time. I started looking what are the various aspects required to become a data scientist. I started become aware of them. I understood that I need to be aware of following aspects :-
- visualization concepts.
- Algorithms.
- Probabilities
- Discrete mathematics
- Statistics and database concepts.
- I was an outlier for all these concepts.I took statistics and mathematics book and started understanding it ( I thought, if I had understood those concepts in school, probably I would not be struggling). I completed my school books in 1 months.
- Now, I started studying “An Introduction to Statistical Learning: With Applications in R” . Ah !! very nicely written book( though I did not understand R part), also very thick book ,more than 800 pages. I took time and completed it next 3-4 months.
When I was thinking to complete entire data analytics or science in 3 months, It took me to complete only one book that is preliminary step for data science. Now, I started liking analytics.
- So, now I was clear on statistical concept, I had to understand vectors, matrix, calculus and probability. I read these concepts online and one of my friend was doing Masters in Mathematics. So, understood mathematical related concepts from him. It took almost 2-3 months.
As of now, I was able to understand and comprehend what data scientists says on analytics. However, they were many terms I could not understand so thought need to understand them and found details.
- Got hold of book “Data Mining: Concepts and Techniques“. I read the book for next 3 months and understood the concepts that were lacking .
Now, I was able to understand what data scientists were talking about. But, still there was a lag, I need to develop code or write algorithms.Now, another BIG challenge what to choose R or Python or something else ?.
I read about both the languages and understanding R and Python , finally decided to go for Python. There is nothing wrong with R , just sheer interest I chose Python to be my language for analytics.
- Now, I had completed a year in data science but hadn’t create a single program or algorithm. So, I took Python language book to understand concept of Python. My first book was “Head First Python” . I read it in next 1 months and then I started reading “Fluent Python”. This book took me 5-6 months to complete.
So, It is almost 1.5 years in analytics where I was only studying and creating small programs.
So, finally thought to create one machine learning program in python. It took 15 days to create a simple logistic regression program. Then I realized that there were many packages or libraries that could be used to data manipulations. So, started understanding them too.
- I studied numpy, Pandas, matplotlib, scikit learn, scipy and other libraries. So, this understanding of libraries took me around 1-2 months.
After that I could create a simple machine learning program within few hours.
Still, there was something missing in entire picture ,after doing introspection, I realized something I need to do for visualization.
- I started learning Tableau ( This I choose randomly, one can choose other tools too). It took 1 months to learn it.
After 2-2.5 years of learning , I thought I know everything in data science. However, my dream was shattered , when people are starting talking about deep learning and reinforcement learning. At that time, I realized , I had to study daily and understand new concepts.
- I got a book by Ian Goodfellow on Deep Learning. Wow !!! Such a great book and it has to be on book shelf. I read it next 3-4 months.
Tasks Completed |
Time Taken |
Read school books ( statistics and mathematics book | 1 month |
An Introduction to Statistical Learning: With Applications in R | 3-4 months |
vectors, matrix, calculus and probability. | 2-3 months |
Data Mining: Concepts and Techniques | 3 months |
Head First Python | 2-3 months |
Fluent Python | 3-4 months |
numpy, Pandas, matplotlib, scikit learn, scipy and other libraries | 2 months |
Tableau | 1 month |
Deep Learning by Ian Goodfellow | 3-4 months |
However, along the journey, I started my Master course in business Analytics.
So, when I started my journey, I knew iSeries , Now I knew Python, Machine Learning, Tableau, Deep Learning, Salesforce, AWS , Statistics, RapidMiner etc.
So, Data Science is a journey. I am learning and I feel that I know only 5% of it. Long way to Go !!!!