What can be a possible time line to improve my data science skills in 1-2 years?
-
I am just graduating, will be working as a data scientist at a start up. I think I won't have more than 15-20 hrs/week to spare on career development. I want to improve my data science skills in the next 1or 2 years. I have some skills listed below. What skills should I be learning in this time? Programming and Software - Advance Knowledge - C/C++, Java - Good Knowledge - Python, Shell - Statistical Packages (Good Knowledge) - R, Octave/MATLAB - Big-Data (Basic but would improve on the job) - Hadoop, Hbase, Hive and Pig - Cloud (Basic but would improve on the job) - Amazon AWS - Database (Good Knowledge) - Mysql - Operating System (Good Knowledge) - Linux Maths/Stats/Algo/ML/IR Algorithms - Basic and Advance Algorithms taught in CS undergraduate curriculum Stats and OR - Basic understanding and ML - Basic Concepts + understanding of NN+ SVM + BN + HMM IR/NLP - Basics n-grams, TF-IDF etc. Maths - Linear Algebra with concepts like matrix decomposition and optimization methods like gradient descent and ALS.
-
Answer:
This question has been modded and the following answer is only intended for the original question. Sorry, it will be impossible to become an expert in 1 year. After 1 year you will probably still be a beginner, and after 2 years you might be at the intermediate level. The skills you listed are a good start, but that is all they are. Having graduated myself in 2012 and now working as a data scientist, I can say that being skilled in CS and math are the bare minimums to even be qualified for a data scientist position. Here are some reasons why advancements in data science come slowly: Data science/Big data is extremely new. If you aren't working at a company with a well established tech/dev team, chances are your company does not have the infrastructure necessary to do any advanced data analysis beyond simple things like group-by-and-count. You'll find things that should be logged and are needed for data analysis are not being logged. Subsequently, it'll take time to build the datasets you want to have for analysis Like software dev, data science projects require lots and lots of iterations. Unlike software dev, these iterations will not happen quickly. You'll find mistakes in your analysis, or that your conclusions aren't applicable to certain types of people. These problems are easy to fix as they are just changes in your algorithms. The hard ones to fix are when you realize your data is bad, or that you need to log more data. In these situations you are at the mercy of time until a new dataset is constructed Data science is a lot like research, and while it utilizes lots of CS it is not like engineering. In engineering, you plan out how things are supposed to work, you build them according to your designs, you break them and you redesign them. Throughout this process you always have an idea of how to move forward: A/B/C need to have a common interface, X/Y/Z have these bugs etc. In data science you are not operating on your own plans, you are trying to identify patterns of other people, and I guarantee that there are big chunks of people who behave in ways you would totally not expect. You will start with an exploratory phase where you are just trying to understand the data and some high level patterns about others, and only when these patterns are identified will you be able to come up with some kind of strategy for a real analysis. The exploratory phase might be repeated multiple times as you pull data from different sources - some of which will be helpful and some won't be - which just adds more time until the real analysis gets started. Sometimes you will study a problem for a very long time, and then conclude that it can't be solved (like those problems you worked on in grad school :( ) In some problems, the best you can do might literally be the group-by-and-count analysis. The algorithms coming out of research might not be applicable to you, you'll tinker with them to see if you can rederive them for your situation, stumble and screw up some math and then give up. All of this takes time to process. A single analysis can easily take 3 - 6 months or more. Having just graduated, you will probably be like me where your first project takes 6 months, and maybe your second project takes only 3. In 1 year you might get a total of 3 projects done. Since data science is a unique interdisciplinary field, you will only have scraped a small chunk of what data science is with those 3 projects, and hence I would say you're still going to be a beginner.
Jeffrey Wong at Quora Visit the source
Other answers
Read this, it's the best post around for this topic
Ferris Jumah
This is a great foundation. I think the most important thing is to apply your knowledge to real problems. Try to find opportunities to do that in your new role.
Raphael Cendrillon
An expert is almost by definition someone who's spent more than 1-2 years working at something. Keep striving to be better, but plan on spending the next 8-10 years at it. Then you'll be an expert. Edit: The original question asked how to become an expert in data science within 1-2 years. Since the question has been edited after I answered it, my answer isn't relevant, but I'm leaving it as is.
Justin Rising
You mentioned you'd have 15 to 20 hours per week for career development. You're fortunate; that is a wealth of time if used properly. I would recommend several things to accelerate your skills over the next one to two years, listed in order of priority but done in parallel: 1) Go the UCI Machine Learning Database (Google that title) and try lots of the problems. Set a goal to produce reasonable results on 100 data sets in two years. At first you'll take longer than the requisite one week per data set, but you'll get faster as you learn, and as you develop your own tools to automate some tasks. Start with regressions in Excel as a baseline, and to learn the limitations most people face, then focus on R and any add-ons you'd like to try. 2) Get a mentor who'll sit with you for lunch two or three times a month. This mentor should be a data scientist, a statistician, or an engineer with significant machine learning experience. Talk about problems and what it takes to solve them. Show your mentor your UCI project results and your methods, and get feedback. 3) Read like hell. Academic machine learning papers, Wikipedia pages on concepts and phrases you haven't heard, books on everything reasonably related to the theory and practice of data science. 4) Take online courses in related subjects. Move fast, don't get caught up in making tiny incremental improvements or trying to beat published results. Just do this large number of projects on UCI's relatively small data sets so that you're exposed to a wide variety of problems. Forget the "big" in "big data" until later. Right now you need to build your intuitive feel for solving problems, and how the various machine learning and statistical methods differ in both approach and results. Don't get caught up in academic theory, religious adherence to a single method, or the latest shiny new thing. Just solve problems and pay attention to how different measures of success have implications for practical use of the results. For the next two years, minimize the amount of algorithms and approaches that you develop for yourself -- the only development you should do is scripts for pre-processing data and automating repetitive tasks. After the two years, you'll know whether you should invent and development something. Be patient; build your foundation first. And always write yourself a little problem statement at the beginning of each project, and a statement at the end about how this could benefit someone -- and calculate the magnitude of the benefit and how many people worldwide would experience that benefit versus standard regression. As mental exercise, ask yourself how much you could charge for that benefit if people bought your model. This process will help you stay focused on the practical aspects of your skills. It'll also help you learn to define success metrics properly for any data set. Do this for two years and you will advance your data science skills faster than 99% of new practitioners, and you will have a killer body of work as an addendum to your resume. You've made a great career choice. Now outrun your peers. Good luck.
Patrick Lilley
A little shell programming is all you need to get started Get involved in an open source data science project, like https://github.com/CalculatedContent/tsvm What we are doing trying to do is set up a set of open source, collaborative, data science / machine learning research projects that correspond to the work described on my blog http://charlesmartin14.wordpress.com/ These projects are designed bona-fide scientific research projects that require collecting our own data and designing experiments that systematically test specific theoretical ideas. Unlike Kaggle, which is isolated, competitive, and predefined, these projects are meant to be shared, collaborative, and requiring scientific thinking to frame the problem Each project involves 1. mathematical understanding and advancement 2. data science experiments 3. developing some code We are very open to having collaborators interested in doing novel data science research
Charles H Martin
I answered a relevant question in quora.
Murthy Kolluru
I've got this link on my to-do list: https://github.com/datasciencemasters/go it seems to be a very up to date community-run curriculum for data science. I would suggest you scope it out as it may help to redefine your approach to what is possible in the given time frame. the people behind it are very friendly and engaging.
Alex Wilkes
Related Q & A:
- What can cause a sore jaw?Best solution by Yahoo! Answers
- What can be a good career for me?Best solution by Yahoo! Answers
- What would make a good story line?Best solution by ChaCha
- What can be a good healthy diet plan for a 14 year old?Best solution by Yahoo! Answers
- What can lower a person's immune system?Best solution by delimmune.com
Just Added Q & A:
- How many active mobile subscribers are there in China?Best solution by Quora
- How to find the right vacation?Best solution by bookit.com
- How To Make Your Own Primer?Best solution by thekrazycouponlady.com
- How do you get the domain & range?Best solution by ChaCha
- How do you open pop up blockers?Best solution by Yahoo! Answers
For every problem there is a solution! Proved by Solucija.
-
Got an issue and looking for advice?
-
Ask Solucija to search every corner of the Web for help.
-
Get workable solutions and helpful tips in a moment.
Just ask Solucija about an issue you face and immediately get a list of ready solutions, answers and tips from other Internet users. We always provide the most suitable and complete answer to your question at the top, along with a few good alternatives below.