What does it take for a mathematics and computer science student to become a data scientist?
-
I am an undergrad student majoring in computer science and math. I have completed the core undergrad math and CS courses, passed 4 actuarial exams and interned as a software engineer. What else should I learn in each of the following categories to become a data scientist? --Theoretical math (eg probability theory, measure theory etc.) --Applied math/statistics (PDE etc) --Theoretical computer science (theory of computation, programming language theory etc.) --Applied CS and software engineering (Design patterns, Distributed systems...) --Other (anything else that does not fall into the above four categories)
-
Answer:
I studied mathematics and computer science and I'm currently working as a data scientist, here at Quora. Some of the most useful classes I have taken in preparation for this role have been statistics classes. Specifically ones which take you beyond the textbook modeling scenarios of first year - i.e. linear models. If there's a good class in non-parametric statistics, and a solid course in bayesian statistics then I think this will cover some useful material. Similarly if there's a machine learning course, this will be invaluable. Time series analysis is a really hot topic right now, as most data analyses have a temporal component, i.e. they are time series, and if you can incorporate this into your experiment or model then you'll have an edge in the job market. For some pragmatic courses on the engineering side, I think a class in large-scale databases would frequently come in useful. Or just any database class that you can take, provided that it's not taught by an awful professor. I never did this and frequently regret it. With regards to measure theory, your chance of using it within a professional setting is incredibly slim. However it will help you understand probability and thus statistics on a more concrete level. Algorithms classes translate straight into industry and I think are really useful - a fairly large chunk of CS theory (I'm thinking the contents of Sipser's ToC) is less useful on a day to day basis. Having said this I'm quite a big believer in letting your interests shape what you study. If you think a topic is interesting and you're passionate about it, you will probably absorb a lot more of the content. I took a course in information theory; the concepts that Shannon thought up over 60 years ago pop up randomly from time to time and although it's hard to quantify its value in a professional setting - it seems like a no-brainer in retrospect. If I didn't find it fascinating however, it probably would have been a waste. Similar sentiments for classes like network analysis, graph theory, quantum information theory, measure theoretic probability, group theory and number theory which have captured my imagination and enthusiasm on many occasions. You chose well with maths/compsci - it opens up so many doors. Good luck!
Jack Rae at Quora Visit the source
Other answers
Aside from taking more subjects, you might consider joining a lab or taking an internship with companies that will give you a change to do you data work. This allows you to gain real experience with you new found academic knowledge. Example. A research group in the biology department are running experiments on the effects of substance X on a large number of plants. This produces a large of data that could benefit from better visualization and presentation. It also needs to be cleaned and pre-processed. A lot statistical tools are used. Here you can flex what you just learned. Of course, starting with the easier activities. In the end, you get experience and the bio folk are happy.
William Emmanuel Yu
The best way to understand something is to just jump and and do it Get involved in an open source data science project, like https://github.com/CalculatedContent/tsvm What we are doing trying to do is set up a set of open source, collaborative, data science / machine learning research projects that correspond to the work described on my blog http://charlesmartin14.wordpress.com/ These projects are designed bona-fide scientific research projects that require collecting our own data and designing experiments that systematically test specific theoretical ideas. Unlike Kaggle, which is isolated, competitive, and predefined, these projects are meant to be shared, collaborative, and requiring scientific thinking to frame the problem Each project involves 1. mathematical understanding and advancement 2. data science experiments 3. developing some code We are very open to having collaborators interested in doing novel data science research
Charles H Martin
We created an infographic that summarizes the steps you can undertake to become a data scientist (all online resources). http://blog.datacamp.com/how-to-become-a-data-scientist-in-8-easy-steps-the-infographic/
Martijn Theuwissen
Related Q & A:
- What can you do with a BS in Computer Science?Best solution by Quora
- What classes do I need to take to become a forensic scientist?Best solution by everydaylife.globalpost.com
- What kind of job can i get with a minor in computer science?Best solution by Yahoo! Answers
- What Job Opportunities Are Available With a Major in Computer Science?Best solution by Quora
- What GOOD jobs can you do with a degree in computer science?Best solution by Yahoo! Answers
Just Added Q & A:
- How many active mobile subscribers are there in China?Best solution by Quora
- How to find the right vacation?Best solution by bookit.com
- How To Make Your Own Primer?Best solution by thekrazycouponlady.com
- How do you get the domain & range?Best solution by ChaCha
- How do you open pop up blockers?Best solution by Yahoo! Answers
For every problem there is a solution! Proved by Solucija.
-
Got an issue and looking for advice?
-
Ask Solucija to search every corner of the Web for help.
-
Get workable solutions and helpful tips in a moment.
Just ask Solucija about an issue you face and immediately get a list of ready solutions, answers and tips from other Internet users. We always provide the most suitable and complete answer to your question at the top, along with a few good alternatives below.