How do I get started with basics of data analysis and visualization?
-
Please excuse me for the long description. I'd really appreciate if you read and help me out. Thanks! -- Background: I'm currently pursuing masters in AI and in one of the courses, the assignment involved analyzing data, visualizing it and reporting what interesting things could be interpreted/concluded from the data. Having no prior experience in statistics/data analysis with R/Matlab/Excel, I literally struggled to work on the assignment. But these basics skills were expected out of the students, so there was not enough scope to take some time out to learn it first. At the final presentation session of the assignment, I noticed that other students had pre-processed and then plotted data in nice different formats to analyze the data using combination of Python & R studio libraries. I felt pathetic for not being able to do any of those things. The question: I want to be able to have approximate ideas on what kind of different analysis I could do with the data. If I understand that, then I should be able to visualize the data for each such analysis with the appropriate chart/plots to validate my (analysis) ideas and draw conclusions based on that. This would involve reformatting/pre-processing the data to fit it in different charts. Please give me some pointers on getting starting with understanding basics of the data analysis and graphs/chart visualization. For example, how to understand which graph is suitable for plotting the given data. How scatter plot/ heat map/other maps or charts are useful in what scenarios? What can be interpreted from each of those graphs? I'm completely lost and not sure where to begin. PS: I'm not sure if I phrased this question properly to express what I want to learn. If not, please leave a comment - I will revise it.
-
Answer:
You'll need to readâ¦alot. To specifically answer one concrete component of your question, this blog post by Andrew Abela : http://extremepresentation.typepad.com/blog/2006/09/choosing_a_good.html : has a great "chart choice" flow chart. Other sites like FlowingData or VisualisingData (the"s" is important there) are good, modern pointers but you should also hit up the classics by Tufte. This is a shameless self-link, but @jayjacobs and I did a few talks this year on the basics of data analysis and visualization and this blog post : http://rud.is/b/2013/05/15/secure360/ : has the most up-to-date pointers, resources and slides from it. It's not going to be an overnight journey, but it will definitely be a worthwhile one. There's a bitly bundle in that post of data analysis & viz resources that I & @dseverski maintain. Again, tons of reading and trial-and-error are in your future if you go down this pathâ¦
Bob Rudis at Quora Visit the source
Other answers
Well, I don't know what kind of answer you are expecting here. They only way to learn something is by doing it. I feel bad saying that because, it's obviously not the answer you want to hear. I have been there. I'll try to give you a step-by-step answer, with the fair warning that I am nowhere close to being an expert here. So, what I am going to do is continually improve on this answer. Feel free to comment below to get the conversation going. I might be going in the wrong direction and making this too trivial for you, since you mentioned that you've already had a class in this topic. Start with a sheet of paper. Make up your own data. Imagine how the scatter plot should look like, and then use Excel/R/Matlab to plot it. Look up the documentation. I know documentation might be annoying to read and it'd be much easier to find a stackoverflow answer, but trust me, it'll make life much easier. Start small - however silly it feels it'll quickly and effectively help you with the understanding. Look up Open Datasets on Google. Maybe even add the type of visualization you intent to do, so that you get specific types of datasets. Work on this data. Upload to Excel for simple datasets that are already clean. Use R for larger datasets (not because Excel can not handle it). Learn to manipulate the data first. Try filtering it, joining other datasets with this one (to provide you with more information). Play with it. Start with graphs on Excel. Change the attributes to everything to see what it does. Understand some basic statistics (you obviously know) here. Talking about favorite packages, look up and download ggplot2 for R. Or if you are a Windows user, also get Tableau. It's amazing! ... Profit? PS: I'll make changes to this if you feel this is silly and you already know how to do these.
Rohit Sivaprasad
Related Q & A:
- How Can I Get Started As A Freelance Web Developer?Best solution by code.tutsplus.com
- How can I get started with WSO2?Best solution by Stack Overflow
- How can I get started in Rally racing?Best solution by Yahoo! Answers
- How do I get started on my xbox 360?Best solution by Answerbag.com
- How do I get started Dirt Biking?Best solution by Yahoo! Answers
Just Added Q & A:
- How many active mobile subscribers are there in China?Best solution by Quora
- How to find the right vacation?Best solution by bookit.com
- How To Make Your Own Primer?Best solution by thekrazycouponlady.com
- How do you get the domain & range?Best solution by ChaCha
- How do you open pop up blockers?Best solution by Yahoo! Answers
For every problem there is a solution! Proved by Solucija.
-
Got an issue and looking for advice?
-
Ask Solucija to search every corner of the Web for help.
-
Get workable solutions and helpful tips in a moment.
Just ask Solucija about an issue you face and immediately get a list of ready solutions, answers and tips from other Internet users. We always provide the most suitable and complete answer to your question at the top, along with a few good alternatives below.