How to import data with JSON?

I have a workbook with data spanning 1999-2011 in separate sheets each sheet having data for 12 months and data for respective days. I would like do ad hoc querying on this data. How do we import this entire data to Hadoop for analysis?

  • I pretty much understand that we can do some data cleansing and convert that into dimensions and aggregates for analysis. I'm curious to know what will be approach if this considered to be unstructured as the goal would be do adhoc querying on the data (e.g.,) Query the data for a specific year, months ,date etc.,?

  • Answer:

    Is this data in a Excel file? It sounds like it is. I would further guess that you have less than 65,000 rows and less than 256 columns (as I think those were the limits in older file versions). If so, Hadoop is massive overkill. Create a simple little SQL table for that or just use Excel formulas or VBA. Should be a lot easier.

Andrew Comstock at Quora Visit the source

Was this solution helpful to you?

Other answers

Put this data as it is into hadoop. Use Apache Pig to clean this data and store it back into hadoop in a specific folder structure like years/months/days/ Create an external table in Hive with partitions on years/months/days Using Hive/Impala, you can now query this data

Amar Parkash

Just Added Q & A:

Find solution

For every problem there is a solution! Proved by Solucija.

  • Got an issue and looking for advice?

  • Ask Solucija to search every corner of the Web for help.

  • Get workable solutions and helpful tips in a moment.

Just ask Solucija about an issue you face and immediately get a list of ready solutions, answers and tips from other Internet users. We always provide the most suitable and complete answer to your question at the top, along with a few good alternatives below.