How to do grouping of analytics data?

What are the best technologies for big data analytics on streaming data?

  • much of the analytics world is focused on data warehouses.  However, there is a whole world of machine generated data ( network elements like routhers, switches, probes.  Or M2M data from smart meters, phones or truck engines or retail registers or stock transactions).  The notion of doing analytics on these massive volumes of data is maybe the next generation and holy grail for analytics without storing the data.

  • Answer:

    Performing complex analytics in real-time on massive amounts of data-in-motion is something that IBM's InfoSphere Streams product was specifically designed do to.  It was developed over many years in cooperation with the US government and is now available as a mature and well supported product.

Jim Sharpe at Quora Visit the source

Was this solution helpful to you?

Other answers

Acunu takes the approach that it's valuable to store the data as well as provide rapid answers. It is building analytics tools to help combine historical data with freshly gathered data with sub-second latency. It's a Big Data database which will support a similar set of queries as CEP/streaming systems. Preview release here: http://www.acunu.com/blogs/andy-twigg/acunu-analytics-preview/ You might also like to check out the various CEP databases, and distributed processing frameworks like Storm. (disclaimer: I work at Acunu)

Tim Moreton

There are three pioneering technologies in Stream processing Storm(backed by twitter), Shark(backed by Amplabs) and Samza(backed by Linkedin).      We have worked with several customers to develop solutions around Apache Spark/Shark technology and found it extremely stable in processing huge amount of streams in a very stable fashion. Each technology has its own shortcomings but Spark seems to have made a good decision between reliability and thoroughput.

Mayur Rustagi

Realtime analytics, or what people call Realtime Analytics, has two flavors. Realtime Streaming Analytics ( static queries given once that do not change, they process data as they come in without storing. CEP (e.g. WSO2 CEP), Apache Strom, Apache Samza etc., are examples of this. Realtime Interactive/Ad-hoc Analytics (user issue ad-hoc dynamic queries and system responds). Druid, SAP Hana, VoltDB, MemSQL, Apache Drill are examples of this. see http://www.kdnuggets.com/2015/03/sql-query-language-realtime-streaming-analytics.html for more details.

Srinath Perera

I use Sclera, which supports streaming analytics. Sclera can ingest M2M stream data, and evaluate SQL on the same without storing the data wherever possible (not possible, for example, when you need to sort the data). In addition, you can also do pattern matching (event detection), parsing and machine learning on the streaming data.  Details at http://www.scleradb.com

Radha Kulkarni

if you're looking to set up your own platform then the technologies raised such as Storm and Shark are your typical path. However there's increasing trend to offload that to the cloud, and the cloud vendors are growing an impressive suite of services to handle stream ingestion, processing and analytics. In fact there's been a fierce fight between all major vendors, primarily Google, Amazon and Microsoft, on the offering around that. Google offers Pub/Sub, Dataflow, etc., Amazon has Kinesis, Lambda, Elastic MapReduce.., Microsoft has Event Hubs and the recently GAed Azure Stream Analytics... These offer a shorter path to building and scaling a solution in an economical way. * https://horovits.wordpress.com/2015/04/16/google-amazon-fight-big-data-in-cloud-heating-up/ * https://horovits.wordpress.com/2015/04/20/microsoft-launch-big-data-cloud-service-stream-analytics/

Dotan Horovits

It's hard to say the best but i guess Twitter summingbird is one of the advanced technologies out there. For more stuff you can checkout: https://github.com/onurakpolat/awesome-bigdata

Onur Akpolat

Check out StreamLab - from our research, no one else is doing anything even close to this. http://vimeo.com/107917677 http://www.sqlstream.com/blaze/streamlab/ enables business analysts and data scientists to explore and visualize machine data streams in real-time. http://www.sqlstream.com/blaze/streamlab/ offers a graphical stream browser for the interactive exploration of http://www.sqlstream.com/products/what-is-machine-data/, with built-in real-time dashboards for visualizing streaming data and analytics. No SQL or Java coding is required – all streaming data interactions are supported through the powerful GUI.

Andrew Bare

I like Esper: http://esper.codehaus.org/ Especially its capability to handle time based sliding windows...

Sajal Kayan

Just Added Q & A:

Find solution

For every problem there is a solution! Proved by Solucija.

  • Got an issue and looking for advice?

  • Ask Solucija to search every corner of the Web for help.

  • Get workable solutions and helpful tips in a moment.

Just ask Solucija about an issue you face and immediately get a list of ready solutions, answers and tips from other Internet users. We always provide the most suitable and complete answer to your question at the top, along with a few good alternatives below.