How to apply media query in table?

Cassandra (database): Materialized Views or Index CF, which way is better to handle 20 single column indexes to support timeline query(s) in one table?

  • Howdy, Can I ask a Cassandra data model question here about time series data and timeline query? Materialized Views or Index CF, which way is better to handle 20 single column indexes in one table? We have a book table with 20 columns, 300 million rows, average row size is 1500 bytes. create table book(  book_id,  isbn,  price,  author,  titile,  ...  col_n1,  col_n2,  ...  col_nm ); Data usage: We need to query data by each column, do pagination as below, typical timeline query, select * from book where isbn   < "XYZ" order by ISBN   descending limit 30; select * from book where price  < 992   order by price  descending limit 30; select * from book where col_n1 < 789   order by col_n1 descending limit 30; select * from book where col_n2 < "MUJ" order by col_n2 descending limit 30; ... select * from book where col_nm < 978 order by col_nm descending limit 30; Write: 100 million updates a day. Read : 16  million queries a day. 200 queries per second, one query returns 30 rows. *** Materialized Views approach {"ISBN_01",book_object1},{"ISBN_02",book_object2},...,{"ISBN_N",book_objectN} ... We will end up with 20 timelines. *** Index approach - create 2nd Column Family as Index 'ISBN_01': {'book_id_a01','book_id_a02',...,'book_id_aN'} 'ISBN_02': {'book_id_b01','book_id_b02',...,'book_id_bN'} ... 'ISBN_0m': {'book_id_m01','book_id_m02',...,'book_id_mN'} This way, we will create 20 index Column Family(s). --- If we choose Materialized Views approach, we have to update all 20 Materialized View column family(s), for each base row update. Also, Materialized Views approach will use 20 times more storage space, increase from 500GB base table size to 10TB. Will the Cassandra write performance acceptable? Redis recommend building an index for the query on each column, that is your 1st strategy - create 2nd index CF: http://redis.io/topics/data-types-intro (see section [ Pushing IDs instead of the actual data in Redis lists ] Should we just normalize the data, create base book table with book_id as primary key, and then build 20 index column family(s), use wide row column slicing approach, with index column data value as column name and book_id as value? This way, we only need to update fewer affected column family that column value changed, but not all 20 Materialized Views CF(s). Another option would be using Redis to store master book data, using Cassandra Column Family to maintain 2nd index. What will you recommend? Thanks, Charlie | DBA developer p.s. Gist from datastax dev blog ( http://www.datastax.com/dev/blog/advanced-time-series-with-cassandra ) "  If the same event is tracked in multiple timelines,  it’s okay to denormalize and store all of the event data in each of those timelines.  One of the main principles that Cassandra was built on is that disk space is very cheap resource;  minimizing disk seeks at the cost of higher space consumption is a good tradeoff.  Unless the data for each event is ^very large^, I always prefer this strategy over the index strategy. " Will 1500 bytes row size be large or small for Cassandra from your understanding? A: 500MB is the limit. "performance degradation starts at 500MB rows, its very slow if you hit this limit." Answer from

  • Answer:

    Short answer - Use IndexCF. You will have to hit cassandra twice to get full object but thats a tradeoff between using 20 times more storage and two hits. Also a tradeof between write performance and read performance. 20 times storage space and 20 times more size of data for write is far bigger compromise with respect to saving just one hit for read.

Sarang Anajwala at Quora Visit the source

Was this solution helpful to you?

Just Added Q & A:

Find solution

For every problem there is a solution! Proved by Solucija.

  • Got an issue and looking for advice?

  • Ask Solucija to search every corner of the Web for help.

  • Get workable solutions and helpful tips in a moment.

Just ask Solucija about an issue you face and immediately get a list of ready solutions, answers and tips from other Internet users. We always provide the most suitable and complete answer to your question at the top, along with a few good alternatives below.