Cassandra (database): Materialized Views or Index CF, which way is better to handle 20 single column indexes to support timeline query(s) in one table?
-
Howdy, Can I ask a Cassandra data model question here about time series data and timeline query? Materialized Views or Index CF, which way is better to handle 20 single column indexes in one table? We have a book table with 20 columns, 300 million rows, average row size is 1500 bytes. create table book( book_id, isbn, price, author, titile, ... col_n1, col_n2, ... col_nm ); Data usage: We need to query data by each column, do pagination as below, typical timeline query, select * from book where isbn < "XYZ" order by ISBN descending limit 30; select * from book where price < 992 order by price descending limit 30; select * from book where col_n1 < 789 order by col_n1 descending limit 30; select * from book where col_n2 < "MUJ" order by col_n2 descending limit 30; ... select * from book where col_nm < 978 order by col_nm descending limit 30; Write: 100 million updates a day. Read : 16 million queries a day. 200 queries per second, one query returns 30 rows. *** Materialized Views approach {"ISBN_01",book_object1},{"ISBN_02",book_object2},...,{"ISBN_N",book_objectN} ... We will end up with 20 timelines. *** Index approach - create 2nd Column Family as Index 'ISBN_01': {'book_id_a01','book_id_a02',...,'book_id_aN'} 'ISBN_02': {'book_id_b01','book_id_b02',...,'book_id_bN'} ... 'ISBN_0m': {'book_id_m01','book_id_m02',...,'book_id_mN'} This way, we will create 20 index Column Family(s). --- If we choose Materialized Views approach, we have to update all 20 Materialized View column family(s), for each base row update. Also, Materialized Views approach will use 20 times more storage space, increase from 500GB base table size to 10TB. Will the Cassandra write performance acceptable? Redis recommend building an index for the query on each column, that is your 1st strategy - create 2nd index CF: http://redis.io/topics/data-types-intro (see section [ Pushing IDs instead of the actual data in Redis lists ] Should we just normalize the data, create base book table with book_id as primary key, and then build 20 index column family(s), use wide row column slicing approach, with index column data value as column name and book_id as value? This way, we only need to update fewer affected column family that column value changed, but not all 20 Materialized Views CF(s). Another option would be using Redis to store master book data, using Cassandra Column Family to maintain 2nd index. What will you recommend? Thanks, Charlie | DBA developer p.s. Gist from datastax dev blog ( http://www.datastax.com/dev/blog/advanced-time-series-with-cassandra ) " If the same event is tracked in multiple timelines, itâs okay to denormalize and store all of the event data in each of those timelines. One of the main principles that Cassandra was built on is that disk space is very cheap resource; minimizing disk seeks at the cost of higher space consumption is a good tradeoff. Unless the data for each event is ^very large^, I always prefer this strategy over the index strategy. " Will 1500 bytes row size be large or small for Cassandra from your understanding? A: 500MB is the limit. "performance degradation starts at 500MB rows, its very slow if you hit this limit." Answer from
-
Answer:
Short answer - Use IndexCF. You will have to hit cassandra twice to get full object but thats a tradeoff between using 20 times more storage and two hits. Also a tradeof between write performance and read performance. 20 times storage space and 20 times more size of data for write is far bigger compromise with respect to saving just one hit for read.
Sarang Anajwala at Quora Visit the source
Related Q & A:
- Which is the better place to do an MSc in theoretical physics, Imperial College London or Perimeter Institute?Best solution by Quora
- Is it better to use many records in one table, or to use multiple tables?Best solution by Stack Overflow
- How to insert data from one table to another?Best solution by Stack Overflow
- How to filter single column in a jtable?Best solution by Stack Overflow
- How can display multiple values to single column?Best solution by Stack Overflow
Just Added Q & A:
- How many active mobile subscribers are there in China?Best solution by Quora
- How to find the right vacation?Best solution by bookit.com
- How To Make Your Own Primer?Best solution by thekrazycouponlady.com
- How do you get the domain & range?Best solution by ChaCha
- How do you open pop up blockers?Best solution by Yahoo! Answers
For every problem there is a solution! Proved by Solucija.
-
Got an issue and looking for advice?
-
Ask Solucija to search every corner of the Web for help.
-
Get workable solutions and helpful tips in a moment.
Just ask Solucija about an issue you face and immediately get a list of ready solutions, answers and tips from other Internet users. We always provide the most suitable and complete answer to your question at the top, along with a few good alternatives below.