How to present large dataset from a SQL Server query?

SQL Server (product): Why columnstore indexes are better than general rowstore indexes in big data query?

  • The columnstore index can be maked my SQL Server 2012, and it's query performance is better than general rowsore index. Could you explain why it is efficient?

  • Answer:

    The performance on a columnstore index is not necessarily better than a regular index. It's different. There are different applications of the columnstore where it's performance is amazingly better than a regular index, but there are situations where it will not be better, but will in fact be worse. Regular indexes are not replaced.   Columnstore indexes work best in situations where you are looking at large sets of data that need to be processed, usually in some type of aggregation or grouping. If you're looking for individual rows, the regular index is going to perform much better. The principal aim of the columnstore index is in support of datawarehousing and datamart needs where you're running reports against your data and need it grouped in interesting ways that cut across the grain of a standard relational storage schema.   Microsoft documentation provides a good introduction here: http://msdn.microsoft.com/en-us/library/gg492088(v=SQL.110).aspx

Grant Fritchey at Quora Visit the source

Was this solution helpful to you?

Other answers

As Grant mentioned in his answer, columnstore indexes are not better than rowstore indexes in all cases. They should be implemented when you need to improve read speeds over large data sets. They are typically used in data warehousing environments where star schemas are implemented and regular aggregations are performed. The reason for the faster performance of column based storage (for reads) is that it reduces total I/O by reducing the amount of data that needs to be read. This is achieved in 2 ways. Firstly, most queries do not need to return all columns. Row based storage needs to read all columns in a row to return the required columns, whereas column based storage only needs to read the required columns. Secondly, the data in columns are sorted and compressed, reducing the data footprint.  The general rule is that the higher the cardinality of a column, the more compression you get. This is because sequences of duplicated values are replaced by one value and a counter of the number of occurrences of that value. All of this comes at a cost though. In SQL Server, columnstore indexes are read only. In other column based relational databases, update performance is generally slower than in row based databases.

Jaco Els

Adding to Grant's answer...you are not currently able to update a Columnstore index. The documentation Grant linked to provides more details on that, but it is certainly a factor to consider when thinking of deploying Columnstore indexes.

Thomas LaRock

Previous answers essentially state that column store indexes _only_ reduce disk I/O. I was previously under this impression myself. However, while that is certainly the case, there is another benefit that should not be overlooked. Much more of the highly compressed column index can fit in memory. So you are effectively reducing I/O twice - first you retrieve much less data, second you have to go to disk for the data much less often. This effect is exactly why Microsoft has started referring to xVelocity (as these indexes are now called) as an "in memory" technology. See this link for example: "Introducing xVelocity in-memory technologies in SQL Server 2012" http://blogs.technet.com/b/dataplatforminsider/archive/2012/03/08/introducing-xvelocity-in-memory-technologies-in-sql-server-2012-for-10-100x-performance.aspx

Joe Harris

A column store database is a database which stores table data as sections of columns of data rather than as rows of data. That means all the values for one particular column will be stored together and these sets of data might be distributed for different columns. Different column values for same row are linked by pointers within. When any query is run against a row store database, it fetches data from all the columns for the rows that match the query conditions. In contrast, when the same query is run against a column store database, only those columns are read which are required by the query. So if you consider a table with 30 columns and your query needs( this is often less than 15% of the columns in a typical fact table, according to a Microsoft whitepaper) only five of those fields the column store database will at least give you 6x performance over its row store counterpart (provided the data in each column is similar). Considering the fact that disk access is still the slowest operation for our computers, this becomes significant because of much lesser pages to be read from the disk. Also, the columns which are not used in the query are not loaded at all in the memory which becomes essential to avoid frequent PLE drops when dealing with large amount of data. Also, most of the column store databases use high values of compression for the data they store. The extremely high values of compression are results of the redundancy we usually see in columns. Again, the values in the same column are invariably of the same type, unlike the data you will see in a row, thus facilitating even more compression. Most of the column store database systems use more than one type of compression schemes based on the type of values stored in the column. For instance, Vertica uses Run Length Encoding schemes to store values in a column that are highly repetitive. These encoding schemes allow the database system to compress each column separately based on the underlying data values which can’t be done in a row store database because of different types of values stored in each row for each column. So if column store databases are so good, why do we use row store databases at all? Moreover, why row store databases are ruling the market when they are so inept when compared with column store databases? The point is that row store databases have their own advantages. You can’t use a column store database for a small table with hundreds of OLTP transactions per minute. A row store database will be much faster than a column store database if you want to see all the columns of the table in your resultset (because different columns are stored at different places on the disk and need multiple disk seeks). In a nutshell, there are specific use cases where you should use column store databases. The rise of column store databases has a lot to do with the rise of BI and big data. So generally column store databases should be used where traditional row based databases fail in delivering a satisfactory performance. Usually, such cases will be when you are trying to run analytic queries on significantly big data or when you need only few columns of a large table with a number of columns (a fact table for a star schema perfectly suits the description). Row store databases fail because they are just not designed for the purpose.

Kautuk Pandey

An update to some of the older answers - MS SQL Server 2014 release has column stores that can be updated: http://msdn.microsoft.com/en-us/library/gg492088.aspx

Victor Di Leo

Check this Channel9 Deep Dive video on ColumnStore Indexes by Sunil Agarwal, PM, SQL Server. http://channel9.msdn.com/events/TechEd/NorthAmerica/2014/DBI-B411

Manoj Pandey

Related Q & A:

Just Added Q & A:

Find solution

For every problem there is a solution! Proved by Solucija.

  • Got an issue and looking for advice?

  • Ask Solucija to search every corner of the Web for help.

  • Get workable solutions and helpful tips in a moment.

Just ask Solucija about an issue you face and immediately get a list of ready solutions, answers and tips from other Internet users. We always provide the most suitable and complete answer to your question at the top, along with a few good alternatives below.