What is the most efficient tree structure for creating an index for an in-memory store?
-
I know B-Trees are good for IO intensive indexes, but what about in-memory trees? My specific need is to create an index of "String keys" which point to specific objects in memory. This isn't language specific, but if there are recommendations for specific C++ implementations, I'd love to hear about them.
-
Answer:
Assuming you want a tree like structure (i.e., you need to solve the "extended" dictionary problem), the structures available to you are balanced trees and skip lists. Red black tree is going to be most efficient structure for solving the "extended dictionary problem". Invariants, enforced by rotations, guaranteeĀ O(log n) insertions, lookup and deletion, independent of the order in which data is inserted into the tree. I believe several databases provide the option of an in-memory red-black tree index. Skip lists are the easiest to implement and reason about. Doug Lea also built an excellent concurrent implementation of the skip list, which comes as a standard part of Java. They're a randomized data structure: the worst case search time is O(N), but the average insertion and deletion time is O(lg n). The other drawback of skip lists, is that every node in a skip list contains log_2 (bits in a word) pointers. On a 64-bit machine, that means you have extra 64 bytes (8 words, 8 bytes per word) per entry. Memtables (used in BigTable, leveldb, HBase and Cassandra for in-memory indices) use skip lists.
Anonymous at Quora Visit the source
Other answers
Re-answering this question as I missed part of the question earlier. I solved similar problems while working for eBay search. We needed CPU and memory efficient structures. We wrote in C, did not use STL, rarely used pointers, and implemented all data structures ourselves, counting every byte. One solution to your problem is the following: 1. If you want O(1) access on key lookup and will always look up the entire key using an exact match (e.g. key="foo"), then consider implementing an open-addressed hashtable for your index. This hashtable does not use a chained linked list per bucket. It stores the data in the bucket array and hence avoids wasting memory on the 64-bit address pointers (if you are running on a 64-bit OS) used in linked lists. 2. Additionally, consider storing all of your objects in a large piece of contiguous memory that you control. This way, the index value will just be an unsigned integer or long representing an offset to the start of your object in your slab. Essentially, you will be implementing your own memory allocator. If you need fancier types of lookup, like range scans or fuzzy string matching, then you would need a word graph or tree. There are a few options for those as well. For example, you can use a directed acyclic word graph. This is essentially a very memory compact prefix+suffix graph, stored as an array. It is possible to implement scans and edit distance queries against it.
Siddharth Anand
The usual two candidates are AVL trees and red-black trees. AVL trees are more aggressively balanced, and so have usually quicker look-up time and slower update time.
Jan Hidders
You should use the standard library. std::map is typically implemented using a rb-tree and is designed for your particular problem. There is also the boost graph library which you should use if you feel fancy.
Fredrik Eckardt
If the keys are string. Then use trie. http://en.wikipedia.org/wiki/Trie
Prathab Kali
Not an expert here...but I recall that Judy Trees excel here http://judy.sourceforge.net/doc/10minutes.htm
Charles H Martin
Related Q & A:
- What is a good way to structure mark-up generating code and avoid the example mess?Best solution by Code Review
- How to print out Tree Structure?Best solution by Stack Overflow
- What are some energy efficient products?Best solution by Quora
- What is a purple pith tree?Best solution by Yahoo! Answers
- What is the cheapest efficient all in one printer? UK?Best solution by Yahoo! Answers
Just Added Q & A:
- How many active mobile subscribers are there in China?Best solution by Quora
- How to find the right vacation?Best solution by bookit.com
- How To Make Your Own Primer?Best solution by thekrazycouponlady.com
- How do you get the domain & range?Best solution by ChaCha
- How do you open pop up blockers?Best solution by Yahoo! Answers
For every problem there is a solution! Proved by Solucija.
-
Got an issue and looking for advice?
-
Ask Solucija to search every corner of the Web for help.
-
Get workable solutions and helpful tips in a moment.
Just ask Solucija about an issue you face and immediately get a list of ready solutions, answers and tips from other Internet users. We always provide the most suitable and complete answer to your question at the top, along with a few good alternatives below.