What is the algorithmic complexity of this digital search tree?
-
It is not just as simple as O(k) ... Overview: http://digitalsearchtree.com/ Detailed description: http://www.linkedin.com/redirect?url=http%3A%2F%2Fwww.digitalsearchtree.com%2FDigital%2520Search%2520Tree%2520Overview.pdf&urlhash=a4eZ&_t=tracking_anet. I developed this search tree approach several years ago and achieved quite phenomenal test results for it as compared to a leading implementation of the balanced binary search tree, for simulations of a number of real search tree scenarios. I have not however been able to satisfactorily analyze its time complexity. Considerations: The worst-case, average-case, and best-case performance of the insert or search method, (using an example for 8-bit character string keys), is a function of: 1) k * c, where k = number of characters in the string, and c is a constant that is derived from the number of bits in the bit mask. E.g. Given a bit mask of 4, c = (8 / 4) = 2. For 16 byte character strings, the result would be: 32. I believe from this it can be concluded that the worst-case performance is O(k * c) = O(k). The worst case could occur if a) the tree is "full", i.e. n = U, where U is the universe of keys; or b) the tree is not full, but every previous key (along a downward path) that was inserted, collided with an existing key at each level going downward in the tree. 2) n, the number of keys/items in the tree (it seems that n only affects the best-case and average-case complexity) 3) The degree of randomness of the keys that exist in the tree. The best-case and average case time complexity are each a function of both k and n. What is the formula for this, and how can the "degree of randomness" of the key set be factored in (should it be?). In practice this makes a huge difference in its performance, based on my testing. A test key set with a lot of patterns in the data (for instance, URLs where "www" is repeated for each key) has an impact on the degree to which the tree gets "saturated" at each level. I am also interested in feedback regarding whether or not this concept is original. There are a number of methods that look very similar, but in each case they differ in some important detail.
-
Answer:
The worst case time complexity for each operation on a digital search tree is not actually O(k). In your analysis, you forget that it takes O(k) time to compare two strings, where k is their length. As a result, each operation is actually at worst O(k \cdot min(n, k)), because the height is at worst O(min(n,k)), where n is the size. This worst-case scenario occurs when all keys have a long common prefix. If you assume the input is completely random and only store pointers to the keys, the average case for each operation is actually O(\log n). This is because the expected height of a randomly built tree is O(\log n). Once you assume the input is random, though, the runtime of most string algorithms becomes meaningless. For example, the naive string searching algorithm on random input is expected time O(n), but it should never be used in practice because the worst case is O(n^2). In this case, real world data often has matching prefixes, hurting the performance. For example, if I have 32-bit integers but only insert integers below ten million, all the keys will have several leading zeroes in common. Thus, if the keys are long, you should use a trie instead of a digital search tree. Tries have depth of at worst O(k), except only perform one comparison instead of O(h) comparisons per operation. And yes, this particular field is fairly well known and has been very well studied since the 1970s. Here's a somewhat interesting paper that compares the performance of a digital search tree against standard balanced binary trees on integer keys: http://www.eecg.toronto.edu/~brown/papers/digital-franjo.pdf. A similar analysis could be done on tries vs. digital search trees, though the former should prevail in most real-world situations where the keys are nontrivially long strings.
Johnny Ho at Quora Visit the source
Related Q & A:
- What are the advantages and disadvantages of digital voltmeters over the analogue voltmeters?Best solution by Yahoo! Answers
- What is the difference between analog and digital communication?Best solution by Yahoo! Answers
- What is a Good DVR card for digital surveillance recording?Best solution by Yahoo! Answers
- Does anybody know what channel would ABC come on Comcast digital cable?Best solution by xfinity.com
- What is a creative way to display a family tree?Best solution by geni.com
Just Added Q & A:
- How many active mobile subscribers are there in China?Best solution by Quora
- How to find the right vacation?Best solution by bookit.com
- How To Make Your Own Primer?Best solution by thekrazycouponlady.com
- How do you get the domain & range?Best solution by ChaCha
- How do you open pop up blockers?Best solution by Yahoo! Answers
For every problem there is a solution! Proved by Solucija.
-
Got an issue and looking for advice?
-
Ask Solucija to search every corner of the Web for help.
-
Get workable solutions and helpful tips in a moment.
Just ask Solucija about an issue you face and immediately get a list of ready solutions, answers and tips from other Internet users. We always provide the most suitable and complete answer to your question at the top, along with a few good alternatives below.