How can I quickly sort an array of elements that is already sorted except for a small number of elements (say, up to 1/4 of the total) whose positions are known?
-
Example: [1, 2, 3, 4, 8, 6, 7, 8, 2, 10, 11, 3, 13, 14, 15, 16] The array is almost sorted except for elements at position 4, 8, 11. Given this array and the knowledge that these elements may be out-of-order, what's a time efficient way to sort the array that's much faster than quicksort? Would the problem be easier if the out-of-order elements were all out-of-order in the same direction, e.g. all too early in the sequence? In practice, I expect 20k-200k elements. The practical application is this: I have an array in which occasionally, the values of a fraction of the elements change. I have to keep the array sorted. The time during which the array is unsorted is my downtime and I want to minimize the downtime. (PS: I don't really care that the elements are in an array. I just care to be able to iterate the elements in order quickly. However, I think an array may also have much better cache characteristics (than, say, a tree) in practice despite whatever theoretical big-O characteristics)
-
Answer:
Sort the kkk out-of-order elements using your favorite O(klogk)O(klogâ¡k)O(k \log k) algorithm, then do one O(n)O(n)O(n) merge operation to combine these kkk newly-sorted elements with the other nâknâkn - k already-sorted elements into a big sorted array.This runtime O(n+klogk)O(n+klogâ¡k)O(n + k \log k) is provably optimal in the comparison model, even under your hypothetical direction constraint.(From your question comment it sounds like you already know this algorithm, but all the other answers seemed to be suggesting insertion sort, which has strictly worse runtime O(nk+k2)O(nk+k2)O(nk + k^2). Come on guys, CS 101âyou should all know that quadratic sorting algorithms suck. I canât believe that developers are still making this mistake, but https://bugs.launchpad.net/ubuntu/+source/sreadahead/+bug/421116.)
Anders Kaseorg at Quora Visit the source
Other answers
None of these answers actually addressed how to identify a small number of out of order elements just given an array. Supposing there are K elements out of order in our array of length N and K is much smaller than N then there are a couple ways to do this. The most obvious way is to use the standard longest increasing subsequence algorithm which will run in O(N log N). The elements not in this sequence will be the out of order elements and we'll always end up with no more than K of them (and perhaps less!). However there's actually another approach. Instead we can build an increasing subsequence of N-2K elements in O(N) time. Do this by scanning over the elements in order maintaining an increasing subsequence of elements. If an element is no smaller than the end of your subsequence append it to the subsequence. Otherwise delete the end of the subsequence (and don't append to it). Since each deletion operation must involve at least one of the out of order elements we must end up with at least N-2K elements in our subsequence at the end. Once we have identified 2K out of order elements we can do as says and sort the out of order elements by themselves and then merge. This gives the overall algorithm a complexity of O(N + K log K). In the description of your problem if only a really small number of elements are changing you might benefit by using a tree data structure that is going to allow you to update the position of a single element in O(log(N)) time so that your entire update operation can run in O(K log N) time, throwing out the linear term in N entirely.
Mark Gordon
Use http://dl.acm.org/citation.cfm?id=1920952 as it already preserves the sorting order when the sorted batches have an order, no matter if they are forward or backward, and there are still a few unsorted elements. This was a work done by http://www.dama.upc.edu
Josep Lluis Larriba Pey
Dhruv Matani
Here are few more thoughts. 1. Is your data dynamic? Does it appear as a stream? In this case Insertion sort works best as an online algorithm. As Prateek Sharma pointed Insertion sort works well on almost sorted arrays. Generally Insertion sort and Quick sort are combined for better results. When quick sort recursion reaches array size less than predefined size M, we go for insertion sort as it is more efficient. On average insertion sort takes O(N2/4)O(N2/4)O(N^2/4) . 2. Based on binary search and insertion sort. A) Search the array from left to right to find "first element which is greater than it's next" A[x] > A[x+1]. B) Again, search the array from right to left to find "first element which is smaller than it's previous" A[y] < A[y-1]. C) Find maximum and minimum elements in the sub array A[x ... y]. D) Get ceil of 'min' in the range A[0 ... x] (binary search), say ceil location is i. E) Get floor of 'max' in the range A[y ... r] (binary search, r is right end), sail floor location is j. F) Sort (insertion sort or any linear arithmetic sort based on length) the sub array A[i ... j] which sorts whole the array. 3. Since your data is moderately large, Shell Sort also can be considered.
Venkateswara Rao Sanaka
Insertion Sort is the most efficient algorithm for sorting partially sorted arrays. Insertion Sort exhibits very less overhead, sorts an already sorted array in a single sweep and is easy to code. You may refer to: http://stackoverflow.com/questions/220044/which-sort-algorithm-works-best-on-mostly-sorted-data
Prateek Sharma
1. If you insist you must use an array not a tree, then mergesort as Anders notes. So don't keep them in an array. Keep them directly in a tree and insert them directly into it. It won't get very imbalanced If you used a (balanced) tree with N elements, and thus average depth D = log2(N) and Dmax = âlog2(N)â, and there were M out-of-place elements to insert, you would expect O(MD) insertions, that's NlogN. (First-order approx, neglecting that insertions make it imbalanced and average depth slightly more; but also some insertions don't need to go to D.) Tree beats heap in this case because since you know you already have 3/4 of the structure built, you don't want to bubble much during the insertions. 2. Would the problem be easier if the out-of-order elements were all out-of-order in the same direction, e.g. all too early in the sequence? For the tree case, it wouldn't make much sense to tweak the algorithm since if they were spread randomly, you would expect to save only ~one depth per insertion. That's O(M) savings. >The practical application is this: I have an array in which occasionally, the values of a fraction of the elements change. Do you have any a priori knowledge about which elements are likely to change, and by how much their values may change? If you did we could come up with a more specialized data structure.
Stephen McInerney
Related Q & A:
- How can I sort a column in a DataGrid?Best solution by Stack Overflow
- How can I convert a string number to a number in Perl?Best solution by Stack Overflow
- How can i get coke to sponsor bottle water to sell and raise money for a charity?Best solution by Yahoo! Answers
- How can I get around paying extra charges when traveling by train through Italy with a Eurail pass?Best solution by Yahoo! Answers
- How can I become a part time academic whilst already in a career?Best solution by Yahoo! Answers
Just Added Q & A:
- How many active mobile subscribers are there in China?Best solution by Quora
- How to find the right vacation?Best solution by bookit.com
- How To Make Your Own Primer?Best solution by thekrazycouponlady.com
- How do you get the domain & range?Best solution by ChaCha
- How do you open pop up blockers?Best solution by Yahoo! Answers
For every problem there is a solution! Proved by Solucija.
-
Got an issue and looking for advice?
-
Ask Solucija to search every corner of the Web for help.
-
Get workable solutions and helpful tips in a moment.
Just ask Solucija about an issue you face and immediately get a list of ready solutions, answers and tips from other Internet users. We always provide the most suitable and complete answer to your question at the top, along with a few good alternatives below.