How is it that the abstract is talking about "Java version" and "Python version" when discussing computational complexity? Aren't algorithms algorithms, independent of the language you're implementing them in?
> there are actually not one, but two main versions of TimSort. The first version of the algorithm contained a flaw, which was spotted in [5]: while the input was correctly sorted, the algorithm did not behave as announced (because of a broken invariant). This was discovered by De Gouw and his co-authors while trying to prove formally the correctness of TimSort. They proposed a simple way to patch the algorithm, which was quickly adopted in Python, leading to what we consider to be the real TimSort. This is the one we analyze in Sections 3 and 4. On the contrary, Java developers chose to stick with the first version of TimSort, and adjusted some tuning values (which depend on the broken invariant; this is explained in Sections 2 and 5) to prevent the bug exposed by [5]. Motivated by its use in Java, we explain in Section 5 how, at the expense of very complicated technical details, the elegant proofs of the Python version can be twisted to prove the same results for this older version.
[5] Stijn De Gouw, Jurriaan Rot, Frank S de Boer, Richard Bubel, and Reiner Hähnle. Open- JDK’s Java.utils.Collection.sort() is broken: The good, the bad and the worst case. In International Conference on Computer Aided Verification, pages 273–289. Springer, 2015.
You can make the average-case perform in O(n^2) by always pivoting on the smallest number in the array. Nobody would do _that_, but it shows that pivot choice can affect complexity.
Ergo, computer scientists researching the algorithm mathematically must consider the effect of choice of pivot.
No, worst-case complexity of real quicksort is always O(n^2), regardless of pivot choice strategy (even with stochastic pivot choice, although you’d have to get very unlucky to hit that worst case). You can make the average case better or worse though.
The only way of making quicksort’s worst-case runtime O(n log n) is by limiting recursion depth, as done e.g. in introsort. But that’s no longer quicksort.
Isn't there a linear time median selection algorithm, which allows you to always select a pivot in the middle of the sorted part and create two equal halves? This produces a worst-case O(n log n) quick sort, which is no longer quick due to the big constant hidden in O notation.
"quickselect" is a selection algorithm that uses a partial quicksort in order to do a select. You're essentially saying to write quicksort using quicksort.
Quickselect requires a pivot choosing strategy; the problem is not only the same as quicksort's, it is the problem from quicksort.
According to Wikipedia, in the worst case, it is O(n²).[1] But that's not strictly correct, IMO. Regardless, it doesn't answer the OP's question of "is there a selection algorithm that operates in worst case O(n)"
> A variant of quickselect, the median of medians algorithm, chooses pivots more carefully, ensuring that the pivots are near the middle of the data (between the 30th and 70th percentiles), and thus has guaranteed linear time – O(n). This same pivot strategy can be used to construct a variant of quicksort (median of medians quicksort) with O(n log n) time. However, the overhead of choosing the pivot is significant, so this is generally not used in practice.