How is it that the abstract is talking about "Java version" and "Python version"...

masklinn · on Aug 31, 2018

From TFA's introduction:

> there are actually not one, but two main versions of TimSort. The first version of the algorithm contained a flaw, which was spotted in [5]: while the input was correctly sorted, the algorithm did not behave as announced (because of a broken invariant). This was discovered by De Gouw and his co-authors while trying to prove formally the correctness of TimSort. They proposed a simple way to patch the algorithm, which was quickly adopted in Python, leading to what we consider to be the real TimSort. This is the one we analyze in Sections 3 and 4. On the contrary, Java developers chose to stick with the first version of TimSort, and adjusted some tuning values (which depend on the broken invariant; this is explained in Sections 2 and 5) to prevent the bug exposed by [5]. Motivated by its use in Java, we explain in Section 5 how, at the expense of very complicated technical details, the elegant proofs of the Python version can be twisted to prove the same results for this older version.

[5] Stijn De Gouw, Jurriaan Rot, Frank S de Boer, Richard Bubel, and Reiner Hähnle. Open- JDK’s Java.utils.Collection.sort() is broken: The good, the bad and the worst case. In International Conference on Computer Aided Verification, pages 273–289. Springer, 2015.

LeoPanthera · on Aug 31, 2018

It’s taking about the implementations in the respective standard libraries, which are apparently different.

growtofill · on Aug 31, 2018

From the article:

> In fact, there are two slightly different versions of TimSort that are currently implemented in Python and in Java respectively.

grillorafael · on Aug 31, 2018

Yes. If the java have a different complexity it is a different algorithm.

To the writers defense, they have to algorithm in pseudo code in the article

brazzy · on Aug 31, 2018

> If the java have a different complexity it is a different algorithm.

It doesn't seem wrong to me to talk about different versions of the same algoritm when there are only minor differences.

ehsankia · on Aug 31, 2018

Right, like how Quicksort can be pretty different depending on how you choose the pivot. It's still Quicksort, but there's different variants.

grillorafael · on Aug 31, 2018

yes but they will share the same complexity

Someone · on Aug 31, 2018

You can make the average-case perform in O(n^2) by always pivoting on the smallest number in the array. Nobody would do _that_, but it shows that pivot choice can affect complexity.

Ergo, computer scientists researching the algorithm mathematically must consider the effect of choice of pivot.

FreeFull · on Aug 31, 2018

Depending on how you choose the pivot, the worst-case of complexity of Quicksort can be O(n^2) or O(n log n)

klmr · on Aug 31, 2018

No, worst-case complexity of real quicksort is always O(n^2), regardless of pivot choice strategy (even with stochastic pivot choice, although you’d have to get very unlucky to hit that worst case). You can make the average case better or worse though.

The only way of making quicksort’s worst-case runtime O(n log n) is by limiting recursion depth, as done e.g. in introsort. But that’s no longer quicksort.

PL_kolek · on Aug 31, 2018

Isn't there a linear time median selection algorithm, which allows you to always select a pivot in the middle of the sorted part and create two equal halves? This produces a worst-case O(n log n) quick sort, which is no longer quick due to the big constant hidden in O notation.

nh2 · on Aug 31, 2018

Correct, Quicksort with Quickselect for pivot choice.

deathanatos · on Aug 31, 2018

"quickselect" is a selection algorithm that uses a partial quicksort in order to do a select. You're essentially saying to write quicksort using quicksort.

Quickselect requires a pivot choosing strategy; the problem is not only the same as quicksort's, it is the problem from quicksort.

According to Wikipedia, in the worst case, it is O(n²).[1] But that's not strictly correct, IMO. Regardless, it doesn't answer the OP's question of "is there a selection algorithm that operates in worst case O(n)"

[1]: https://en.wikipedia.org/wiki/Quickselect

[2]: https://news.ycombinator.com/item?id=17888755 and the parent comment; specifically, the median-of-medians algorithm is a worst-case O(n) selection algorithm.

nh2 · on Aug 31, 2018

This is wrong.

See https://en.m.wikipedia.org/wiki/Quicksort, section "Selection-based pivoting".

deathanatos · on Aug 31, 2018

Specifically,

> A variant of quickselect, the median of medians algorithm, chooses pivots more carefully, ensuring that the pivots are near the middle of the data (between the 30th and 70th percentiles), and thus has guaranteed linear time – O(n). This same pivot strategy can be used to construct a variant of quicksort (median of medians quicksort) with O(n log n) time. However, the overhead of choosing the pivot is significant, so this is generally not used in practice.

jwilk · on Aug 31, 2018

Non-mobile link:

https://en.wikipedia.org/wiki/Quicksort#Selection-based_pivo...