Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

How is it that the abstract is talking about "Java version" and "Python version" when discussing computational complexity? Aren't algorithms algorithms, independent of the language you're implementing them in?


From TFA's introduction:

> there are actually not one, but two main versions of TimSort. The first version of the algorithm contained a flaw, which was spotted in [5]: while the input was correctly sorted, the algorithm did not behave as announced (because of a broken invariant). This was discovered by De Gouw and his co-authors while trying to prove formally the correctness of TimSort. They proposed a simple way to patch the algorithm, which was quickly adopted in Python, leading to what we consider to be the real TimSort. This is the one we analyze in Sections 3 and 4. On the contrary, Java developers chose to stick with the first version of TimSort, and adjusted some tuning values (which depend on the broken invariant; this is explained in Sections 2 and 5) to prevent the bug exposed by [5]. Motivated by its use in Java, we explain in Section 5 how, at the expense of very complicated technical details, the elegant proofs of the Python version can be twisted to prove the same results for this older version.

[5] Stijn De Gouw, Jurriaan Rot, Frank S de Boer, Richard Bubel, and Reiner Hähnle. Open- JDK’s Java.utils.Collection.sort() is broken: The good, the bad and the worst case. In International Conference on Computer Aided Verification, pages 273–289. Springer, 2015.


It’s taking about the implementations in the respective standard libraries, which are apparently different.


From the article:

> In fact, there are two slightly different versions of TimSort that are currently implemented in Python and in Java respectively.


Yes. If the java have a different complexity it is a different algorithm.

To the writers defense, they have to algorithm in pseudo code in the article


> If the java have a different complexity it is a different algorithm.

It doesn't seem wrong to me to talk about different versions of the same algoritm when there are only minor differences.


Right, like how Quicksort can be pretty different depending on how you choose the pivot. It's still Quicksort, but there's different variants.


yes but they will share the same complexity


You can make the average-case perform in O(n^2) by always pivoting on the smallest number in the array. Nobody would do _that_, but it shows that pivot choice can affect complexity.

Ergo, computer scientists researching the algorithm mathematically must consider the effect of choice of pivot.


Depending on how you choose the pivot, the worst-case of complexity of Quicksort can be O(n^2) or O(n log n)


No, worst-case complexity of real quicksort is always O(n^2), regardless of pivot choice strategy (even with stochastic pivot choice, although you’d have to get very unlucky to hit that worst case). You can make the average case better or worse though.

The only way of making quicksort’s worst-case runtime O(n log n) is by limiting recursion depth, as done e.g. in introsort. But that’s no longer quicksort.


Isn't there a linear time median selection algorithm, which allows you to always select a pivot in the middle of the sorted part and create two equal halves? This produces a worst-case O(n log n) quick sort, which is no longer quick due to the big constant hidden in O notation.


Correct, Quicksort with Quickselect for pivot choice.


"quickselect" is a selection algorithm that uses a partial quicksort in order to do a select. You're essentially saying to write quicksort using quicksort.

Quickselect requires a pivot choosing strategy; the problem is not only the same as quicksort's, it is the problem from quicksort.

According to Wikipedia, in the worst case, it is O(n²).[1] But that's not strictly correct, IMO. Regardless, it doesn't answer the OP's question of "is there a selection algorithm that operates in worst case O(n)"

[1]: https://en.wikipedia.org/wiki/Quickselect

[2]: https://news.ycombinator.com/item?id=17888755 and the parent comment; specifically, the median-of-medians algorithm is a worst-case O(n) selection algorithm.


This is wrong.

See https://en.m.wikipedia.org/wiki/Quicksort, section "Selection-based pivoting".


Specifically,

> A variant of quickselect, the median of medians algorithm, chooses pivots more carefully, ensuring that the pivots are near the middle of the data (between the 30th and 70th percentiles), and thus has guaranteed linear time – O(n). This same pivot strategy can be used to construct a variant of quicksort (median of medians quicksort) with O(n log n) time. However, the overhead of choosing the pivot is significant, so this is generally not used in practice.





Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: