I’ve heard before that one shouldn’t attempt to sort a linked list due to big-O time complexity. So, I just wanted to write down a few quick thoughts for the different ways that a linked list could be sorted. To keep things simple for this post, only singly-linked lists will be considered and at this time I’m not going to spend time discussing the “traditional” sorts (including direct mergesort); I want to discuss at a high level some possibly new sorts (well, new to me, but others may have implemented already, I haven’t looked).
So, imagining a linked lists with random keys and we don’t know anything about the keys.
The first idea I had (after thinking through the “traditional” sort algorithms) was to do a pass through the list and create a new list on the side that was monotonically increasing. Nodes would be removed from the original list and put into the new sorted list. It would get a minimum of one and maximum of all the nodes. To continue the first idea, there would just be many passes through the original list until all the nodes where in new lists that were monotonically increasing. Then, mergesort those lists, which be linear time complexity. Time complexity big-omega of linear time, and big-oh of quadratic time.
In order to “beat” the edge case of monotonically decreasing original list.. instead of always going front to back, we could alternate front-to-back, then back-to-front, then repeat. Going both ways would be looking to extract monotonically increasing sequences. Without thinking about it too hard, there are more cases where alternating would be very beneficial, without much performance drawback for another edge case.
Now, for the third iteration of this “walkthough+extract” sorting algorithm.. There is a lot of waste where our pointer is walking over nodes and not doing anything with them. So, let’s do something about that! Instead of just skipping each node that doesn’t fix into the currently generating monotonically increasing list, we could create the new list on-the-spot. So, we would have multiple generating lists at the same time. This would allow our initial pointer/runner to walk through the original list only once. The list of generated lists would keep a pointer to the head so that the entire generated list wouldn’t have to be checked, just the last one. Walking through the original list just once is nice, but it introduces a problem for near monotonically decreasing lists.. there would be N generated lists to walk-through for each node to add (for a naive implementation).
So, there are multiple ways to fix the problem with the above:
- Keep the list of generated lists sorted themselves.
- Somehow, always compare with the lowest generated list, rather than always the first one.
- Maybe, just switching the main list to add to for each node added rather than always going back to the first for each new node from the original list.
- The sorted generated lists idea gets me to think about just loading a heap.. (O(2nlog(n)) from this is our time to beat)
- What if, instead of just comparing the highest node in each generated list, we compare both the highest and lowest.. as in double-ended queue. This would be at least half the amount of generated lists. At first it may seem to be the same amount of checks, but closer examination shows that there would also be fewer checks required because more nodes would be in the middle of the generated lists instead of having them all at the end. Then again, depends on input, this method would be defeated by a list=[100,0,99,1,98,2,97,3,96,4,95,5..], whereas the implementation from the above paragraph would handle this better.
At this point, it would be easy to say just use the heap idea because it is easy to see how it would get done and there are already many great implementations of heap (or PriorityQueue) available. For quick and easy maintainability and productivity, this would likely be the way to go. Then, only after determining that this area really is a performance bottleneck, would we spend more time thinking about this. Until that time comes, this information will still be available for quick and easy reference to see where we last were in the discussion without having to do it all over again.
Having said that, maybe one quick thing to test would be comparing use of heap and sorted tree.. which one to use could depend on what we were doing with the information after it is “sorted” (enough).