Download Free Audio of This week we’re switching gears a little bit. Fo... - Woord

Read Aloud the Text Content

This audio was created by Woord's Text to Speech service by content creators from all around the world.


Text Content or SSML code:

This week we’re switching gears a little bit. For the bulk of the course we’ve talked about different abstract data types and the data structures we use to implement them. Now, we’re going to talk about one very specific data-processing operation, which is one of the most fundamental in computer science: sorting. Just as we saw multiple data structures that could be used to represent the same ADT, we’ll look at a few different ways to implement sorting. You’ve studied sorting before in CSC108; all of the basic sorting algorithms you probably saw—bubblesort, selection sort, insertion sort—were iterative, meaning they involved multiple loops through the list. You may wish to review these CSC108 videos. You probably also talked about how their running time was quadratic in the size of the list, so each of these algorithms sorts a list of size n in O(n2) steps. (Why? Briefly, each involves n different loops, where each loop has between 1 and n iterations, and 1 + 2 + 3 + ⋯ + n = n(n + 1)/2.) In this lecture, we’re going to use recursion to develop two faster sorting algorithms, mergesort and quicksort. These are both recursive divide-and-conquer algorithms, which is a general class of algorithms that use the following steps: Split up the input into two or more parts. Recurse on each part separately. Combine the results of the previous step into a single result. Where these two algorithms differ is in the splitting and combining: mergesort does the “hard” (algorithmically complex) work in the combine step, and quicksort does it in the divide step. Mergesort The first algorithm we’ll study is called mergesort, and takes the “divide-and-conquer” philosophy very literally. The basic idea of this algorithm is that it divides its input list into two halves, recursively sorts each half, and then merges each sorted half into the final sorted list. The merge operation While this code looks very straightforward, we’ve hidden the main complexity in the helper function _merge , which needs to take two lists and combine them into one sorted list. For two arbitrary lists, there isn’t an “efficient” way of combining them. We’ll discuss what we mean by “efficient” here in the next reading. For example, the first element of the returned list should be the minimum value in either list, and to find this value we’d need to iterate through each element in both lists. But if we assume that the two lists are sorted, this changes things dramatically. For example, to find the minimum item of lst1 and lst2 when both lists are sorted, we only need to compare lst1[0] and lst2[0] , since the minimum must be one of these two values. We can generalize this idea so that after every comparison we make, we can add a new element to the sorted list. This is the key insight that makes the _merge operation efficient, and which gives the algorithm mergesort its name. Quicksort While quicksort also uses a divide-and-conquer approach, it takes a different philosophy for dividing up its input list. Here’s some intuition for this approach: suppose we’re sorting a group of people alphabetically by their surname. We do this by first dividing up the people into two groups: those whose surname starts with A-L, and those whose surnames start with M-Z. This can be seen as an “approximate sort”: even though each smaller group is not sorted, we do know that everyone in the A-L group should come before everyone in the M-Z group. Then after sorting each group separately, we’re done: we can simply take the two groups and then concatenate them to obtain a fully sorted list. The formal quicksort algorithm uses exactly this idea: First, it picks some element in its input list and calls it the pivot. It then splits up the list into two parts: the elements less than or equal to the pivot, and those greater than the pivot. The implementation we’ll show below always chooses the first element in the list to be the pivot. This has some significant drawbacks, which we’ll discuss in lecture and the next reading. This is traditionally called the partitioning step. Next, it sorts each part recursively. Finally, it concatenates the two sorted parts, putting the pivot in between them.