Sorting Algorithms: Runtime Analysis

    Sorting Algorithms: Runtime Analysis

      You know the training montage in the heartwarming underdog movie where the hero goes from struggling to perform a basic task to becoming the best at sport? Some algorithms are like pre-montage-hero and others are post-montage hero—and we need to figure out which is which.

      How? By knowing how well an algorithm does in the best case, the average case, and the worst case. In some algorithms, all three numbers are the same, which means that we'll know exactly how well an algorithm runs—whether it can do a five minute mile or it'll pass out after three yards.

      On the other hand, some have vastly different times for each, and—just like in life—we need to be prepared for the worst.

      Usually the algorithms with huge differences between their best and worst cases (we're looking at you, Monkey Sort) are the ones we want to avoid. Sure, they could run really well sometimes, but they could just as easily crap out. It'd be like watching our underdog revert back to his pre-montage state during the showdown against the sport's current champion.

      Oof.

      Let's meet the different types of cases before we start analyzing algorithms.

      Best Case

      An eternal optimist, Best Cacse loves telling you how long it'll take when everything goes right. If everything's almost sorted or—even better—already sorted, Best Case would be more than glad to give you the deets.

      If you see this:

      o(n)

      You'll know that Best Case has given its prediction for the runtime.

      In its spare time, Best Case likes thinking about how great summer will be and reading historical romance novels.

      Worst Case

      Some people mock it for its over-dramatic reactions and "silly tin-foil hat," but at least Worst Case comes prepared. If everything's sorted in the wrong direction, you're going to want to listen to Worst Case's runtime predictions.

      If you see:

      O(n)

      You'll know that Worst Case has given its prediction for runtime.

      In its spare time, Worst Case likes guessing the endings of horror films and reinterpreting Nostradamus.

      Average Case

      Average Case likes to mediate between Best Case and Worst Case. Otherwise, family reunions would get pretty tense. It takes the average of different inputs (like how ordered or unordered a list may be) and tells how well it'll run on average. Ask for Average Case's advice whenever you want to run an algorithm a bunch of times with a bunch of different inputs.

      If you see:

      Θ(n)

      You'll know that Average Case has mediated between Best Case and Worst Case.

      In its spare time, Average Case likes baking oatmeal cookies and watching non-competitive reality TV.

      How to Find Runtime

      In general, you want to look for

      • Procedure (a.k.a. the number of lines of code)
      • Loops
      • Recursion

      Disclaimer: The style of analysis we're talking about is not precise. In fact, it's a good guess of what a runtime will look like. So if you're taking a class dedicated to algorithms, this isn't going to cover anywhere near the depth you'll need in finding runtime. If you're in an introductory course, though, this is all you'll need.

      Procedure

      If an algorithm has a set number of steps whether you're running it on a list of size zero or ten zillion, its runtime is constant.

      All the lines that perform arithmetic, make comparisons, declare variables, assign values, or call a function (the call itself, not what the function does) have a level one complexity, so the runtime will look like this:

      o(n) Θ(n) O(n)
      111

      It doesn't matter how many lines there actually are; they're still considered order one. Once our input is in the gazillions, it won't matter that we have thousands of lines of procedural code because each line only runs once.

      Loops

      As soon as we pull order one code out of a procedural list and pop it into a loop, we've added complexity. If a loop runs a pre-set number of times, then it still counts as order one.

      If we make it depend on something that isn't set? Not so much.

      If you run those lines of code on an unsorted list of songs, for example, all of the sudden the number of times each line runs depends on—you guessed it—the number of songs

      FOR i in range 0 to Length(fantasy_rock_playlist):

      Now the for loop's controlled by the number of songs the fantasy_rock_playlist. If we have 1,000 songs (wow, really?), the loop runs 1,000 times. If we have 1,000,000,000, it's going to run a billion times.

      Makes those thousand lines of code look a lot smaller, doesn't it?

      For that reason, the loop's runtime will be

      O(length of fantasy rock playlist)

      but, for convention's sake, we're going to call it

      O(n)

      where n stands for the number of fantasy rock songs.

      If we nest loops, then we're adding even more complexity. In fact, for every nested loop, we add a power to n in our Worst Case scenario.

      Example One

      Nested loops act a lot like clocks. For every hour, the minute hand needs to travel around the clock 60 times. Every minute, the second hand needs to circle the clock 60 times. At the end of an hour, that second hand

      1. has gone around the clock 3,600 times.
      2. is exhausted (probably).

      Because it had to cycle through 60 times for each of the 60 minutes, its number of trips around the clock is squared. That's just like what happens in nested loops.

      Take a loop that looks like this:

      FOR each song in fantasy_rock_songs:
      	FOR each song in fantasy_rock_songs:
      		FOR each song in fantasy_rock_songs:
      			INCREMENT a counter

      At the end of this loop, we've gone through the elements n * n * n times, which equals n3. If the length of the list is four, the counter's value will be 64.

      That adds up. Fast.

      Again, all the Cases give the same runtime.

      o(nΘ(n)O(n)
      n3n3n3

      Even though we can run some sneaky manipulations to make an algorithm run faster according to Best Case, Worst Case will always be n3. Worst Case is much more popular than Best Case, and for good reason: it doesn't lie to us. Even so, some algorithms are so good at cutting corners that we'll need to check the Average Case.

      Then again, some algorithms are so good at getting around doing anything that sorting becomes pointless.

      Example Two

      What if, instead of running through the entire list in every loop, we only want to look at a section of the list? For example:

      FOR i in range 0 to Length(Scottish_pirate_metal_playlist):
      	FOR j in range 0 to i:
      		PRINT "The value of j is: "+j

      Unlike the last nested loop, this inner loop relies on the value of the variable in the outer loop. If we converted this into an actual language and ran it at n = 4, we'd get this:

      The value of j is: 0 
      The value of j is: 0 
      The value of j is: 1 
      The value of j is: 0 
      The value of j is: 1 
      The value of j is: 2

      which is 6 (significantly less than 42).

      However, as the number of songs in the playlist increases from 4 to 12 to 100 to infinity, we'll

      • feel amazed at the sheer number of Scottish pirate metal songs there are (seriously: who knew?).
      • see the number of times j prints out get closer and closer to n2.

      Don't believe us? Read the table and weep.

      nNumber of Iterationsn2
      4616
      1266144
      501,2252,500
      1004,95010,000
      20019,90040,000
      40079,800160,000
      700244,650490,000

      You could also write it in terms of our buddy Worst Case:

      O((n²)⁄2)

      but when n moves towards infinity, we don't actually care about what it's multiplied by (in this case, one half); it's still going to be a huge number. So we can shorten it to:

      O(n2)

      TL;DR: By and large, any time we nest a loop inside another loop, we multiply the runtime by the number times the new list needs to run. But that only matters if the loops run based on the size of a changing input like a song list.

      Recursion

      This part is the stickiest to trace in an algorithm because it isn't spelled out the way a list is. Because a recursive function calls itself to find an answer, sometimes it makes the runtime linear; sometimes it makes the runtime exponential. Who can tell in this crazy world?

      You, actually. Here's how.

      Recursion is created in code by having a function call itself, like so:

      recurringProblem(n):
      	WHILE n > 0:
      		DO recurringProblem(n – 1)

      The runtime analysis depends on the number of times the function calls itself, i.e., the number of times recurringProblem continues to be a recurringProblem. Try to figure out how long Worst Case would tell us this will run.

      …Didya come up with O(n)? Because it subtracts one value from n at every recursive call, we're going to call recurringProblem() n times.

      How about this one?

      slightlyLessRecurringProblem(n):
      	while (n > 0):
      		slightlyLessRecurringProblem(n – 2)

      Yup, the recurring problem's still linear because it's subtracting from n every time. Okay, let's shake it up a bit.

      iCantBelieveItsNotLooping(n):
      	IF (n > 0):
      		iCantBelieveItsNotLooping(n ÷ 2)

      If you don't know how many times a recursive algorithm like this will run, try out small numbers for n and see how many times it runs.

      niCantBelieveItsNotLooping
      12
      105
      206
      407
      1008

      The number of times it takes iCantBelieveItsNotLooping() to run is being added to every time the value of n doubles, making the relationship logarithmic.

      Don't go assuming any log base for this example will work, though; this recursion's base two, which is the assumed log in computer science. Since code is written in binary—writing numbers in terms of zeros and ones—a lot of the things we do are based around number two.

      If you want to make this algorithm base three, though (so that you add one value every time the size triples), you'd change the algorithm to look like this:

      def iCantBelieveItsNotLooping(n):
      	if (n > 0):
      		iCantBelieveItsNotLooping(n ÷ 3)

      You can write logarithmic code of any base number by dividing n by that base, but we usually stick with two or three. Otherwise function calls can get out of hand. The more recursive calls we ask the computer to do, the bigger the risk that we could cause a stack overflow, where we simply don't have the memory space to store the calls.