# Week 2: Algorithms 2

### Week 2: Algorithms 2

#### Summaries

• Week 2: Algorithms 2 > 2a Graphs > 2a Video
• Week 2: Algorithms 2 > 2b Some Ideas Behind Map Searches 1 > 2b Video
• Week 2: Algorithms 2 > 2c Some Ideas Behind Map Searches 2 > 2c Video
• Week 2: Algorithms 2 > 2d Application of Algorithms: Stable Marriages Example > 2d Video
• Week 2: Algorithms 2 > 2e Dictionaries and Hashing > 2e Video
• Week 2: Algorithms 2 > 2f Search Trees > 2f Video
• Week 2: Algorithms 2 > 2g Dynamic Programming 1 > 2g Video
• Week 2: Algorithms 2 > 2h Dynamic Programming 2 > 2h Video

#### Week 2: Algorithms 2 > 2a Graphs > 2a Video

• We can see graphs that are directed or undirected.
• Graphs model any situation where you have objects and you have pairwise relationships between the objects.
• We’ll use V to denote the number of vertices and E to denote the number of edges in the graph.
• You’ve probably- you see pictures of graphs now in the media.
• The density is the ratio of the edges to the vertices in the graph.
• We have an example of a sparse graph, and that is the number of edges is on the same order of magnitude as the number of vertices.
• On the graph on the left, it’s one less than the number of vertices.
• In a dense graph, you have many, many more edges than the number of vertices.
• To do a few back of the envelope calculations, if we think about a million node graph, which is a very common type of graph if you start to talk about road networks for a state or a small country.
• A million node sparse graph maybe has three million edges.
• A million node dense graph can have up to 10 to the 12th edges.
• So the difference between being in a reasonably sized data regime and in a large data regime is often the difference between sparse graphs and dense graphs.
• As a modeler, someone who is looking at data and trying to come up with algorithms, if you can model your problem as a sparse graph, you’re much better off.
• You should always be conscious of how dense is my graph, and how’s it going to affect the computations I’m going to do.
• On average, the internet tends to be a sparse graph.
• Billion node sparse graphs are something we can still deal with.
• Billion node dense graphs, you start to get up to 10 to the 18th edges.
• So let’s begin by looking at a couple of simple things, primitives that you can do on a graph.
• One of the first things you often want to do when you have a graph is to be able to search it.
• We want to be able to do this in time that’s linear in the size of the graph.
• Again, graphs are big and so anything that can be linear is a big win.
• We can search a graph, visit all the nodes in an efficient manner.
• Breadth first search, we search a graph in distance order.
• Again, it’s important that we are processing the graph in an organized, orderly fashion.
• So let’s do an example of breadth first search on this graph.
• In the way we store graph, we always store it so that we can efficiently look at its neighbors.
• We have a tree here, which gives us a nice compact way of dealing with the graph in other algorithms.
• One can actually implement this in time linear in the graph.
• So just by being a little careful with the data structure we use, we’re processing this in time proportional to the number of nodes and the number of edges in the graph.
• So we get a linear time algorithm for processing an undirected graph.
• So we have a simple way to begin processing a graph.
• So we would have discovered that our graph is not completely connected.
• Then we may have wanted to start processing this point, but it gives us a way to tell if our graph is connected or not.
• Breadth first search could equally be applied to directed graphs with similar details as could depth first search.
• Let’s look at a slightly different algorithm that’s very basic for directed graphs.
• Now what I’ll do is take a and eliminate it from the graph.
• Now in the graph, where those edges aren’t here, we look for a node that has no predecessors.
• What else has no edges? d has no edges coming in, so I might take d and eliminate its edges.
• Then I could take h, and then I can take k. So I get this order here- a, c, e, f, d, g, i, j, h, k- and this gives me a logically consist order to process a graph.
• You don’t want to scan the whole graph because each time you scan the whole graph, that would lead to quadratic time.

#### Week 2: Algorithms 2 > 2b Some Ideas Behind Map Searches 1 > 2b Video

• Look at both some of the underlying ideas and shortest paths and in what it takes to try to make an application like this run in real time as it’s needed.
• OK? So much of the calculation must be done in real time.
• So let’s begin with the basics of how we compute shortest paths and graphs.
• We’re now going to have weights on the edges, that is we’re going to have some number on the edges in addition to just having edges representing relationships.
• OK? So the weight of a path is just the sum of the weights of its edges.
• The shortest path weight is just the length of the sum of the weights of the edges on the path that’s shortest.
• OK. And what we typically do- and we’ll come back this- is when we say we’re computing shortest paths, what we’re actually doing is we’re taking an origin, which we’re going to denote by S, and compute the shortest paths to all the other vertices in the graph.
• So we have some vertex, S. And we want to compute the shortest path from S to all the other vertices.
• There are multiple paths between S and F. So you could go S to A to F. Or you could go S to B to F. Or you could go S to A to B to F. Or you could go S to A to C to F. So there’s a number of different paths.
• In general on a graph, you may have millions and millions of paths.
• Among these paths, we want to choose the shortest one.
• OK? That is shortest in terms of the sum of the weights of the edges.
• OK? So in this example, the path S to A to F is 3 plus 7, or 10.
• Over all paths, you want to take the shortest.
• Now we don’t want to explicitly enumerate all paths and take the shortest each time.
• So in this graph here, in the same graph, what I’ve done is I’ve written above each node in red the distance, the shortest path distance, to that node from S. So in other words, from S to A, the shortest path is of length 3.
• It’s that edge SA. From S to F, as we spoke about before, the shortest path is S to A to C to F. And that has length 6, 3 plus 2 plus 1.
• What I’ve also done is colored in red, edges that are in the shortest paths.
• If I take a source S and I compute the shortest paths to all other vertices, I get this nice tree that contains the edges that are in the shortest paths.
• OK? And what that means is if I compute the shortest path from S to F, say like I did there.
• I know that that goes through A and C. I’ve also computed the shortest paths to A to C. So as you’re computing shortest paths to one node, you are, by necessity, computing shortest paths to other nodes.
• It also means that to compute shortest paths to one point, we are essentially computing shortest paths to many other points at the same time.
• OK? So just to formalize that slightly for computing shortest paths, there’s some real shortest path distance, which I’m going to use delta to denote.
• Then I’m going to have an algorithm, which is computing the shortest path, which I’ll denote D. And the goal is to compute a D that’s actually equal to delta, that is the real.
• So if we don’t know delta, we’re just using that to talk about our algorithms, we’re computing some value D. OK? And there’s one key property, which I alluded to on that picture.
• That is sub-paths of shortest paths are shortest paths.
• What does this mean? That means if I’m computing the shortest path from S to T, say now, so that’s like 10.
• I know it goes through a vertex C. Then, we know that it consists of the shortest path from S to C, followed by the shortest path from C to T. OK? So the individual pieces of shortest paths are shortest paths.
• This underlies a lot of the calculations of shortest paths.
• OK. So now when we actually want to compute shortest paths in a graph, OK, where the nodes have non-negative weights.
• There’s other variances shortest path, that you can allow nodes to have negative weights.
• So we know we can get to A by a path of length five.
• That’s the basic operation of shortest paths to be constantly discovering shorter paths because you’ve labeled other nodes.
• So this step of finding, looking at a neighbor, looking at its label, plus an edge, and updating our distance is what underlies all shortest path algorithms and is going to underlie Dijkstra’s algorithm.
• Since I was thinking about getting to them via path of length infinity, in other words, I hadn’t seen a path yet.
• What is the smallest one here? It’s A. OK? So I’m going to pull A out and delete it and permanently label A at three.
• Because I know the shorter path, basically the path that goes from S to A. And now, can continue.
• I’m keeping track for each vertex of the shortest way I know to get there, extending the frontier that I’ve already labeled.
• I keep discovering shorter paths to F. And this is very common.
• As I explore more and more the graph and permanently label it, I discover shorter and shorter paths to different nodes.

#### Week 2: Algorithms 2 > 2c Some Ideas Behind Map Searches 2 > 2c Video

• Let’s explore some of the kinds of ideas that are used in getting this kind of a search down to be really efficient.
• Let’s look at a few techniques that we can use that will greatly speed up our search in map directions.
• When we think about how Dijkstra’s algorithm works, if you remember, it explores its neighbors in increasing order of distance.
• So you can think about that as, it’s going to start searching, and it actually starts out in all directions, because it doesn’t know exactly where the destination is, which direction is the best one to go first.
• So it’s going to start exploring, and if you think about it, it’s roughly exploring in order of distance.
• So you can think about, like, semicircles- like, these are all the points that are 100 miles away, and these are all the points that are 400 miles away.
• So we won’t just start at our source and explore our distances until we reach our destination.
• When the two explorations meet, at that point we can stop.
• So if you think about that pictorially again, you’ll start exploring here, and you’ll explore maybe the points that are 100 miles, and eventually you’d get out to here.
• So you’ll be exploring like this, and at some point, these searches will meet.
• At the point they meet, you’ll stop, and then you can reconstruct the right path from the information you computed in this search, to the information you computed in this search.
• You can see, pictorially, that you’re exploring a smaller fraction of the graph- if you remember the last picture, we were exploring a much larger piece.
• Let’s see an example of how we might use this idea of landmarks and the triangle inequality to help us prune our search.
• Suppose we were searching for this example, where we’re trying to go from this point on the east coast to this point on the west coast.
• As you recall we mentioned, you pre-compute the distances to landmarks.
• So we may already know that this distance here is 2,500, and we may already know that this distance here is 200.
• Now, suppose we start doing our search from this point, and we’re searching.
• Suppose we get to this point in our search, and we know that we’ve traveled 1000.
• So we know that the shortest path from this point on the east coast to this point in Minnesota is 1000.
• Now what we’re trying to do though, is we really want to get to this point in California here.
• So what does that mean? That means that we’ve traveled 1,000 to get here, and the remaining distance is at least 2,300.
• Now, we’re doing lots of computations here, so we may have been searching in other directions.
• Because we know that a certain path is going to lead us- no matter what we do, the best we can do from this point forward, which we’ve pre-computed, is going to take another 2,300- we can get a lower bound on going through a particular midpoint.
• So we can prune the search here, and not continue exploring in this direction.
• We probably could have pruned the search much earlier, using the many landmarks which we placed around the map.
• If we actually took a map of the United States, and placed some reasonable number of landmarks around it, and did a search from the east coast to the west coast, what would happen? You would see, actually, it would be pretty much what you would want.
• By just using a few simple ideas, we’re able to bring it down to exploring a region which is what you’d think you would want to explore, a thin band going between the source and destination.
• Just to quantify what we just did on the map, what you can say is, again, to explore a distance of r with Dijkstra’s algorithm, you basically explore an area that’s proportional to r squared, if you just think about drawing a circle.
• So if we’re searching, so we need to search a graph whose size is proportional to r squared, if you think about the points being distributed throughout.
• So if we do two searches that meet at about r over 2, each one is exploring an area proportional to r over 2 squared, which is r squared over 4, and there are two of them.
• So just by doing this bi-directional search, we’ve saved half the work.
• Another idea we can use is based on what’s called the triangle inequality, which is basically the shortest distance between two points is a straight line.
• So if we look at the distance from x to t, along the bottom, that has to be less than or equal to the distance from x to v plus v to t, in any triangle.
• Then technically in any triangle where we have an underlying metrics base, but we’re talking about real distances here, so it’s true for any triangle that we’re going to see.
• Among these points, we’re going to pre-compute the distances to these landmark points.
• Recall, we can’t compute all pairs of 10 million points, but we could compute and store for each of 10 million points, the distance to 100 different landmark points.
• So we can know about distances to and from landmark points, and between them, but we can’t know all pairs of distances.
• So assume we were trying to find a path from s to t, and we got into a vertex v with some distance.
• What we see is we get a lower bound on the distance between v and t. So we can use the fact that we’ve pre-computed a few distances to lower bound the distance from v to t. That is, we can know without doing any further calculation, a lower bound on how far v is from t. And if we know that v is far from t, we can use that to stop the search from s to t when it gets to v. So that’s the basic idea of how we use triangle inequality and landmarks to prune searches.
• So we’ve seen some examples of how using algorithms or shortest path, and using some additional algorithmic ideas which were developed in response to the fact that we had to do searching in real time.

#### Week 2: Algorithms 2 > 2d Application of Algorithms: Stable Marriages Example > 2d Video

• You have each man rank the women- one, two, three, four, five, et cetera- by his preferences.
• Each woman ranks the men, again, from preferences from one being the favorite to n being the least favorite, OK? So we’ll have these variables, which we’ll denote by r, for various ranks.
• What’s fair, and what’s stable? What do we think is going to allow marriages to last, in this case, or in general, for an assignment to be fair, OK? And so given a matching, given that we’ve made some assignment, we say that there’s something that’s unstable if a man and woman each prefer each other to their current matched partner.
• So we’re not going to leave our current spouses for each other.
• If something changed in the world, and Angelina Jolie decided she preferred me to Brad Pitt, we’d all of a sudden be unstable, because now we’d realize that we could both leave our spouses for each other.
• In each round, any unmatched man proposes to his highest-ranked woman who has not yet rejected him.
• So each man looks at who’s not married, finds the best woman who hasn’t already rejected him, and proposes.
• Now the women, at each step, they get some set of proposals, and they just accept the best one.
• A woman who’s already married and receives a better proposal will drop her current spouse and accept the new proposal.
• I have four men and four women, and they have each ranked the other side.
• In the first round, each man proposes to his most-preferred woman who has not rejected him.
• Well, nothing’s happened yet, so they each propose to their favorite.
• Carl proposes to Alice, and David proposes to Cathy, OK? So Alice and Cathy each receive two proposals, so they each have to choose which one to accept.
• So Alice has received proposals from Adam and from Carl, and Alice prefers Carl to Adam, because she has Carl ranked two and Adam ranked four, so Alice is going to choose Carl.
• Cathy has received two proposals, from Bob and from David.
• Adam and David each propose to the highest-ranked woman who was not yet rejected them.
• She looks at them, and Adam is number two for her, and David is number three, so she’ll accept the proposal from Adam.
• So Debbie over here is going to accept the proposal from Adam.
• So David now proposes to the highest-ranked woman who was not yet rejected him.
• Now, Alice is already married in the algorithm, but our rule for women here is that if the woman receives a proposal from someone better than the person she’s currently with, she accepts the new proposal, and she drops her current partner.
• So in this case, what will Alice do? Alice will notice that she receives a proposal from David.
• She’s already paired with Carl, but she prefers David to Carl, so she will drop Carl, and she will accept the proposal from David, OK? And now David is married.
• Now Carl says, who’s his favorite person who has not yet rejected him? Well, Alice has rejected him.
• So he proposes now to Debbie, OK? And Debbie, if we look, receives a proposal from Carl, OK? Debbie’s already paired with Adam, and Debbie actually prefers Adam to Carl, two to three, so Debbie rejects that proposal.
• Cathy actually is already paired with Bob, and she prefers Bob to Carl, so she rejects him.
• The men repeatedly propose to the favorite woman who has not yet rejected them.
• The women just accept the best proposal they’ve seen so far.
• If you think about the algorithm, a woman accepts a proposal.
• The woman’s partner only improves over time, so she only accepts better and better proposals.
• Once a woman rejects a man, she would always reject him in the future, so there’s no reason for the man to ever propose again to a woman who’s rejected him.
• So when you put all those together, you can conclude that every woman is eventually matched.
• If every woman is eventually matched, then every man must eventually be matched.
• Each man is matched to the highest-ranked woman he could match to in any stable marriage.
• It seems like the woman is in charge here, because she gets to keep rejecting and accepting.
• If you look at it, the man is matched to the highest-ranked woman he could match to, and the woman actually is ranked to the lowest-ranked man she could match to in any stable marriage.
• The matching of residents to hospitals- these are doctors who finish medical school and now go to do what’s called a residency in the United States.
• How they’re matched? It’s through a stable-marriage algorithm.
• There’s a matching of rabbis to congregations that’s used.
• Like for the residents, what if two residents are married to each other and need to go to the same hospital? So things like that complicate it slightly, but this underlying algorithm actually works for these applications and many others.

#### Week 2: Algorithms 2 > 2e Dictionaries and Hashing > 2e Video

• So when we want the set, we want to search if an element belongs to that set.
• The search may grow my inserting new elements or shrink by deleting elements.
• User leave elements have associated information within.
• When we search for an element when we do a lookup operation, we want to retrieve that information an associate with element.
• It’s where you have- the set is the set of all words in a language.
• Another application is phone book, for example where the element is the name of the person.
• Or we may have the reverse, where the element is the phone number and we want to look up the name of the person.
• Many applications in computer science and file systems web businesses- for example, domain numbers, name server, have elements or the URLs that we wanted to reach.
• In genome databases, we want to look up the DNA strings or the elements we’re looking for, marketers.
• So what’s a trivial approach? Trivial approach is we have an array which is indexed by the elements of the universe.
• For each location in the array, we put a zero if the element is not in the set.
• So if we want to look up an element, we just go at the position.
• So we want to look up element x. We go to the position and see if we see they’re in one or a zero.
• Another solution which takes place proportional to the size of the set we want to record, which is what we want to do- we do not want to spend more space than roughly proportion to us- is to have a list of all the elements that are in S. So we have a list that contains the elements in S. So let’s say the numbers that are in S. So in that sense, in that case, inserting a new element is not a problem.
• We can just attach an element at the end of the list or at the beginnings of the list.
• So if we want to look for an element 25, let’s say, we have to start at the beginning of the list and work done the list.
• General approach, very simple approach that is often used, is to- the hashing approach, which is essentially using the array idea but without having an array index on the whole set of elements, on the whole set of potential elements of the universe.
• So what we do is map the large universe, the whole set U of all possible elements, into a very small set- let’s say the first 10 integers- what M will be roughly comparable in size to the size of the set as we want to record, we expect to record.
• Then we map our set S. The subset S will be mapped to the subset of the little set M. So we’ll have this hash function h. And now we have transferred our set S to a smaller set h of S in a smaller universe.
• Ideally, we would like the hash function to be one to one so that when we want to hash to record an element x, we look up at the position h of x and see if that’s in one or a zero.
• The problem is, of course, that we cannot pick a hash function h that will work for all possible set S. No matter which function h we pick, there will be a subset S- in fact, lots and lots of subsets which may even all gets mapped to the same elements just because we’re mapping a very big set into a small set.
• So there will be lots of convergences of elements mapping to the same set of elements, the same element.
• Why a random function? Because a random function will take any subset S which we don’t know what it’s going to be and distribute the elements more or less uniformly in the target domain.
• Then we can map this big set of integers into the smaller set of integers from one through M by, for example, taking a modular operation or doing maybe something more complicated.
• This method is called chaining, which is when we map our set S to the smaller domain one through M, you can think of one through M comparable to the site of S. Invariably, some elements will hash to the same occasion.
• So in that case, we take all the elements that hash to the same location and form a list out of them.
• So in this example, there’s two elements that map to location one, no element mapped to location two, three of them map to location three.
• When we add a new element, we compute its hash function and then append it to the list at the beginning of the list for its hash value.
• If we want to search for an element, we have to compute its hash value.
• Of course, in the worst case, no matter which hash function you pick, there were be some bad sets where everybody hashes to the same location and the list will be of length n. But that’s a pathological case.
• That is, if any element is equally likely to end up to any location independent of where the elements get mapped to, then f first of all, the chain- the expected length of any chain at any slot will be [? in ?] the ratio- what’s called the load factor, which is the ratio of n, the size of the set we’re hashing, to M, the size of the target domain.
• So on the average, it will take just a constant time to search for an element, will take constant time to insert an element, constant time to delete an element.

#### Week 2: Algorithms 2 > 2f Search Trees > 2f Video

• The elements come from an ordered universe, so there’s an order among the elements.
• We want to ask additional queries that have to do with the order among the elements.
• If we don’t find an element in the set, we would like to know, what’s the closest element that belongs to the set? Let’s say, the next higher element that’s the successor, or the next lower element, the predecessor.
• We would like to know, what’s the minimum element in the set? What’s the maximum element? What’s the median? What is the quartile- let’s say, the 10th percentile element? If we are given an element, let’s say, 357, what’s the rank that belongs to the set? What’s the rank of 357 in the whole set? Is it, let’s say, the 57th smallest element? I would like to have a data structure that supports all these operations, and every operation is going to be done quickly.
• We’ll just do binary search, and say, if I want to find if an element belongs to the set, I probe the middle element of the array- so, seven, in this case.
• So the minimum is the very first element of the array.
• The problem is, if you want to update the set- for example, if you want to insert a new element into the array.
• Once I insert 3 here, I have to push all the other elements one position to the right.
• So I have to take each one of the other elements and copy them one position on the right.
• Deletion- I have to move elements to the left- it’s going to take linear time.
• It has the property that, for every node, the elements that are in the left subtree- the nodes that are in the left subtree have elements that are smaller than what’s sitting at the node [? v. ?] And the elements that are in the right subtree have bigger elements.
• The left subtree contains the smaller elements, 2, 4, and 5.
• If I give you any set, and I sort it, it’s very easy to construct a binary search tree like that, that contains all the elements in the set.
• Left child will take the middle element of the smaller elements.
• The right child will take the middle element of the bigger elements, and so forth.
• If I want to search for an element- for example, I want to search for 8, we start at the root, 7, and then, compare 7 with our element x, 8.
• Again, compare our element, 8, that we’re searching for, with the element at the node, 12.
• I continue down the tree until either I find the element I’m looking for- for example, 8 here- and I return it.
• If I have information associated with elements, the information will be sitting here, so I retrieve that information.
• That’s the worst case that it’s going to take until either I locate my element or I decide that it’s not in the set.
• If I don’t find the element, and I want to know what’s the closest element in the set, that’s very easy to do also.
• For 9, as I walk down the path, I keep track of what are the nearest elements closer to 9, from below and from above.
• When I come down here, I see that the closest element in the path to 9 from below is 8.
• These are the two closest elements to 9 in the whole set.
• If the element is not in the set, we can find the closest element in the set in the same amount of time.
• If I want to insert a new element, it’s, again, very easy.
• So I put a new leaf there, I hang a right child from 8, and I put my element 9 there.
• Because if you have a full binary tree, as you’re going down the tree, the number of elements doubles, doubles, doubles.
• The problem is that, if I keep inserting elements or deleting elements, and I don’t pay any attention, my tree is going to get very unbalanced.
• If I want to find the minimum, the smallest element has to be on the left side, then again, on the left side.
• So I walk down the left branch of the tree until I can’t walk anymore, and that’s the minimum element.
• If I want to find other ordered statistics- for example, find the median element or the element that has rank 127, then I need to attach a little more information into the nodes of the tree so that I know how to navigate in the tree.
• When we delete elements, we decrement these numbers.
• If we have these numbers, it’s easy to find an element of any rank that we want.
• If I want to find the median- this set has nine elements- the median will be the element of rank five, the fifth smallest element.
• In the right subtree, it will be the element with rank one, because there are four elements, now that I have counted four- the three elements in the left subtree and one in the root, 4.
• So I’m looking for the element of rank one in the right subtree.
• Rank one, of course, means the smallest element, and I’ll go walk all the way down.
• Then if I want to find what is the rank of element 10, for example, I can find element 10, and then walk up the tree and count how many nodes are to the left of that path- or, nodes on that path that are smaller than 10.
• So it’s very easy to find the median element or find the element of any rank, compute the rank of any elements, and answer all kinds of queries that have to do with order, again, in logarithmic time.

#### Week 2: Algorithms 2 > 2g Dynamic Programming 1 > 2g Video

• For optimization problems, in order to use dynamic programming, the problem has to satisfy what’s called the principle of optimality, or sometimes the optimal substructure property, which is that the optimal solution for the whole problem should use optimal solutions for the sub-problems.
• The easiest way to do that is to start from the bottom up, solve the smallest possible problems, tabulate the results, go to the next level and compute the next level of complexity of the problems of size.
• Longest common subsequence means a sequence which appears in both sequences x and y. Doesn’t have to appear contiguously.
• It appears as a subsequence, so the characters appear in the same order in both sequences.
• For this particular sequence x and y, the common subsequence is BCBA.
• One way to see that this occurs in both subsequences is to line up the two sequences so that the common subsequences are lined up on top of each other.
• We want this common subsequence to be as long as possible.
• In this case, for example, we have a subsequence of length four.
• So we want to minimize the number of operations, insertions, and deletions that we do in going from x to y, that at the same time, what is left, the characters for which we do not have to do anything, are the matched characters which form the subsequence.
• So minimizing the number of operations maximizes what’s left, which is the common subsequence.
• The algorithms look exactly the same, finding longest common subsequence and finding the distance with a more general repertoire of operations.
• So we’ll just talk about the longest common subsequence problem.
• So you want to line up the two files, two documents, so that they match in most of the places, in as many places as possible- and that’s the longest common subsequence between the two- until these utilities essentially line up, in the best possible way, the two files and tell you what’s the differences, what changes from one file to the other.
• Here is the sequence x and the sequence y. Consider every possible subsequence of x. So let’s say we’ll take these elements, these characters.
• See if this subsequence occurs in y. So if a fix a subsequence of x, I can check if that subsequence occurs as a subsequence in y in linear time.
• It’s not entirely trivial, but it’s not very difficult to do, actually, to check if a particular sequence appears as a subsequence of y. This is not hard, takes linear time.
• The problem is that there are so many subsequences of x. So if x has length m, there is 2 to the m subsequences of x. So the whole time, if I want to try all possible subsequences, will be 2 to the m times m. It’s an exponential time algorithm.
• Here, we have the sequences x and y. I want to find the longest subsequence between x and y. So let’s say the longest subsequence is s. So we ask ourselves, OK does s include the last elements, the m-th element of x and the n-th element of y? So we have three cases.
• If xm is not included, our subsequence has to involve the prefix of x excluding the last element, and y maybe could include everything.
• So the longest subsequence will be a subsequence of the prefix of x and y. Which subsequence should I pick? Well, the longest possible because that’s going to be the whole sequence.
• I want to pick the longest subsequence between the prefix of x and y. So that’s case one, xm, the last element of x, is not included.
• Symmetrically, we have the case when the last element of y is not included in the common subsequence.
• Then I want to pick the longest common subsequence between the prefix of y, excluding that last element, and the whole of x. So you see the principles of suboptimality.
• If I use these problems to form the solution, I want to pick the best solution for this problem.
• The third case is where both of these are part of the subsequence.
• So in that case, my common subsequence will consist of the last elements and the best subsequence of the prefix of x, excluding the last element, and the prefix of y, excluding the last element.
• So the best subsequence will be one of these three cases.
• A priori, we don’t know which one of the three cases will turn out to be the best because, in order to decide that, we would like to know what’s the longest common subsequence between the appropriate prefix of x and y, or prefix of y. So we have to solve sub-problems in order to tell which of the three cases are best.

#### Week 2: Algorithms 2 > 2h Dynamic Programming 2 > 2h Video

• We write it for the general case, where it’s not the whole sequence but any sequence up to any in this design [INAUDIBLE].
• So let’s say we think this is position i. This is position j. And we want to compute, we have see c ij is the length of the best subsequence between the prefix of x up to position i. And y up to position j. So by the same arm logic, we can write a recurrence that expresses c of ij, which is the length of the best match between the prefixes up to i and up to j in terms of smaller prefixes.
• So what we have here is that c ij, in the case that x i is not equal to y j, we have only the cases 1 and 2.
• So c ij will be the maximum of c i minus 1, the prefix of length i minus 1 of x, and j, the whole preference of y up to j, or c of i, the whole prefix of i and y up to j minus 1 excluding the last element.
• So if x i is not a y j, we have only the first two cases.
• It turns out always that the last case is the best in that case.
• Now, this reduces the computation of c of ij to c of a pair, where one of the indices is smaller.
• At the end, we’ll hit the bottom, where one of the indices will be 0.
• So if we have c of ij, where I is 0, it means I consider the prefix of x that has no elements.
• So no matter what y I get, the subsequence has length 0 because x has no elements.
• Then we can evaluate the recurrence starting from the base case, that’s the bottom.
• The recurrence says that, when I want to compute the element ij, this is going to depend on the element i j minus 1, which is the element immediately to the left.
• It depends on the element i minus 1j, which is the element immediately on the right.
• If here, the i’s element of x, and the j’s element of y are equal, it will also depend on the diagonal element, i minus 1, j minus 1.
• In this position, we have a in both x and y. So what we are going to put here will be the diagonal element plus 1.
• Let’s say these three here in a position where the B’s match from both x and y. From the left we have 2, from above we have 2, and diagonally, we have 2.
• When we get to the bottom right corner, that’s the length of the longest common subsequence between the whole of x and the whole of y. So in this case, the length of the largest common subsequence is 4.
• By the way, I have here colored here the elements in this table where I have a match between the two strings.
• That’s where I actually can increase by 1, what’s diagonally.
• We have to remember, for every element, where did we get this number? Which of the three cases gave this number? So there’s three cases.
• In case we don’t have equality, we’ll be the two cases, one above and one below.
• We say, where did this 4 come from? Well, in this case, it could have come either from the one above it or the one to the left.
• So we remembered this little arrow of this four remembers that it came from the element immediately to the left.
• So we have a little arrow for each element that points to what is the previous element that gave that value there.
• This is an element of the longest common subsequence.
• So in this example, we have the longest common subsequence we form will be very red elements here at the top.
• This will be of the A that is common in the elements in the both strings.
• So what were the elements again? First element is that we took the big problem, reduced it to smaller subproblems.
• Then we evaluate this recurrence bottom up, starting from the base case and tabulating the results as we go along until we get to the whole problem that tells us what is the cost or the value for the whole problem.
• We compute two or three numbers and compare which one of them is the best.
• Keep track of where a little r of where this number came from.
• So if I only want to compute the value, the length of the longest common subsequence, and not the sequence itself, I could just go row by row and remember only the two last rows.
• In particular, if the number of matching pairs of symbols of x and y- and what I mean by matching pairs is the number of red entries here.
• Actually, to only examine these entries, and, with the right data structures using the balanced search trees that we said before, it’s possible to do the whole processing just working on these entries and spending time proportional to the number of the entries.
• In this case, will be m plus n, the length of the two strings, plus P, the number of matching pairs of symbols, times logarithmic factors.
• For some applications, the number of matching pairs of symbols is not quadratic.
• So the number of matching symbols is going to be, typically, linear.
• Even in the cases where lots of matches, like in the DNA strings, where you have a small alphabet, so there will be lots of matches, there is a way, using hashing, and so forth, by looking at longer segments to try to reduce the number of matching pairs so that it will get closer to linear time.
• That’s a general method where it’s one of the things to think about, about reducing a problem to smaller problems, which you can solve optimally.
• The main ingredient is that, the total number of problems that will be generated if you keep doing that has to be relatively small, polynomial, let’s say, in the input size, so that you can tabulate them all.