Enumeration methods 5
Inhaltsverzeichnis
Theory
Introduction
Enumeration methods are used to solve combinatorial optimization problems. Combinatorial optimization problems are problems where decision variables are binary, expressing that an object (e.g. graph, edge, node) is chosen or is not chosen. This leads to a lot of possible solutions with the difficulty of selecting and finding solutions that are more optimal than others. Real world examples of such problems include the scheduling of machines in production planning, aircraft rotation/crew scheduli in airline businesses and transport routing/scheduling in logistics. When modelled in a formal way, these problems are subproblems of either allocation problems, permutation problems (where the "Travelling salesman problem" belongs to -> example), selection problems (like "Knapsack problem") or clustering problems.
The number of existing possible alternative solutions of a combinatorial optimization problem can be very large. It is determined by the computational complexity. For example, for an allocation or sequence problem, there are n(n-1)(n-2)...1 = n! alternative solutions, meaning it has a computational complexity of "n factorial". So even with today's computing power, computing all possible solutions to a not-too-big TSP can be impossible. Therefore the goal of most enumeration methods is to cut down the complexity with certain algorithms.
Types of enumeration methods
Explicit complete enumeration
Full enumeration of all possible alternatives and comparision of all of them to pick the best solution. Can be very expensive or even impossible for more complicated problems (->see above).
Implicit complete enumeration
Parts of the solution space that are definitely sub-optimal are excluded. This reduces complexity because only the most promising solutions have to be considered. For implicit complete enumeration, methods like Branch & Bound, limited enumeration and dynamic optimization can be used.
Incomplete enumeration
Selecting alternatives by only looking at parts of the solution space by applying certain heuristics. This provides approximate solutions, but not necessarily optimal ones.
Example: Traveling Salesman Problem
The Traveling Salesman Problem is one of the most intensively studied problems in optimization. Every time a new optimization algorithm is developed, it will be tested using the TSP (among others). This is because at a certain point, finding an optimal solution is nearly impossible, so you can clearly differentiate between the three enumeration methods. The problem is as followed: A salesman has to visit a certain amount of cities and return to his hometown afterwards. Given the distance between the cities, which route is the shortest? As mentioned before, basically all optimization methods have / can be used to solve the problem, so to keep it simple, we focus on only one method per category.
Mathematical model:
Fehler beim Parsen (http://mathoid.testme.wmflabs.org Serverantwort ist ungültiges JSON.): min D = \sum \sum d_{ij} x_{ij}
, where
with the following constraints:
- Fehler beim Parsen (http://mathoid.testme.wmflabs.org Serverantwort ist ungültiges JSON.): \sum \limits_{i}^{} x_{ij} = 1 \text{ for } j=1, ..., n
(If we come to city j, we only come there from exactly one city) and Fehler beim Parsen (http://mathoid.testme.wmflabs.org Serverantwort ist ungültiges JSON.): \sum \limits_{j}^{} x_{ij} = 1 \text{ for } i=1, ..., n
(If we go away from city i, we only go to exactly one city)
- Fehler beim Parsen (http://mathoid.testme.wmflabs.org Serverantwort ist ungültiges JSON.): x_{ij}+x_{ji} \leq 1 \text{ for all } i, j
We need the last two terms so that there won’t be any short cycles, for example look at the distance of the four cities A to B below. If it weren’t for those equations, going from A to B would immediately result in going from B to A again, because it is obviously the shortest tour. Mathematically, and would be 1, so + = 2
As we can see above, the value must be 1 or 0, which in other words means that short cycles of lengths 1 and 2 are not possible with these equations. We don’t need a third (fourth) equation for short cycles with three (four) nodes in our problem with five cities, because having a short cycle of lenghth s automatically implies there is another short cycle of length (n-s), which in our case would be 5-3 = 2 or 5-4 = 1, both of which are already forbidden.
Given is the following map:
A-E represent the five cities we have to visit, the numbers show the distance between each of the cities. To clear things up, we create a table containing the same information:
Detailed solution process
Incomplete enumeration
Here we use the nearest neighbor method to get a feasible solution:
Starting Point B: to A=10, to C = 10, to D = 5, to E = 6
Therefore we start with B -> D
Next up: Point D: to A=9, to C=8, to E=6
Our route until this point is B -> D -> E
Next up: Point E: to A=7, to C=9
Our final route therefore is B -> D -> E -> A -> C -> B with a total length of 5+6+7+8+10=36 units
This is a feasible solution, but (as we later find out) not optimal.
Implicit complete enumeration
Here we use the branch and bound method to get an optimized solution. This leads to a decision tree where each branch represents one possible way to continue the route from the "current" city (node). We evaluate the branches by looking at the lower bounds of each current route (see next paragraph), then we continue with the branch that has the lowest bound. The algorithm stops when we have found a valid solution and no other node in the decision tree has a lower bound than the solution we found.
We start by taking a look at the lower bound, which is the minimum distance the salesman has to go to leave each city and go to the nearest one. If he wants to leave city A, he has to walk at least 7 units, in which case he would walk to city E. If he wants to leave city B, he has to walk at least 5 units, which would lead him to city D. The same goes for C (8 units), D (5 units) and E (6 units). If we add up all these values, we know that the optimal solution is (7+5+8+5+6)=31 units AT LEAST. This does not mean this is the optimal solution, just that the lowest possible solution is guaranteed to be equal or greater than 31. We start our decision tree with this lower bound.
Next, we calculate the lower bounds if we fix one tour, in this case we want to take a look at the connections going out of city A:
To calculate the lower bounds of each connection, we can simplify the given table by striking off the column of the target city and the row of the city we came from. That is because we can’t go back to a city once we visited it and we can’t leave a city twice.
= 1
-> LB = 10 (distance from A to B) + 5 + 8 + 6 + 6 = 35
= 1
-> LB = 8 (distance from A to C) + 5 + 8 + 5 + 6 = 32
= 1 -> LB = 9 + 6 + 8 + 5 + 6 = 34
= 1 -> LB = 7 + 5 + 8 + 5 + 6 = 31
After this, our graph looks like this:
We continue our route through the path which has the shortest LB, in this case -A-E-. It doesn’t matter if A is the starting point or if the route goes over A then E or E then A, since the distance remains the same.
To find further short connections, we branch our known connection again:
As you can see, we can reduce the number of branches further each level, in this case to n-2=3. is forbidden (We can’t have short cycles in our final solution) as well as (We just went to city E and we can only go there once).
Using the given table (even further simplified) we continue to calculate the LB:
= 1
LB = 7 (A->E) + 10 (B->A) + 8 + 5 + 6 = 36
We strike off rows A and B because we already came from these two cities and also colums A and E because we already went to those cities. Then we strike off AB, because if we take route BA then we’re not allowed to go directly back (creating a short cycle).
= 1
LB = 7 + 5 + 8 + 8 + 6 = 34
Here not forgetting to chance D-B to blank actually changes the LB and falsifies the final result.
= 1 -> LB = 7 + 10 + 8 + 5 + 6 =36
Now our decision tree should look like this:
We continue the same method for all branches of connection -A-C-, because it has the lowest LB at the moment. As it’s just the same procedure as before, we skip directly to how the tree looks like after branching.
In the next level we don’t calculate LBs, but the real distance through all cities and back to the start since all required connections have been calculated.
Let’s branch connection -B-D- first:
Two options remain: We can go from C to B or to E
= 1 -> Route = A-C-B-D-E-A; D = 8+10+5+6+7 = 36
= 1 -> Route = A-C-E-B-D-A; D = 8+9+6+5+9 = 37
Next we branch connection -B-E-
The two options: Go from C to B or to D
= 1 -> Route = A-C-B-E-D-A; D = 8+10+6+6+9 = 39
= 1 -> Route = A-C-D-B-E-A; D = 8+8+5+6+7 = 34
Now we draw the tree again:
As we can see, the solution with a length of 34 is optimal, because the lower bounds of all other remaining connections are higher than the actual length of this connection, only connections AE+BD or AD could theoretically be of equal length, but since we’re already found the optimal solution, that’s not of interest.
Explicit complete enumeration
Using these methods requires a lot of computing power but guarantees the optimal solution, because every possible outcome is calculated.
Think of the TSP with only 2 cities:
The salesman could start in A, go to B and come back. Alternatively he could start in B, go to A and come back. The total number of possibilities would be n!=2, but since -A-B-A-B-… repeats itself over and over again, we have (n-1)! possibilies, in this case just 1.
Let’s expand the TSP to 3 cities:
Now he could take the routes A-B-C-A, A-C-B-A, B-A-C-B, B-C-A-B, C-A-B-C and C-B-A-C. The total number of possibilities is (n-1)! = (3-1)! = 2. Even though there are 6 solutions presented, they all come down to going clockwise or counter clockwise.
If we take a look at the TSP with four cities, it starts to get more complicated.
We have (4-1)! = 6 solutions, A-B-C-D-A, A-B-D-C-A, A-C-D-B-A, A-C-B-D-A, A-D-B-C-A and A-D-C-B-A.
As one can clearly see, the number of possibilities rises exponentially, so very rapidly. In our example with five cities there would be (5-1)! = 24 possibilities, which doesn’t seem that much, but take a TSP with just 10 cities, it’s going to take a while to calculate all 360,000 possible solutions.
This is the exact reason why explicit complete enumeration is the most reliable method of calculating the optimal solution, but also by far the slowest.
Sources
W.Domschke, A.Drexl, Einführung in Operations Research, 6th edition, 2005
G. Srinivasan, Operations Research: Principles and Applications