Suffix Arrays in JavaScript

Suffix arrays are a memory efficient data structure for storing the sorted order of the suffixes of a string. There is really nothing fancy; they are literally just an array of indexes into the string, where each index represents the start of a suffix.

I recently wrote an implementation (JavaScript) of Kärkkäinen’s and Sanders’s linear time algorithm to construct a suffix array as part of larger project and have made it available.

Using it is straightforward. If you have a string, then it works much as expected.

var s = "... put some stuff in here ...",
    a = suffixArray(s);

If you have anything more complicated, then you can instead pass a function that takes an index (integer >= 0) and returns a symbol (an integer) along with its length. For example,

var s = [ 0xBEEFCAB, ..., 0xFEEDF00B ],
    a = suffixArray(function(i) {
        return (s[i >> 2] >> ((i & 3) << 3)) & 255;
    }, s.length * 4);

Grab the code and use it, please.


Super Simple JavaScript Queue

One of the things I like about JavaScript is that it rewards funky coding techniques. Good JavaScript is short and sweet. Simple to use and fast to download. The search for simplicity lead me to create this simple implementation of a queue in Javascript.

function q() {
    var head, tail, undefined, f;
    return f = function(x) {
        if (x != undefined) {
            tail = tail ? tail.n = {v: x} : head = {v: x};
            x = f;
        } else {
            x = head ? head.v : undefined;
            head = head == tail ? (tail = undefined) : head.n;
        }
        return x;
    };
}

Calling the function q() gives you a function. Called without parameters and it dequeues and item from the queue. Call it with a single parameter and that is enqueued. When the queue is empty it returns undefined. For example,

function fail(s) { throw new Error(s) }
var a = q();
a() == undefined || fail();
a(1); a(2); a(3);
a() == 1 && a() == 2 && a() == 3 || fail();
a() == undefined || fail();
// Or...
a = q()(1)(2)(3);
a() == 1 && a() == 2 && a() == 3 && a() == undefined || fail();

Of course, being simple means it has a few problems. There is no way to peek at the value and no way to know the size of the queue. It also can’t store undefined as a value.

Here is a minified version (using UglifyJS):

function q(){var a,b,c,d;return d=function(e){e!=c?(b=b?b.n={v:e}:a={v:e},e=d):(e=a?a.v:c,a=a==b?b=c:a.n);return e}}

In case anyone mentions it, Array’s push() and shift() is not a replacement for a proper a queue, since shift() is an O(n) operation.


2D Convex Hulls: Chan’s Algorithm

This is the last in a series of 3 posts on 2D convex hull algorithms. Chan’s algorithm is interesting. It runs in O(n log h) time, where h is the number of points on the hull. It was not the first optimal output-sensitive convex hull algorithm, but it is the simplest. The algorithm uses the idea of the Jarvis March, but in a novel way, even mixing in the Graham Scan. You can grab the source code from github.

Chan’s Algorithm in a Nutshell

Chan’s paper is very clearly written and most students/enthusiasts shouldn’t have any problems with it. However, I will reiterate the basic idea of the algorithm here.

Essentially, the idea is this. You know there are m points on the hull, so you partition your point set into n/m chunks, each with approximately ~m points. You find the convex hull of each of these chunks in time O((n/m) · (m log m)) = O(n log m) time. We now find the convex hull of these convex hulls using an approach much like the Jarvis March. We start with some initial extreme point. Then, to find the next point on the convex hull (the furthest right point), we first find the point of tangency (on the “right” side) from this extreme point to each of the convex hulls separately. The next point is simply the furthest right point amongst the O(n/m) points of tangency. Finding a single tangent requires only O(log m) time (as it is just a bimodal search). Finding all of the tangents requires O((n/m) · (log m)) time and since we are only doing this for the m points on the hull, the total running time is O(n log m).

Of course, the obvious problem here is that we don’t know m ahead of time!

Finding m

It turns out we can find m rather quickly; enough that we don’t ruin our time-bound of O(n log m). The main idea is that if we assume the hull has, at most, m points, then we only need to try to wrap the hull until we reach m points. If we haven’t ended up where we started by this point, then we can just give up and guess higher while only wasting O(n log m) time. To keep the total number of guesses low enough, we set a parameter t = 1 and set m= 2^(2^t). After each guess that we are wrong, we increment t by 1, and recalculate m (eg. {m} = 4, 16, 256, 65536, …). Yes, m grows VERY fast. However, if you analyze the runing time, you can see the reason why. We can define the time complexity with the following recurrence,

T(n,t) = O(n log 2^(2^t)) + T(n, t-1).

Since n log 2^(2^(t-k)) = n 2^(t-k) log 2 = O(n (log m) / 2^k),

T(n,t) = O(n log 2^(2^t)) + T(n, t-1) = O(n log 2^(2^t)) + O(n log 2^(2^(t-1))) + … + O(n log 2^(2^(t-t))) = (1 + 1/2 + 1/4 + 1/8 + … + 1/2^t) * O(n log m) < 2 * O(n log m) = O(n log m).

The Implementation

The main loop is pretty straightforward. We simply have to keep updating our guess for the hull size (m). We then split the points into chunks of (roughly) size m and find the convex hull of these chunks using the Graham Scan.

def convex_hull(pts):
    """Returns the points on the convex hull of pts in CCW order."""
    for m in (2 ** (2 ** t) for t in xrange(len(pts))):
        hulls = [_graham_scan(pts[i:i + m]) for i in xrange(0, len(pts), m)]
        # more code goes here...

The Graham Scan is just a copy from my previous post.

TURN_LEFT, TURN_RIGHT, TURN_NONE = (1, -1, 0)

def turn(p, q, r):
    """Returns -1, 0, 1 if p,q,r forms a right, straight, or left turn."""
    return cmp((q[0] - p[0])*(r[1] - p[1]) - (r[0] - p[0])*(q[1] - p[1]), 0)

def _keep_left(hull, r):
    while len(hull) > 1 and turn(hull[-2], hull[-1], r) != TURN_LEFT:
            hull.pop()
    return (not len(hull) or hull[-1] != r) and hull.append(r) or hull

def _graham_scan(points):
    """Returns points on convex hull of an array of points in CCW order."""
    points.sort()
    lh = reduce(_keep_left, points, [])
    uh = reduce(_keep_left, reversed(points), [])
    return lh.extend(uh[i] for i in xrange(1, len(uh) - 1)) or lh

The next major step is finding an extreme point. We can’t just use min(pts), since, while this will give us an extreme point, we need to know which of the chunks this point is in. So, we create a function to return the hull, along with the point (actually, we return the indices instead). This is pretty much as you’d expect.

def _min_hull_pt_pair(hulls):
    """Returns the hull, point index pair that is minimal."""
    h, p = 0, 0
    for i in xrange(len(hulls)):
        j = min(xrange(len(hulls[i])), key=lambda j: hulls[i][j])
        if hulls[i][j] < hulls[h][p]:
            h, p = i, j
    return (h, p)

It takes a list of lists of points as a single parameter and returns a tuple of 2 indices (ints) into the list of lists of points.

After this we need a way to find the next point in the hull. This requires us to find the tangents from the extreme point to each of the n/m hulls. We’ll delay discussion of the implementation of the tangent finding algorithm for now, and instead use it as a blackbox in another function that simply returns the next point on the hull.

def _rtangent(hull, p):
    """Return the index of the point in hull that the right tangent line from p
    to hull touches.
    """
    # Code goes here...

def _next_hull_pt_pair(hulls, pair):
    """
    Returns the (hull, point) index pair of the next point in the convex
    hull.
    """
    p = hulls[pair[0]][pair[1]]
    # Finding the next point for current hull is a little easier.
    next = (pair[0], (pair[1] + 1) % len(hulls[pair[0]]))
    for h in (i for i in xrange(len(hulls)) if i != pair[0]):
        # Find the right tangent to hull h from the point p.
        s = _rtangent(hulls[h], p)
        # Now figure out if this is further right then our previous guess (next).
        q, r = hulls[next[0]][next[1]], hulls[h][s]
        t = turn(p, q, r)
        if t == TURN_RIGHT or t == TURN_NONE and _dist(p, r) > _dist(p, q):
            next = (h, s)
            # Notice that we are a little more careful with collinear points.
    return next

This function takes a list of lists of points and a pair of indices to the previous hull point. It returns another pair of indices of the next hull point. Now, with these, we can turn back to our original convex hull function and fill it out.

def convex_hull(pts):
    """Returns the points on the convex hull of pts in CCW order."""
    for m in (2 ** (2 ** t) for t in xrange(len(pts))):
        hulls = [_graham_scan(pts[i:i + m]) for i in xrange(0, len(pts), m)]
        # Here we find the extreme point and initialize our hull with it.
        hull = [_min_hull_pt_pair(hulls)]
        # We must ensure we stop after no more than m iterations.
        for _ in xrange(m):
            p = _next_hull_pt_pair(hulls, hull[-1])
            if p == hull[0]:
                return [hulls[h][i] for h, i in hull]
            hull.append(p)

Finding Tangents

Now, of course we aren’t quite done yet. We skipped over the implementation of the O(log n) tangent finding algorithm (n is the size hull). This is actually the most difficult part of the implementation. It is skipped in the paper, but is a good exercise to try on your own first.

Note that this implementation relies heavily on the fact that we are working with a convex hull that is wound counterclockwise.

The general structure is similar to binary search on the convex hull. The difficulty is deciding which half contains the right tangent after we split it. So, start by breaking the problem down into cases. We have a sub-list of the original hull and we set c to the median point of this list. Either the tangent is the c, or it is in the left half, or it is in the right half.

Checking if it is c is easy, as both the points before and after c will be to the left of the line defined by p and c (p is the point we wish to find the tangent from).

Checking if the tangent is in the the left side of the chain is a little tougher. However, there are really only 3 cases we need to check for. I’ve diddled this little drawing that helps illustrate these.

3 cases where the tangent is on the left chain.

The 3 cases where the tangent is on the left chain.

Putting this in code form isn’t too hard. Most importantly, we need to find the orientation of points before and after c (l), relative to p, c (p, l).

def _rtangent(hull, p):
    """Return the index of the point in hull that the right tangent line from p
    to hull touches.
    """
    l, r = 0, len(hull)
    l_prev = turn(p, hull[0], hull[-1])
    l_next = turn(p, hull[0], hull[(l + 1) % r])
    while l < r:
        c = (l + r) / 2
        c_prev = turn(p, hull1, hull[(c - 1) % len(hull)])
        c_next = turn(p, hull1, hull[(c + 1) % len(hull)])
        c_side = turn(p, hull[l], hull1)
        if c_prev != TURN_RIGHT and c_next != TURN_RIGHT:
            return c
        elif c_side == TURN_LEFT and (l_next == TURN_RIGHT or
                                      l_prev == l_next) or \
                c_side == TURN_RIGHT and c_prev == TURN_RIGHT:
            r = c               # Tangent touches left chain
        else:
            l = c + 1           # Tangent touches right chain
            l_prev = -c_next    # Switch sides
            l_next = turn(p, hull[l], hull[(l + 1) % len(hull)])
    return l

And that’s basically it. You can get the source code at github (as a gist).


Stretch: A jQuery Plugin

I created a jQuery plugin to resize text so that it fills up the full width of its container. It does this in a 2 step process:

  1. It finds the largest font size (aligned on a pixel) that can be contained by the container without overflow;
  2. It finds the maximum amount of word-spacing that can be added without overflow.

There is a similar plugin (TextFill), however it doesn’t allow the text to be taller than the initial height, requires a maximum font size to be given and is non-optimal (performs a O(n) search, n is the font-size of the final chosen size). My plug-in remedies these problems. It does not consider the height during the resizing, always fills to the maximum width (font will be whatever size gets to this width), and runs in O(log n) time.

Using it is simple. For example, to stretch the heading of this blog to the full width of its container, we can simply run:

$("#header h1").contents().stretch();

I also use it to stretch the dates beside the blog posts so the year, month, and day all have equal width.

$(".post-date").find(".year, .month, .day").stretch();

The key here is that whatever you are stretching should be an inline element or text. You can grab the stretch plugin from its GitHub repository or its project page on jQuery.com.


Easy Multilingual Theming with XDV

I use Plone at work (and love it). Lately I’ve started getting into using XDV to theme my sites (3 so far). It’s fantastic. Our latest site was put up in a week (from start to finish) using Plone, collective.xdv and some team work. Anyways, we only had the English version up initially, while we waited on the translations. When they finally came through, I first plugged it all into Plone (using LinguaPlone), but now I had to update the theme to support the French site. Support for the :lang() CSS pseudo selector is lacking in some browsers, so I contemplated for a few minutes, then added 2 rules to my rules.xml file:

<drop theme="//*[@lang='fr']" if-content="/html[@lang!='fr']" />;
<drop theme="//*[@lang='en']" if-content="/html[@lang!='en']" />;

Problem solved! To handle language-specific CSS, I just created 2 new css files, one for English and one for French and added a lang attribute in the link element.

<link lang="en" href="style-en.css" rel="stylesheet" type="text/css" />
<link lang="fr" href="style-fr.css" rel="stylesheet" type="text/css" />

I did the same thing for static text and images as well.


2D Convex Hulls: Graham Scan

This is the 2nd post in a series of 3 on 2D convex hull algorithms. The first covered the Jarvis March and here I’ll be covering the Graham Scan. The implementation of the Graham Scan is short, but sweet. It handles degenerate cases very well. The next post will cover Chan’s algorithm.

Graham Scan

Although it may not look it at first glance, the Graham Scan is similar to the Jarvis March. We start with an extreme point and then find the edges of the hull iteratively, one at a time. The main difference is that by sorting the points first, we remove the need to spend O(n) time searching for the next hull point (though at the cost of making some “mistakes”). Instead, we spend our effort on updating the hull at each iteration, fixing previous mistakes. It turns out that fixing the mistakes is pretty easy and we can find the entire hull in only O(n) comparisons (rather then the O(nh) of the Jarvis March). The time complexity of the algorithm actually comes from having to first sort the points in O(n log n) time.

The original incantation of the Graham Scan sorted the points in angular order relative to some initial extreme point (eg. sorted in CCW order around the furthest left point). The convex hull is then constructed iteratively by going through the sorted list of points, one by one, each time adding the point to the previous hull. However, adding a point to the previous hull in the Graham Scan is not as simple as it was in the Jarvis March. In the Jarvis March, we knew our previous hull points were a subset of our final hull points, so to “update” the previous hull, we simply add the point found at each iteration to the end of our list of hull points. However, in the Graham Scan, things are not so simple.

Updating the Hull

The hull we have at the start of iteration i is actually the complete convex hull of the first i-1 points in our sorted list. Updating the hull requires us to find the tangents between the previous hull and point pi, and replacing all hull edges between these 2 tangents with the tangents themselves. This may seem hard at first, but because the points were sorted in CCW order around the first point, p1, it is actually quite simple.

We know the point pi lies to the left of p1pi-1 and all other points pk, 1 < k < i-1, lie to the right of p1pi-1, so one of the tangents will have to be the edge pip1. Knowing this, we can find the other point of tangency by starting at pi-1 (the last point in the list of hull points) and working backwards. The point of tangency is the first point pj, j<i, where pj-1, pj, pi form a left (CCW) turn. After finding the point pj, we simply remove all points between this and the other point of tangency, p1 (ie. we simply remove all points after pj in our hull list). The point pi is then appended to the end of our list of hull points.

In Python, we can implement a function to perform this step (adding a point r to the hull) fairly simply: we just keep popping the last item off our list while hull[-2], hull[-1], r don’t form a left turn.

TURN_LEFT, TURN_RIGHT, TURN_NONE = (1, -1, 0)

def turn(p, q, r):
    return cmp((q[0] - p[0])*(r[1] - p[1]) - (r[0] - p[0])*(q[1] - p[1]), 0)

def _keep_left(hull, r):
    while len(hull) > 1 and turn(hull[-2], hull[-1], r) != TURN_LEFT:
        hull.pop()
    hull.append(r)
    return hull

This function is incredibly simple. Also, since we insist on left turns only, it also handles the degenerate case where you have collinear points (ie. TURN_NONE). However, we run into problems if r already exists in the hull. If you insist on using sets only, then each point can only appear once and you can safely ignore this case, but handling multisets is incredibly easy. Since the points will be added in sorted order, if a point is already in the hull, then it will be at the very end. To ensure we don’t add a point to the hull twice, we simply check the point at the end.

def _keep_left(hull, r):
    while len(hull) > 1 and turn(hull[-2], hull[-1], r) != TURN_LEFT:
        hull.pop()
    # We check that hull[-1] != r to handle degenerate cases (ie. multisets)
    if not hull or hull[-1] != r:
    hull.append(r)
    return hull

Sorting the Points

Putting it together, in Python, we could simply sort the list by angular order, using turn (wrapped by functools.partial()) as the cmp function, then iterate through this list adding points one at a time to the hull with _keep_left. However, the Graham Scan can actually get away with sorting by lexicographical order only, instead of angular order. This is not only easier to implement, but it is also faster in a practical sense (though not asymptotic). Pretend we added a point at (0, ∞) to the point set. Then sorting by angular order about this point is equivalent to simply sorting by the x-coordinate. The convex hull of this modified point set is actually just the lower hull of our original point set. We can use this idea to split the algorithm into 3 parts. First we sort the points lexicographically (ie. first by x-coordiante, then by the y-coordiante) and find the lower hull. Then we reverse this sorted list and find the upper hull. Finally, we merge the upper and lower hull together. Doing this in Python, using the _keep_left function, is actually really straightforward. We simply have to keep in mind that the first point of the lower hull is also the last point of the upper hull and vice versa. To deal with this, we remove the first and last point from the upper hull before we merge the 2.

def convex_hull(points):
    """Returns points on convex hull of an array of points in CCW order."""
    points = sorted(points)
    l = reduce(_keep_left, points, [])
    u = reduce(_keep_left, reversed(points), [])
    # We don't include the first or last point when extending l.
    l.extend(u[i] for i in xrange(1, len(u) - 1))
    return l

What’s nice about this is that in each iteration we only backtrack through the list, starting at the last point, by as many points as we will remove +1 more point pj (which is where we stop). After we remove these points, they will never be looked at again, since they are no longer in the hull. This means, if we account for the +1 point we don’t remove, we can never look at more than 2n hull points during the entire run of the Graham Scan to find the tangents. Since we only go through the sorted list once, we require only O(n) time to find the convex hull, but O(n log n) time to sort the list first.

I posted the full source for the Graham Scan in Python (all 20 lines) as a Gist.


2D Convex Hulls: Jarvis March

I’ve found myself coding convex hull algorithms on a few occasions now, so I decided to implement a few and talk about them here, in case someone new to the subject wants to get the quick ‘n’ dirty. The algorithms I will talk about are the Jarvis March (code), the Graham Scan (code) and Chan’s algorithm (code). I feel they are all different enough that each is worth looking at, but similar enough that they are worth looking at together. I will try and focus more on the actual implementation of the algorithms (all in Python), looking at potential pit falls as well as the niceties, rather then just rehashing what can be found on Wikipedia. This first post looks at the Jarvis March.

Jarvis March

The Jarvis March is probably the simplest convex hull algorithm conceptually. You start with an extreme point p (a vertex of the convex hull) of a point set. You then find the next point on the convex hull of the point set in CCW order. This is done by finding the “furthest right” point, relative to p. The furthest right point q, relative to p, is the point such all other points in the point set lie to the left of the directed line through p, q (this line lies on the hull boundary). We can then update p to q and repeat the process until we end up with the point we started at.

p, q, r forming a left turn

The heart of the algorithm really lies in finding the furthest right point q, relative to an extreme point p. Say you have 3 points p, q and r. We say p, q, r form a left (respectively right) turn if r lies to the left (right) of the directed line through p and q. We can write a simple function to determine the turn of 3 points:

TURN_LEFT, TURN_RIGHT, TURN_NONE = (1, -1, 0)

def turn(p, q, r):
    """Returns -1, 0, 1 if p,q,r forms a right, straight, or left turn."""
    return cmp((q[0] - p[0])*(r[1] - p[1]) - (r[0] - p[0])*(q[1] - p[1]), 0)

Finding the furthest right point relative to p just reduces to simply finding the minimum point q using turn with a fixed parameter p as our comparison function. This can is done in O(n) time:

def _next_hull_pt(points, p):
    """Returns the next point on the convex hull in CCW from p."""
    q = points[0] != p and points[0] or points[1]
    for r in (x for x in points if x != p):
        if turn(p, q, r) == TURN_RIGHT:
            q = r
    return q

p in between q & r

The above function is simple, but it assumes the points are in general position; that there are no 3 collinear points. The first problem arises if there are 2 furthest right points, then we could chose the one that is closer to p, which is not an extreme point (ie. it lies on the hull boundary, but is not a vertex). This becomes a bigger problem if our first comparison relative to this “mid-point” is with the 2 vertices on either side of it. We could possibly “skip” over the furthest right point, since it is not to the right of the furthest left, but collinear and would end up with the incorrect furthest right point. Luckily, handling this is rather simple; we insist that if there is more than 1 furthest right point, we choose the one furthest from p. We can then rewrite our function, handling this case:

def _dist(p, q):
    """Returns the squared Euclidean distance between p and q."""
    dx, dy = q[0] - p[0], q[1] - p[1]
    return dx * dx + dy * dy

def _next_hull_pt(points, p):
    """Returns the next point on the convex hull in CCW from p."""
    q = p
    for r in points:
        t = turn(p, q, r)
        if t == TURN_RIGHT or t == TURN_NONE and _dist(p, r) > _dist(p, q):
            q = r
    return q

The only problem now is finding the first extreme point. This can be done rather simply by choosing the minimum point in lexicographical order. We can then put this together and simply loop until we end up where we started:

def convex_hull(points):
    """Returns the points on the convex hull of points in CCW order."""
    hull = [min(points)]
    for p in hull:
        q = _next_hull_pt(points, p)
        if q != hull[0]:
            hull.append(q)
    return hull

We will end up going through h iterations of the loop, where h is the number of points on the convex hull. Each iteration takes O(n) time to find the furthest right point, so the total time required is O(nh), which is suboptimal. However, it is important to remember that suboptimal does not mean it is useless. The Jarvis March can easily handle very large datasets in memory constrained environments. The only information you need to maintain between iterations is the first and last point found. Finding the next point scans through all points only once; order does not matter. These points could come from a database cursor, for instance. In this case, your function would return an iterator of the hull edges (or points) that generates them on demand, each time next() is called.

You can also download the complete version of the code. Next post I’ll cover the Graham Scan.


Permutations and Combinations

Quite often while programming I find my self going through all possible k-combinations of a list. That is, all possible unordered sets of size k made up of elements of a list. A few times I have needed to iterate through all possible permutations of a list. Getting tired of writing nested for-loops, I wrote a small (~40 SLOC) Python module to handle combinations (and permutations) of lists. It defines two simple iterators, combinations and permutations to iterate through k-combinations and permutations of lists.

Combinations:

If you’ve taken an intro to programming course, you’ve probably coded something similar to this:

def sort(lst, cmp=cmp):
    """Sort the list lst."""
    for i in xrange(len(lst)):
        for j in xrange(i + 1, len(lst)):
            if lst[j] < lst[i]:
                lst[j], lst[i] = lst[i], lst[j]

This will sort, in-place, the list lst in $latex T(n) = \binom{n}{2} = \frac{n(n-1)}{2} = O(n^2)&s=-2$ comparisons (ie. not very efficient). However, it illustrates the concepts of combinations. We are actually iterating over all 2-combinations of the set {0,1,..,n-1} (albeit, in a particular order). In this case, we are using the combinations as indices into the list. Now, obviously there are better ways to sort a list, but, though I hate to admit it, I will sometimes brute-force my way to a solution using similar methods (mostly for prototyping or testing more efficient methods for accuracy). For 2-combinations you can just use a loop as above, but what happens when you are going 3 or 4 nested loops deep? Hiding ugly code is never a good thing, but if you gotta do it, you might as well not waste several levels of indentation.

Using the module, we can rewrite the above sorting function as a single for-loop (obviously this still does not change the $latex O(n^2)&s=-2$ time complexity).

from permute import combinations
def sort(lst, cmp=cmp):
    """Sort list lst."""
    for i,j in combinations(range(len(lst)), 2):
        if cmp(a[j], a[i]) < 0:
            a[i], a[j] = a[j], a[i]

The function has a couple guarantees. First, the type returned by the iterator’s next function is the same as the type of the list. Second, the function combinations guarantees the order passed to the type constructor will be the same as if you had written k nested for-loops as above… In other words:

x = range(k)
t = type(lst)
for x[0] in xrange(n):
    for x[1] in xrange(x[0] + 1, n):
        ...
        for x[k-1] in xrange(x[k-2] + 1, n):
            v = t(lst[x[i]] for i in xrange(k))
            ...

Is equivalent to:

from permute import combinations
for v in combinations(lst, k):
    ...

Permutations:

The permutations iterator, as you may have guessed, iterates over all permutations of a list (in lexicographical order). Using this, we can solve the Travelling Salesman Problem:

from permute import *

def solve_tsp(weights):
    nodes = list(reduce(lambda s, e: s.update(e) or s, weights.keys(), set()))
    for i,j in combinations(nodes, 2):
        if (i,j) not in weights and (j,i) not in weights:
            weights[(i,j)] = -1
    path, min = [], -1
    for cand in permutations(nodes):
        dist = 0
        for e in ((cand[i-1], cand[i]) for i in xrange(1, len(nodes))):
            d = e in weights and weights[e] or weights[(e[1], e[0])]
            if d < 0:
                dist = -1
                break
            else:
                dist += d
            if dist >= 0 and (min < 0 or dist < min):
                path, min = cand, dist
    return path, min

solve_tsp takes 1 argument, a map of edges to their corresponding weights and returns a 2-tuple of the shortest path and its total distance. The “graph” (the weight map) is assumed to be undirected. For example,

print(solve_tsp({(1,2): 1, (1,3): 2, (1,4): 3, (2,3): 3, (3,4):1}))

Gives us the shortest path and its total distance: ([2, 1, 3, 4], 4). Going through all permutations ($latex n!&s=-2$) isn’t very practical. Once $latex n&s=-2$ gets larger then you can count on your hands it is really too slow to be of use.

You can grab the permute module from its github repository.