Palindromes in The Gettysburg Address and Shakespeare's collected works — A programming puzzle

By Christian Stigen Larsen
Posted 03 Mar 2011 — updated 23 Feb 2016

Some years ago, I solved some programming puzzles posted by Greplin (later Cue, then acquired by Apple).

The task was to find the longest palindrome—words or sentences that read the same forwards and backwards—in Abraham Lincoln's Gettysburg Address.

Of course, the code had to be as fast and elegant as possible. (Correct submissions lead to recruitment talks. I did talk with them, but as a foreigner I did not have any US worker's visa, and they were too small to have any time and money to spend on getting me one.)

Anyway, there are many algorithms for doing this. Mine is just based on a brute force scan of the text. I see I claim that the running time is linear on the average input and quadratic in the worst case. Update: Today I would have used the word in practice instead of on average, because there really is nothing fancy about this code (except that it's a bit cute).

/*
 * Find the longest palindrome in the text.
 *
 * This is Greplin's first challenge, and I originally solved it in Python.
 *
 * This algorithm is linear on the average input, but has a quadratic
 * worst case running time.  There exists an even better algorithm, but
 * this should do.
 *
 * There might also be a small error below, but you get the general idea.
 *
 * Christian Stigen Larsen
 */

#include <stdio.h>

static const char text[] =
  "Fourscoreandsevenyearsagoourfaathersbroughtforthonthiscontainentanewnati"
  "onconceivedinzLibertyanddedicatedtothepropositionthatallmenarecreatedequ"
  "alNowweareengagedinagreahtcivilwartestingwhetherthatnaptionoranynartions"
  "oconceivedandsodedicatedcanlongendureWeareqmetonagreatbattlefiemldoftzha"
  "twarWehavecometodedicpateaportionofthatfieldasafinalrestingplaceforthose"
  "whoheregavetheirlivesthatthatnationmightliveItisaltogetherfangandpropert"
  "hatweshoulddothisButinalargersensewecannotdedicatewecannotconsecrateweca"
  "nnothallowthisgroundThebravelmenlivinganddeadwhostruggledherehaveconsecr"
  "ateditfaraboveourpoorponwertoaddordetractTgheworldadswfilllittlenotlenor"
  "longrememberwhatwesayherebutitcanneverforgetwhattheydidhereItisforusthel"
  "ivingrathertobededicatedheretotheulnfinishedworkwhichtheywhofoughthereha"
  "vethusfarsonoblyadvancedItisratherforustobeherededicatedtothegreattdafsk"
  "remainingbeforeusthatfromthesehonoreddeadwetakeincreaseddevotiontothatca"
  "useforwhichtheygavethelastpfullmeasureofdevotionthatweherehighlyresolvet"
  "hatthesedeadshallnothavediedinvainthatthisnationunsderGodshallhaveanewbi"
  "rthoffreedomandthatgovernmentofthepeoplebythepeopleforthepeopleshallnotp"
  "erishfromtheearth";

void print_best(const char* l, const char* r)
{
  static int best = 2;

  if ( r-l > best )
    printf("%.*s\n", (best=r-l), l);
}

int main()
{
  const char *l, *r;

  for ( const char *p = text + 1; *p; ++p ) {
    l = p; r = p - 1;
    while ( *--l == *++r );

    print_best(l+1, r);

    l = r = p;
    while ( *--l == *++r );

    print_best(l+1, r);
  }

  return 0;
}

The output is

$ gcc -O3 -W -Wall gettysburg.c -ogetty && time ./getty
eve
ranynar

real  0m0.004s
user  0m0.001s
sys   0m0.002s

So the longest palindrome in The Gettysburg Address is ranynar.

The above code is quite fast as well. It runs through the complete works of Shakespeare in about 0.03 seconds of wall clock time, or a processing rate of 128 Mb/second, and grows linearly with the input. It has a worst case running time which is quadratic, though; if you feed it a string of same characters, for instance.

To find the palindromes in Shakespeare's collected works, I first modified the code to load the text from disk. I changed print_best to print all palindromes equal to or longer than the current best, so we get a longer list of palindromes. I also had to prepare Shakespeare's collected works into a format suitable for processing: First I removed the name of which character is speaking (done by sed), then I converted all text to lowercase and deleted all non-alphanumeric characters.

Since Shakespeare wrote his plays more than 300 years before copyright law was invented, you can download and use it freely. I got mine off a site with all the stuff in different directories. Here's what I did to prepare them, all in one jolly line of UNIX goodness:

$ find shakespeare/ -type f \        # for all files
    | xargs sed 's/^[A-Za-z]*//g' \  # only keep alphabetical chars
    | tr A-Z a-z \                   # translate to lowercase
    | tr -dC a-z \                   # remove anything BUT alphabetical chars
    > shakespeare.txt                # output to one big file

I compiled my code and ran it on the file:

$ llvm-g++ -O4 -flto pali.cpp -o pali
$ cat shakespeare.txt | time ./pali
ll
ll
ll
ama
lonanol
tomymot
withtiw
iwerewi
erewere
tarorat
rownwor
sieveis
tomymot
imadami
madammadam
ereherehere
reherehereher
hereherehereh
illitmadamtilli
madammadammadam
madammadammadam

real    0m0.026s
user    0m0.022s
sys     0m0.004s

The longest palindromes appear in the lines:

GLOUCESTER

So will it madam till I lie with you.

from the play Richard III. The preparsing of the collected works above includes the character's name. Without it, the output is a bit different:

$ find shakespeare/ -type f \
    | xargs cat \
    | sed -e 's/^[A-Za-z]*//g' \
    | tr A-Z a-z \
    | tr -dC a-z > collected-works-no-names.txt

$ cat collected-works-no-names.txt | ./pali
ama
lonanol
nymedemyn
ereherehere
reherehereher
illitmadamtilli

The second last character-by-character palindrome comes from the tragedy Troilus and Cressida:

PANDARUS

Hark! they are coming from the field: shall we stand up here, and see them as they pass toward Ilium? good niece, do, sweet niece Cressida.

CRESSIDA

At your pleasure.

PANDARUS

Here, here, here's an excellent place; here we may see most bravely: I'll tell you them all by their names as they pass by; but mark Troilus above the rest.

You can see it ignores a punctuation mark, which I think is fine.

There are probably other, much better algorithms for finding palindromes. If you're interested in generating palindromes, you should check out Peter Norvig's supposed world record palindrome.