jumble – awk portability

Programming

jumble – awk portability

An exercise was posted over at ProgrammingPraxis called ‘Jumble’.

The aim is to re-arrange a collection of jumbled letters into a real word. A few solutions were posted on the blog (including the original solution in Scheme which is quite nice). I included a solution in gawk in the comments, this came in at a pleasing 18LOC, but relied on the gawk only function asort.

$ cat jumble.gawk
#! /usr/bin/gawk -f
# jumble.gawk - solves a jumbled word puzzle
# usage: jumble [-v puzzle="XXX"] [path/to/dictionary]

BEGIN {
	if( puzzle == "" ) puzzle = "LTEADE"
	split(puzzle, puzzle_arr, "")
	puzzle_len = asort(puzzle_arr)
}

{
	if( length($0) != puzzle_len ) next
	split($0, test_arr, "")
	asort(test_arr)
	for( i = 1 ; i <= puzzle_len ; i++ )
		if( tolower(test_arr[i]) != tolower(puzzle_arr[i]) ) next
	print FILENAME ": " $0
}

$ ./jumble.gawk /usr/share/dict/*
/usr/share/dict/american-english: elated
/usr/share/dict/british: elated
/usr/share/dict/british-english: elated
/usr/share/dict/catala: delate
/usr/share/dict/catalan: delate
/usr/share/dict/german: adelte
/usr/share/dict/german: dealte
/usr/share/dict/german: tadele
/usr/share/dict/ngerman: adelte
/usr/share/dict/ngerman: dealte
/usr/share/dict/ngerman: tadele
/usr/share/dict/usa: elated
/usr/share/dict/words: elated

What is the LOC penalty for making this more portable? My goal was to run this with mawk as I’ve got it installed (and hopefully there’ll be some performance gains). Because mawk doesn’t have a built in array sorting function, we’ll need to bring our own. Handily, The AWK Programming Language (Aho, Kerninghan, Weinberger) defines a quicksort algorithm in Chapter 7.

The AWK Programming Language

The below is the program re-written to be more portable. Note; I’ve given the gawk only program the .gawk extension, the portable solution gets .awk.

$ cat jumble.awk
#! /usr/bin/awk -f
# jumble - solves a jumbled word puzzle
# usage: jumble [-v puzzle="XXX"] [path/to/dictionary]

BEGIN {
	if( puzzle == "" ) puzzle = "LTEADE"
	puzzle_len = split(puzzle, puzzle_arr, "")
	quicksort(puzzle_arr, 1, puzzle_len)
}

{
	if( length($0) != puzzle_len ) next
	test_len = split($0, test_arr, "")
	quicksort(test_arr, 1, test_len)
	for( i = 1 ; i <= puzzle_len ; i++ )
		if( tolower(test_arr[i]) != tolower(puzzle_arr[i]) ) next
	print FILENAME ": " $0
}

function quicksort( A, left, right,	i, last ){
	if( left >= right ) return
	swap(A, left, left + int((right - left + 1) * rand()) )
	last = left
	for( i = left + 1; i <= right; i++ )
		if( A[i] < A[left] )
			swap(A, ++last, i)
	swap(A, left, last)
	quicksort(A, left, last - 1)
	quicksort(A, last + 1, right)
}

function swap(A, i, j,	t){
	t = A[i]; A[i] = A[j]; A[j] = t
}

15 additional lines allows us to use mawk. Any performance gains?

$ time mawk -f jumble.awk -v word="LTAEDE" /usr/share/dict/*
-- SNIP --
real	0m0.862s
user	0m0.847s
sys	0m0.014s

$ time gawk -f jumble.gawk -v word="LTAEDE" /usr/share/dict/*
-- SNIP -- 
real	0m0.963s
user	0m0.958s
sys	0m0.004s

Nice one.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.