survex to therion conversion script

Wookey wookey at aleph1.co.uk
Wed Jan 12 03:54:53 CET 2005


A couple of people asked about this, so here it is under a sensible title so
people can find it in the archives.

Below is a perl script which does a reasonable job of converting .svx
files to .th files. Written by Olly.

I have noticed a few problems, which I might as well document here as as
good a place as any: (I was going to tidy them up and send them to Ol as I
was given said script for 'testing').

*calibrate declination comes out wrong. It should be:
*calibrate declination n ->  declination n degrees

therion doesn't understand 'ignoreall' - needs commenting out

Doesn't try to deal with different 'team' syntax - but could be fixed to get
most of them right ((lower case them all), 'pics'->'pictures', 'disto'->'length
#disto' etc)

Only significant problem was with 'overview' files which include a number of others.
* All the equates need enclosing in 'centreline'/'endcentreline'
* The converter reads in any 'included' files and inserts them. It stopped
after doing two of these and truncated the file. It should probably just
convert '*include foo' to 'input foo.th'

It once inserted a top-level 'dummy' survey where one wasn't needed. This is
a feature of converting files individually rather than as a coherent dataset,
I suspect. It could probably do a better job if it 'spidered' the dataset
from a top-level overview file, but that's a lot more work  :-)

I found one case where the ';' remained for the comment char instead of it
getting converted to '#'. I need to look at that or send ol the offending
file.

Then you can do this to convert a whole directory:
for FILE in ls *.svx; do FILE=echo $FILE | sed "s/.svx//"; echo "Converting
$FILE.svx"; svx2th $FILE.svx > $FILE.th; done

#!/usr/bin/perl -w
use strict;
# svx2th v0.1
# Copyright (C) Olly Betts 2004

sub convert_file($);

my $in_survey = 0;
my $had_fix = 0;
print "encoding iso8859-1\n";
for my $filename (@ARGV) {
    convert_file($filename);
}
if ($in_survey) {
    print "endsurvey dummy\n";
}

sub convert_file($) {
    my $filename = shift;
    open F, "<", $filename or die "$filename: $!\n";
    my @lines = <F>;
    close F;
    my $in_centre_line = 0;
    my $lineno = 0;
    my $dummy_survey = -1;
    foreach $_ (@lines) {
	++$lineno;
	# Replace ; with # as comment separator.
	# FIXME won't cope with ; in a filename or *title
	s/;/#/;

	# Comment out "*export" and "*entrance" as there seems to be no
	# equivalent of either.
	if (/^\s*\*\s*(?:export|entrance)\b/i) {
	    print "#$_";
	    next;
	}

	if ($in_centre_line) {
	    if (/^\s*\*/ && !/^\s*\*\s*(?:date|calibrate|fix|equate|data|instrument|units|sd|infer|flags|team)\b/i) {
		print "endcentreline\n";
		$in_centre_line = 0;
	    }
	} else {
	    if (/^\s*[^\s*#]/ || /^\s*\*\s*(?:date|calibrate|fix|equate|data|instrument|units|sd|infer|flags|team)\b/) {
		if (!$in_survey) {
		    # Therion can't handle these outside a centreline which
		    # must be inside a survey, so we have to add a dummy
		    # top-level survey.
		    print "survey dummy -title \"Therion is crap\"\n";
		    ++$in_survey;
		}
		print "centreline\n";
		$in_centre_line = 1;
	    }
	}

	# *begin <survey> -> survey <survey>
	if (s/^(\s*)\*(\s*)begin\b(\s*)(\S+)/$1$2survey$3$4 -title "$4" /i) {
	    ++$in_survey;
	    print $_;
	    next;
	}
	# *begin -> <nothing>
	# The *begin will cause an endcentreline / centreline pair to be
	# output which hopefully prevents settings from escaping.  However
	# this doesn't restore the old settings, just undoes any new ones.
	# FIXME: Need to address this somehow...
	if (/^(\s*)\*(\s*)begin\b[ \t]*$/i) {
	    ++$in_survey;
	    if ($dummy_survey != 0) {
		# FIXME: doesn't coped with nested *begin with no arguments...
		die "This convertor doesn't currently handle nested *begin with no arguments\n";
	    }
	    $dummy_survey = $in_survey;
            print "#$_";
	    next;
	}
	# *end [<survey>] -> endsurvey [<survey>]
	if (s/^(\s*)\*(\s*)end\b/$1$2endsurvey/i) {
	    if ($dummy_survey == $in_survey) {
		$_ = "#$_";
		$dummy_survey = -1;
	    }
	    --$in_survey;
	    if ($in_centre_line) {
		print "endcentreline\n";
		$in_centre_line = 0;
	    }
	    print $_;
	    next;
	}
	# *title <title> -> # -title <title>
	# FIXME: just comment out for now - should really convert to -title on
	# the "survey" line.
	if (s/^(\s*)\*\s*title\b\s*/$1# -title /i) {
	    print $_;
	    next;
	}
	# *team and *instrument format is unspecified in Survex, and they're
	# just informational so comment them out for now...
	# NB therion seems to be case sensitive so "Compass" isn't a valid role
	# ("compass" is)...
	# NB in *team pics -> pictures
	if (s/^(\s*)\*(\s*(?:team|instrument))\b/#$1$2/i) {
	    print $_;
	    next;
	}
	# *include -> literal text inclusion.
	# Note that the *include means an implicit *begin, but the output
	# may not reflect this correctly (since we can't handle *begin
	# with no survey name anyway...)
	if (/^\s*\*\s*include\s*"?([^"\s]*)/i) {
	    # Use Unix path separators (/ not \) - Survex understands either on
	    # either platform.
	    my $filename = $1;
	    $filename =~ s!\\!/!g;
	    $filename .= '.svx' unless $filename =~ /\.svx$/i;
	    convert_file($filename);
	    next;
	}
	# survey.subsurvey.12 -> 12 at subsurvey.survey
	if (s/^(\s*)\*(\s*equate)\b/$1$2/i) {
	    # Ensure that a comment separator doesn't get eaten by station name.
	    s/(\S)#/$1 #/;
	    s/(\S+)\.(\S+)/"$2\@".join(".",reverse split m!\.!, $1)/ge;
	    print $_;
	    next;
	}
	if (/^\s*\*\s*fix\b/i) {
	    $had_fix = 1;
	}
	s/^(\s*)\*/$1/;
	print;
    }
}

Wookey
--
Aleph One Ltd, Bottisham, CAMBRIDGE, CB5 9BA, UK Tel +44 (0) 1223 811679
work: http://www.aleph1.co.uk/ play: http://www.chaos.org.uk/~wookey/





More information about the Therion mailing list