[Therion] Dataset structure including survex files

Wookey wookey at aleph1.co.uk
Tue Jun 27 03:59:51 CEST 2006

I have been trying to rationalise the Mulu dataset I first created in 2003,                              
to be able to process individual caves as well as the whole dataset.                                     
I have also been trying to make the 2003 data compatible with the 2005 data                              
which Andrew Atkinson created.                                                                           
Andrew has greatly improved the layout of the dataset over my initial efforts,                             
and has a hierarchical setup that works. However his dataset has used some                               
slightly different conventions to mine. By using his ideas I too have a                                  
hierarchical dataset that works (in part- see below), but marging data from                              
the two schemes doesn't work properly.                                                                   
I have been trying to understand exactly what is going on, and with a lot of                             
testing I have got some very peculiar results which make it clear to me that                             
I do not understand what Therion is doing, and why some things are                                       
happening, nor what the correct solution to the problem is.                                              
I'm afraid this is a very long post but I want to explain what I have done                               
so far and why, in what I hope is a clear way; not least because we have had                             
parts of this conversation before on the list and there has been some                                    

The dataset is available from svn://wookware.org/mulusurvey                                              
A tarball with the scans removed to keep it small is here:                                               
http://wookware.org/files/benarat.tgz (4MB)

I am using therion 0.3.10                                                                                 
Before I get into too much detail let me explain what we want to achieve:                                
 * centreline data stored in survex form. Processed to top-level .3d file                                
 * drawings stored in therion format                                                                     
 * ability to process the whole dataset to produce maps of everything                                    
 * ability to process individual caves to produce maps of them
 * ideally mixing scraps with therion-style and survey-style station naming syntaxes
Andrew's 2005 Layout looks like this:   

(example taken from api/Whiterock/)

There is one thconfig file for each cave:
top-level thconfig has:
 source Whiterock_only.th
Whiterock_only has:
 survey whiterock
  import Whiterock.3d -surveys use -filter whiterock.the_ashes_series
  input WHiterock.th

Whiterock.svx has:
 *begin Whiterock
  *include Api_Birthday.svx
 *end Whiterock

Whiterock.th has:
 input Api_Birthday.th
 input Api_Chamber.th
 <joins between caves>

Api_Birthday.th has:
 input Api_Birthday.th2
 map Api_BirthdayScPl1
 map Api_BirthdayScPl2
 joins if needed

stations notation is: 
 station -name api_birthday.16

thconfig in Api_Birthday dir has:
 source Api_Birthday_only.th
 exports ...
 layout benarat
  local changes to layout

Api_Birthday_only.th has:
 survey Api_Birthday
  import Api_Birthday.3d -surveys use
  input Api_Birthday,th

Api_Birthday.svx has:
 *begin Api_Birthday
 *end Api_Birthday

This scheme works nicely, but has the disadvantage that you have to generate a 3d file for each 
processable subdirectory - you cannot use the top-level 3d file that contains all the info.

The layout I have used is slightly different - the main differences being the station syntax 
and the import syntax.

Wook scheme:

top-level thconfig has:
 source benarat_only.th

benerat_only.th has:
 survey benarat
  import benarat.3d -surveys use
  input benarat.th
 endsurvey benarat

benarat.th has:
 input terikan/terikan.th
 input davis/davids.th
 input menagerie/menagerie.th
 <joins between caves>
 input davids.th2
  <davids scraps>
thconfig in davids dir:
 survey davids
  import ../benarat.3d -surveys use 
  input davids.th

stations (in davids.th2) are:
station -name davids.a15

thconfig in terikan subdir has:
 survey terikan
  import ../benarat.3d
  input terikan.th

stations are:
stattion -name 23 at terikan


This scheme also works (or at least it was but I now seem to have broken it
and have become hoplessly confused about how things work). It has the
advantage of allowing you to use one consistent top-level .3d data file, and
it seems to be possible to mix the different station notations that the 2003
(wook) and 2005 (andy)  datasets use.

The thing I do not understand is how the survey hierarchy in 'import'ed .3d
files relates to the survey hierarchy in the therion files, and the
significance of the station at survey syntax versus the survey.station syntax.

I started by reading the thbook entry:

Description: Reads survey data in diferent formats (currently processed
centreline in *.3d, *.plt, *.xyz formats). Survey stations may be referenced in scraps
etc. When importing Survex' 3D file, stations are inserted in survey hierarchy, if there
exists identical hierarchy to that in 3D file.
Syntax: import <file-name> [OPTIONS]
Context: survey / all (only with .3d files where survey structure is specified)
* filter   <prefix> -> if specied, only stations with given prefix and shots
   between them will be imported. Prefix will be removed from station names.
* surveys (create)/use/ignore -> species how to import survey structure
    (works only with .3d files).
   create -> split stations into subsurveys, if subsurveys do not exist, create them
   use    -> split stations into existing subsurveys
   ignore -> do not split stations into sub-surveys

This didn't really enlighten me enough to understand why some of my data
works and some doesn't. 

What does 'split stations into sub-surveys' actually mean (surveys in the
.th files or in the .3d file, or both?). Examples are needed here to make
this clear. Does the top level of the .3d file have to match the current
survey in the therion hierarchy or the next level down? How can you tell
whether the stations have 'fitted' into existing subsurveys or not?

And what happens if you don't specify any of create/use/ignore. Do you get a
fourth behaviour, or one of the above? This needs specifying.

In an attempt to work it out for myself I tried some things and found this:

if we have davids.a15 style names then we need -surveys use on import to work
if we have a15 at davids style names then it works with or without -surveys use 
so we can mix these two styles with -surveys use. (confirmed)

Why is this? I don't understand what's going on.

I also noticed (see davids.th2) that you can do "-station-names davids." in
the scrap line, then station -name a15 to get davids.a15 style names. Is
there equivalent syntax for a15 at davids style?

The big problem here is that I don't want to have to rename hundreds of
stations in existing datasets from therion-stryle <station>@<survey> syntax
to survex-style survey.station syntax unless I really have to. And not
mentioning the survey-name in the .th2 scrap definitions is not always an
option, because scraps often cross survey joins.

Is seems to be possible to mix syntaxes, but it is tricky to get right, and
when it doesn't work it seems to be difficult to work out what you should do
to fix it. 

I have caves with various layouts. Some work and some don't:
Bluemoon, Moon_cave, davids and terikan
 * terikan is quite big with a load of subsurveys. Stations are therion-style
 * bluemoon is all one survey 'fake.bluemoon'. stations are just "-name 10"
       - no survey specified.
 * Moon_Cave is a 2005 survey, so it has survex-style names, excpet mainline_side which I have changed to
therion-style station names
 * davids is a simple one-survey cave, with survex-style names, but using 
"-station-names davids." in the scrap

Trying to process these individually I have found:

* terikan: 

1) with import ../benarat.3d, and import command outside survey terikan
works, with these warnings:
 therion: warning -- unable to import 25 stations outside survey
 therion: warning -- unable to import 43 shots outside survey

2) with import ../benarat.3d, and import command outside survey terikan _and_ inside survey terikan
works, no warnings. (this is a very odd result!). Having two instances of survey terikan is obviously 'wrong'
but it works - in fact it works better than above. 

3) with import ../benarat -survey use
 therion: warning -- unable to import 3895 stations outside survey
 therion: error -- faketerikan.th2 [95] -- survey does not exist -- fake -- invalid station reference -- 1 at fake

4) with import ../benarat -survey create
 therion: error -- faketerikan.th2 [95] -- survey does not exist -- fake -- invalid station reference -- 1 at fake

5) with import ../benarat -filter benarat, and import command outside survey terikan
works, with these warnings:
therion: warning -- unable to import 25 stations outside survey
therion: warning -- unable to import 43 shots outside survey

6) with import ../benarat -filter terikan, and import command outside survey terikan
 therion: warning -- unable to import 25 stations outside survey
 therion: warning -- unable to import 71 shots outside survey
 therion: error -- faketerikan.th2 [95] -- survey does not exist -- fake -- invalid station reference -- 1 at fake

7) with import ../benarat -filter terikan, and import command inside survey terikan
works, no warnings.

>From this lot I infer that 7) is the 'right' way to do it, but why does having two 
levels of survey terikan (2) seem to work? And why doesn't 5) work - it seems to me that it should?

And I have previously  determined above that we need -surveys use to allow mixing of station-styles. But I
can't get it to work with -surveys use. I tried 
8)  import ../benarat -surveys use -filter terikan, and import command inside survey terikan
therion: error -- faketerikan.th2 [95] -- survey does not exist -- fake -- invalid station reference -- 1 at fake

But I thought it would work. Why doesn't it?

* bluemoon 

I can only get this to work with import ../benarat.3d -surveys use -filter bluemoon, or 
import ../benarat.3d -filter bluemoon, 
Everything else fails. I don't understand why I get different results from with terikan - it 
may be something to do with the fact that the scraps have no survey names specified?

* davids

Works fine with -surveys use, doesn't work without it.

* Moon_Cave

I have never got this to work - whatever I do I always get:
therion: error -- Benerat_Mainline/Benerat_Mainline.th2 [396] -- survey does not exist -- mainline_side --
invalid station reference -- 52 at mainline_side
therion: error -- Darkside/Darkside.th2 [158] -- invalid station reference -- darkside.13

* Overall benarat survey

The other aspect of this is processing the whole benarat dataset. I did have this working with both terikan
and davids (mixed station name syntaxes) (which needed -surveys use), but I can't get it to do it now 
(after a huge amount of chopping and changing things to write this mail. <aaargghh>

But I have never managed to get Moon_Cave to be included as well. 

Now I can get terikan and bluemoon (and menagerie) to work together - i.e. caves that have therion-style
names,  _or_ I can get davids to process (needs -surveys use), but not both together.

I tried looking at therion.logs to try and better understand what is going on. 
The logs look rather odd.

can have survey terikan in terikan_only, _and_ survey terikan in terikan.th
that it includes. Still processes fine! therion.log shows an extra .terikan
on top of everything: 
63 at snailchamber.terikan.terikan
40 at flatline.moon_cave.terikan
faketerikan_s13 at terikan.terikan

if you remove the survey terikan in .th files then it fails to process!
(either from terikan dir (using terikan_only.th), or above benerat dir,
where benerat.th has survey terikan/endsurvey round the input
terikan/terikan.th line). This makes no sense!
error is 
 processing references ... 
 therion: error -- faketerikan.th2 [95] -- survey does not exist -- fake -- invalid station reference -- 1 at fake
where 1 at fake is first station referenced in terikan.th2 - via faketerikan.th2 
therion.log still has 
63 at snailchamber.terikan.terikan
but no scrap references as it barfs before then

The log files seem to show the stations that were not used, which can be a
clue, but we can't see the stations that _were_ used, which might actually
be more useful. 

Given how hard it has been for me to make sense of this, having spent some
10 hours or so getting to my current state of confusion, I think we need a
better way of debugging these sorts of errors, as well as a very clear
desription of how it is suppposed to work.

Apologies for the length of this - but I hope you can understand both ewhat
we are trying to do, and the problems we are having in achieving it. 

Aleph One Ltd, Bottisham, CAMBRIDGE, CB5 9BA, UK  Tel +44 (0) 1223 811679
work: http://www.aleph1.co.uk/                 play: http://wookware.org/

More information about the Therion mailing list