Discussion:
Parsing incoming word documents
Joel Palmius
2008-02-12 12:04:06 UTC
Permalink
Large parts of my work with surveys currently consists of converting a
mailed word document into a survey file, something which is both boring
and time consuming. In order to not have to copy it by hand I use the
program "antiword" to convert it into a text file, to which I then add the
survey syntax around the text.

I have now grown tired of this, so I've written a quick hack to convert
the usual looks of an antiword-parsed word document into a survey file.
The following file:

I like pie
true ... false

Pie should contain fruit
* apple
* banana
* strawberry

And this ingredient
- Sugar
- Cinnamon
- Rye

I'll call the pie recipe
_____

gets converted into:

<?xml version="1.0" encoding="iso-8859-1" standalone="no"?>
<!DOCTYPE SURVEY PUBLIC "-//Joel Palmius//DTD Survey markup
definition//EN" "http://www.modsurvey.org/mod_survey/survey-3.2.5.dtd">
<SURVEY TITLE="Autogenerated">
<LICKERT NAME="var1" CAPTION="I like pie" LEFTTAG="true" RIGHTTAG="false" STEPS="3" />
<CHOICE NAME="var2" CAPTION="Pie should contain fruit">
<CHOICEELEMENT VALUE="1" CAPTION="apple" />
<CHOICEELEMENT VALUE="2" CAPTION="banana" />
<CHOICEELEMENT VALUE="3" CAPTION="strawberry" />
</CHOICE>
<LIST NAME="var3" CAPTION="And this ingredient">
<LISTELEMENT CAPTION="Sugar" />
<LISTELEMENT CAPTION="Cinnamon" />
<LISTELEMENT CAPTION="Rye" />
</LIST>
<TEXT NAME="var4" CAPTION="I'll call the pie recipe" />
</SURVEY>

Lines that the script cannot interpret gets included as COMMENT tags. I
hope this will shorten my pipeline from mailed word document to ready
survey file significantly.

The script is currently in svn with the name "docinterpreter.pl". (You can
thus download it at http://www.modsurvey.org/svn/branches/32x/mod_survey/docinterpreter.pl )

To use, invoke the following:

cat [crude text file] | perl docinterpreter.pl > [survey file]

// Joel
Skickat av Joel Palmius <***@miun.se>
till survey-discussion

Loading...