XLWrap is a spreadsheet-to-RDF wrapper which is capable of transforming spreadsheets to arbitrary RDF graphs based on a mapping specification. It supports Microsoft Excel and OpenDocument spreadsheets such as comma- (and tab-) separated value (CSV) files and it can load local files or download remote files via HTTP.
Consider, we would like to make data from this spreadsheet available via a SPARQL endpoint. The spreadsheet contains revenues from a company represented in a cross table by country, year, and products. We want to provide the generated RDF data according to the SCOVO vocabulary.
These are the only three steps we need to do:
[xlwrap-folder]/mappings
$ bin/server
(with -p
for a port other than :8900) That's it!
You can use Snorql (known from D2R-Server)
to browse the wrapped dataset or use the SPARQL endpoint at http://localhost:8900/sparql.
Here is the complete target graph for the given example (alternatively in RDF/XML syntax).
Additional examples are provided as part of the Mapping Design Patterns collection.
A mapping is provided as part of an RDF file in TriG syntax. TriG is very similar to N3 and allows the representation of multiple (named) RDF graphs in a single file. An XLWrap mapping file must contain exactly one graph that contains exactly one RDF instance of xl:Mapping (typically as part of the default graph). Other graphs in the same file will be used to denote template graphs.
Example:
{ # default graph in TriG file [] a xl:Mapping ; xl:template [ xl:fileName "file:example.xls" ; xl:sheetName "foo" ; xl:templateGraph :Foo ; xl:transform [ ... ] ] . } :Foo { rdfs:label "A2 & ' ' & B2"^^xl:Expr ; ... }
XLWrap mappings are based on template graphs, which may contain XLWrap expressions including cell references similar to a typical spreadsheet application such as Microsoft Excel or OpenOffice Calc. Template graphs are repeatedly applied on a work sheet (or other work sheets) in order to produce the target graph. Depending on the representation of the information stored in the spreadsheet, which may be flat tables or cross tables over multiple sheets and files, each template graph is moved across sheets and subsequently applied for different combinations of cells. How template graphs are moved is specified by transform operations.
The XLWrap mapping vocabulary can be found here: http://purl.org/NET/xlwrap#.
An xl:Mapping consists of one or more map templates, which are denoted by the property xl:template. Example:
{ # default graph (mapping specification) [] a xl:Mapping ; xl:template [ ... ] ; xl:template [ ... ] ; ... }
Each map template (value of xl:template
) consists of the following components:
file:
xl:fileName "http://example.org/path/foo.xls" xl:fileName "file:foo.ods" xl:fileName "foo.ods" xl:fileName "../path/file.csv"
#1
must be specified.
Examples:xl:sheetName "Sheet 1" xl:sheetNumber "4"^^xsd:int xl:sheetNumber "4" # also allowed as plain literal
{ [] a xl:Mapping ; ... xl:constantGraph :ANamedGraph ... } :ANamedGraph { ex:dataset dc:creator "Michael Jackson" ; dc:date "NOW()"^^xl:Expr ; rdfs:seeAlso "FILENAME(A1)"^^xl:Expr ; }
{ [] a xl:Mapping ; ... xl:templateGraph :ANamedGraph ... } :ANamedGraph { [ xl:uri "'http://example.org/' & URLENCODE(A2 & B2)"^^xl:Expr ] a foaf:Person ; foaf:name "A2 & ' ' & B2"^^xl:Expr ; foaf:mbox_sha1sum "SHA(C2)"^^xl:Expr ; }
xl:transform [ a rdf:Seq ; rdf:_1 [ a xl:RowShift ; xl:breakCondition "ALLEMPTY(*.*)" ; ] ; rdf:_2 [ a xl:SheetShift ; xl:repeat "3" ; ]
The following transform operations (sub-classes of xl:Transformation) are currently available:
[ a xl:RowShift ; xl:steps "2" ; xl:repeat "2000" ; xl:restriction "Sheet1.*" ; xl:breakCondition "ALLEMPTY(*.*)" ; ]
[ a xl:ColShift ; xl:steps "5" ; xl:repeat "1000" ; xl:restriction "A3:N20;A1;'Sheet 2'.*" ; xl:breakCondition "A1 == 'foo'" ; ]
[ a xl:SheetShift ; xl:steps "1" ; xl:repeat "10" ; xl:restriction "A1;A4:A6;B9" ; xl:breakCondition "INT(LEFT(SHEETNAME(A1), 4)) <= 2007" ; ]
[ a xl:SheetRepeat ; xl:restriction "*.*" ; xl:breakCondition "A1 == 'foo'" ; xl:sheetNames "Sheet 1, Sheet 2, 'Sheet,, 3', Sheet 4" ; ]
[ a xl:FileRepeat ; xl:restriction "*.*" ; xl:breakCondition "A1 <= 10" ; xl:fileNames "http://example.com/foo.xls, file:localfile.csv" ; ]
For each of these operations, an optional range restriction (property xl:restriction) can be specified,
which restricts the operation to a multi-range, i.e. only range references within the restriction are transformed (default is any range: *.*
).
Furthermore, an optional break condition can be specified as an XLWrap Expression with the xl:breakCondition property (default condition expression: false, i.e. "never break until end of file or specified times repeated"). The condition is evaluated after a template graph is transformed and if it evaluates to true, the transform operation is skipped and XLWrap continues with the next stage of the following transform operation.
For shift operations it is possible to specify the steps (in terms of columns/rows/sheets
to shift) with the property xl:steps (default is 1
) and the number of
times the operation should be repeated with the property xl:repeat
(default is the maximum integer value of the runtime system).
In case of the repeat operations it is required to specify a list of sheet names with the property xl:sheetNames or file names, respectively, with xl:fileNames. They are specified as a comma-separated list in a plain literal.
XLWrap expressions are used:
A4 == 'foo'
)The basic elements of XLWrap expressions are:
A3
, Sheet1.A3:Sheet3.B9
, etc.)+
and -
for negative numbers: e.g. -3
or -(3+9)
%
to denote percentage of numbers: e.g. 40-20%
or SUM(A3:A9)%
!
(logical not) for boolean values or sub expressions: e.g. !(A3 == 'foo')
&
(e.g. 'foo' & A4
)+
, -
, *
, /
, ^
(e.g. 5*4 + 2*(9+3)
<
, <=
, ==
or =
, !=
or <>
, >=
, >
(e.g. A3 > 5*4
)&&
, ||
(A1 + 4) * (A3 + 10)
The complete grammar for XLWrap expressions is available as part of the distribution.
Expressions may contain range references in order to obtain values from spreadsheets. The common syntax for range references is:
( ( [filename] "#$" )? [sheet1] "." )? [column1] [row1] ( ":" ( [sheet2] "." )? [column2] [row2] )?
File and sheet names have to be quoted if they contain spaces or special characters. Please note the optional parts inside of the brackets. There are actually different kinds of ranges references:
( ( [filename] "#$" )? [sheet1] "." )? [column1] [row1]
A4 file:foo.xls#$Sheet1.A4 'Sheet 1'.Z9
":" ( [sheet2] "." )? [column2] [row2]
A4:C9 file:foo.xls#$Sheet1.A4:Sheet3.B6
( [filename] "#$" )? [sheet] ".*"
Sheet1.* http://example.org/file.xls#$'sheet x'.*
*.*
A3; Sheet1.A3:B20; file:foo.xls#$Sheet1.*
Furthermore, it is possible to specify sheets by numbers instead of names with a #
-prefix: e.g. #1.A3
refers to A3 on the first sheet of the current workbook (the current workbook
is usually the base workbook specified for the map template, however, as a consequence of transform operations this can also be another workbook file).
Depending on the situation, specific kinds of ranges may be valid or not. For instance, specifying a box range as a single expression is not
valid: A3:A9
is invalid. However, it is possible to specify a box range as an argument of the function SUM()
:
SUM(A3:A9; A10; 34; 10)
.
In order to construct URI nodes, a blank node is used with the special property xl:uri
, specifying the URI of the node. For example:
[ xl:uri "'http://example.org/' & URLENCODE(A2 & B2)"^^xl:Expr ] a foaf:Person .
will create a statement like <http://example.org/Tim+Smith> a foaf:Person
at runtime.
Similarly, it is possible to link anonymous resources (blank nodes):
[] a ex:Revenue ; scv:dimension [ xl:id "A4"^^xl:Expr ; a ex:Product ; rdfs:label "A4"^^xl:Expr ] .
The XLWrap processor will ensure, that all resources with equal xl:id
values will get equal blank node identifiers in the target graph. This
feature is very important for more sophisticated mappings. It is also used for the SCOVO example above.
The function library of XLWrap will be continuously extended. We will provide detailed documentation of functions in future. For the meanwhile, please have a look at the source code of the core functions.
It is also possible to add custom functions. Each function is implemented by a Java class. The name of the
class must adhere to the following pattern: E_Func[NAME]
where NAME
is the name of the function
as it is used as part of expressions (e.g. NOW()
is implemented by the class E_FuncNOW
). The location on the classpath is not important, you can
place it anywhere.
E_FuncYOURFUNCTION
by extending the abstract class at.jku.xlwrap.map.expr.func.XLExprFunction
.
Apart from defining a default constructor (without arguments), the only method you need to implement is this one from the
interface at.jku.xlwrap.map.expr.XLExpr
:public XLExprValue<?> eval(ExecutionContext context) throws XLWrapException, XLWrapEOFException;
getArg(int i)
..eval(context)
for any argument you wish to process. For example eval()
of E_FuncTRIM
looks like this:@Override public XLExprValue<?> eval(ExecutionContext context) throws XLWrapException, XLWrapEOFException { XLExprValue<?> v1 = getArg(0).eval(context); if (v1 == null) return null; else return new E_String(TypeCast.toString(v1).trim()); }
FunctionRegistry.registerPackage(String pkg)
somewhere in your code, where pkg
is the package name where E_FuncYOURFUNCTION
is located.
In the next release it will be possible to dynamically load function libraries with the property xl:functionLib specified
for the xl:Mapping instance.XLWrap can be used either in-process via the Jena API or as a SPARQL endpoint via the bundled XLWrap-Server.
The relevant classes to look for are:
MapParser
, which can be used to parse TriG files into instances of XLWrapMapping
, which is the representation used internally by XLWrap.XLWrapMaterializer
, which can be used to dump spreadsheets to RDF models.Example:
XLWrapMapping map = MappingParser.parse("mappings/iswc09-example-scovo.trig"); XLWrapMaterializer mat = new XLWrapMaterializer(); Model m = mat.generateModel(map); m.write(new FileOutputStream("docs/website/example/revenues-scovo.n3"), "N3");
The materializer returns an ordinary Jena model. If you want to provide your own Jena model (possibly an inference model), you can explicitly provide a target model:
Model m = mat.generateModel(map, targetModel);
The materializer instance can be reused multiple times for different processes. It will keep any workbooks in memory for faster processing of multiple mappings with common spreadsheet files (especially when they have been downloaded from the Web).
Just start the server from the XLWrap directory: $ bin/server
and place mapping files into the folder mappings
. The command takes the
following arguments:
-?,--help help -B,--dataset-base <uri-prefix> Dataset base URI prefix for Pubby (see http://www4.wiwiss.fu-berlin.de/pubby/) -c,--config <port> Joseki config file -D,--desc <text> Dataset description -d,--data <port> RDF data (.n3, .ttl, .rdf, .xml) -H,--homepage <url> Homepage -h,--host <hostname> hostname (if it cannot be correctly determined by the JVM) -p,--port <port> port (default: 8900) -T,--title <text> Dataset title
If no argument is specified, the shell script automatically adds "-c joseki-xlwrap.ttl" to the call. The options -c
and -p
are probably the most important ones. Some meta data displayed at the web frontend and used by
Pubby can be specified also.
Now open your browser and go to http://localhost:8900 (please adjust the port if needed).
Useful Functions to provide Linked Data
XLWrap provides useful functions for mappings: HOSTNAME()
and PORT()
that can be used to obtain the current hostname and port
the server is running on. Another important function is MAKEURI()
, which can be used to create local URIs in order to make the resources accessible
via the integrated Linked Data Browser Pubby. Call MAKEURI()
with either one or two String
arguments. In the first case, the argument is URL-encoded, in the second case, the first argument is appended as-is and the second String argument is URL-encoded.
Additionally, we will provide functions in future to help users creating external links to other Linked Data sources such as DBpedia. We will integrate several record linkage algorithms and geo-mapping functions to create links to DBpedia, geonames.org, etc.
Sources: https://github.com/theandyl/xlwrap
Downloads: http://sourceforge.net/projects/xlwrap/files/xlwrap/
XLWrap has been developed because there previously was no spreadsheet-to-RDF wrapper that supported cross tables. The following related projects are all based on a row-by-row wrapping process and plain text CSV files only:
Andreas Langegger, Wolfram Wöß (2009): XLWrap – Querying and Integrating Arbitrary Spreadsheets with SPARQL. In Proceedings of the 8th International Semantic Web Conference (ISWC2009), Washington D.C. LNCS 5823, Springer, 2009.
Andreas Langegger, Wolfram Wöß (2009): Querying and Semantically Integrating Spreadsheet Collections with XLWrap-Server – Use Cases and Mapping Design Patterns. Poster & Demo Session at the 8th International Semantic Web Conference (ISWC2009), Washington D.C., 2009.
Please use the XLWrap Users mailing list to get support.
If you need professional support in your project, please contact me directly.
Contact: aka AndyL
Many thanks to Richard Cyganiak for contributing bug fixes and very good ideas for improvements and new features.
XLWrap has been developed as another wrapper for the Semantic Web Integrator and Query Engine (SemWIQ).
Thanks to SourceForge.net for providing the infrastructure.