Reification in the Virtuoso Sponger

Note: Some of the underlying implementation of reification is in flux

What is Reification?

Reification is one level of useful abstraction, in which raw triples are modeled as resources in their own right, allowing description and annotation of those triples.

A typical use is provenance: given a particular resource to sponge, the Virtuoso Sponger has many components that can contribute triples, so it can be useful to trace which cartridge is responsible.

Data Islands

In addition to the datasource-specific cartridges, the HTML+Variants extractor cartridge identifies several ways of embedding RDF data in HTML, which we term data islands.

Additionally, if installed, the Turtle Meta-cartridge identifies Turtle in any "content" triple, e.g. titles, descriptions, social media post bodies, etc.

Configuration

The HTML+Variants extractor cartridge takes a handful of options by which one can configure which data-islands contribute:

Sample Input

Let us assume a very simple input HTML document, as follows:


<html 
  xmlns="http://www.w3.org/1999/xhtml"
  xmlns:dc="http://purl.org/dc/elements/1.1/">
  <head>
    <title property="dc:title" content="Turtle test">Turtle-in-script test</title>
    <script type="text/turtle">
    <![CDATA[
    <http://example.org/person/Mark_Twain>
	<http://example.org/relation/author> 
	<http://example.org/books/Huckleberry_Finn> ;
	<http://xmlns.com/foaf/0.1/#name> "Mark Twain" .
    ]]>
    </script>
    </head>
  <body>
    <h1>Testing Turtle in scripts</h1>
    Stuff
    <hr />
  </body>
</html>

As we can see, this contains one RDFa statement in the <title> element and a small pool of Turtle data in a script element.

Sample Output

When sponging with the default settings for HTML+Variants extractor cartridge enabled, we see:

type Document
sameAs #this
container of Embedded RDFa Statement 1
Embedded TTL-script Statement 1
Embedded TTL-script Statement 2
Title Turtle-in-script test

Expanding the Embedded RDFa Statement 1, we see:

type Statement
label Embedded RDFa Statement 1
described by Turtle test
<>
subject Turtle test
predicate Title
object Turtle test
Sponge Time 2014-06-11 14:42:40.200348 (xsd:date)