---+ Reification in the Virtuoso Sponger
%TOC%
*Note*: Some of the underlying implementation of reification is in flux
---++ What is Reification?
Reification is one level of useful abstraction, in which raw triples are
modeled as resources in their own right, allowing description and annotation
of those triples.
A typical use is provenance: given a particular resource to sponge, the
Virtuoso Sponger has many components that can contribute triples, so it can be
useful to trace which cartridge is responsible.
---+++ Data Islands
In addition to the datasource-specific cartridges, the HTML+Variants extractor
cartridge identifies several ways of embedding RDF data in HTML, which we term
_data islands_.
* HTML5 Microdata (itemscope
, itemtype
, itemprop
attributes)
* RDFa microdata (about
, property
, resource
attributes)
* JSON-LD using <script type="application/ld+json"> ... </script>
* Turtle and N3 using <script type="text/turtle"> ... </script>
* GRDDL (hRecipe, hCard, hCalendar, hProduct, xFolk, eRDF, etc)
Additionally, if installed, the Turtle Meta-cartridge identifies Turtle in any "content" triple, e.g. titles, descriptions, social media post bodies, etc.
---++ Configuration
The HTML+Variants extractor cartridge takes a handful of options by which one
can configure which data-islands contribute:
* rdfa=yes
- controls whether the RDFa extractor runs
* reify_rdfa=1
- determines whether extracted RDFa is reified
* reify_html5md=1
- determines whether extracted HTML5 Microdata is reified
* reify_jsonld=1
- determines whether extracted JSON-LD is reified
* reify_all_grddl=0
- determines whether all other GRDDL data is reified
---++ Sample Input
Let us assume a very simple input HTML document, as follows:
Turtle-in-script test
Testing Turtle in scripts
Stuff
As we can see, this contains one RDFa statement in the <title>
element and a
small pool of Turtle data in a script
element.
---++ Sample Output
When sponging with the default settings for HTML+Variants extractor cartridge
enabled, we see:
| type | Document |
| sameAs | #this |
| container of | Embedded RDFa Statement 1 |
| | Embedded TTL-script Statement 1 |
| | Embedded TTL-script Statement 2 |
| Title | Turtle-in-script test |
Expanding the Embedded RDFa Statement 1, we see:
| type | Statement |
| label | Embedded RDFa Statement 1 |
| described by | Turtle test |
| | <> |
| subject | Turtle test |
| predicate | Title |
| object | Turtle test |
| Sponge Time | 2014-06-11 14:42:40.200348 (xsd:date) |