Note: Some of the underlying implementation of reification is in flux
Reification is one level of useful abstraction, in which raw triples are modeled as resources in their own right, allowing description and annotation of those triples.
A typical use is provenance: given a particular resource to sponge, the Virtuoso Sponger has many components that can contribute triples, so it can be useful to trace which cartridge is responsible.
In addition to the datasource-specific cartridges, the HTML+Variants extractor cartridge identifies several ways of embedding RDF data in HTML, which we term data islands.
itemscope
, itemtype
, itemprop
attributes) about
, property
, resource
attributes) <script type="application/ld+json"> ...
</script>
<script type="text/turtle"> ...
</script>
Additionally, if installed, the Turtle Meta-cartridge identifies Turtle in any "content" triple, e.g. titles, descriptions, social media post bodies, etc.
The HTML+Variants extractor cartridge takes a handful of options by which one can configure which data-islands contribute:
rdfa=yes
- controls whether the RDFa extractor runs reify_rdfa=1
- determines whether extracted RDFa is reified reify_html5md=1
- determines whether extracted HTML5 Microdata is reified reify_jsonld=1
- determines whether extracted JSON-LD is reified reify_all_grddl=0
- determines whether all other GRDDL data is reifiedLet us assume a very simple input HTML document, as follows:
<html xmlns="http://www.w3.org/1999/xhtml" xmlns:dc="http://purl.org/dc/elements/1.1/"> <head> <title property="dc:title" content="Turtle test">Turtle-in-script test</title> <script type="text/turtle"> <![CDATA[ <http://example.org/person/Mark_Twain> <http://example.org/relation/author> <http://example.org/books/Huckleberry_Finn> ; <http://xmlns.com/foaf/0.1/#name> "Mark Twain" . ]]> </script> </head> <body> <h1>Testing Turtle in scripts</h1> Stuff <hr /> </body> </html>
As we can see, this contains one RDFa statement in the <title>
element and a small pool of Turtle data in a script
element.
When sponging with the default settings for HTML+Variants extractor cartridge enabled, we see:
type | Document |
sameAs | #this |
container of | Embedded RDFa Statement 1 |
Embedded TTL-script Statement 1 | |
Embedded TTL-script Statement 2 | |
Title | Turtle-in-script test |
Expanding the Embedded RDFa Statement 1, we see:
type | Statement |
label | Embedded RDFa Statement 1 |
described by | Turtle test |
<> | |
subject | Turtle test |
predicate | Title |
object | Turtle test |
Sponge Time | 2014-06-11 14:42:40.200348 (xsd:date) |