Virtuoso Programmer's Guide - RDF Middleware ("Sponger") (Part 3)
Contents (Part 3)
- Sponger Queue API
- Functions
- REST Web service
- Useful Virtuoso Functions
- String Functions
- sprintf_inverse
- split_and_decode
- Retrieving URLs
- http_get
- http_request_header
- Handling Non-XML Response Content
- json_parse
- Writing Arbitrarily Long Text
- http
- string_output
- string_output_string
- XML & XSLT
- xtree_doc
- xpath_eval
- DB.DBA.RDF_MAPPER_XSLT
- Character Set Conversion
- serialize_to_UTF8_xml
- Loading Data Into the Quad Store
- DB.DBA.RDF_LOAD_RDFXML
- DB.DBA.TTLP
- Debug Output
- dbg_obj_print
- String Functions
- References
- Appendix A: PingTheSemanticWeb? RDF Notification Service
- Appendix B: Main Namespaces used by OpenLink Cartridges
- Appendix C: Freebase Cartridge & Stylesheet
Sponger Queue API
Functions
- DB.DBA.RDF_SPONGER_QUEUE_ADD: This function is available when the cartridges vad is installed.
DB.DBA.RDF_SPONGER_QUEUE_ADD (url, options);
- url: the Network Resource URI to be fetched;
-
options: an array usually typical sponger pragmas, for ex:
vector ('get:soft', 'soft', 'refresh_free_text', 1);
REST Web service
The Sponger REST Web service has the following characteristics:
-
endpoint:
http://cname/about/service
-
parameters:
-
op=add
: type of operation, for now addition to the queue is supported -
uris=[json array]:
an array of URIs to be added to the sponger queue, the format is JSON array, for example:
{ "uris":["http://www.amazon.co.uk/Hama-Stylus-Input-Apple-iPad/dp/B003O0OM0C", "http://www.amazon.co.uk/Krusell-GAIA-Case-Apple-iPad/dp/B003QHXWWC" ] }
-
The service will return a json encoded result of the number of items added, for example:
{ "result":2 }
In case of error a JSON with error text will be returned and http status 500.
cURL example
- Assume file.txt which contains URL encoded JSON string:
uris=%7B%20%22uris%22%3A%5B%22http%3A%2F%2Fwww.amazon.co.uk%2FHama-Stylus-Input-Apple-iPad%2Fdp%2FB003O0OM0C%22%2C%20%22http%3A%2F%2Fwww.amazon.co.uk%2FKrusell-GAIA-Case-Apple-iPad%2Fdp%2FB003QHXWWC%22%20%5D%20%7D
- Execute the following command:
curl -i -d@file.txt http://cname/about/service?op=add HTTP/1.1 200 OK Server: Virtuoso/06.02.3129 (Darwin) i686-apple-darwin10.0.0 VDB Connection: Keep-Alive Date: Thu, 05 May 2011 12:06:24 GMT Accept-Ranges: bytes Content-Type: applcation/json; charset="UTF-8" Content-Length: 14 { "result":2 }
Useful Virtuoso Functions
String Functions
sprintf_inverse
sprintf_inverse takes a string to parse and a format string. If the first argument matches the format string then it returns vector of the values matching the placeholders in the format string. sprintf_inverse comes in useful for extracting fields from URLs.Full description: Virtuoso Functions Guide entry
Example
tmp := sprintf_inverse (new_origin_uri, 'http://farm%s.static.flickr.com/%s/%s_%s.%s', 0); img_id := tmp[2];
split_and_decode
Converts escaped key=value pairs in an input string into a vector of strings.Full description: Virtuoso Functions Guide entry
Example
To split the HTTP request and response headers passed to the cartridge hook function.request_hdr := headers[0]; response_hdr := headers[1]; host := http_request_header (request, 'Host'); tmp := split_and_decode (request_hdr[0], 0, '\0\0 '); http_method := tmp[0]; url := tmp[1]; protocol_version := substring (tmp[2], 6, 8); tmp := rtrim (response_hdr[0], '\r\n'); tmp := split_and_decode (response_hdr[0], 0, '\0\0 ');
Retrieving URLs
http_get
Returns a string containing the body of the requested URL and optionally the request headers.Full description: Virtuoso Functions Guide entry
Example
url := sprintf('http://api.flickr.com/services/rest/?? method=flickr.photos.getInfo&photo_id=%s&api_key=%s', img_id, api_key); tmp := http_get (url, hdr); if (hdr[0] not like 'HTTP/1._ 200 %') signal ('22023', trim(hdr[0], '\r\n'), 'RDFXX'); xd := xtree_doc (tmp);
DB.DBA.RDF_HTTP_URL_GET
A wrapper around http_get. Retrieves a URL using the specified HTTP method (defaults to GET). The function can handle proxies, redirects (up to fifteen) and HTTPS.Example
uri := sprintf ('http://musicbrainz.org/ws/1/%s/%s?type=xml&inc=%U', kind, id, inc); cnt := RDF_HTTP_URL_GET (uri, '', hdr, 'GET', 'Accept: */*'); xt := xtree_doc (cnt); xd := DB.DBA.RDF_MAPPER_XSLT (registry_get ('_cartridges_path_') || 'xslt/main/mbz2rdf.xsl', xt, vector ('baseUri', new_origin_uri));
http_request_header
Returns an array containing the HTTP request header lines.Full description: Virtuoso Functions Guide entry
Example
content := RDF_HTTP_URL_GET (rdf_url, new_origin_uri, hdr, 'GET', 'Accept: application/rdf+xml, text/rdf+n3, */*'); ret_content_type := http_request_header (hdr, 'Content-Type', null, null);
Handling Non-XML Response Content
json_parse
Parses JSON content into a tree.Example
url := sprintf ('http://www.freebase.com/api/service/mqlread?queries=%U', qr); content := http_get (url, hdr); tree := json_parse (content); tree := get_keyword ('ROOT', tree); tree := get_keyword ('result', tree);
Writing Arbitrarily Long Text
http
Writes to an http client or a string output stream.Full description: Virtuoso Functions Guide entry
Example
Writing N3 to a string output stream using function http(), parsing the N3 into a graph, then loading the graph into the quad store.ses := string_output (); http ('@prefix opl: <http://www.openlinksw.com/schema/attribution#> .\n', ses); http ('@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .\n', ses); ... DB.DBA.TTLP (ses, base, graph); DB.DBA.RDF_LOAD_RDFXML (strg, base, graph);
string_output
Makes a string output stream, a special object that may be used to buffer arbitrarily long streams of data. The HTTP output functions optionally take a string output stream handle as a parameter and then output to the string stream instead of the HTTP client.Full description: Virtuoso Functions Guide entry
Example
ses := string_output (); cnt := http_get (sprintf ('http://download.finance.yahoo.com/d/quotes.csv?s=%U&f=nsbavophg&e=.csv', symbol)); arr := rdfm_yq_parse_csv (cnt); http ('<quote stock="NASDAQ">', ses); foreach (any q in arr) do { http_value (q[0], 'company', ses); http_value (q[1], 'symbol', ses); ... } http ('</quote>', ses); content := string_output_string (ses); xt := xtree_doc (content);
string_output_string
Produces a string out of a string output stream.Full description: Virtuoso Functions Guide entry
Example
See string_output above.XML & XSLT
xtree_doc
Parses the input string, which is expected to be a well formed XML fragment and returns a parse tree as a special memory-resident object.Full description: Virtuoso Functions Guide entry
Example
content := RDF_HTTP_URL_GET (uri, '', hdr, 'GET', 'Accept: */*'); xt := xtree_doc (content);
xpath_eval
Applies an XPATH expression to a context node and returns the result(s).Full description: Virtuoso Functions Guide entry
Example
profile := cast (xpath_eval ('/html/head/@profile', xt) as varchar);
DB.DBA.RDF_MAPPER_XSLT
A simple wrapper around the Virtuoso function xslt which sets the current user to dba before returning an XML document transformed by an XSLT stylesheet.Full description: Virtuoso Functions Guide entry
Example
tmp := http_get (url); xd := xtree_doc (tmp); xt := DB.DBA.RDF_MAPPER_XSLT ( registry_get ('_cartridges_path_') || 'xslt/main/atom2rdf.xsl', xd, vector ('baseUri', coalesce (dest, graph_iri)));
Character Set Conversion
serialize_to_UTF8_xml
Converts a value of arbitrary type to a UTF-8 string representation.Full description: Virtuoso Functions Guide entry
Example
xt := DB.DBA.RDF_MAPPER_XSLT ( registry_get ('_cartridges_path_') || 'xslt/main/crunchbase2rdf.xsl', xt, vector ('baseUri', coalesce (dest, graph_iri), 'base', base, 'suffix', suffix)); xd := serialize_to_UTF8_xml (xt); DB.DBA.RM_RDF_LOAD_RDFXML (xd, new_origin_uri, coalesce (dest, graph_iri));
Loading Data Into the Quad Store
DB.DBA.RDF_LOAD_RDFXML
Parses the content of RDF/XML text into a sequence of separate triples and loads the triples into the specified graph in the Virtuoso Quad Store.
Full description: Virtuoso Functions Guide entry
Example
content := RDF_HTTP_URL_GET (uri, '', hdr, 'GET', 'Accept: */*'); xt := xtree_doc (content); xd := DB.DBA.RDF_MAPPER_XSLT ( registry_get ('_cartridges_path_') || 'xslt/main/mbz2rdf.xsl', xt, vector ('baseUri', new_origin_uri)); xd := serialize_to_UTF8_xml (xd); DB.DBA.RM_RDF_LOAD_RDFXML (xd, new_origin_uri, coalesce (dest, graph_iri));
DB.DBA.TTLP
Parses the content of TTL (TURTLE or N3) text and loads the triples into the specified graph in the Virtuoso Quad Store.
Full description: Virtuoso Functions Guide entry
Example
sess := string_output (); ... http (sprintf ('<http://dbpedia.org/resource/%s> <http://xbrlontology.com/ontology/finance/stock_market#hasCompetitor> <http://dbpedia.org/resource/%s> .\n', symbol, x), sess); http (sprintf ('<http://dbpedia.org/resource/%s> <http://www.w3.org/2000/01/rdf-schema#isDefinedBy> <http://finance.yahoo.com/q?s=%s> .\n', x, x), sess); content := string_output_string (sess); DB.DBA.TTLP (content, new_origin_uri, coalesce (dest, graph_iri));
Debug Output
dbg_obj_print
Prints to the Virtuoso system console.Full description: Virtuoso Functions Guide entry
Example
dbg_obj_print ('try all grddl mappings here');
References
- RDF Primer: http://www.w3.org/TR/2004/REC-rdf-primer-20040210/
- RDF/XML Syntax Specification: http://www.w3.org/TR/rdf-syntax-grammar/
- GRDDL Primer: http://www.w3.org/TR/grddl-primer/
Appendix A: PingTheSemanticWeb? RDF Notification Service
PingtheSemanticWeb (PTSW) is a repository for RDF documents. The PTSW web service archives the location of recently created or updated RDF documents on the Web. It is intended for use by crawlers or other types of software agents which need to know when and where the latest updated RDF documents can be found. They can request a list of recently updated documents as a starting location to crawl the Semantic Web.
You may find this service useful for publicizing your own RDF content. Content authors can notify PTSW that an RDF document has been created or updated by pinging the service with the URL of the document. The Sponger supports this facility through the async_queue and ping_service parameters of the cartridge hook function, where the ping_service parameter contains the ping service URL as configured in the SPARQL section of the virtuoso.ini file:
[SPARQL] ... PingService = http://rpc.pingthesemanticweb.com/ ...
The configured ping service can be called using an asynchronous request and the RDF_SW_PING procedure as illustrated below.
create procedure DB.DBA.RDF_LOAD_HTML_RESPONSE ( in graph_iri varchar, in new_origin_uri varchar, in dest varchar, inout ret_body any, inout async_queue any, inout ping_service any, inout _key any, inout opts any ) { ... if ( ... and async_queue is not null) aq_request (async_queue, 'DB.DBA.RDF_SW_PING', vector (ping_service, new_origin_uri));
For more details, please refer to the section Asynchronous Execution and Multithreading in Virtuoso/PL in the Virtuoso reference documentation.
Appendix B: Main Namespaces used by OpenLink Cartridges
A list of the main namespaces / ontologies used by OpenLink-provided Sponger cartridges is given below. Some of these ontologies may prove useful when creating your own cartridges.
- - http://www.openlinksw.com/virtuoso/xslt/
- - http://www.openlinksw.com/schemas/XHTML#
- rdf: http://www.w3.org/1999/02/22-rdf-syntax-ns#
- rdfs: http://www.w3.org/2000/01/rdf-schema#
- dc: http://purl.org/dc/elements/1.1/
- dcterms: http://purl.org/dc/terms/
- foaf: http://xmlns.com/foaf/0.1/
- sioc: http://rdfs.org/sioc/ns#
- sioct: http://rdfs.org/sioc/types#
- skos: http://www.w3.org/2004/02/skos/core#
- bibo: http://purl.org/ontology/bibo/
Appendix C: Freebase Cartridge & Stylesheet
Snapshots of the Freebase cartridge and stylesheet compatible with the meta-cartridge example presented earlier in this document can be found below.
DB.DBA.RDF_LOAD_MQL:
--no_c_escapes- create procedure DB.DBA.RDF_LOAD_MQL (in graph_iri varchar, in new_origin_uri varchar, in dest varchar, inout _ret_body any, inout aq any, inout ps any, inout _key any, inout opts any) { declare qr, path, hdr any; declare tree, xt, xd, types any; declare k, cnt, url, sa varchar; hdr := null; sa := ''; declare exit handler for sqlstate '*' { --dbg_printf ('%s', __SQL_MESSAGE); return 0; }; path := split_and_decode (new_origin_uri, 0, '%\0/'); if (length (path) < 1) return 0; k := path [length(path) - 1]; if (path [length(path) - 2] = 'guid') k := sprintf ('"id":"/guid/%s"', k); else { if (k like '#%') k := sprintf ('"id":"%s"', k); else { sa := DB.DBA.RDF_MQL_GET_WIKI_URI (k); k := sprintf ('"key":"%s"', k); } } qr := sprintf ('{"ROOT":{"query":[{%s, "type":[]}]}}', k); url := sprintf ('http://www.freebase.com/api/service/mqlread?queries=%U', qr); cnt := http_get (url, hdr); tree := json_parse (cnt); xt := get_keyword ('ROOT', tree); if (not isarray (xt)) return 0; xt := get_keyword ('result', xt); types := vector (); foreach (any tp in xt) do { declare tmp any; tmp := get_keyword ('type', tp); types := vector_concat (types, tmp); } --types := get_keyword ('type', xt); delete from DB.DBA.RDF_QUAD where g = iri_to_id(new_origin_uri); foreach (any tp in types) do { qr := sprintf ('{"ROOT":{"query":{%s, "type":"%s", "*":[]}}}', k, tp); url := sprintf ('http://www.freebase.com/api/service/mqlread?queries=%U', qr); cnt := http_get (url, hdr); --dbg_printf ('%s', cnt); tree := json_parse (cnt); xt := get_keyword ('ROOT', tree); xt := DB.DBA.MQL_TREE_TO_XML (tree); --dbg_obj_print (xt); xt := DB.DBA.RDF_MAPPER_XSLT (registry_get ('_cartridges_path_') || 'xslt/main/mql2rdf.xsl', xt, vector ('baseUri', coalesce (dest, graph_iri), 'wpUri', sa)); sa := ''; xd := serialize_to_UTF8_xml (xt); -- dbg_printf ('%s', xd); DB.DBA.RM_RDF_LOAD_RDFXML (xd, new_origin_uri, coalesce (dest, graph_iri)); } return 1; }
mql2rdf.xsl:
<?xml version="1.0" encoding="UTF-8"?> <!-- - - $Id: mql2rdf.xsl,v 1.10 2008/09/04 09:14:12 source Exp $ - - This file is part of the OpenLink Software Virtuoso Open-Source (VOS) - project. - - Copyright (C) 1998-2008 OpenLink Software - - This project is free software; you can redistribute it and/or modify it - under the terms of the GNU General Public License as published by the - Free Software Foundation; only version 2 of the License, dated June 1991. - - This program is distributed in the hope that it will be useful, but - WITHOUT ANY WARRANTY; without even the implied warranty of - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU - General Public License for more details. - - You should have received a copy of the GNU General Public License along - with this program; if not, write to the Free Software Foundation, Inc., - 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA --> <!DOCTYPE xsl:stylesheet [ <!ENTITY rdf "http://www.w3.org/1999/02/22-rdf-syntax-ns#"> <!ENTITY bibo "http://purl.org/ontology/bibo/"> <!ENTITY xsd "http://www.w3.org/2001/XMLSchema#"> <!ENTITY foaf "http://xmlns.com/foaf/0.1/"> <!ENTITY sioc "http://rdfs.org/sioc/ns#"> ]> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:vi="http://www.openlinksw.com/virtuoso/xslt/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#" xmlns:sioc="" xmlns:bibo="" xmlns:foaf="" xmlns:skos="http://www.w3.org/2004/02/skos/core#" xmlns:dcterms= "http://purl.org/dc/terms/" xmlns:mql="http://www.freebase.com/"> <xsl:output method="xml" indent="yes" /> <xsl:param name="baseUri" /> <xsl:param name="wpUri" /> <xsl:variable name="ns">http://www.freebase.com/</xsl:variable> <xsl:template match="/"> <rdf:RDF> <xsl:if test="/results/ROOT/result/*"> <rdf:Description rdf:about="{$baseUri}"> <rdf:type rdf:resource="Document"/> <rdf:type rdf:resource="Document"/> <rdf:type rdf:resource="Container"/> <sioc:container_of rdf:resource="{vi:proxyIRI($baseUri)}"/> <foaf:primaryTopic rdf:resource="{vi:proxyIRI($baseUri)}"/> <dcterms:subject rdf:resource="{vi:proxyIRI($baseUri)}"/> </rdf:Description> <rdf:Description rdf:about="{vi:proxyIRI($baseUri)}"> <rdf:type rdf:resource="Item"/> <sioc:has_container rdf:resource="{$baseUri}"/> <xsl:apply-templates select="/results/ROOT/result/*"/> <xsl:if test="$wpUri != ''"> <rdfs:seeAlso rdf:resource="{$wpUri}"/> </xsl:if> </rdf:Description> </xsl:if> </rdf:RDF> </xsl:template> <xsl:template match="*[starts-with(.,'http://') or starts-with(.,'urn:')]"> <xsl:element namespace="{$ns}" name="{name()}"> <xsl:attribute name="rdf:resource"> <xsl:value-of select="vi:proxyIRI (.)"/> </xsl:attribute> </xsl:element> </xsl:template> <xsl:template match="*[starts-with(.,'/')]"> <xsl:if test="local-name () = 'type' and . like '%/person'"> <rdf:type rdf:resource="Person"/> </xsl:if> <xsl:if test="local-name () = 'type'"> <sioc:topic> <skos:Concept rdf:about="{vi:proxyIRI (concat ($ns, 'view', .))}"/> </sioc:topic> </xsl:if> <xsl:element namespace="{$ns}" name="{name()}"> <xsl:attribute name="rdf:resource"> <xsl:value-of select="vi:proxyIRI(concat ($ns, 'view', .))"/> </xsl:attribute> </xsl:element> </xsl:template> <xsl:template match="*[* and ../../*]"> <xsl:element namespace="{$ns}" name="{name()}"> <xsl:attribute name="rdf:parseType">Resource</xsl:attribute> <xsl:apply-templates select="@*|node()"/> </xsl:element> </xsl:template> <xsl:template match="*"> <xsl:if test="* or . != ''"> <xsl:choose> <xsl:when test="name()='image'"> <foaf:depiction rdf:resource="{vi:mql-image-by-name (.)}"/> </xsl:when> <xsl:otherwise> <xsl:element namespace="{$ns}" name="{name()}"> <xsl:if test="name() like 'date_%'"> <xsl:attribute name="rdf:datatype">dateTime</xsl:attribute> </xsl:if> <xsl:apply-templates select="@*|node()"/> </xsl:element> </xsl:otherwise> </xsl:choose> </xsl:if> </xsl:template> </xsl:stylesheet>