VirtSpongerCartridgeProgrammersGuide_Part3Virtuoso Programmer's Guide - RDF Middleware ("Sponger") (Part 3)Virtuoso Programmer's Guide - RDF Middleware ("Sponger") (Part 3) Part 1 Part 2 Contents (Part 3) Sponger Queue API Functions REST Web service Useful Virtuoso Functions String Functions sprintf_inverse split_and_decode Retrieving URLs http_get http_request_header Handling Non-XML Response Content json_parse Writing Arbitrarily Long Text http string_output string_output_string XML & XSLT xtree_doc xpath_eval DB.DBA.RDF_MAPPER_XSLT Character Set Conversion serialize_to_UTF8_xml Loading Data Into the Quad Store DB.DBA.RDF_LOAD_RDFXML DB.DBA.TTLP Debug Output dbg_obj_print References Appendix A: PingTheSemanticWeb RDF Notification Service Appendix B: Main Namespaces used by OpenLink Cartridges Appendix C: Freebase Cartridge & Stylesheet Sponger Queue API Functions DB.DBA.RDF_SPONGER_QUEUE_ADD: This function is available when the cartridges vad is installed. DB.DBA.RDF_SPONGER_QUEUE_ADD (url, options); url: the Network Resource URI to be fetched; options: an array usually typical sponger pragmas, for ex: vector ('get:soft', 'soft', 'refresh_free_text', 1); REST Web service The Sponger REST Web service has the following characteristics: endpoint: http://cname/about/service parameters: op=add: type of operation, for now addition to the queue is supported uris=[json array]: an array of URIs to be added to the sponger queue, the format is JSON array, for example: { "uris":["http://www.amazon.co.uk/Hama-Stylus-Input-Apple-iPad/dp/B003O0OM0C", "http://www.amazon.co.uk/Krusell-GAIA-Case-Apple-iPad/dp/B003QHXWWC" ] } The service will return a json encoded result of the number of items added, for example: { "result":2 } In case of error a JSON with error text will be returned and http status 500. cURL example Assume file.txt which contains URL encoded JSON string: uris=%7B%20%22uris%22%3A%5B%22http%3A%2F%2Fwww.amazon.co.uk%2FHama-Stylus-Input-Apple-iPad%2Fdp%2FB003O0OM0C%22%2C%20%22http%3A%2F%2Fwww.amazon.co.uk%2FKrusell-GAIA-Case-Apple-iPad%2Fdp%2FB003QHXWWC%22%20%5D%20%7D Execute the following command: curl -i -d@file.txt http://cname/about/service?op=add HTTP/1.1 200 OK Server: Virtuoso/06.02.3129 (Darwin) i686-apple-darwin10.0.0 VDB Connection: Keep-Alive Date: Thu, 05 May 2011 12:06:24 GMT Accept-Ranges: bytes Content-Type: applcation/json; charset="UTF-8" Content-Length: 14 { "result":2 } Useful Virtuoso Functions String Functions sprintf_inverse sprintf_inverse takes a string to parse and a format string. If the first argument matches the format string then it returns vector of the values matching the placeholders in the format string. sprintf_inverse comes in useful for extracting fields from URLs.Full description: Virtuoso Functions Guide entry Example tmp := sprintf_inverse (new_origin_uri, 'http://farm%s.static.flickr.com/%s/%s_%s.%s', 0); img_id := tmp[2]; split_and_decode Converts escaped key=value pairs in an input string into a vector of strings.Full description: Virtuoso Functions Guide entry Example To split the HTTP request and response headers passed to the cartridge hook function. request_hdr := headers[0]; response_hdr := headers[1]; host := http_request_header (request, 'Host'); tmp := split_and_decode (request_hdr[0], 0, '\0\0 '); http_method := tmp[0]; url := tmp[1]; protocol_version := substring (tmp[2], 6, 8); tmp := rtrim (response_hdr[0], '\r\n'); tmp := split_and_decode (response_hdr[0], 0, '\0\0 '); Retrieving URLs http_get Returns a string containing the body of the requested URL and optionally the request headers.Full description: Virtuoso Functions Guide entry Example url := sprintf('http://api.flickr.com/services/rest/?? method=flickr.photos.getInfo&photo_id=%s&api_key=%s', img_id, api_key); tmp := http_get (url, hdr); if (hdr[0] not like 'HTTP/1._ 200 %') signal ('22023', trim(hdr[0], '\r\n'), 'RDFXX'); xd := xtree_doc (tmp); DB.DBA.RDF_HTTP_URL_GET A wrapper around http_get. Retrieves a URL using the specified HTTP method (defaults to GET). The function can handle proxies, redirects (up to fifteen) and HTTPS. Example uri := sprintf ('http://musicbrainz.org/ws/1/%s/%s?type=xml&inc=%U', kind, id, inc); cnt := RDF_HTTP_URL_GET (uri, '', hdr, 'GET', 'Accept: */*'); xt := xtree_doc (cnt); xd := DB.DBA.RDF_MAPPER_XSLT (registry_get ('_cartridges_path_') || 'xslt/main/mbz2rdf.xsl', xt, vector ('baseUri', new_origin_uri)); http_request_header Returns an array containing the HTTP request header lines.Full description: Virtuoso Functions Guide entry Example content := RDF_HTTP_URL_GET (rdf_url, new_origin_uri, hdr, 'GET', 'Accept: application/rdf+xml, text/rdf+n3, */*'); ret_content_type := http_request_header (hdr, 'Content-Type', null, null); Handling Non-XML Response Content json_parse Parses JSON content into a tree. Example url := sprintf ('http://www.freebase.com/api/service/mqlread?queries=%U', qr); content := http_get (url, hdr); tree := json_parse (content); tree := get_keyword ('ROOT', tree); tree := get_keyword ('result', tree); Writing Arbitrarily Long Text http Writes to an http client or a string output stream.Full description: Virtuoso Functions Guide entry Example Writing N3 to a string output stream using function http(), parsing the N3 into a graph, then loading the graph into the quad store. ses := string_output (); http ('@prefix opl: <http://www.openlinksw.com/schema/attribution#> .\n', ses); http ('@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .\n', ses); ... DB.DBA.TTLP (ses, base, graph); DB.DBA.RDF_LOAD_RDFXML (strg, base, graph); string_output Makes a string output stream, a special object that may be used to buffer arbitrarily long streams of data. The HTTP output functions optionally take a string output stream handle as a parameter and then output to the string stream instead of the HTTP client.Full description: Virtuoso Functions Guide entry Example ses := string_output (); cnt := http_get (sprintf ('http://download.finance.yahoo.com/d/quotes.csv?s=%U&f=nsbavophg&e=.csv', symbol)); arr := rdfm_yq_parse_csv (cnt); http ('<quote stock="NASDAQ">', ses); foreach (any q in arr) do { http_value (q[0], 'company', ses); http_value (q[1], 'symbol', ses); ... } http ('</quote>', ses); content := string_output_string (ses); xt := xtree_doc (content); string_output_string Produces a string out of a string output stream.Full description: Virtuoso Functions Guide entry Example See string_output above.XML & XSLT xtree_doc Parses the input string, which is expected to be a well formed XML fragment and returns a parse tree as a special memory-resident object.Full description: Virtuoso Functions Guide entry Example content := RDF_HTTP_URL_GET (uri, '', hdr, 'GET', 'Accept: */*'); xt := xtree_doc (content); xpath_eval Applies an XPATH expression to a context node and returns the result(s).Full description: Virtuoso Functions Guide entry Example profile := cast (xpath_eval ('/html/head/@profile', xt) as varchar); DB.DBA.RDF_MAPPER_XSLT A simple wrapper around the Virtuoso function xslt which sets the current user to dba before returning an XML document transformed by an XSLT stylesheet.Full description: Virtuoso Functions Guide entry Example tmp := http_get (url); xd := xtree_doc (tmp); xt := DB.DBA.RDF_MAPPER_XSLT ( registry_get ('_cartridges_path_') || 'xslt/main/atom2rdf.xsl', xd, vector ('baseUri', coalesce (dest, graph_iri))); Character Set Conversion serialize_to_UTF8_xml Converts a value of arbitrary type to a UTF-8 string representation.Full description: Virtuoso Functions Guide entry Example xt := DB.DBA.RDF_MAPPER_XSLT ( registry_get ('_cartridges_path_') || 'xslt/main/crunchbase2rdf.xsl', xt, vector ('baseUri', coalesce (dest, graph_iri), 'base', base, 'suffix', suffix)); xd := serialize_to_UTF8_xml (xt); DB.DBA.RM_RDF_LOAD_RDFXML (xd, new_origin_uri, coalesce (dest, graph_iri)); Loading Data Into the Quad Store DB.DBA.RDF_LOAD_RDFXML Parses the content of RDF/XML text into a sequence of separate triples and loads the triples into the specified graph in the Virtuoso Quad Store. Full description: Virtuoso Functions Guide entry Example content := RDF_HTTP_URL_GET (uri, '', hdr, 'GET', 'Accept: */*'); xt := xtree_doc (content); xd := DB.DBA.RDF_MAPPER_XSLT ( registry_get ('_cartridges_path_') || 'xslt/main/mbz2rdf.xsl', xt, vector ('baseUri', new_origin_uri)); xd := serialize_to_UTF8_xml (xd); DB.DBA.RM_RDF_LOAD_RDFXML (xd, new_origin_uri, coalesce (dest, graph_iri)); DB.DBA.TTLP Parses the content of TTL (TURTLE or N3) text and loads the triples into the specified graph in the Virtuoso Quad Store. Full description: Virtuoso Functions Guide entry Example sess := string_output (); ... http (sprintf ('<http://dbpedia.org/resource/%s> <http://xbrlontology.com/ontology/finance/stock_market#hasCompetitor> <http://dbpedia.org/resource/%s> .\n', symbol, x), sess); http (sprintf ('<http://dbpedia.org/resource/%s> <http://www.w3.org/2000/01/rdf-schema#isDefinedBy> <http://finance.yahoo.com/q?s=%s> .\n', x, x), sess); content := string_output_string (sess); DB.DBA.TTLP (content, new_origin_uri, coalesce (dest, graph_iri)); Debug Output dbg_obj_print Prints to the Virtuoso system console.Full description: Virtuoso Functions Guide entry Example dbg_obj_print ('try all grddl mappings here'); References RDF Primer: http://www.w3.org/TR/2004/REC-rdf-primer-20040210/ RDF/XML Syntax Specification: http://www.w3.org/TR/rdf-syntax-grammar/ GRDDL Primer: http://www.w3.org/TR/grddl-primer/ Appendix A: PingTheSemanticWeb RDF Notification Service PingtheSemanticWeb (PTSW) is a repository for RDF documents. The PTSW web service archives the location of recently created or updated RDF documents on the Web. It is intended for use by crawlers or other types of software agents which need to know when and where the latest updated RDF documents can be found. They can request a list of recently updated documents as a starting location to crawl the Semantic Web. You may find this service useful for publicizing your own RDF content. Content authors can notify PTSW that an RDF document has been created or updated by pinging the service with the URL of the document. The Sponger supports this facility through the async_queue and ping_service parameters of the cartridge hook function, where the ping_service parameter contains the ping service URL as configured in the SPARQL section of the virtuoso.ini file: [SPARQL] ... PingService = http://rpc.pingthesemanticweb.com/ ... The configured ping service can be called using an asynchronous request and the RDF_SW_PING procedure as illustrated below. create procedure DB.DBA.RDF_LOAD_HTML_RESPONSE ( in graph_iri varchar, in new_origin_uri varchar, in dest varchar, inout ret_body any, inout async_queue any, inout ping_service any, inout _key any, inout opts any ) { ... if ( ... and async_queue is not null) aq_request (async_queue, 'DB.DBA.RDF_SW_PING', vector (ping_service, new_origin_uri)); For more details, please refer to the section Asynchronous Execution and Multithreading in Virtuoso/PL in the Virtuoso reference documentation. Appendix B: Main Namespaces used by OpenLink Cartridges A list of the main namespaces / ontologies used by OpenLink-provided Sponger cartridges is given below. Some of these ontologies may prove useful when creating your own cartridges. - http://www.openlinksw.com/virtuoso/xslt/ - http://www.openlinksw.com/schemas/XHTML# rdf: http://www.w3.org/1999/02/22-rdf-syntax-ns# rdfs: http://www.w3.org/2000/01/rdf-schema# dc: http://purl.org/dc/elements/1.1/ dcterms: http://purl.org/dc/terms/ foaf: http://xmlns.com/foaf/0.1/ sioc: http://rdfs.org/sioc/ns# sioct: http://rdfs.org/sioc/types# skos: http://www.w3.org/2004/02/skos/core# bibo: http://purl.org/ontology/bibo/ Appendix C: Freebase Cartridge & Stylesheet Snapshots of the Freebase cartridge and stylesheet compatible with the meta-cartridge example presented earlier in this document can be found below. DB.DBA.RDF_LOAD_MQL: --no_c_escapes- create procedure DB.DBA.RDF_LOAD_MQL (in graph_iri varchar, in new_origin_uri varchar, in dest varchar, inout _ret_body any, inout aq any, inout ps any, inout _key any, inout opts any) { declare qr, path, hdr any; declare tree, xt, xd, types any; declare k, cnt, url, sa varchar; hdr := null; sa := ''; declare exit handler for sqlstate '*' { --dbg_printf ('%s', __SQL_MESSAGE); return 0; }; path := split_and_decode (new_origin_uri, 0, '%\0/'); if (length (path) < 1) return 0; k := path [length(path) - 1]; if (path [length(path) - 2] = 'guid') k := sprintf ('"id":"/guid/%s"', k); else { if (k like '#%') k := sprintf ('"id":"%s"', k); else { sa := DB.DBA.RDF_MQL_GET_WIKI_URI (k); k := sprintf ('"key":"%s"', k); } } qr := sprintf ('{"ROOT":{"query":[{%s, "type":[]}]}}', k); url := sprintf ('http://www.freebase.com/api/service/mqlread?queries=%U', qr); cnt := http_get (url, hdr); tree := json_parse (cnt); xt := get_keyword ('ROOT', tree); if (not isarray (xt)) return 0; xt := get_keyword ('result', xt); types := vector (); foreach (any tp in xt) do { declare tmp any; tmp := get_keyword ('type', tp); types := vector_concat (types, tmp); } --types := get_keyword ('type', xt); delete from DB.DBA.RDF_QUAD where g = iri_to_id(new_origin_uri); foreach (any tp in types) do { qr := sprintf ('{"ROOT":{"query":{%s, "type":"%s", "*":[]}}}', k, tp); url := sprintf ('http://www.freebase.com/api/service/mqlread?queries=%U', qr); cnt := http_get (url, hdr); --dbg_printf ('%s', cnt); tree := json_parse (cnt); xt := get_keyword ('ROOT', tree); xt := DB.DBA.MQL_TREE_TO_XML (tree); --dbg_obj_print (xt); xt := DB.DBA.RDF_MAPPER_XSLT (registry_get ('_cartridges_path_') || 'xslt/main/mql2rdf.xsl', xt, vector ('baseUri', coalesce (dest, graph_iri), 'wpUri', sa)); sa := ''; xd := serialize_to_UTF8_xml (xt); -- dbg_printf ('%s', xd); DB.DBA.RM_RDF_LOAD_RDFXML (xd, new_origin_uri, coalesce (dest, graph_iri)); } return 1; } mql2rdf.xsl: <?xml version="1.0" encoding="UTF-8"?> <!-- - - $Id: mql2rdf.xsl,v 1.10 2008/09/04 09:14:12 source Exp $ - - This file is part of the OpenLink Software Virtuoso Open-Source (VOS) - project. - - Copyright (C) 1998-2008 OpenLink Software - - This project is free software; you can redistribute it and/or modify it - under the terms of the GNU General Public License as published by the - Free Software Foundation; only version 2 of the License, dated June 1991. - - This program is distributed in the hope that it will be useful, but - WITHOUT ANY WARRANTY; without even the implied warranty of - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU - General Public License for more details. - - You should have received a copy of the GNU General Public License along - with this program; if not, write to the Free Software Foundation, Inc., - 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA --> <!DOCTYPE xsl:stylesheet [ <!ENTITY rdf "http://www.w3.org/1999/02/22-rdf-syntax-ns#"> <!ENTITY bibo "http://purl.org/ontology/bibo/"> <!ENTITY xsd "http://www.w3.org/2001/XMLSchema#"> <!ENTITY foaf "http://xmlns.com/foaf/0.1/"> <!ENTITY sioc "http://rdfs.org/sioc/ns#"> ]> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:vi="http://www.openlinksw.com/virtuoso/xslt/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#" xmlns:sioc="" xmlns:bibo="" xmlns:foaf="" xmlns:skos="http://www.w3.org/2004/02/skos/core#" xmlns:dcterms= "http://purl.org/dc/terms/" xmlns:mql="http://www.freebase.com/"> <xsl:output method="xml" indent="yes" /> <xsl:param name="baseUri" /> <xsl:param name="wpUri" /> <xsl:variable name="ns">http://www.freebase.com/</xsl:variable> <xsl:template match="/"> <rdf:RDF> <xsl:if test="/results/ROOT/result/*"> <rdf:Description rdf:about="{$baseUri}"> <rdf:type rdf:resource="Document"/> <rdf:type rdf:resource="Document"/> <rdf:type rdf:resource="Container"/> <sioc:container_of rdf:resource="{vi:proxyIRI($baseUri)}"/> <foaf:primaryTopic rdf:resource="{vi:proxyIRI($baseUri)}"/> <dcterms:subject rdf:resource="{vi:proxyIRI($baseUri)}"/> </rdf:Description> <rdf:Description rdf:about="{vi:proxyIRI($baseUri)}"> <rdf:type rdf:resource="Item"/> <sioc:has_container rdf:resource="{$baseUri}"/> <xsl:apply-templates select="/results/ROOT/result/*"/> <xsl:if test="$wpUri != ''"> <rdfs:seeAlso rdf:resource="{$wpUri}"/> </xsl:if> </rdf:Description> </xsl:if> </rdf:RDF> </xsl:template> <xsl:template match="*[starts-with(.,'http://') or starts-with(.,'urn:')]"> <xsl:element namespace="{$ns}" name="{name()}"> <xsl:attribute name="rdf:resource"> <xsl:value-of select="vi:proxyIRI (.)"/> </xsl:attribute> </xsl:element> </xsl:template> <xsl:template match="*[starts-with(.,'/')]"> <xsl:if test="local-name () = 'type' and . like '%/person'"> <rdf:type rdf:resource="Person"/> </xsl:if> <xsl:if test="local-name () = 'type'"> <sioc:topic> <skos:Concept rdf:about="{vi:proxyIRI (concat ($ns, 'view', .))}"/> </sioc:topic> </xsl:if> <xsl:element namespace="{$ns}" name="{name()}"> <xsl:attribute name="rdf:resource"> <xsl:value-of select="vi:proxyIRI(concat ($ns, 'view', .))"/> </xsl:attribute> </xsl:element> </xsl:template> <xsl:template match="*[* and ../../*]"> <xsl:element namespace="{$ns}" name="{name()}"> <xsl:attribute name="rdf:parseType">Resource</xsl:attribute> <xsl:apply-templates select="@*|node()"/> </xsl:element> </xsl:template> <xsl:template match="*"> <xsl:if test="* or . != ''"> <xsl:choose> <xsl:when test="name()='image'"> <foaf:depiction rdf:resource="{vi:mql-image-by-name (.)}"/> </xsl:when> <xsl:otherwise> <xsl:element namespace="{$ns}" name="{name()}"> <xsl:if test="name() like 'date_%'"> <xsl:attribute name="rdf:datatype">dateTime</xsl:attribute> </xsl:if> <xsl:apply-templates select="@*|node()"/> </xsl:element> </xsl:otherwise> </xsl:choose> </xsl:if> </xsl:template> </xsl:stylesheet>