• Topic
  • Discussion
  • VOS.VirtSpongerLinkedDataHooksIntoSPARQL(Last) -- Owiki? , 2018-04-13 12:08:21 Edit owiki 2018-04-13 12:08:21

    Enhancements the Virtuoso Sponger brings to SPARQL


    In the world of Linked Data, the Web is treated as a global data space where every data object has an identifier (URI) that serves as a key to its entity-attribute-value (3-tuple or triples)-based description. To make these "keys" work, data object URIs have to be dereferenceable — i.e., they must resolve to actual object content through functionality commonly delivered via data object locator and retriever URI specializations (or subtypes) such as URLs.


    Virtuoso's Sponger is a sophisticated piece of middleware that provides full Linked Data fidelity for pre-existing data objects or resources. This Linked Data is then accessible via HTTP-based Web Services, and SPARQL is enhanced with Sponger pragmas (or directives) and some optional additions to the FROM clause.



    Sponger pragmas control various aspects of functionality —

    1. Identifier Dereference: handled by INPUT pragmas.
    2. Actual Data Retrieval: handled by GET pragmas.
    3. SQL Code Generation: handled by SQL pragmas.
    4. Output Format Adjustments: handled by OUTPUT pragmas.

    Pragmas are qualified at usage time using the following pattern:

    <pragma-type>:<actual-method> ["<method-modifier>"]


    INPUT Pragmas

    INPUT Pragmas enable you control dereference behavior applied to a SPARQL query. Net effect, fine-grained control over how variables and explicit data object identifiers are dereferenced en route to creating base data from which SPARQL query solutions are derived.

    Methods and method-modifiers associated with this pragma type include:

    Method Modifier(s) Description Usage Example
    input:default-graph-exclude "<IRI>" Works like "NOT FROM" clause Example
    input:default-graph-uri "<IRI>" Works like "FROM" clause Example
    input:freeze Blocks further changes in the list of source graphs. The web service endpoint (or similar non-web application) can edit an incoming query by placing a list of pragmas ending with input:freeze in front of the query text. If an intruder tries to place some graph names, they will get a compilation error, not access to the data. input:freeze disables all input:grab-... pragmas as well. Example?
    input:grab-all "yes" Instructs the SPARQL processor to dereference everything related to the query. All variables and literal IRIs in the query become values for input:grab-var and input:grab-iri. The resulting performance may be very bad. Example
    input:grab-base "<IRI>" Specifies the base IRI to use when converting relative IRIs to absolute. (Default: empty string.) Example
    input:grab-depth "0" Sets the maximum 'degrees of separation' or links (predicates) between nodes in the target graph. Acceptable range is non-negative integers. 0 means unlimited. Example
    input:grab-destination "<IRI>" Overrides the default IRI dereferencing and Local Graph IRI designation. Retrieved content (triples) is stored in a graph IRI designated by the modifier value. Example
    input:grab-follow-predicate "<IRI>" Specifies a predicate IRI to be used when traversing a graph. (This pragma may be included multiple times). Synonym of input:grab-seealso. Example
    input:grab-iri "<IRI>" Specifies an IRI that should be retrieved before executing the rest of the query, if it is not in the quad store already. (This pragma can be included multiple times). Example
    input:grab-limit "<number>" Sets the maximum number of resources (triple subject or object IRIs) to be de-referenced. Acceptable range is non-negative integers. 0 means unlimited. Example
    input:grab-loader "<procedure-name>" Identifies the procedure used to retrieve, parse, and store content. (Default: DB.DBA.RDF_SPONGE_UP) Example?
    input:grab-resolver "<procedure-name>" Identifies the procedure that handles IRI dereference and actual content retrieval via a specific data access protocol (e.g., HTTP). (Default: DB.DBA.RDF_GRAB_RESOLVER_DEFAULT.) Example?
    input:grab-seealso "<IRI>" Synonym of input:grab-follow-predicate. Example
    input:grab-var "?<var-name>" Specifies the name of the SPARQL variable whose values should be used as IRIs of resources that should be downloaded. Example
    input:grab-group-destination "<IRI>" Resembles input:grab-destination but sponges will create individual graphs for Network Resource Fetch results, and in addition to this common routine, a copy of each Network Resource Fetch result will be added to the resource specified by the value of input:grab-group-destination. input:grab-destination redirects loadings; input:grab-group-destination duplicates them. Example?
    input:grab-intermediate "<IRI>" Extends the set of IRIs to sponge, useful in combination with input:grab-seealso. If present, then, for a given subject, Network Resource Fetch will retrieve not only values of see-also predicates for that subject, but also the subject itself. The define value is not used in current implementation. Example?
    input:ifp "<keyword>" Adds IFP keyword in OPTION (QUIETCAST, ...) clause in the generated SQL. The value of this define is not used yet; an empty string is safe for future extensions. Example?
    input:inference "<IRI>" Specifies the name of an inference rule to provide context for backward-chained reasoner. Example
    input:named-graph-exclude "<IRI>" Works like "NOT FROM NAMED" clause Example
    input:named-graph-uri "<IRI>" Works like "FROM NAMED" clause Example
    input:param "<variable-name>"

    Declares a variable name to be used as a custom SPARQL protocol parameter.

    SPARQL query leverages this custom parameter using the special "?::{variable}" sytnax (excluding quotation marks).

    If query text is generated by a query builder that does not understand Virtuoso's SPARQL-BI extensions, then the generated query text may contain a conventional query variable as long as it uses the define input:param "X" pragma in its preamble.

    Note: This will not work for positional parameters; i.e., you cannot replace a SPARQL-BI reference like ?::3 with ?3 combined with a define input:param "3" pragma.

    input:params "<variable-name>" Synonym of input:param Example?
    input:same-as "yes" Sets inference context for owl:sameAs (entity equivalence by name) reasoning and union expansion. Example
    input:storage "<IRI>" Sets dataset (quads) storage scope. The value is a storage identifier (IRI) where the default value is virtrdf:DefaultQuadStorage. If the value is an empty string, then only quads associated with Linked Data Views are used. This is a good choice for low-level admin procedures, for two reasons: they will not interfere with any changes in virtrdf:DefaultQuadStorage; and they will continue to work even if all compiler's metadata is corrupted, including the description of virtrdf:DefaultQuadStorage. (define input:storage "" switches the SPARQL compiler to a small set of metadata that is built in 'C' code and thus are very hard for end-users to corrupt.) Example
    input:target-fallback-graph-uri "<IRI>" This pragma tells the compiler to use <XXX> as target for SPARQL 1.1 INSERT and DELETE operations if no other graph is specified in the query. Example
    input:with-fallback-graph-uri "<IRI>" This pragma tells the compiler to use <XXX> as target both for SPARQL 1.1 operations if no other graph is specified and for default graph IRI if no other source graphs are named in the query. Example

    GET Pragmas

    GET Pragmas enables you to control actual data-object content-retrieval behavior applied to a SPARQL query. The net effect is fine-grained control over data-access-oriented matters such as —
    • Data object content format, via content negotiation
    • Cache invalidation
    • Proxy handling

    This pragma type is also usable as a comma-separated list of SPARQL ... FROM <options>. Its methods and method-modifiers include —

    Method Modifier(s) Description Usage Example
    get:accept "application/xml"
    get:accept is most commonly used to access a web service that returns HTML by default but can also return RDF if forced to do so. The default value is
    "application/rdf+xml; q=1.0, text/rdf+n3; q=0.9,
    application/rdf+turtle; q=0.5, application/x-turtle; q=0.6,
    application/turtle; q=0.5, text/turtle; q=1.0,
    application/xml; q=0.2, */*; q=0.1"
    get:cartridge "extractor"
    Designates the use of Sponger ?meta? or ?extractor? cartridges in the query being executed. Example
    get:method "GET"
    • "GET" loads the resource itself.
    • "MGET" loads metadata about the resource.
    get:private ""
    When used for sponging graph X, it adjusts graph-level security of graph X (and of graph_group_IRI, if specified) so that X becomes a privately accessible graph of the user who sponges the X. If graph_group_IRI is specified, X becomes accessible to users that can access graph_group_IRI with the same permissions they have on graph_group_IRI.

    The exact rules are —
    • If graph is virtrdf:, an error is signaled.
    • If graph name is an IRI of handshaked web service endpoint or "public IRI" of a handshaked web service endpoint, an error is signaled.
    • If access is public by default, even for private graphs, an error is signaled and sponging is not tried.
    • If default is "no access" but someone (other than current user) has specifically granted read access to the graph in question AND current user is not dba AND current user has no bit 32 permission on this graph, an error is signaled.
    • If read access is public by default for world and disabled for private graphs, then the graph to be sponged is added to the group of private graphs.
    • If current user is not DBA, current user is granted read+write+sponge+admin access to the graph to be sponged. In addition, current user gets special permission bit 32, indicating that the graph is made by private sponge of this specific user.
    • If the value of get:private is an IRI, then —
      • the IRI is supposed to be an IRI of "plain" graph group. An error is signaled in case of non-existing graph group, group of private graphs, or group of graphs to be replicated.
      • the graph is added to that group.
      • each non-dba user that can get list of files of the group will get permissions for the loaded graph equal to permissions they have on graph group minus "list" permission.
    get:proxy "<host[:port]>" Similar to setting up a Web browser to work with a proxy-style HTTP server, this identifies the CNAME (URL host:port or authority component) to target if direct retrieval from the URL in the FROM clause or handling of a data object's dereferenceable identifier is not possible. Example
    get:refresh "<seconds>" Limits the lifetime of a local cached copy of the source. The value is in seconds. Example
    get:query Example?
    get:soft "soft"
    • "soft" applies cache-invalidation to the sponged resource en route to replacing content or doing nothing.
    • "replace" replaces triples stored in named graphs.
    • "add" simply adds triples to existing named graphs.
    get:uri "<IRI>" Identifies a specific URI to be de-referenced, distinct from the document URL in the FROM clause of a SPARQL query. Typically, this would be used to deference a specific subject or object of a relation in the data retrieved in by the document URL in the FROM clause. Example

    SQL Pragmas

    Pragmas to control code generation:

    Method Modifier(s) Description Usage Example
    sql:assert-user "<username>" Defines the user who is supposed to be the single "proper" use for the query. If the compiler is launched by any other user, an error is signaled. The typical use is define sql:assist-user "dba". This is too weak to be a security measure, but may help in debugging of security issues. Example?
    sql:big-data-const Example
    sql:describe-mode ""
    See detailed description here. Example
    sql:globals-mode "XSLT"
    Tells how to print names of global variables. Supported values are
    • "XSLT" — print colon before name of global variable
    • "SQL" — print as usual
    sql:gs-app-callback Application-specific callback, returns permission bits of a given graph. Example
    sql:gs-app-uid Application-specific user-id to use in callback. Example
    sql:log-enable Value that will be passed to SPARUL procedures, where it will be passed to log_enable() BIF. define sql:log-enable N will result in log_enable(N, 1) at the beginning of the operation; another log_enable() call will restore previous mode of transaction log at exit from the procedure including any error signaled from it. For example, set to 2 to disable logging to avoid a huge transaction after-image when sponging is deep and wide. Example
    sql:param "<variable-name>" Synonym of input:param Example?
    sql:params "<variable-name>" Synonym of input:param Example?
    sql:select-option Value will be added as a global OPTION() clause of the generated SQL SELECT. This clause is always printed; it is always at least OPTION (QUIETCAST, ...). The most popular use case is define sql:table-option "ORDER" to tell the SQL compiler to execute JOINs in the order of their use in the query; this can make query compilation much faster, but the compilation result can be terrible if you do not know precisely what you're doing and do not inspect the execution plan of the generated SQL query. Example
    sql:signal-void-variables When set to 0, this forces the SPARQL compiler to signal errors if some variables cannot be bound due to, for instance, misspelled names or attempts to make joins across disjoint domains. These diagnostics are especially important when the query is long. It is the most useful debugging variable if Linked Data Views are in use. It tells the SPARQL compiler to signal an error if it can prove that some variable can never be bound. Usually it means an error in the query, like a typo in IRI or a totally wrong triple pattern. Example
    sql:table-option Value will be added as an option to each triple in the query, and later it will be printed in TABLE OPTION (...) clause of source table clause. This works only for SQL code for plain triples from RDF_QUAD; fragments of queries related to RDF Views will remain unchanged. Example

    OUTPUT Pragmas

    Pragmas to control the type of the result.

    Method Modifier(s) Description Usage Example
    output:dict-format "<format-specifier>" Tells the compiler that the query should produce a string output with the serialization of the result, not a result set. Only CONSTRUCT and DESCRIBE queries are affected by the value of output:dict-format. Use output:scalar-format and/or output:format for ASK queries. Example?
    output:format "<format-specifier>" Tells the compiler that the query should produce a string output with the serialization of the result, not a result set. The value of output:format is primarily used for SELECT and data manipulation queries. It will also be used for CONSTRUCT, DESCRIBE, and ASK queries, if output:dict-format or output:scalar-format are not used. Example
    output:scalar-format "<format-specifier>" Tells the compiler that the query should produce a string output with the serialization of the result, not a result set. Only ASK queries are affected by the value of output:scalar-format. Use output:dict-format and/or output:format for CONSTRUCT or DESCRIBE queries. Example?
    output:valmode "SQLVAL"
    Tells the compiler which SQL datatypes should be used for output values.
    • "SQLVAL", the default, is appropriate for ODBC clients and the like which know nothing about RDF and expect plain SQL values.
    • "LONG" tells the compiler to preserve RDF boxes as is and to return IRI IDs instead of IRI string value. This is good for when a Virtuoso/PL procedure is RDF-aware and keeps results to be passed on to other SPARQL queries or some low-level RDF routines.
    • "AUTO", is for dirty hackers that do not want any conversion of any sort at the output to read the SQL output of SPARQL front-end, who will find the format of each column and add the needed conversions later.

    Sponger Usage Examples