Enhancements the Virtuoso Sponger brings to SPARQL


Virtuoso's Sponger is a sophisticated piece of middleware that provides full Linked Data fidelity for pre-existing data objects or resources. This Linked Data is then accessible via HTTP-based Web Services, and SPARQL is enhanced with Sponger pragmas and some optional additions to the FROM clause.


In the world of Linked Data, the Web is treated as a global data space where every data object has an identifier (URI) that serves as a key to its entity-attribute-value (3-tuple or triples)-based description. To make these "keys" work, data object URIs have to be dereferenceable -- i.e., they must resolve to actual object content through functionality commonly delivered via data object locator and retriever URI specializations (or subtypes) such as URLs.



Sponger pragmas control various aspects of functionality --

  1. Identifier Dereference: handled by INPUT pragmas.
  2. Actual Data Retrieval: handled by GET pragmas.
  3. SQL Code Generation: handled by SQL pragmas.
  4. Output Format Adjustments: handled by OUTPUT pragmas.

Pragmas are qualified at usage time using the following pattern:

<pragma-type>:<actual-method> ["<method-modifier>"]


INPUT Pragmas

INPUT Pragmas enable you control dereference behavior applied to a SPARQL query. Net effect, fine-grained control over how variables and explicit data object identifiers are dereferenced en route to creating base data from which SPARQL query solutions are derived.

Methods and method-modifiers associated with this pragma type include:

Method Modifier(s) Description Usage Example
input:grab-all "yes" Instructs the SPARQL processor to dereference everything related to the query. All variables and literal IRIs in the query become values for input:grab-var and input:grab-iri. The resulting performance may be very bad. Example
input:grab-base "<IRI>" Specifies the base IRI to use when converting relative IRIs to absolute. (Default: empty string.) Example
input:grab-depth "0" Sets the maximum 'degrees of separation' or links (predicates) between nodes in the target graph. Acceptable range is 0 (unlimited) . Example
input:grab-destination "<IRI>" Overrides the default IRI dereferencing and Local Graph IRI designation. Basically, retrieved content (triples) is stored in a graph IRI designated by the modifier value. Example
input:grab-follow-predicate "<IRI>" Specifies a predicate IRI to be used when traversing a graph. (This pragma can be included multiple times). Synonym of input:grab-seealso. Example
input:grab-iri "<IRI>" Specifies an IRI that should be retrieved before executing the rest of the query, if it is not in the quad store already. (This pragma can be included multiple times). Example
input:grab-limit "<number>" Sets the maximum number of resources (triple subjects or objects IRIs) to be de-referenced. Acceptable range is 0 (unlimited) . Example
input:grab-loader "<procedure-name>" Identifies the procedure used to retrieve, parse, and store content. (Default: DB.DBA.RDF_SPONGE_UP) Example?
input:grab-resolver "<procedure-name>" Identifies the procedure that handles IRI dereference and actual content retrieval via a specific data access protocol (e.g., HTTP). (Default: DB.DBA.RDF_GRAB_RESOLVER_DEFAULT.) Example?
input:grab-seealso "<IRI>" Synonym of input:grab-follow-predicate. Example
input:grab-var "?<var-name>" Specifies the name of the SPARQL variable whose values should be used as IRIs of resources that should be downloaded. Example
input:grab-group-destination "<IRI>" resembles input:grab-destination but sponges will create individual graphs for Network Resource Fetch results, and in additional to this common routine, a copy of each Network Resource Fetch result is added to the resource specified by the value of input:grab-group-destination. input:grab-destination redirects loadings, input:grab-group-destination duplicates them. Example?
input:grab-intermediate "<IRI>" extends the set of IRIs to sponge, useful in combination with input:grab-seealso. If present then for a given subject, Network Resource Fetch will retrieve not only values of see-also predicates for that subject but the subject itself. The define value is not used in current implementation. Example?
input:same-as "yes" sets inference context for owl:sameAs (entity equivalence by name) reasoning and union expansion Example
input:storage "<IRI>" sets dataset (quads) storage scope. The value is a storage identifier (IRI) where the default value is: virtrdf:DefaultQuadStorage?. If the value is an empty string then only quads associated with Linked Data Views are used. This is a good choice for low-level admin procedures, for two reasons: they will not interfere with any changes in virtrdf:DefaultQuadStorage? and they will continue to work even if all compiler's metadata is corrupted, including the description of virtrdf:DefaultQuadStorage? (define input:storage "" switches the SPARQL compiler to a small set of metadata that is built in 'C' code and thus are very hard to corrupt by end-users) Example
input:ifp "<keyword>" adds IFP keyword in OPTION (QUIETCAST, ...) clause in the generated SQL. The value of this define is not used yet; an empty string is safe for future extensions. Example?
input:inference "<IRI>" specifies the name of inference rule that provides context for backward-chained reasoner. Example
input:param "<variable-name>" declares a variable name as a protocol parameter. The SPARQL query can refer to protocol parameter X via variable with special syntax of "?::X" (without quotation marks). If query text should be made by a query builder that does not understand SPARQL-BI extensions, then the query text may contain variable ?X and define input:param "X". This does not work for positional parameters; one can not replace a reference to ?::3 with ?3 and define input:param "3". Example?
input:params "<variable-name>" Synonym of input:param Example?
input:default-graph-uri "<IRI>" works like "FROM" clause Example
input:named-graph-uri "<IRI>" works like "FROM NAMED" clause Example
input:default-graph-exclude "<IRI>" works like "NOT FROM" clause Example
input:named-graph-exclude "<IRI>" works like "NOT FROM NAMED" clause Example
input:freeze blocks further changes in the list of source graphs. The web service endpoint (or similar non-web application) can edit an incoming query by placing list of pragmas ended with input:freeze in front of query text. Even if an intruder ties to place some graph names, they will get a compilation error, not an access to the data. input:freeze disables all input:grab-... pragmas as well. Example?

GET Pragmas

GET Pragmas enables you to control actual data object content retrieval behavior applied to a SPARQL query. The net effect is a fine-grained control over data access oriented matters such as --

  1. Data object content format, via content negotiation;
  2. Cache invalidation; and
  3. Proxy handling.

This pragma type is also usable as a comma separated list of SPARQL ... FROM <options>. Its methods and method-modifiers include --

Method Modifier(s) Description Usage Example
get:proxy "<host[:port]>" Similar to setting up a Web browser to working with a proxy style of HTTP server, this identifies the CNAME (URL "host:port" or "authority" component) to target if direct retrieval from the URL in the FROM clause or handling of a data object's dereferenceable identifier is not possible. Example
get:soft "soft"
"soft" and "replace" are synonyms, and replace triples stored in named graphs.
"add", on the other hand, simply adds triples to existing named graphs.
All are subject to the overarching cache invalidation scheme applied to a given DBMS instance.
get:accept "application/xml"
The most common purpose of define get:accept is accessing a web service that returns a HTML by default but can also return RDF if is forced to do so. The default value is "application/rdf+xml; q=1.0, text/rdf+n3; q=0.9, application/rdf+turtle; q=0.5, application/x-turtle; q=0.6, application/turtle; q=0.5 text/turtle; q=1.0, application/xml; q=0.2, */*; q=0.1" Example
get:uri "<IRI>" Determines the object identifiers associated with content retrieval if the data source in question differs from data object content URL used in the FROM clause of a SPARQL query. Example
get:refresh "<seconds>" limits the lifetime of a local cached copy of the source, the value is in seconds; Example
get:method "GET" or "MGET" "GET" loads the resource itself; "MGET" loads metadata about the resource. Example
get:cartridge Example
get:query Example?
get:private "" or <graph_group_IRI> When used for sponging graph X, it adjusts graph-level security of graph X (and of graph_group_IRI, if specified) so that X becomes a privately accessible graph of the user who sponges the X and if graph_group_IRI is specified then X becomes accessible to users that can access graph_group_IRI with permissions like permissions they have on graph_group_IRI.
The exact rules are following:
   * If graph is virtrdf: then an error is signaled.
   * If graph name is an IRI of handshaked web service endpoint or "public IRI" of a handshaked web service endpoint then an error is signaled.
   * If access is public by default even for private graphs then an error is signaled and sponging is not tried.
   * If default is "no access" but someone (other than current user) has specifically granted read access to the graph in question AND current user is not dba AND current user has no bit 32 permission on this graph then an error is signaled.
   * If read access is public by default for world and disabled for private graphs then the graph to be sponged is added to the group of private graphs.
   * If current user is not DBA, current user gets granted read+write+sponge+admin access to the graph to be sponged. In addition, current user gets special permission bit 32, indicating that the graph is made by private sponge of this specific user.
   * If the value of get:private is an IRI then:
      * the IRI is supposed to be an IRI of "plain" graph group, error is signaled in case of non-existing graph group, group of private graphs or group of graphs to be replicated.
      * the graph is added to that group.
      * each non-dba user that can get list of files of the group will get permissions for the loaded graph equal to permissions they have on graph group minus "list" permission.
1. Example for entirely confidential database
2. Example using private graphs

SQL Pragmas

Pragmas to control code generation:

Method Modifier(s) Description Usage Example
sql:signal-void-variables When set to 0 that forces the SPARQL compiler to signal errors if some variables cannot be bound due to misspell names or attempts to make joins across disjoint domains. These diagnostics are especially important when the query is long. It is the most useful debugging variable if Linked Data Views are in use. It tells the SPARQL compiler to signal an error if it can prove that some variable can never be bound. Usually it means error in query, like typo in IRI or totally wrong triple pattern. Example
sql:big-data-const Example
sql:describe-mode See detailed description here. Example
sql:log-enable Value that will be passed to SPARUL procedures and there it will be passed to log_enable() BIF. Thus define sql:log-enable N will result in log_enable(N, 1) at the beginning of the operation and other log_enable() call will restore previous mode of transaction log at exit from the procedure or at any error signaled from it. For example, set to 2 to disable logging to avoid huge transaction after-image when sponging is deep and wide. Example
sql:globals-mode tells how to print names of global variables, supported values are "XSLT" (print colon before name of global variable and "SQL" (print as usual) Example
sql:table-option value will be added as an option to each triple in the query and later it will be printed in TABLE OPTION (...) clause of source table clause. This works only for SQL code for plain triples from RDF_QUAD, fragments of queries related to RDF Views will remain unchanged. Example
sql:select-option value will be added as an global OPTION () clause of the generated SQL SELECT. This clause is always printed, it is always at least OPTION (QUIETCAST, ...). The most popular use case is define sql:table-option "ORDER" to tell the SQL compiler execute joins in the order of their use in the query (this can make query compilation much faster but the compilation result can be terrible if you do not know precisely what you're doing and not inspected execution plan of the generated SQL query) Example
sql:assert-user defines the user who is supposed to be the single "proper" use for the query. If the compiler is launched by other user, an error is signaled. The typical use is define sql:assist-user "dba". This is too weak to be a security measure, but may help in debugging of security issues. Example?
sql:gs-app-callback application-specific callback that returns permission bits of a given graph Example
sql:gs-app-uid application-specific user id to use in callback. Example
sql:param "<variable-name>" Synonym of input:param Example?
sql:params "<variable-name>" Synonym of input:param Example?

OUTPUT Pragmas

Pragmas to control the type of the result.

Method Modifier(s) Description Usage Example
output:valmode tells the compiler which SQL datatypes should be used for output values. ODBC clients and the like known nothing about RDF and expect plain SQL values, so the appropriate value for them is "SQLVAL" and that's the default. When a Virtuoso/PL procedure is RDF-aware and keeps results for further passing to other SPARQL queries or some low-level RDF routines, the value "LONG" tells the compiler to preserve RDF boxes as is and to return IRI IDs instead of IRI string value. Third possible value, "AUTO", is for dirty hackers that do not want any conversion of any sort at the output to read the SQL output of SPARQL front-end, find the format of each column and add the needed conversions later. Example
output:format tells the compiler that the query should produce a string output with the serialization of the result, not a result set. There are three of them because the caller, like SPARQL web service endpoint, may not know the actual type of the query that should be executed. The value of output:format is used for SELECT and data manipulation queries, if specified, it can also be used for CONSTRUCT, DESCRIBE or ASK, if it is specified but related output:dict-format or output:scalar-format is not. Example
output:scalar-format tells the compiler that the query should produce a string output with the serialization of the result, not a result set. There are three of them because the caller, like SPARQL web service endpoint, may not know the actual type of the query that should be executed. The value of output:scalar-format is used for ASK queries only, if specified. Example?
output:dict-format tells the compiler that the query should produce a string output with the serialization of the result, not a result set. There are three of them because the caller, like SPARQL web service endpoint, may not know the actual type of the query that should be executed. The value of output:dict-format is used for CONSTRUCT and DESCRIBE queries only, if specified. Example?

Sponger Usage Examples