&timeout={time in milliseconds}
) as part of Virtuoso's SPARQL Protocol implementation.The following scenario demonstrates how to manage anytime SPARQL query execution using the timeout
parameter at a Virtuoso SPARQL Endpoint.
Suppose a simple query:
SELECT DISTINCT ?p WHERE { ?s ?p ?o } LIMIT 100
Suppose the query execution fails against Virtuoso endpoint due to running out of execution time as set in virtuoso.ini
configuration file.
This is basically a table scan, since it has to go through each quad and create a HASH of unique P it sees.
After that is done, it will return the first 100 values of P from this hash.
Next we will use SQL translation and explain()
of the query:
SQL> SET sparql_translate ON; SQL> SELECT DISTINCT ?p WHERE { ?s ?p ?o } LIMIT 100; SPARQL_TO_SQL_TEXT VARCHAR _______________________________________________________________________________ SELECT __ro2sq ("s_2_0_rbc"."p") AS "p" FROM (SELECT DISTINCT TOP 100 __id2i ( "s_2_1-t0"."P" ) AS "p" FROM DB.DBA.RDF_QUAD AS "s_2_1-t0" OPTION (QUIETCAST)) AS "s_2_0_rbc" 1 Rows. -- 1 msec. SQL> set sparql_translate off; SQL> set explain on; SQL> SELECT __ro2sq ("s_2_0_rbc"."p") AS "p" FROM (SELECT DISTINCT TOP 100 __id2i ( "s_2_1-t0"."P" ) AS "p" FROM DB.DBA.RDF_QUAD AS "s_2_1-t0" OPTION (QUIETCAST)) AS "s_2_0_rbc" SQL> SELECT __ro2sq ("s_2_0_rbc"."p") AS "p" FROM (SELECT DISTINCT TOP 100 __id2i ( "s_2_1-t0"."P" ) AS "p" FROM DB.DBA.RDF_QUAD AS "s_2_1-t0" OPTION (QUIETCAST)) AS "s_2_0_rbc" ; REPORT VARCHAR _______________________________________________________________________________ { local save: ($22 "set_no", $23 "set_no_save") Subquery 20 { from DB.DBA.RDF_QUAD by RDF_QUAD_POGS 8.9e+07 rows Key RDF_QUAD_POGS ASC ($25 "s_2_1-t0.P") After code: 0: $28 "__id2i" := Call __id2i ($25 "s_2_1-t0.P") 5: BReturn 0 Distinct (HASH) ($25 "s_2_1-t0.P") After code: 0: $21 "p" := := artm $28 "__id2i" 4: BReturn 0 Subquery Select(TOP 100 ) ($21 "p", <$27 "<DB.DBA.RDF_QUAD s_2_1-t0>" spec 5>) } After code: 0: $44 "p" := Call __ro2sq ($21 "p") 5: BReturn 0 Select ($44 "p") } 26 Rows. -- 1 msec. SQL> set explain off;
As we can see, it is going to use the RDF_QUAD_POGS
index which offers the best query plan construction basis.
It would be marginally faster if there was an index that started with GP, but the current 2+3 index scheme does not have such.
So it is almost a full table scan, which might not complete on most of our systems due to limits set in the INI file for Web accessible instances
Set the Execution timeout
on the /sparql
form to enable the Anytime paradigm:
SELECT distinct ?p WHERE { ?s ?p ?o } LIMIT 100
60000
.
timeout=60000
), that presents the found results:
http://dbpedia.org/sparql/?default-graph-uri=http%3A%2F%2Fdbpedia.org&query=select+distinct+%3Fp%0D%0AWHERE%0D%0A{%0D%0A+%3Fs+%3Fp+%3Fo%0D%0A}%0D%0ALIMIT+100%0D%0A&format=text%2Fhtml&timeout=60000&debug=on
10000
(10 sec), you get only one unique ?p
value back.
Using a timeout value of 60000
(60 sec) would still only show a few different values for ?p
, which is logical considering that the index is sorted by Predicate (the P slot in the RDF triple pattern).
This means that once it gets the first unique ?p
, it has to skip all the triples that have the same value of ?p
, to get to the next one.
$ curl -F "query=SELECT DISTINCT ?p WHERE { ?s ?p ?o } LIMIT 100" http://dbpedia.org/sparql Virtuoso S1T00 Error SR171: Transaction timed out SPARQL query: define sql:big-data-const 0 SELECT DISTINCT ?p WHERE { ?s ?p ?o} LIMIT 100
$ curl "http://dbpedia.org/sparql/?default-graph-uri=http%3A%2F%2Fdbpedia.org&query=SELECT+DISTINCT+%3Fp%0D%0AWHERE%0D%0A%7B%0D%0A+%3Fs+%3Fp+%3Fo%0D%0A%7D%0D%0ALIMIT+100%0D%0A&format=text%2Fhtml&timeout=60000&debug=on" <table class="sparql" border="1"> <tr> <th>p</th> </tr> <tr> <td>http://www.w3.org/1999/02/22-rdf-syntax-ns#type</td> </tr> <tr> <td>http://www.w3.org/2002/07/owl#equivalentClass</td> </tr> <tr> <td>http://www.w3.org/2002/07/owl#sameAs</td> </tr> <tr> <td>http://www.w3.org/2002/07/owl#equivalentProperty</td> </tr> <tr> <td>http://www.w3.org/2000/01/rdf-schema#subClassOf</td> </tr> <tr> <td>http://www.w3.org/2004/02/skos/core#broader</td> </tr> <tr> <td>http://www.w3.org/2000/01/rdf-schema#comment</td> </tr> <tr> <td>http://www.w3.org/2000/01/rdf-schema#label</td> </tr> <tr> <td>http://xmlns.com/foaf/0.1/name</td> </tr> <tr> <td>http://xmlns.com/foaf/0.1/nick</td> </tr> <tr> <td>http://www.w3.org/2004/02/skos/core#prefLabel</td> </tr> <tr> <td>http://www.w3.org/2003/01/geo/wgs84_pos#lat</td> </tr> <tr> <td>http://www.w3.org/2003/01/geo/wgs84_pos#long</td> </tr> <tr> <td>http://www.w3.org/2000/01/rdf-schema#domain</td> </tr> <tr> <td>http://www.w3.org/2000/01/rdf-schema#range</td> </tr> <tr> <td>http://www.w3.org/2002/07/owl#versionInfo</td> </tr> <tr> <td>http://dbpedia.org/ontology/purpose</td> </tr> <tr> <td>http://dbpedia.org/ontology/supplementalDraftRound</td> </tr> <tr> <td>http://dbpedia.org/ontology/podiums</td> </tr> <tr> <td>http://dbpedia.org/ontology/buildingStartDate</td> </tr> </table>
In this example, the Response
header contains X-SPARQL-MaxRows?: 10000
, which is used to indicate the maximum result set size of the instance, configured via [SPARQL]
INI section entry.
For Anytime Query
, it means that determining timeout that triggers response with additional headers (due to partial results) is based on the combination of MaxRows
and QueryTimeout
; i.e., only when Virtuoso cannot produce the ResultSet
that matches MaxRows
within QueryTimeout
setting, will there be a timeout condition accompanied by addition response headers.
curl -I "http://dbpedia.org/sparql?default-graph-uri=http%3A%2F%2Fdbpedia.org&query=SELECT+*+WHERE+%7B+%3Fs+%3Fp+%3Fo+.+%7D&format=text%2Fhtml&timeout=2000&debug=on" HTTP/1.1 200 OK Date: Fri, 12 Dec 2014 20:51:29 GMT Content-Type: text/html; charset=UTF-8 Content-Length: 3804866 Connection: keep-alive Vary: Accept-Encoding Server: Virtuoso/07.10.3211 (Linux) x86_64-redhat-linux-gnu VDB X-SPARQL-default-graph: http://dbpedia.org *X-SPARQL-MaxRows: 10000* Expires: Fri, 19 Dec 2014 20:51:29 GMT Cache-Control: max-age=604800 Accept-Ranges: bytes