SPARQL Support in Virtuoso SQL

Data Types

SPARQL needs to differentiate between names of nodes and any other data. To natively facilitate this, Virtuoso introduces a distinct SQL IRI data type. This is the unique id of a resource IRI, not to be confused with numbers, strings or other data. By making the IRI a first class citizen of SQL, Virtuoso avoids many of the problems traditionally found when mapping RDF to SQL, where special additional type flags are often used. Since Virtuoso SQL has always been typed at run time and has always featured an ANY data type, mapping operations on heterogeneous data from SPARQL to SQL is made easier and the inefficiency of extra type checking in SQL is avoided.

Furthermore, Virtuoso has a SPARQL mode where automatic type casting is modified so as not to generate errors when regular SQL semantics require a cast failure to be signaled but SPARQL expects a silent failure.

SQL Compiler

RDF querying is specially join intensive. Virtuoso recognizes joins between S and S and S and O where a merging AND of indices can be used instead of a loop join. This frequently converts random access to sequential access, for example when there is an AND of patterns where the S is a shared unknown and the P and O of the triples are given.

Similar join order and loop invariant considerations apply to SPARQL as to SQL, thus the SPARQL implementation directly benefits from all the work done on SQL. There are additional plans to adjust the SQL cost model to specially sample the RDF data at compile time to better sense the cardinality of triple patterns, as standard SQL compiler statistics do not provide the requisite level of detail.

Additionally, many specialized SQL constructs will be introduced for speeding up bulk load and such operations.

CategoryWebSite CategoryVirtuoso CategoryOpenSource