The Virtuoso Science Library
This is a compilation of the best scientific material on Virtuoso with a paragraph of introduction on each. Some of these are project deliverables from projects under the EU FP7 programme; some are peer-reviewed publications.
European Project Deliverables
-
GeoKnow D 2.6.1: Graph Analytics in the DBMS (2015-01-05)
This introduces the idea of unbundling basic cluster DBMS functionality like cross partition joins and partitioned group by to form a graph processing framework collocated with the data.
-
GeoKnow D2.4.1: Geospatial Clustering and Characteristic Sets (2015-01-06)
This presents experimental results of structure-aware RDF applied to geospatial data. The regularly structured part of the data goes in tables; the rest is triples/quads. Furthermore, for the first time in the RDF space, physical storage location is correlated to properties of entities, in this case geo location, so that geospatially adjacent items are also likely adjacent in the physical data representation.
-
LOD2 D2.1.5: 500 billion triple BSBM (2014-08-18)
This presents experimental results on lookup and BI workloads on Virtuoso cluster with 12 nodes, for a total of 3T RAM and 192 cores. This also discusses bulk load, at up to 6M triples/s and specifics of query optimization in scale-out settings.
-
LOD2 D2.6: Parallel Programming in SQL (2012-08-12)
This discusses ways of making SQL procedures partitioning-aware, so that one can, map-reduce style, send parallel chunks of computation to each partition of the data.
Publications
2015
-
Orri Erling (
OpenLink Software); Alex Averbuch (Neo Technology); Josep Larriba-Pey (Sparsity Technologies); Hassan Chafi (Oracle Labs); Andrey Gubichev (TU Munich); Arnau Prat-Pérez (Universitat Politècnica de Catalunya); Minh-Duc Pham (VU University Amsterdam); Peter Boncz (CWI): The LDBC Social Network Benchmark: Interactive Workload. Proceedings of SIGMOD 2015, Melbourne.
This paper is an overview of the challenges posed in the LDBC social network benchmark, from data generation to the interactive workload.
-
Mihai Capotă (Delft University of Technology), Tim Hegeman (Delft University of Technology), Alexandru Iosup (Delft University of Technology), Arnau Prat-Pérez (Universitat Politècnica de Catalunya), Orri Erling (
OpenLink Software), Peter Boncz (CWI): Graphalytics: A Big Data Benchmark for Graph-Processing Platforms. SIGMOD GRADES 2015.
This paper discusses the future evolution of the LDBC Social Network Benchmark and gives a preview of Virtuoso graph traversal performance.
-
Minh-Duc, Pham, Linnea, P., Erling, O., and Boncz, P.A.
"Deriving an Emergent Relational Schema from RDF Data," WWW, 2015.
This paper shows how RDF is in fact structured and how this structure can be reconstructed. This reconstruction then serves to create a physical schema, reintroducing all the benefits of physical design to the schema-last world. Experiments with Virtuoso show marked gains in query speed and data compactness.
2014
-
Peter A.
Boncz, Orri Erling, Minh-Duc Pham: Experiences with Virtuoso Cluster RDF Column Store.
Linked Data Management 2014: 239-259.
This book chapter gives an in-depth look at the performance dynamics of Virtuoso scale out.
2013
-
P.
A.
Boncz, T.
Neumann, and O.
Erling.
TPC-H Analyzed: Hidden Messages and Lessons Learned from an Influential Benchmark.
Proceedings of the TPC Technology Conference on Performance Evaluation & Benchmarking TPCTC, 2013.
This is a summary of all factors that make up analytics performance by those who know. The Virtuoso TPC-H blog series is a further development and commentary on these same truths.
2012
-
Orri Erling: Virtuoso, a Hybrid RDBMS/Graph Column Store.
IEEE Data Eng.
Bull.
(DEBU) 35(1):3-8 (2012)
This paper introduces the Virtuoso column store architecture and design choices. One design is made to serve both random updates and lookups as well as the big scans where column stores traditionally excel. Examples are given from both TPC-H and the schema-less RDF world.
-
Minh-Duc Pham, Peter A.
Boncz, Orri Erling: S3G2: A Scalable Structure-Correlated Social Graph Generator.
TPCTC 2012:156-172
This paper presents the basis of the social network benchmarking technology later used in the LDBC benchmarks.
2011
-
Christian Bizer, Peter A.
Boncz, Michael L.
Brodie, Orri Erling: "The Meaningful Use of Big Data: Four Perspectives ? Four Challenges".
SIGMOD Record (SIGMOD) 40(4):56-60 (2011)
This is an anthology of views by industry thought leaders on what semantics could or ought to contribute to the practice of data management.
2009
-
Orri Erling, Ivan Mikhailov: Faceted Views over Large-Scale Linked Data.
LDOW 2009
This paper introduces anytime query answering as an enabling technology for open-ended querying of large data on public service end points. While not every query can be run to completion, partial results can most often be returned within a constrained time window.
-
Orri Erling, Ivan Mikhailov: Virtuoso: RDF Support in a Native RDBMS.
Semantic Web Information Management 2009:501-519
This is a general presentation of how a SQL engine needs to be adapted to serve a run-time typed and schema-less workload.
2008
-
Orri Erling, Ivan Mikhailov: Integrating Open Sources and Relational Data with SPARQL.
ESWC 2008:838-842
This paper introduces the still challenging RDF-H benchmark, an RDF translation of the classic TPC-H. Running this over SPARQL to SQL mapping is considered.
2007
-
Orri Erling, Ivan Mikhailov: RDF Support in the Virtuoso DBMS.
CSSW 2007:59-68
This is an initial discussion of RDF support in Virtuoso. Most specifics are by now different but this can give a historical perspective.