%VOSWARNING%
---+ LUBM and Virtuoso
---++ Introduction
This article discusses Virtuoso's performance of the Lehigh University
Benchmark (LUBM) at different scales and in different configurations.
We analyze the performance of Virtuoso in both single-server and clustered
configurations for loading and querying a derivative of the LUBM data set.
The goal of this article is to give a general understanding of Virtuoso's
triple-store performance and governing factors.
---++ The Data Set
We use the unmodified LUBM synthetic data set at different scales. The
basic query timing is done with the qualification data set of about 100K
triples. Tests with concurrent queries are performed at the scales of 800,
8000 and 160,000 universities, corresponding roughly to 100M, 1G and 20G
triples.
We have adapted the original LUBM benchmark queries to Virtuoso and
verified that when applied to the qualification data set the correct
answers are produced.
Aside storage and query, the LUBM benchmark involves simple inference on
RDF data. Some of this inference can be done at either load or query time.
Some of the inference must be done after loading because this is not
supported at query time.
* We always materialize the transitive suborganization relation, so that
a suborganization triple is always present between all direct and indirect
super/suborganization pairs.
* Inverse relation inference is not tested. The queries are rewritten to
avoid reliance on inverse relation inference.
* Subclass and subproperty inference may be done either at query or load
time. We experiment with both.
* The benchmark does not involve owl:same-as. Virtuoso's query-time
support for owl:same-as is not used.
---++ The Queries and Metric
The queries exist in three variants:
1 Open Coded Inference. All combinations of subclass and subproperty
relations are expressed as unions.
1 Inference using Virtuoso's query-time support of subclass and
subproperty.
1 Materialized data where all triples implied by subclass and
subproperty relations are physically present.
The original LUBM involves a composite metric of speed and completeness of
inference. We produce complete results in all cases but vary the time and
mode of inference.
We measure load rates as Kt/s, with 1Kt/s being 1000 triples per second of
real time.
* We give times for single user query execution against the one
university qualification database. These times are in milliseconds.
* For concurrent query load, we have defined a query mix consisting of
the 14 LUBM queries, modified when necessary so as not to return excessive
volumes of data. The metric is queries per second at scale, where scale is
the number of universities. Each query belonging to a completed query mix
is counted as one query. Only queries from query mixes completed during the
measurement interval are counted.
All queries are modified so as to be scoped to a single university. The
rationale is that queries that read through the whole database will be by
far the longest in duration and hence the benchmark would measure only
these if these were included in the mix. A mix with orders of magnitude
between longest and shortest might as well not include the shorter queries.
---++ Query Mix
The adapted query mix is shown below. We only show the version that runs
against the materialized data. The other mixes are similarly modified. The
mixes run against the qualification database are listed in all three
variants in the appendix.
/* Q1 */
select * from };
/* Q2 */
select * from . ?x ub:undergraduateDegreeFrom <%s> }
/* Q3 */
select * from }
/* Q4 */
select * from . ?x ub:name ?y1 . ?x ub:emailAddress ?y2 . ?x ub:telephone ?y3 . };
/* Q5 */
select * from };
/* Q6 */
select * from };
/* Q7 */
select * from ub:teacherOf ?y . ?x ub:takesCourse ?y . };
/* Q8 */
select * from . ?x ub:emailAddress ?z };
/* Q9 */
select * from . }
/* Q10 */
select * from . };
/* Q11 */
select * from . };
/* Q12 */
select * from . };
/* Q13 */
select count (*) from . }
/* Q14 */
select * from . };
The %s is substituted with a randomly selected IRI of the appropriate type.
Q13 is modified to return a count because otherwise it would take all the
time of the benchmark since it returns up to 48K rows.
---++ Database Layout
The tests are run against Virtuoso with default triple storage layout. All
the data except for schema data is loaded in a single graph. The quads are
indexed as GSPO and OGPS, where the latter is a bitmap index with all
values of S for a given OGP combination represented as a bitmap. All URI
and object ids are 32-bit. An O that is an IRI or short scalar is stored
inline in the O column of the quad table. Long string-valued Os are
assigned an id and referenced using this id from the O of the quad table.
---++ Loading
We have experimented with different ways of loading RDF data using
different multithreading schemes.
1 Loading on a single thread, with the same thread running the parser
and translating the URIs to URI id and inserting these into the quad table
1 One thread parsing a file and feeding a queue from which worker
threads pick triples. The worker threads then translate the URIs to IDs and
insert the triples.
1 Cluster loading with optimization on message passing.
All loading is done without locking, transaction rollback possibility and
with no roll-forward logging. This is reasonable since this is a bulk-load
activity. The loads are hardened by a database checkpoint.
We have found that with a single server process, the best performance is
obtained by running one single threaded load function on each core, thus
with 4 concurrent loads proceeding at all times on a 4-core machine. With 8
cores, the optimum is around 6 streams.
If only a single load stream is available, some performance gain is
obtained by having up to 3 worker threads for processing the output of a
single parser thread.
In all cases, we avoid having threads repeatedly hit the same last page of
data by giving each thread a small pool of URI ids to allocate. Thus two
threads do not generally try to write the same page at the same time.
---++ Tuning
On all systems, the count of database cache buffers was selected so that
the Virtuoso process, after reaching steady memory consumption, took about
three-quarters the available physical memory. In this way, a database
buffer counts for about 9.5K.
When separate disks were available, if running a single server, the
database was striped across all disks. When running multiple server
processes in cluster mode, each had its own disk when available.
Besides this, no other special configuration measures were taken.
---++ Systems Tested
The systems tested were:
System A: 2 x Xeon 5130 2GHz, 8G RAM, 6 x 160G SATA2 disk
System B: 2 x Xeon 5330 2GHz, 6 x 250G SATA2 disks
---+++ Large Load Rate
8000 Universities
System A: 29.7 Kt/s, single server, 4 streams
System B: 36.9 Kt/s
---+++ Qualification Database
We ran all versions of the queries against the qualification database of
1 university, about 100K triples to verify that all query versions produce
the same data. The total run times of the queries, one stream at a time,
warm cache, are stated below, tested on system A:
1 Unions: 1917 ms
1 Inference: 1029 ms
1 Materialized: 724 ms
With the materialized run, the set of queries performs 77,600 single row
retrievals, a rate of 98,000 rows per second. This does not include lookups
which find no rows. The single row lookup rate is about 300,000 per second
if there is no other query logic, for example when joining one index of the
quad table to another index on full equality, i.e. checking that the two
indices have the same content.
All three modes access the same amount of data on a warm cache. The
difference is only due to the different length of execution path in the SQL
run time.
---+++ Concurrent Query Rate
We filled a database with 8000 universities and ran different numbers of
clients on different fractions of the database. Each query is scoped to a
single university picked at random from the /n/ first universities of the
8000 university database; this ensures the total volume is the same but the
working set varies.
The queries considered here are using the built-in inference of subclasses
and subproperties. The only materialized inference is the suborganization
property.
The timing results are obtained when the server has reached a steady state
with the selected number of universities. Steady state is here defined as
either (1) having less than 1% of real time in disk i/o or (2) having
filled all disk cache buffers after starting with an empty cache.
The results are reported per query, taking a sample of the test driver's
output.
The numbers are, for example in:
-- Q1 2 / 40 / 299 3451 0% 85 times
Query
| shortest/average /longest msec
| total msec
| percentage of total run time spent in this query
| count of times the query was run in the reported interval
---+++ 100 Universities
The queries were applied against the 100 first universities of a database
of 8000. This measures memory-based performance.
1 client: 11 qps
4 clients: 31 qps
8 clients: 33.1 qps
CPU at approx 360% of 400%, less than 0.04 threads waiting for disk
Sample output from run with 4 clients:
-- Q1 1 / 2 / 12 29 0% 10 times
-- Q2 7 / 9 / 11 92 0% 10 times
-- Q3 2 / 10 / 24 104 0% 10 times
-- Q4 8 / 47 / 153 476 2% 10 times
-- Q5 0 / 6 / 16 62 0% 10 times
-- Q6 7 / 18 / 28 185 1% 10 times
-- Q7 6 / 12 / 31 120 0% 10 times
-- Q8 299 / 431 / 546 4311 23% 10 times
-- Q9 70 / 88 / 123 888 4% 10 times
-- Q10 2 / 4 / 11 44 0% 10 times
-- Q11 5 / 6 / 8 67 0% 10 times
-- Q12 3 / 8 / 13 82 0% 10 times
-- Q13 822 / 911 / 1023 9110 50% 10 times
-- Q14 83 / 170 / 275 1700 9% 10 times
1000 Universities
The measurement was done with 8 concurrent clients feeding the query mix
against the 1000 first universities of the 8000 university set.
6.7 qps
CPU 89% of 400%, disk 6.9 of 8 threads waiting on average
Sample:
-- Q1 1 / 18 / 33 186 0% 10 times
-- Q2 8 / 115 / 640 1150 0% 10 times
-- Q3 7 / 75 / 278 756 0% 10 times
-- Q4 39 / 117 / 182 1175 0% 10 times
-- Q5 4 / 7 / 25 78 0% 10 times
-- Q6 9 / 28 / 98 280 0% 10 times
-- Q7 5 / 72 / 326 724 0% 10 times
-- Q8 347 / 13939 / 30077 139397 83% 10 times
-- Q9 170 / 364 / 763 3648 2% 10 times
-- Q10 3 / 18 / 45 186 0% 10 times
-- Q11 6 / 105 / 406 1058 0% 10 times
-- Q12 6 / 102 / 721 1025 0% 10 times
-- Q13 691 / 930 / 1260 9302 5% 10 times
-- Q14 101 / 589 / 1690 5898 3% 10 times
8000 Universities
4.8 qps
CPU at 20% of 400%, 7.7 threads waiting for disk on the average.
Sample:
-- Q1 20 / 71 / 219 710 0% 10 times
-- Q2 28 / 110 / 484 1106 0% 10 times
-- Q3 48 / 83 / 172 830 0% 10 times
-- Q4 121 / 205 / 364 2056 1% 10 times
-- Q5 4 / 40 / 133 403 0% 10 times
-- Q6 73 / 129 / 224 1298 0% 10 times
-- Q7 77 / 170 / 323 1706 0% 10 times
-- Q8 8169 / 15293 / 25299 152930 75% 10 times
-- Q9 234 / 629 / 988 6294 3% 10 times
-- Q10 12 / 36 / 69 363 0% 10 times
-- Q11 255 / 411 / 617 4116 2% 10 times
-- Q12 7 / 600 / 1027 6007 2% 10 times
-- Q13 15 / 303 / 1154 3035 1% 10 times
-- Q14 958 / 1897 / 2706 18979 9% 10 times
Comments: Q13 is low because there in fact are universities in the
generated set from which nobody has a degree.
Even though the performance is totally I/O-bound, all indices of the
database have a hit rate of over 99%. This means less than 1 read per 100
successfully retrieved rows.
---++ Analysis
We see that the database size has little effect on query-times as long as
the working set fits in memory. The single query stream rate with 100K
triples is 14 qps at 100K triples and 11 qps at 1G triples. We also note
that while staying in memory, contention between processor cores does not
severely affect performance: from 11 qps with 1 stream to 33 qps with 8
streams with 4 cores.
As expected, we get a severe drop in performance when going out of purely
memory-based working set. This emphasizes the need for a memory-efficient
storage format. This has been addressed in Virtuoso 6, which stores twice
as many triples in the same space.
All disk-access is done on-demand, one page at a time. The workload does
not have many opportunities for exploiting sequentiality in disk access.
The starting point of a navigational query is typically a bitmap, such as
the bitmap of all subjects of a given type or all suborganizations of a
university. These often fit on a single page but for larger bitmaps read
ahead is beneficial and should be used.
Bitmap intersections are frequent: for example in Q13, where we have an
intersection of all subcases of person with all types of graduates from a
given university. Thus we have a loop iterating over the types of persons,
a nested loop iterating over the types of degrees and then a bitmap
intersection counting how many of the persons intersect with the bitmap of
graduates of the given type from the university.
The bitmap intersection is about twice as efficient as the equivalent loop
join, even in the worst case, i.e. a short bitmap (all doctoral graduates
of university 1) with a large bitmap (all associate professors in the
database). If the bitmaps are about the same size the gain from a bitmap
merge join is still greater.
Otherwise the access method is loop joining, most often using the OGPS
bitmap index. This is preferred because it is only about 1/3 the size of
the GSPO index with the same data. Loop joins with random access offer
little opportunity for optimizing disk-access.
Hash joins do not occur in the execution plans, which is for the best. The
cases of joining a small set to a large one on equality of a key are
covered by bitmap intersections.
---++ Conclusions and Future Work
We have here presented intermediate results following a review of the LUBM
query workload and some consequent optimizations. All results are measured
on Virtuoso 5.0.4, as of February 1, 2008.
We thank the authors of the LUBM benchmark for their work in defining the
test data and the workload. A point-by-point run-through this and the
issues this presented resulted in an improvement of over 30% in our
performance of this workload. This serves to demonstrate that benchmarks
are always useful.
As we have stated many times before, RDF benchmarking needs to evolve to
more varied workloads, specifically analytics with aggregation and
grouping. This is where a lot of the action in the relational space is and
where RDF also may find uses as a data integration medium.
A query performance metric should have the right mix of frequent and
infrequent queries. Also, the queries likely to run in an interactive
application and those run in batch mode should be differently weighted and
with different frequencies or should have their own benchmark and metric.
Due to this the queries-per-second metric presented here is not
representative of any specific type of application.
While LUBM has served us well indeed, it is time to define a new benchmark
with a metric for concurrent performance and a more complex and varied
workload.
As future work, we intend to define a new RDF database benchmark drawing on
the social web as a use-case and featuring a more varied workload with a
well-defined metric for concurrent query and update.
At the time of writing, we are running the same tests on Virtuoso 6.0 in
single-machine and cluster configurations and hope to publish results in
due course. As a preview, we can say that performance there is higher due
to improved storage density.
---++ Appendix A Query Text
This appendix contains the text of the queries adapted to Virtuoso. Three
variants are presented: one with unions, one with run-time subclass and
subproperty inference and one with all entailed triples materialized. The
script text can be run with the Virtuoso /isql/ utility.
---+++ Entailment
For all scripts, the ub:chair property was materialized. The statement
for this is:
sparql prefix ub:
insert into graph { ?x ub:subOrganizationOf ?z } from where { ?x ub:subOrganizationOf ?y . ?y ub:subOrganizationOf ?z . };
Additionally, for getting the correct results with the materialized script,
the following statements were run:
sparql prefix ub:
insert into graph { ?x a ub:Professor }
where {
{ ?x a ub:AssistantProfessor } union
{ ?x a ub:AssociateProfessor } union
{ ?x a ub:FullProfessor } union
{ ?x a ub:VisitingProfessor }
};
sparql prefix ub:
insert into graph { ?x a ub:Faculty }
where {
{ ?x a ub:Professor } union
{ ?x a ub:PostDoc } union
{ ?x a ub:Lecturer }
};
sparql prefix ub:
insert into graph { ?x a ub:Student }
where {
{ ?x a ub:UndergraduateStudent } union
{ ?x a ub:GraduateStudent } union
{ ?x a ub:ResearchAssistant }
};
sparql prefix ub:
insert into graph { ?x a ub:AdministrativeStaff }
where {
{ ?x a ub:ClericalStaff } union
{ ?x a ub:SystemsStaff }
};
sparql prefix ub:
insert into graph { ?x a ub:Employee }
where {
{ ?x a ub:Faculty } union
{ ?x a ub:AdministrativeStaff }
};
sparql prefix ub:
insert into graph { ?x a ub:Person }
where {
{ ?x a ub:Chair } union
{ ?x a ub:Dean } union
{ ?x a ub:Director } union
{ ?x a ub:Employee } union
{ ?x a ub:Student } union
{ ?x a ub:TeachingAssistant }
};
sparql prefix ub:
insert into graph { ?x a ub:Course }
where {
{ ?x a ub:GraduateCourse }
};
sparql prefix ub:
insert into graph { ?x ub:worksFor ?z }
where {
{ ?x ub:headOf ?z }
};
sparql prefix ub:
insert into graph { ?x ub:memberOf ?z }
where {
{ ?x ub:worksFor ?z }
};
sparql prefix ub:
insert into graph { ?x ub:degreeFrom ?z }
where {
{ ?x ub:doctoralDegreeFrom ?z } union
{ ?x ub:mastersDegreeFrom ?z } union
{ ?x ub:undergraduateDegreeFrom ?z }
};
---+++ Query Text with Unions
set autocommit on;
-- Q1
sparql prefix ub:
select * from
where { ?x rdf:type ub:GraduateStudent . ?x ub:takesCourse };
-- Q2
sparql prefix ub:
select * from
where { ?x a ub:GraduateStudent . ?y a ub:University . ?z a ub:Department . ?x ub:memberOf ?z . ?z ub:subOrganizationOf ?y . ?x ub:undergraduateDegreeFrom ?y };
-- Q3
sparql prefix ub:
select * from
where { ?x a ub:Publication . ?x ub:publicationAuthor };
-- Q4
sparql prefix ub:
select * from
where
{
{ ?x a ub:AssociateProfessor . ?x ub:worksFor . ?x ub:name ?y1 . ?x ub:emailAddress ?y2 . ?x ub:telephone ?y3 . }
union
{ ?x a ub:AssistantProfessor . ?x ub:worksFor . ?x ub:name ?y1 . ?x ub:emailAddress ?y2 . ?x ub:telephone ?y3 . }
union
{ ?x a ub:FullProfessor . ?x ub:worksFor . ?x ub:name ?y1 . ?x ub:emailAddress ?y2 . ?x ub:telephone ?y3 . }
};
-- Q5
sparql prefix ub:
select distinct * from
where
{
{ ?x a ub:AssociateProfessor . ?x ub:memberOf } union
{ ?x a ub:FullProfessor . ?x ub:memberOf } union
{ ?x a ub:AssistantProfessor . ?x ub:memberOf } union
{ ?x a ub:Lecturer . ?x ub:memberOf } union
{ ?x a ub:UndergraduateStudent . ?x ub:memberOf } union
{ ?x a ub:GraduateStudent . ?x ub:memberOf } union
{ ?x a ub:TeachingAssistant . ?x ub:memberOf } union
{ ?x a ub:ResearchAssistant . ?x ub:memberOf } union
{ ?x a ub:AssociateProfessor . ?x ub:worksFor } union
{ ?x a ub:FullProfessor . ?x ub:worksFor } union
{ ?x a ub:AssistantProfessor . ?x ub:worksFor } union
{ ?x a ub:Lecturer . ?x ub:worksFor } union
{ ?x a ub:UndergraduateStudent . ?x ub:worksFor } union
{ ?x a ub:GraduateStudent . ?x ub:worksFor } union
{ ?x a ub:TeachingAssistant . ?x ub:worksFor } union
{ ?x a ub:ResearchAssistant . ?x ub:worksFor } union
{ ?x a ub:AssociateProfessor . ?x ub:headOf } union
{ ?x a ub:FullProfessor . ?x ub:headOf } union
{ ?x a ub:AssistantProfessor . ?x ub:headOf } union
{ ?x a ub:Lecturer . ?x ub:headOf } union
{ ?x a ub:UndergraduateStudent . ?x ub:headOf } union
{ ?x a ub:GraduateStudent . ?x ub:headOf } union
{ ?x a ub:TeachingAssistant . ?x ub:headOf } union
{ ?x a ub:ResearchAssistant . ?x ub:headOf }
};
-- Q6
sparql prefix ub:
select distinct * from where {
{ ?x a ub:UndergraduateStudent . }
union
{ ?x a ub:ResearchAssistant . }
union
{ ?x a ub:GraduateStudent . }
};
-- Q7
sparql prefix ub:
select distinct * from
where
{
{ ?x a ub:UndergraduateStudent . ?y a ub:Course . ub:teacherOf ?y . ?x ub:takesCourse ?y . }
union
{ ?x a ub:UndergraduateStudent . ?y a ub:GraduateCourse . ub:teacherOf ?y . ?x ub:takesCourse ?y . }
union
{ ?x a ub:ResearchAssistant . ?y a ub:Course . ub:teacherOf ?y . ?x ub:takesCourse ?y . }
union
{ ?x a ub:ResearchAssistant . ?y a ub:GraduateCourse . ub:teacherOf ?y . ?x ub:takesCourse ?y . }
union
{ ?x a ub:GraduateStudent . ?y a ub:Course . ub:teacherOf ?y . ?x ub:takesCourse ?y . }
union
{ ?x a ub:GraduateStudent . ?y a ub:GraduateCourse . ub:teacherOf ?y . ?x ub:takesCourse ?y . }
}
;
-- Q8
sparql prefix ub:
select distinct * from
where
{
{ ?x a ub:UndergraduateStudent . ?y a ub:Department . ?x ub:memberOf ?y . ?y ub:subOrganizationOf . ?x ub:emailAddress ?z }
union
{ ?x a ub:UndergraduateStudent . ?y a ub:Department . ?x ub:worksFor ?y . ?y ub:subOrganizationOf . ?x ub:emailAddress ?z }
union
{ ?x a ub:UndergraduateStudent . ?y a ub:Department . ?x ub:headOf ?y . ?y ub:subOrganizationOf . ?x ub:emailAddress ?z }
union
{ ?x a ub:ResearchAssistant . ?y a ub:Department . ?x ub:memberOf ?y . ?y ub:subOrganizationOf . ?x ub:emailAddress ?z }
union
{ ?x a ub:ResearchAssistant . ?y a ub:Department . ?x ub:worksFor ?y . ?y ub:subOrganizationOf . ?x ub:emailAddress ?z }
union
{ ?x a ub:ResearchAssistant . ?y a ub:Department . ?x ub:headOf ?y . ?y ub:subOrganizationOf . ?x ub:emailAddress ?z }
union
{ ?x a ub:GraduateStudent . ?y a ub:Department . ?x ub:memberOf ?y . ?y ub:subOrganizationOf . ?x ub:emailAddress ?z }
union
{ ?x a ub:GraduateStudent . ?y a ub:Department . ?x ub:worksFor ?y . ?y ub:subOrganizationOf . ?x ub:emailAddress ?z }
union
{ ?x a ub:GraduateStudent . ?y a ub:Department . ?x ub:headOf ?y . ?y ub:subOrganizationOf . ?x ub:emailAddress ?z }
}
;
-- Q9
sparql prefix ub:
select distinct * from
where
{
{ ?x a ub:ResearchAssistant . ?y a ub:Lecturer . ?z a ub:Course . ?x ub:advisor ?y . ?x ub:takesCourse ?z . ?y ub:teacherOf ?z . } union
{ ?x a ub:ResearchAssistant . ?y a ub:PostDoc . ?z a ub:Course . ?x ub:advisor ?y . ?x ub:takesCourse ?z . ?y ub:teacherOf ?z . } union
{ ?x a ub:ResearchAssistant . ?y a ub:VisitingProfessor . ?z a ub:Course . ?x ub:advisor ?y . ?x ub:takesCourse ?z . ?y ub:teacherOf ?z . } union
{ ?x a ub:ResearchAssistant . ?y a ub:AssistantProfessor . ?z a ub:Course . ?x ub:advisor ?y . ?x ub:takesCourse ?z . ?y ub:teacherOf ?z . } union
{ ?x a ub:ResearchAssistant . ?y a ub:AssociateProfessor . ?z a ub:Course . ?x ub:advisor ?y . ?x ub:takesCourse ?z . ?y ub:teacherOf ?z . } union
{ ?x a ub:ResearchAssistant . ?y a ub:FullProfessor . ?z a ub:Course . ?x ub:advisor ?y . ?x ub:takesCourse ?z . ?y ub:teacherOf ?z . } union
{ ?x a ub:ResearchAssistant . ?y a ub:Lecturer . ?z a ub:GraduateCourse . ?x ub:advisor ?y . ?x ub:takesCourse ?z . ?y ub:teacherOf ?z . } union
{ ?x a ub:ResearchAssistant . ?y a ub:PostDoc . ?z a ub:GraduateCourse . ?x ub:advisor ?y . ?x ub:takesCourse ?z . ?y ub:teacherOf ?z . } union
{ ?x a ub:ResearchAssistant . ?y a ub:VisitingProfessor . ?z a ub:GraduateCourse . ?x ub:advisor ?y . ?x ub:takesCourse ?z . ?y ub:teacherOf ?z . } union
{ ?x a ub:ResearchAssistant . ?y a ub:AssistantProfessor . ?z a ub:GraduateCourse . ?x ub:advisor ?y . ?x ub:takesCourse ?z . ?y ub:teacherOf ?z . } union
{ ?x a ub:ResearchAssistant . ?y a ub:AssociateProfessor . ?z a ub:GraduateCourse . ?x ub:advisor ?y . ?x ub:takesCourse ?z . ?y ub:teacherOf ?z . } union
{ ?x a ub:ResearchAssistant . ?y a ub:FullProfessor . ?z a ub:GraduateCourse . ?x ub:advisor ?y . ?x ub:takesCourse ?z . ?y ub:teacherOf ?z . } union
{ ?x a ub:UndergraduateStudent . ?y a ub:Lecturer . ?z a ub:Course . ?x ub:advisor ?y . ?x ub:takesCourse ?z . ?y ub:teacherOf ?z . } union
{ ?x a ub:UndergraduateStudent . ?y a ub:PostDoc . ?z a ub:Course . ?x ub:advisor ?y . ?x ub:takesCourse ?z . ?y ub:teacherOf ?z . } union
{ ?x a ub:UndergraduateStudent . ?y a ub:VisitingProfessor . ?z a ub:Course . ?x ub:advisor ?y . ?x ub:takesCourse ?z . ?y ub:teacherOf ?z . } union
{ ?x a ub:UndergraduateStudent . ?y a ub:AssistantProfessor . ?z a ub:Course . ?x ub:advisor ?y . ?x ub:takesCourse ?z . ?y ub:teacherOf ?z . } union
{ ?x a ub:UndergraduateStudent . ?y a ub:AssociateProfessor . ?z a ub:Course . ?x ub:advisor ?y . ?x ub:takesCourse ?z . ?y ub:teacherOf ?z . } union
{ ?x a ub:UndergraduateStudent . ?y a ub:FullProfessor . ?z a ub:Course . ?x ub:advisor ?y . ?x ub:takesCourse ?z . ?y ub:teacherOf ?z . } union
{ ?x a ub:UndergraduateStudent . ?y a ub:Lecturer . ?z a ub:GraduateCourse . ?x ub:advisor ?y . ?x ub:takesCourse ?z . ?y ub:teacherOf ?z . } union
{ ?x a ub:UndergraduateStudent . ?y a ub:PostDoc . ?z a ub:GraduateCourse . ?x ub:advisor ?y . ?x ub:takesCourse ?z . ?y ub:teacherOf ?z . } union
{ ?x a ub:UndergraduateStudent . ?y a ub:VisitingProfessor . ?z a ub:GraduateCourse . ?x ub:advisor ?y . ?x ub:takesCourse ?z . ?y ub:teacherOf ?z . } union
{ ?x a ub:UndergraduateStudent . ?y a ub:AssistantProfessor . ?z a ub:GraduateCourse . ?x ub:advisor ?y . ?x ub:takesCourse ?z . ?y ub:teacherOf ?z . } union
{ ?x a ub:UndergraduateStudent . ?y a ub:AssociateProfessor . ?z a ub:GraduateCourse . ?x ub:advisor ?y . ?x ub:takesCourse ?z . ?y ub:teacherOf ?z . } union
{ ?x a ub:UndergraduateStudent . ?y a ub:FullProfessor . ?z a ub:GraduateCourse . ?x ub:advisor ?y . ?x ub:takesCourse ?z . ?y ub:teacherOf ?z . } union
{ ?x a ub:GraduateStudent . ?y a ub:Lecturer . ?z a ub:Course . ?x ub:advisor ?y . ?x ub:takesCourse ?z . ?y ub:teacherOf ?z . } union
{ ?x a ub:GraduateStudent . ?y a ub:PostDoc . ?z a ub:Course . ?x ub:advisor ?y . ?x ub:takesCourse ?z . ?y ub:teacherOf ?z . } union
{ ?x a ub:GraduateStudent . ?y a ub:VisitingProfessor . ?z a ub:Course . ?x ub:advisor ?y . ?x ub:takesCourse ?z . ?y ub:teacherOf ?z . } union
{ ?x a ub:GraduateStudent . ?y a ub:AssistantProfessor . ?z a ub:Course . ?x ub:advisor ?y . ?x ub:takesCourse ?z . ?y ub:teacherOf ?z . } union
{ ?x a ub:GraduateStudent . ?y a ub:AssociateProfessor . ?z a ub:Course . ?x ub:advisor ?y . ?x ub:takesCourse ?z . ?y ub:teacherOf ?z . } union
{ ?x a ub:GraduateStudent . ?y a ub:FullProfessor . ?z a ub:Course . ?x ub:advisor ?y . ?x ub:takesCourse ?z . ?y ub:teacherOf ?z . } union
{ ?x a ub:GraduateStudent . ?y a ub:Lecturer . ?z a ub:GraduateCourse . ?x ub:advisor ?y . ?x ub:takesCourse ?z . ?y ub:teacherOf ?z . } union
{ ?x a ub:GraduateStudent . ?y a ub:PostDoc . ?z a ub:GraduateCourse . ?x ub:advisor ?y . ?x ub:takesCourse ?z . ?y ub:teacherOf ?z . } union
{ ?x a ub:GraduateStudent . ?y a ub:VisitingProfessor . ?z a ub:GraduateCourse . ?x ub:advisor ?y . ?x ub:takesCourse ?z . ?y ub:teacherOf ?z . } union
{ ?x a ub:GraduateStudent . ?y a ub:AssistantProfessor . ?z a ub:GraduateCourse . ?x ub:advisor ?y . ?x ub:takesCourse ?z . ?y ub:teacherOf ?z . } union
{ ?x a ub:GraduateStudent . ?y a ub:AssociateProfessor . ?z a ub:GraduateCourse . ?x ub:advisor ?y . ?x ub:takesCourse ?z . ?y ub:teacherOf ?z . } union
{ ?x a ub:GraduateStudent . ?y a ub:FullProfessor . ?z a ub:GraduateCourse . ?x ub:advisor ?y . ?x ub:takesCourse ?z . ?y ub:teacherOf ?z . }
};
-- Q10
sparql prefix ub:
select * from
where
{
{ ?x a ub:ResearchAssistant . ?x ub:takesCourse . }
union
{ ?x a ub:UndergraduateStudent . ?x ub:takesCourse . }
union
{ ?x a ub:GraduateStudent . ?x ub:takesCourse . }
};
-- Q11
sparql prefix ub: select * from where { ?x a ub:ResearchGroup . ?x ub:subOrganizationOf . };
-- Q12
sparql prefix ub: select * from where
{
{ ?x a ub:FullProfessor . ?y a ub:Department . ?x ub:headOf ?y . ?y ub:subOrganizationOf . }
union
{ ?x a ub:AssistantProfessor . ?y a ub:Department . ?x ub:headOf ?y . ?y ub:subOrganizationOf . }
union
{ ?x a ub:AssociateProfessor . ?y a ub:Department . ?x ub:headOf ?y . ?y ub:subOrganizationOf . }
};
-- Q13
sparql prefix ub: select * from where
{
{ ?x a ub:AssociateProfessor . ?x ub:doctoralDegreeFrom . }
union
{ ?x a ub:FullProfessor . ?x ub:doctoralDegreeFrom . }
union
{ ?x a ub:AssistantProfessor . ?x ub:doctoralDegreeFrom . }
union
{ ?x a ub:Lecturer . ?x ub:doctoralDegreeFrom . }
union
{ ?x a ub:UndergraduateStudent . ?x ub:doctoralDegreeFrom . }
union
{ ?x a ub:GraduateStudent . ?x ub:doctoralDegreeFrom . }
union
{ ?x a ub:TeachingAssistant . ?x ub:doctoralDegreeFrom . }
union
{ ?x a ub:ResearchAssistant . ?x ub:doctoralDegreeFrom . }
union
{ ?x a ub:AssociateProfessor . ?x ub:mastersDegreeFrom . }
union
{ ?x a ub:FullProfessor . ?x ub:mastersDegreeFrom . }
union
{ ?x a ub:AssistantProfessor . ?x ub:mastersDegreeFrom . }
union
{ ?x a ub:Lecturer . ?x ub:mastersDegreeFrom . }
union
{ ?x a ub:UndergraduateStudent . ?x ub:mastersDegreeFrom . }
union
{ ?x a ub:GraduateStudent . ?x ub:mastersDegreeFrom . }
union
{ ?x a ub:TeachingAssistant . ?x ub:mastersDegreeFrom . }
union
{ ?x a ub:ResearchAssistant . ?x ub:mastersDegreeFrom . }
union
{ ?x a ub:AssociateProfessor . ?x ub:undergraduateDegreeFrom . }
union
{ ?x a ub:FullProfessor . ?x ub:undergraduateDegreeFrom . }
union
{ ?x a ub:AssistantProfessor . ?x ub:undergraduateDegreeFrom . }
union
{ ?x a ub:Lecturer . ?x ub:undergraduateDegreeFrom . }
union
{ ?x a ub:UndergraduateStudent . ?x ub:undergraduateDegreeFrom . }
union
{ ?x a ub:GraduateStudent . ?x ub:undergraduateDegreeFrom . }
union
{ ?x a ub:TeachingAssistant . ?x ub:undergraduateDegreeFrom . }
union
{ ?x a ub:ResearchAssistant . ?x ub:undergraduateDegreeFrom . }
}
;
-- Q14
sparql prefix ub: select * from where { ?x a ub:UndergraduateStudent . };
---+++ Query Text With Inference Options
set autocommit on;
-- Q1
sparql
define input:inference 'inft'
prefix ub:
select * from
where { ?x rdf:type ub:GraduateStudent . ?x ub:takesCourse };
-- Q2
sparql
define input:inference 'inft'
prefix ub:
select * from
where { ?x a ub:GraduateStudent . ?y a ub:University . ?z a ub:Department . ?x ub:memberOf ?z . ?z ub:subOrganizationOf ?y . ?x ub:undergraduateDegreeFrom ?y };
-- Q3
sparql
define input:inference 'inft'
prefix ub:
select * from
where { ?x a ub:Publication . ?x ub:publicationAuthor };
-- Q4
sparql
define input:inference 'inft'
prefix ub:
select distinct * from
where { ?x a ub:Professor . ?x ub:worksFor . ?x ub:name ?y1 . ?x ub:emailAddress ?y2 . ?x ub:telephone ?y3 . };
-- Q5
sparql
define input:inference 'inft'
prefix ub:
select distinct * from
where { ?x a ub:Person . ?x ub:memberOf };
-- Q6
sparql
define input:inference 'inft'
prefix ub:
select distinct * from where { ?x a ub:Student . };
-- Q7
sparql
define input:inference 'inft'
prefix ub:
select distinct * from
where { ?x a ub:Student . ?y a ub:Course . ub:teacherOf ?y . ?x ub:takesCourse ?y . };
-- Q8: XXX
sparql
define input:inference 'inft'
prefix ub:
select distinct * from
where { ?x a ub:Student . ?y a ub:Department . ?x ub:memberOf ?y . ?y ub:subOrganizationOf . ?x ub:emailAddress ?z };
-- Q9: XXX
sparql
define input:inference 'inft'
prefix ub:
select distinct * from
where { ?x a ub:Student . ?y a ub:Faculty . ?z a ub:Course . ?x ub:advisor ?y . ?x ub:takesCourse ?z . ?y ub:teacherOf ?z . };
-- Q10
sparql
define input:inference 'inft'
prefix ub:
select * from
where { ?x a ub:Student . ?x ub:takesCourse . };
-- Q11
sparql
define input:inference 'inft'
prefix ub: select * from where { ?x a ub:ResearchGroup . ?x ub:subOrganizationOf . };
-- Q12
sparql
define input:inference 'inft'
prefix ub: select * from where { ?x a ub:Professor . ?y a ub:Department . ?x ub:headOf ?y . ?y ub:subOrganizationOf . };
-- Q13
sparql
define input:inference 'inft'
prefix ub: select * from where { ?x a ub:Person . ?x ub:degreeFrom . };
-- Q14
sparql
define input:inference 'inft'
prefix ub: select * from where { ?x a ub:UndergraduateStudent . };
---+++ Appendix C Query Text With Materialized Entailed Triples
set autocommit on;
-- Q1
sparql prefix ub:
select * from
where { ?x rdf:type ub:GraduateStudent . ?x ub:takesCourse };
-- Q2
sparql prefix ub:
select * from
where { ?x a ub:GraduateStudent . ?y a ub:University . ?z a ub:Department . ?x ub:memberOf ?z . ?z ub:subOrganizationOf ?y . ?x ub:undergraduateDegreeFrom ?y };
-- Q3
sparql prefix ub:
select * from
where { ?x a ub:Publication . ?x ub:publicationAuthor };
-- Q4
sparql prefix ub:
select * from
where { ?x a ub:Professor . ?x ub:worksFor . ?x ub:name ?y1 . ?x ub:emailAddress ?y2 . ?x ub:telephone ?y3 . };
-- Q5
sparql prefix ub:
select * from
where { ?x a ub:Person . ?x ub:memberOf };
-- Q6
sparql prefix ub:
select * from where { ?x a ub:Student . };
-- Q7
sparql prefix ub:
select * from
where { ?x a ub:Student . ?y a ub:Course . ub:teacherOf ?y . ?x ub:takesCourse ?y . };
-- Q8
sparql prefix ub:
select * from
where { ?x a ub:Student . ?y a ub:Department . ?x ub:memberOf ?y . ?y ub:subOrganizationOf . ?x ub:emailAddress ?z };
-- Q9
sparql prefix ub:
select * from
where { ?x a ub:Student . ?y a ub:Faculty . ?z a ub:Course . ?x ub:advisor ?y . ?x ub:takesCourse ?z . ?y ub:teacherOf ?z . };
-- Q10
sparql prefix ub:
select * from
where { ?x a ub:Student . ?x ub:takesCourse . };
-- Q11
sparql prefix ub: select * from where { ?x a ub:ResearchGroup . ?x ub:subOrganizationOf . };
-- Q12
sparql prefix ub: select * from where { ?x a ub:Professor . ?y a ub:Department . ?x ub:headOf ?y . ?y ub:subOrganizationOf . };
-- Q13
sparql prefix ub: select * from where { ?x a ub:Person . ?x ub:degreeFrom . };
-- Q14
sparql prefix ub: select * from where { ?x a ub:UndergraduateStudent . };
---++ Appendix B Configuration
Single process, 8G RAM.
The following lines were changed in the default virtuoso.ini file:
NumberOfBuffers = 550000
MaxCheckpointRemap = 2000000
Striping = 1
[Striping]
# One file per disk, with distinct IO queue
Segment1 = 100G /disk1/db1-1.db q1, /disk2/db1-2.db q2 # and so on