To determine the time taken to load the triples into a RDF graph when it is comprised of multiple datasets using the Virtuoso RDF Bulk Loader functions.
If loading datasets from multiple files at the same time and/or running multiple rdf_loader_run()
functions for faster parallel loading of datasets on a multi-core machine.
The Virtuoso rdf_loader_run()
is used for loading RDF datasets, and records the results of loading these datasets in the load_list
, including the load times for each physical files and the graph name it is loaded into with the ll_start
and ll_done
columns.
A SQL query of the following form can be used to determine the time taken to load the datasets for a given graph:
select min(ll_started) as start, max(ll_done) as finish, datediff('second', min(ll_started), max(ll_done)) as delta from load_list where ll_graph like '<graph-name>';
where graph-name
is the name of the graph the load time is required for, with the delta
return value being the time taken in seconds.
For example, the time taken to load the Freebase dataset into the Virtuoso LOD Cloud Cluster instance was 6135 seconds i.e. about 1.7hrs:
SQL> select min(ll_started) as start, max(ll_done) as finish, datediff('second', min(ll_started), max(ll_done)) as delta from load_list where ll_graph like 'http://commondatastorage.googleapis.com/freebase-public/rdf/freebase-rdf-2013-11-17-00-00.gz'; start finish delta TIMESTAMP TIMESTAMP INTEGER _______________________________________________________________________________ 2013.12.2 22:34.9 0 2013.12.3 0:16.24 0 6135 1 Rows. -- 74 msec. SQL>