• Topic
  • Discussion
  • VOS.VirtRDFDatasetDump(1.2) -- DAVWikiAdmin? , 2019-12-22 09:13:40 Edit WebDAV System Administrator 2019-12-22 09:13:40

    Producing RDF dumps from the Virtuoso Triple-/Quad-Store

    What?

    How to export RDF model data from Virtuoso's Quad Store.

    Why?

    Every DBMS needs to offer a mechanism for bulk export and import of data.

    Virtuoso supports dumping and reloading graph model data (e.g., RDF), as well as relational data (e.g., SQL) (discussed elsewhere).

    How?

    We have created stored procedures for the task. The dump procedures leverage SPARQL to facilitate selective data dump(s) from one or more Named Graphs, each denoted by an IRI.

    Dump One Graph

    The procedure dump_one_graph can be used to dump any single Named Graph.

    Parameters

    • IN srcgraph VARCHAR -- source graph
    • IN out_file VARCHAR -- output file
    • IN file_length_limit INTEGER -- maximum length of dump files

    Procedure source


    CREATE PROCEDURE dump_one_graph 
      ( IN  srcgraph           VARCHAR
      , IN  out_file           VARCHAR
      , IN  file_length_limit  INTEGER  := 1000000000
      )
      {
        DECLARE  file_name     VARCHAR;
        DECLARE  env,  ses           ANY;
        DECLARE  ses_len
              ,  max_ses_len
              ,  file_len
              ,  file_idx      INTEGER;
       SET ISOLATION = 'uncommitted';
       max_ses_len  := 10000000;
       file_len     := 0;
       file_idx     := 1;
       file_name    := sprintf ('%s%06d.ttl', out_file, file_idx);
       string_to_file ( file_name || '.graph', 
                         srcgraph, 
                         -2
                       );
        string_to_file ( file_name, 
                         sprintf ( '# Dump of graph <%s>, as of %s\n@base <> .\n', 
                                   srcgraph, 
                                   CAST (NOW() AS VARCHAR)
                                 ), 
                         -2
                       );
       env := vector (dict_new (16000), 0, '', '', '', 0, 0, 0, 0, 0);
       ses := string_output ();
       FOR (SELECT * FROM ( SPARQL DEFINE input:storage "" 
                             SELECT ?s ?p ?o { GRAPH `iri(?:srcgraph)` { ?s ?p ?o } } 
                           ) AS sub OPTION (LOOP)) DO
          {
            http_ttl_triple (env, "s", "p", "o", ses);
            ses_len := length (ses);
            IF (ses_len > max_ses_len)
              {
                file_len := file_len + ses_len;
                IF (file_len > file_length_limit)
                  {
                    http (' .\n', ses);
                    string_to_file (file_name, ses, -1);
    		gz_compress_file (file_name, file_name||'.gz');
    		file_delete (file_name);
                    file_len := 0;
                    file_idx := file_idx + 1;
                    file_name := sprintf ('%s%06d.ttl', out_file, file_idx);
                    string_to_file ( file_name, 
                                     sprintf ( '# Dump of graph <%s>, as of %s (part %d)\n@base <> .\n', 
                                               srcgraph, 
                                               CAST (NOW() AS VARCHAR), 
                                               file_idx), 
                                     -2
                                   );
                     env := VECTOR (dict_new (16000), 0, '', '', '', 0, 0, 0, 0, 0);
                  }
                ELSE
                  string_to_file (file_name, ses, -1);
                ses := string_output ();
              }
          }
        IF (LENGTH (ses))
          {
            http (' .\n', ses);
            string_to_file (file_name, ses, -1);
    	gz_compress_file (file_name, file_name||'.gz');
    	file_delete (file_name);
          }
      }
    ;
    

    To load the procedure into Virtuoso, it can simply be copied and pasted into the Virtuoso "isql" command line tool or Interactive SQL UI of the Conductor and execute.

    Example

    1. Call the dump_one_graph procedure with appropriate arguments:

      $ pwd /Applications/OpenLink Virtuoso/Virtuoso 6.1/database $ grep DirsAllowed virtuoso.ini DirsAllowed = ., ../vad, $ /opt/virtuoso/bin/isql 1111 Connected to OpenLink Virtuoso Driver: 06.01.3127 OpenLink Virtuoso ODBC Driver OpenLink Interactive SQL (Virtuoso), version 0.9849b. Type HELP; for help and EXIT; to exit. SQL> dump_one_graph ('http://daas.openlinksw.com/data#', './data_', 1000000000); Done. -- 1438 msec.

    2. As a result, a dump of the graph <http://daas.openlinksw.com/data#> will be found in the files data_XX (located in your Virtuoso db folder):

      $ ls data_000001.ttl data_000002.ttl .... data_000001.ttl.graph

    Dump Multiple Graphs

    See the following documentation on dumping Virtuoso RDF Quad store hosted data into NQuad datasets.

    Related