How to Attach a CSV document to Virtuoso Database Server Instance?

What?

Configuring Virtuoso's Virtual Database (VDB) engine to treat selected CSV (comma-separated variable) files as DBMS TABLEs.

Why?

Attachment of CSV files broadens the range of external data sources which may be used to construct virtual views of heterogeneous data sources, which are ultimately available to the Federated SQL and Linked Data Views services of a given Virtuoso instance.

How?

Using a Table as a Placeholder for CSV Attachment

In this example a SQL TABLE is created and then associated with an external CSV file. This file is situated in a host operating system location that's accessible to the Virtuoso instance using relative (to instance .db file) or full path file name reference (constrained by DirsAllowed entry in Virtuoso INI).

CSV filename: contacts.csv (which can also be referred to as file:contacts.csv) .

File Content:


Id,Fname,Sname,Age
0,John,Smith,48 
1,Anna,Clarks,62 
2,Rojer,Danrette,27 
3,Kate,Sigton,56 
4,Tim,Craft,41 

CSV File to SQL Table association steps:

  1. CREATE an empty TABLE , using the SQL command:

    CREATE TABLE csv.tutorials.contacts ( Id INTEGER NOT NULL, Fname VARCHAR(20), Sname VARCHAR(20), Age INTEGER, CONSTRAINT demo_table_pk PRIMARY KEY (Id) );

    • Notes:
      • There is no special object type for File Tables corresponding to the tabular data structure represented by the CSV document content;
      • To create a File Table, a user must have DBA group privileges. In addition, access to the referenced file is subject the same file system access functions (e.g., file_to_string and DirsAllowed INI file settings) that govern the whole Virtuoso instance.
  2. Use Virtuoso ft_set_file procedure to associate the above newly created internal TABLE (first procedure argument) with the external CSV file (second procedure argument, in the form of a file name relative to the Virtuoso instance); optional arguments may be used to specify delimiter, newline, and escape characters, respectively:

    ft_set_file ('csv.tutorials.contacts', 'contacts.csv', ',', 1) ;

    • Notes:
      • The text in each field is parsed according to the data type declared for the column whose position in the CREATE TABLE statement corresponds to that field's position on the line. The parsing is as by the SQL CAST function from a VARCHAR value. If the CAST fails, the line is silently ignored.
      • The newline and escape characters must be single character strings. A newline or escape character following the escape character will be added to the parsed input as a literal character, without its special interpretation.
      • Each column in the CSV file is expected to end with the delimiter character. A field of zero length is considered a SQL NULL value; i.e., if two delimiters are adjacent, the field is considered NULL. Likewise, if a line begins with the delimiter, the first field is considered NULL.
  3. Check the inserted data in the csv.tutorials.contacts table:

    SQL> SELECT * FROM csv.tutorials.contacts; Id Fname Sname Age INTEGER NOT NULL VARCHAR VARCHAR INTEGER _________________________________________________ 0 John Smith 48 1 Anna Clarks 62 2 Rojer Danrette 27 3 Kate Sigton 56 4 Tim Craft 41 5 Rows. -- 15 msec.

Simplified CSV File Attachment Variant I

This example demonstrates how to directly attach external CSV file to Virtuoso that functions like any other internal TABLE. Unlike the prior example, you aren't required to create a SQL Table that functions as a placeholder, the entire process is automated:

  1. Create CSV file named contacts1.csv:

    $ cat > contacts1.csv Id,Fname,Sname,Age 0,John,Smith,48 1,Anna,Clarks,62 2,Rojer,Danrette,27 3,Kate,Sigton,56 4,Tim,Craft,41

  2. Place the created contacts1.csv file within scope of DirsAllowed INI setting.
  3. Call the attach_from_csv procedure from iSQL or Virtuoso Conductor UI ( http://host:port/conductor ). Note: As the pkey_columns parameter value is not specified, by default the virtual table will have no Primary Key:

    SQL> attach_from_csv ('csv.tutorials.contacts1', 'file:contacts1.csv', ',', '\n', null, 1); Done. -- 109 msec.

  4. Check the inserted data in the csv.tutorials.contacts1 table:

    SQL> SELECT * FROM csv.tutorials.contacts1 ; Id Fname Sname Age INTEGER VARCHAR VARCHAR INTEGER ___________________________________________________________________________ 0 John Smith 48 1 Anna Clarks 62 2 Rojer Danrette 27 3 Kate Sigton 56 4 Tim Craft 41 5 Rows. -- 78 msec.

  5. Check the Primary Key columns of the csv.tutorials.contacts1 table:

    SQL> primarykeys csv.tutorials.contacts1; Showing SQLPrimaryKeys of tables like 'csv.tutorials.contacts1', tabletype/colname like 'NULL' TABLE_QUALIFIER TABLE_OWNER TABLE_NAME COLUMN_NAME KEY_SEQ PK_NAME ROOT_QUALIFIER ROOT_OWNER ROOT_NAME VARCHAR VARCHAR VARCHAR VARCHAR SMALLINT VARCHAR VARCHAR VARCHAR VARCHAR __________________________________________________________________________________________________________________ 0 Rows. -- 47 msec.

Simplified CSV File Attachment Variant II

This example demonstrates how to directly attach external CSV file to Virtuoso by skipping the first 3 rows of the the CSV file:

  1. Create CSV file named contacts1.csv:

    $ cat > contacts1.csv Id,Fname,Sname,Age 0,John,Smith,48 1,Anna,Clarks,62 2,Rojer,Danrette,27 3,Kate,Sigton,56 4,Tim,Craft,41

  2. Place the created contacts1.csv file within scope of DirsAllowed INI setting.
  3. Call the attach_from_csv procedure from iSQL or Virtuoso Conductor UI ( http://host:port/conductor ):

    SQL> attach_from_csv ('csv.tutorials.contacts1', 'file:contacts1.csv', ',', '\n', null, 3); Done. -- 109 msec.

  4. Check the inserted data in the csv.tutorials.contacts1 table:

    SQL> SELECT * FROM csv.tutorials.contacts1 ; Id Fname Sname Age INTEGER VARCHAR VARCHAR INTEGER ___________________________________________________________________________ 2 Rojer Danrette 27 3 Kate Sigton 56 4 Tim Craft 41 3 Rows. -- 78 msec.

Simplified CSV File Attachment with Compound Key Generation

In this example a CSV File System is not only attached to Virtuoso, but as part of the process a Primary Key is generated using the first and second ordinal columns from the CSV file content:

  1. Create CSV file named contacts2.csv :

    $ cat > contacts2.csv Fname,Sname,Age John,Smith,48 Anna,Clarks,62 Rojer,Danrette,27 Kate,Sigton,56 Tim,Craft,41

  2. Place the created contacts2.csv file within scope of DirsAllowed INI setting.
  3. Call the attach_from_csv procedure from iSQL or Virtuoso Conductor UI ( http://host:port/conductor ):

    SQL> attach_from_csv ('csv.tutorials.contacts2', 'file:contacts2.csv', ',', '\n', null, 1,vector(1,2)); Done. -- 109 msec.

  4. Check the inserted data in the csv.tutorials.contacts2 table:

    SQL> SELECT * FROM csv.tutorials.contacts2 ; Fname Sname Age VARCHAR VARCHAR INTEGER ___________________________________________________________________________ John Smith 48 Anna Clarks 62 Rojer Danrette 27 Kate Sigton 56 Tim Craft 41 5 Rows. -- 78 msec.

  5. Check the Primary Key columns of the csv.tutorials.contacts2 table:

    SQL> primarykeys csv.tutorials.contacts2; Showing SQLPrimaryKeys of tables like 'csv.tutorials.contacts2', tabletype/colname like 'NULL' TABLE_QUALIFIER TABLE_OWNER TABLE_NAME COLUMN_NAME KEY_SEQ PK_NAME ROOT_QUALIFIER ROOT_OWNER ROOT_NAME VARCHAR VARCHAR VARCHAR VARCHAR SMALLINT VARCHAR VARCHAR VARCHAR VARCHAR ____________________________________________________________________________________________________________________________________________ DB DBA csv.tutorials.contacts2 Fname 1 csv.tutorials.contacts2 DB DBA csv.tutorials.contacts2 DB DBA csv.tutorials.contacts2 Sname 2 csv.tutorials.contacts2 DB DBA csv.tutorials.contacts2 2 Rows. -- 32 msec.

Simplified CSV File Attachment with Composite Key Generation

In this example a CSV File System is not only attached to Virtuoso, but as part of the process a Primary Key is generated using the second (CompanyID), third (Fname) and forth (Sname) ordinal columns. In this example the CompanyID column (which uniquely identifies a company) is combined with the Fname and Sname columns to create a primary key for each contact:

  1. Create CSV file named contacts3.csv:

    $ cat > contacts3.csv CompanyName,CompanyID,Fname,Sname,Age MLogistic,12,John,Smith,48 ZiAirLines,13,Anna,Clarks,62 MLogistic,12,Rojer,Danrette,27 MLogistic,12,Kate,Sigton,56 ZiAirLines,13,Tim,Craft,41

  2. Place the created contacts3.csv file within scope of DirsAllowed INI setting.
  3. Call the attach_from_csv procedure from iSQL or Virtuoso Conductor UI ( http://host:port/conductor ):

    SQL> attach_from_csv ('csv.tutorials.contacts3', 'file:contacts3.csv', ',', '\n', null, 1, vector (2,3,4)); Done. -- 109 msec.

  4. Check the inserted data in the csv.tutorials.contacts3 table:

    SQL> SELECT * FROM csv.tutorials.contacts3 ; CompanyName CompanyID Fname Sname Age VARCHAR INTEGER VARCHAR VARCHAR INTEGER ___________________________________________________________ MLogistic 12 John Smith 48 ZiAirLines 13 Anna Clarks 62 MLogistic 12 Rojer Danrette 27 MLogistic 12 Kate Sigton 56 ZiAirLines 13 Tim Craft 41

  5. Check the Primary Key columns of the csv.tutorials.contacts3 table:

    SQL> primarykeys csv.tutorials.contacts3; Showing SQLPrimaryKeys of tables like 'csv.tutorials.contacts3', tabletype/colname like 'NULL' TABLE_QUALIFIER TABLE_OWNER TABLE_NAME COLUMN_NAME KEY_SEQ PK_NAME ROOT_QUALIFIER ROOT_OWNER ROOT_NAME VARCHAR VARCHAR VARCHAR VARCHAR SMALLINT VARCHAR VARCHAR VARCHAR VARCHAR ___________________________________________________________________________________________________________________________________ DB DBA csv.tutorials.contacts3 CompanyID 1 csv.tutorials.contacts3 DB DBA csv.tutorials.contacts3 DB DBA csv.tutorials.contacts3 Fname 2 csv.tutorials.contacts3 DB DBA csv.tutorials.contacts3 DB DBA csv.tutorials.contacts3 Sname 3 csv.tutorials.contacts3 DB DBA csv.tutorials.contacts3 3 Rows. -- 47 msec.

Related