Configuring Virtuoso's Virtual Database (VDB) engine to treat selected CSV (comma-separated variable) files as DBMS TABLEs
.
Attachment of CSV files broadens the range of external data sources which may be used to construct virtual views of heterogeneous data sources, which are ultimately available to the Federated SQL and Linked Data Views services of a given Virtuoso instance.
In this example a SQL TABLE
is created and then associated with an external CSV file.
This file is situated in a host operating system location that's accessible to the Virtuoso instance using relative (to instance .db file) or full path file name reference (constrained by DirsAllowed
entry in Virtuoso INI).
CSV filename: contacts.csv
(which can also be referred to as file:contacts.csv) .
File Content:
Id,Fname,Sname,Age 0,John,Smith,48 1,Anna,Clarks,62 2,Rojer,Danrette,27 3,Kate,Sigton,56 4,Tim,Craft,41
CSV File to SQL Table association steps:
CREATE
an empty TABLE
, using the SQL command:
CREATE TABLE csv.tutorials.contacts ( Id INTEGER NOT NULL, Fname VARCHAR(20), Sname VARCHAR(20), Age INTEGER, CONSTRAINT demo_table_pk PRIMARY KEY (Id) );
DirsAllowed
INI file settings) that govern the whole Virtuoso instance.
TABLE
(first procedure argument) with the external CSV file (second procedure argument, in the form of a file name relative to the Virtuoso instance); optional arguments may be used to specify delimiter, newline, and escape characters, respectively:
ft_set_file ('csv.tutorials.contacts', 'contacts.csv', ',', 1) ;
CREATE TABLE
statement corresponds to that field's position on the line.
The parsing is as by the SQL CAST
function from a VARCHAR
value.
If the CAST
fails, the line is silently ignored.
NULL
value; i.e., if two delimiters are adjacent, the field is considered NULL
.
Likewise, if a line begins with the delimiter, the first field is considered NULL
.
csv.tutorials.contacts
table:
SQL> SELECT * FROM csv.tutorials.contacts; Id Fname Sname Age INTEGER NOT NULL VARCHAR VARCHAR INTEGER _________________________________________________ 0 John Smith 48 1 Anna Clarks 62 2 Rojer Danrette 27 3 Kate Sigton 56 4 Tim Craft 41 5 Rows. -- 15 msec.
This example demonstrates how to directly attach external CSV file to Virtuoso that functions like any other internal TABLE
.
Unlike the prior example, you aren't required to create a SQL Table that functions as a placeholder, the entire process is automated:
contacts1.csv
:
$ cat > contacts1.csv Id,Fname,Sname,Age 0,John,Smith,48 1,Anna,Clarks,62 2,Rojer,Danrette,27 3,Kate,Sigton,56 4,Tim,Craft,41
contacts1.csv
file within scope of DirsAllowed INI
setting.
pkey_columns
parameter value is not specified, by default the virtual table will have no Primary Key:
SQL> attach_from_csv ('csv.tutorials.contacts1', 'file:contacts1.csv', ',', '\n', null, 1); Done. -- 109 msec.
csv.tutorials.contacts1
table:
SQL> SELECT * FROM csv.tutorials.contacts1 ; Id Fname Sname Age INTEGER VARCHAR VARCHAR INTEGER ___________________________________________________________________________ 0 John Smith 48 1 Anna Clarks 62 2 Rojer Danrette 27 3 Kate Sigton 56 4 Tim Craft 41 5 Rows. -- 78 msec.
csv.tutorials.contacts1
table:
SQL> primarykeys csv.tutorials.contacts1; Showing SQLPrimaryKeys of tables like 'csv.tutorials.contacts1', tabletype/colname like 'NULL' TABLE_QUALIFIER TABLE_OWNER TABLE_NAME COLUMN_NAME KEY_SEQ PK_NAME ROOT_QUALIFIER ROOT_OWNER ROOT_NAME VARCHAR VARCHAR VARCHAR VARCHAR SMALLINT VARCHAR VARCHAR VARCHAR VARCHAR __________________________________________________________________________________________________________________ 0 Rows. -- 47 msec.
This example demonstrates how to directly attach external CSV file to Virtuoso by skipping the first 3 rows of the the CSV file:
contacts1.csv
:
$ cat > contacts1.csv Id,Fname,Sname,Age 0,John,Smith,48 1,Anna,Clarks,62 2,Rojer,Danrette,27 3,Kate,Sigton,56 4,Tim,Craft,41
contacts1.csv
file within scope of DirsAllowed INI
setting.
SQL> attach_from_csv ('csv.tutorials.contacts1', 'file:contacts1.csv', ',', '\n', null, 3); Done. -- 109 msec.
csv.tutorials.contacts1
table:
SQL> SELECT * FROM csv.tutorials.contacts1 ; Id Fname Sname Age INTEGER VARCHAR VARCHAR INTEGER ___________________________________________________________________________ 2 Rojer Danrette 27 3 Kate Sigton 56 4 Tim Craft 41 3 Rows. -- 78 msec.
In this example a CSV File System is not only attached to Virtuoso, but as part of the process a Primary Key is generated using the first and second ordinal columns from the CSV file content:
contacts2.csv
:
$ cat > contacts2.csv Fname,Sname,Age John,Smith,48 Anna,Clarks,62 Rojer,Danrette,27 Kate,Sigton,56 Tim,Craft,41
contacts2.csv
file within scope of DirsAllowed INI
setting.
SQL> attach_from_csv ('csv.tutorials.contacts2', 'file:contacts2.csv', ',', '\n', null, 1,vector(1,2)); Done. -- 109 msec.
csv.tutorials.contacts2
table:
SQL> SELECT * FROM csv.tutorials.contacts2 ; Fname Sname Age VARCHAR VARCHAR INTEGER ___________________________________________________________________________ John Smith 48 Anna Clarks 62 Rojer Danrette 27 Kate Sigton 56 Tim Craft 41 5 Rows. -- 78 msec.
csv.tutorials.contacts2
table:
SQL> primarykeys csv.tutorials.contacts2; Showing SQLPrimaryKeys of tables like 'csv.tutorials.contacts2', tabletype/colname like 'NULL' TABLE_QUALIFIER TABLE_OWNER TABLE_NAME COLUMN_NAME KEY_SEQ PK_NAME ROOT_QUALIFIER ROOT_OWNER ROOT_NAME VARCHAR VARCHAR VARCHAR VARCHAR SMALLINT VARCHAR VARCHAR VARCHAR VARCHAR ____________________________________________________________________________________________________________________________________________ DB DBA csv.tutorials.contacts2 Fname 1 csv.tutorials.contacts2 DB DBA csv.tutorials.contacts2 DB DBA csv.tutorials.contacts2 Sname 2 csv.tutorials.contacts2 DB DBA csv.tutorials.contacts2 2 Rows. -- 32 msec.
In this example a CSV File System is not only attached to Virtuoso, but as part of the process a Primary Key is generated using the second (CompanyID
), third (Fname
) and forth (Sname
) ordinal columns.
In this example the CompanyID
column (which uniquely identifies a company) is combined with the Fname
and Sname
columns to create a primary key for each contact:
contacts3.csv
:
$ cat > contacts3.csv CompanyName,CompanyID,Fname,Sname,Age MLogistic,12,John,Smith,48 ZiAirLines,13,Anna,Clarks,62 MLogistic,12,Rojer,Danrette,27 MLogistic,12,Kate,Sigton,56 ZiAirLines,13,Tim,Craft,41
contacts3.csv
file within scope of DirsAllowed INI
setting.
SQL> attach_from_csv ('csv.tutorials.contacts3', 'file:contacts3.csv', ',', '\n', null, 1, vector (2,3,4)); Done. -- 109 msec.
csv.tutorials.contacts3
table:
SQL> SELECT * FROM csv.tutorials.contacts3 ; CompanyName CompanyID Fname Sname Age VARCHAR INTEGER VARCHAR VARCHAR INTEGER ___________________________________________________________ MLogistic 12 John Smith 48 ZiAirLines 13 Anna Clarks 62 MLogistic 12 Rojer Danrette 27 MLogistic 12 Kate Sigton 56 ZiAirLines 13 Tim Craft 41
csv.tutorials.contacts3
table:
SQL> primarykeys csv.tutorials.contacts3; Showing SQLPrimaryKeys of tables like 'csv.tutorials.contacts3', tabletype/colname like 'NULL' TABLE_QUALIFIER TABLE_OWNER TABLE_NAME COLUMN_NAME KEY_SEQ PK_NAME ROOT_QUALIFIER ROOT_OWNER ROOT_NAME VARCHAR VARCHAR VARCHAR VARCHAR SMALLINT VARCHAR VARCHAR VARCHAR VARCHAR ___________________________________________________________________________________________________________________________________ DB DBA csv.tutorials.contacts3 CompanyID 1 csv.tutorials.contacts3 DB DBA csv.tutorials.contacts3 DB DBA csv.tutorials.contacts3 Fname 2 csv.tutorials.contacts3 DB DBA csv.tutorials.contacts3 DB DBA csv.tutorials.contacts3 Sname 3 csv.tutorials.contacts3 DB DBA csv.tutorials.contacts3 3 Rows. -- 47 msec.