How to Attach a CSV document to Virtuoso Database Server Instance?
What?
Configuring Virtuoso's Virtual Database (VDB) engine to treat selected CSV (comma-separated variable) files as DBMS TABLEs
.
Why?
Attachment of CSV files broadens the range of external data sources which may be used to construct virtual views of heterogeneous data sources, which are ultimately available to the Federated SQL and Linked Data Views services of a given Virtuoso instance.
How?
Using a Table as a Placeholder for CSV Attachment
In this example a SQL TABLE
is created and then associated with an external CSV file.
This file is situated in a host operating system location that's accessible to the Virtuoso instance using relative (to instance .db file) or full path file name reference (constrained by DirsAllowed
entry in Virtuoso INI).
CSV filename: contacts.csv
(which can also be referred to as file:contacts.csv) .
File Content:
Id,Fname,Sname,Age 0,John,Smith,48 1,Anna,Clarks,62 2,Rojer,Danrette,27 3,Kate,Sigton,56 4,Tim,Craft,41
CSV File to SQL Table association steps:
-
CREATE
an emptyTABLE
, using the SQL command:
CREATE TABLE csv.tutorials.contacts ( Id INTEGER NOT NULL, Fname VARCHAR(20), Sname VARCHAR(20), Age INTEGER, CONSTRAINT demo_table_pk PRIMARY KEY (Id) );
- Notes:
- There is no special object type for File Tables corresponding to the tabular data structure represented by the CSV document content;
- To create a File Table, a user must have DBA group privileges.
In addition, access to the referenced file is subject the same file system access functions (e.g., file_to_string and
DirsAllowed
INI file settings) that govern the whole Virtuoso instance.
- Notes:
- Use Virtuoso ft_set_file procedure to associate the above newly created internal
TABLE
(first procedure argument) with the external CSV file (second procedure argument, in the form of a file name relative to the Virtuoso instance); optional arguments may be used to specify delimiter, newline, and escape characters, respectively:
ft_set_file ('csv.tutorials.contacts', 'contacts.csv', ',', 1) ;
- Notes:
- The text in each field is parsed according to the data type declared for the column whose position in the
CREATE TABLE
statement corresponds to that field's position on the line. The parsing is as by the SQLCAST
function from aVARCHAR
value. If theCAST
fails, the line is silently ignored. - The newline and escape characters must be single character strings. A newline or escape character following the escape character will be added to the parsed input as a literal character, without its special interpretation.
- Each column in the CSV file is expected to end with the delimiter character.
A field of zero length is considered a SQL
NULL
value; i.e., if two delimiters are adjacent, the field is consideredNULL
. Likewise, if a line begins with the delimiter, the first field is consideredNULL
.
- The text in each field is parsed according to the data type declared for the column whose position in the
- Notes:
- Check the inserted data in the
csv.tutorials.contacts
table:
SQL> SELECT * FROM csv.tutorials.contacts; Id Fname Sname Age INTEGER NOT NULL VARCHAR VARCHAR INTEGER _________________________________________________ 0 John Smith 48 1 Anna Clarks 62 2 Rojer Danrette 27 3 Kate Sigton 56 4 Tim Craft 41 5 Rows. -- 15 msec.
Simplified CSV File Attachment Variant I
This example demonstrates how to directly attach external CSV file to Virtuoso that functions like any other internal TABLE
.
Unlike the prior example, you aren't required to create a SQL Table that functions as a placeholder, the entire process is automated:
- Create CSV file named
contacts1.csv
:
$ cat > contacts1.csv Id,Fname,Sname,Age 0,John,Smith,48 1,Anna,Clarks,62 2,Rojer,Danrette,27 3,Kate,Sigton,56 4,Tim,Craft,41
- Place the created
contacts1.csv
file within scope ofDirsAllowed INI
setting. - Call the attach_from_csv procedure from iSQL or Virtuoso Conductor UI ( http://host:port/conductor ).
Note: As the
pkey_columns
parameter value is not specified, by default the virtual table will have no Primary Key:
SQL> attach_from_csv ('csv.tutorials.contacts1', 'file:contacts1.csv', ',', '\n', null, 1); Done. -- 109 msec.
- Check the inserted data in the
csv.tutorials.contacts1
table:
SQL> SELECT * FROM csv.tutorials.contacts1 ; Id Fname Sname Age INTEGER VARCHAR VARCHAR INTEGER ___________________________________________________________________________ 0 John Smith 48 1 Anna Clarks 62 2 Rojer Danrette 27 3 Kate Sigton 56 4 Tim Craft 41 5 Rows. -- 78 msec.
- Check the Primary Key columns of the
csv.tutorials.contacts1
table:
SQL> primarykeys csv.tutorials.contacts1; Showing SQLPrimaryKeys of tables like 'csv.tutorials.contacts1', tabletype/colname like 'NULL' TABLE_QUALIFIER TABLE_OWNER TABLE_NAME COLUMN_NAME KEY_SEQ PK_NAME ROOT_QUALIFIER ROOT_OWNER ROOT_NAME VARCHAR VARCHAR VARCHAR VARCHAR SMALLINT VARCHAR VARCHAR VARCHAR VARCHAR __________________________________________________________________________________________________________________ 0 Rows. -- 47 msec.
Simplified CSV File Attachment Variant II
This example demonstrates how to directly attach external CSV file to Virtuoso by skipping the first 3 rows of the the CSV file:
- Create CSV file named
contacts1.csv
:
$ cat > contacts1.csv Id,Fname,Sname,Age 0,John,Smith,48 1,Anna,Clarks,62 2,Rojer,Danrette,27 3,Kate,Sigton,56 4,Tim,Craft,41
- Place the created
contacts1.csv
file within scope ofDirsAllowed INI
setting. - Call the attach_from_csv procedure from iSQL or Virtuoso Conductor UI ( http://host:port/conductor ):
SQL> attach_from_csv ('csv.tutorials.contacts1', 'file:contacts1.csv', ',', '\n', null, 3); Done. -- 109 msec.
- Check the inserted data in the
csv.tutorials.contacts1
table:
SQL> SELECT * FROM csv.tutorials.contacts1 ; Id Fname Sname Age INTEGER VARCHAR VARCHAR INTEGER ___________________________________________________________________________ 2 Rojer Danrette 27 3 Kate Sigton 56 4 Tim Craft 41 3 Rows. -- 78 msec.
Simplified CSV File Attachment with Compound Key Generation
In this example a CSV File System is not only attached to Virtuoso, but as part of the process a Primary Key is generated using the first and second ordinal columns from the CSV file content:
- Create CSV file named
contacts2.csv
:
$ cat > contacts2.csv Fname,Sname,Age John,Smith,48 Anna,Clarks,62 Rojer,Danrette,27 Kate,Sigton,56 Tim,Craft,41
- Place the created
contacts2.csv
file within scope ofDirsAllowed INI
setting. - Call the attach_from_csv procedure from iSQL or Virtuoso Conductor UI ( http://host:port/conductor ):
SQL> attach_from_csv ('csv.tutorials.contacts2', 'file:contacts2.csv', ',', '\n', null, 1,vector(1,2)); Done. -- 109 msec.
- Check the inserted data in the
csv.tutorials.contacts2
table:
SQL> SELECT * FROM csv.tutorials.contacts2 ; Fname Sname Age VARCHAR VARCHAR INTEGER ___________________________________________________________________________ John Smith 48 Anna Clarks 62 Rojer Danrette 27 Kate Sigton 56 Tim Craft 41 5 Rows. -- 78 msec.
- Check the Primary Key columns of the
csv.tutorials.contacts2
table:
SQL> primarykeys csv.tutorials.contacts2; Showing SQLPrimaryKeys of tables like 'csv.tutorials.contacts2', tabletype/colname like 'NULL' TABLE_QUALIFIER TABLE_OWNER TABLE_NAME COLUMN_NAME KEY_SEQ PK_NAME ROOT_QUALIFIER ROOT_OWNER ROOT_NAME VARCHAR VARCHAR VARCHAR VARCHAR SMALLINT VARCHAR VARCHAR VARCHAR VARCHAR ____________________________________________________________________________________________________________________________________________ DB DBA csv.tutorials.contacts2 Fname 1 csv.tutorials.contacts2 DB DBA csv.tutorials.contacts2 DB DBA csv.tutorials.contacts2 Sname 2 csv.tutorials.contacts2 DB DBA csv.tutorials.contacts2 2 Rows. -- 32 msec.
Simplified CSV File Attachment with Composite Key Generation
In this example a CSV File System is not only attached to Virtuoso, but as part of the process a Primary Key is generated using the second (CompanyID
), third (Fname
) and forth (Sname
) ordinal columns.
In this example the CompanyID
column (which uniquely identifies a company) is combined with the Fname
and Sname
columns to create a primary key for each contact:
- Create CSV file named
contacts3.csv
:
$ cat > contacts3.csv CompanyName,CompanyID,Fname,Sname,Age MLogistic,12,John,Smith,48 ZiAirLines,13,Anna,Clarks,62 MLogistic,12,Rojer,Danrette,27 MLogistic,12,Kate,Sigton,56 ZiAirLines,13,Tim,Craft,41
- Place the created
contacts3.csv
file within scope ofDirsAllowed INI
setting. - Call the attach_from_csv procedure from iSQL or Virtuoso Conductor UI ( http://host:port/conductor ):
SQL> attach_from_csv ('csv.tutorials.contacts3', 'file:contacts3.csv', ',', '\n', null, 1, vector (2,3,4)); Done. -- 109 msec.
- Check the inserted data in the
csv.tutorials.contacts3
table:
SQL> SELECT * FROM csv.tutorials.contacts3 ; CompanyName CompanyID Fname Sname Age VARCHAR INTEGER VARCHAR VARCHAR INTEGER ___________________________________________________________ MLogistic 12 John Smith 48 ZiAirLines 13 Anna Clarks 62 MLogistic 12 Rojer Danrette 27 MLogistic 12 Kate Sigton 56 ZiAirLines 13 Tim Craft 41
- Check the Primary Key columns of the
csv.tutorials.contacts3
table:
SQL> primarykeys csv.tutorials.contacts3; Showing SQLPrimaryKeys of tables like 'csv.tutorials.contacts3', tabletype/colname like 'NULL' TABLE_QUALIFIER TABLE_OWNER TABLE_NAME COLUMN_NAME KEY_SEQ PK_NAME ROOT_QUALIFIER ROOT_OWNER ROOT_NAME VARCHAR VARCHAR VARCHAR VARCHAR SMALLINT VARCHAR VARCHAR VARCHAR VARCHAR ___________________________________________________________________________________________________________________________________ DB DBA csv.tutorials.contacts3 CompanyID 1 csv.tutorials.contacts3 DB DBA csv.tutorials.contacts3 DB DBA csv.tutorials.contacts3 Fname 2 csv.tutorials.contacts3 DB DBA csv.tutorials.contacts3 DB DBA csv.tutorials.contacts3 Sname 3 csv.tutorials.contacts3 DB DBA csv.tutorials.contacts3 3 Rows. -- 47 msec.
Related
- Virtuoso Tips and Tricks Collection
- Perform Bulk Loading of CSV Files
- Virtuoso File Tables Documentation
- Virtuoso CSV File Bulk Loader
- attach_from_csv
- csv_load_file
- csv_parse
- csv_cols_def
- csv_table_def
- ft_set_file
- file_to_string
- RFC 4180 (Informational): Common Format and MIME Type for Comma-Separated Values (CSV) Files