Deploying Linked Data Guide - Part 2: Deploying Linked Data Using Virtuoso

Deploying Linked Data - Overall TOC

Section Contents

Deploying Linked Data using Virtuoso

The preceding sections described a generic approach to deploying Linked Data into the existing Web. We now turn our attention to Virtuoso, to describe its solution for Linked Data deployment. In fact, Virtuoso's solution is to implement the generic approach outlined in the prior sections, using the twin pillars of content negotiation and URL rewriting.

The Virtuoso Rule-Based URL Rewriter

Virtuoso provides a URL rewriter that can be enabled for URLs matching specified patterns. Coupled with customizable HTTP response headers and response codes, Linked Data Web server administrators can configure highly flexible rules for driving content negotiation and URL rewriting. The key elements of the URL rewriter are:

  • Rewrite rule
    • Each rule describes how to parse a single source URL, and how to compose the URL of the page ultimately returned in the " Location: " response headers
    • Every rewrite rule is uniquely identified internally (using IRIs).
    • Two types of rule are supported, based on the syntax used to describe the source URL pattern matching - sprintf-based and regex-based.
  • Rewrite rule list
    • A named ordered-list of rewrite rules or rule lists where rules of the list are processed from top to bottom or in line with processing pipeline precedence instructions
  • Configuration API
    • Defines functions for creating, dropping, and enumerating rules and rule lists.
  • Virtual hosts and virtual paths
    • URL rewriting is enabled by associating a rewrite rules list with a virtual directory

Each of these elements is described in more detail below, although complete descriptions of the features or functions in question are not given. The intention here is to provide an overview of Virtuoso's URL rewriting capabilities and their application to deploying Linked Data. Please refer to the Virtuoso Reference Documentation for full details.

Conductor UI for the URL Rewriter

Virtuoso is a full-blown HTTP server in its own right. The HTTP server functionality co-exists with the product core (i.e. DBMS Engine, Web Services Platform, WebDAV filesystem, and other components of the Universal Server). As a result, it has the ability to multi-home Web domains within a single instance across a variety of domain name and port combinations. In addition, it also enables the creation of multiple virtual directories per domain.

In addition to the basic functionality describe above, Virtuoso lets you associate URL rewrite rules with the virtual directories associated with a particular hosted Web domain.

In all cases, Virtuoso enables you to configure virtual domains, virtual directories and URL rewrite rules for one or more virtual directories, via the (X)HTML-based Conductor Admin User Interface or a collection of Virtuoso Stored Procedure Language (PL)-based APIs.

Virtual Domains (Hosts) & Directories

A Virtuoso virtual directory maps a logical path to a physical directory in your file system or WebDAV repository. This mechanism allows physical locations to be hidden or simply reorganised. Virtual directory definitions are held in the system table DB.DBA.HTTP_PATH. Virtual directories can be administered in three basic ways:
  • Using the Visual Administration Interface via a Web browser;
  • Using the functions vhost_define() and vhost_remove(); and
  • Using SQL statements to directly update the HTTP_PATH system table.

"Nice" URLs vs. "Long" URLs

Although we are approaching the URL Rewriter from the perspective of deploying Linked Data, the rewriter was developed with additional objectives in mind. These in turn have influenced the naming of some of the formal argument names in the Configuration API function prototypes. In the following sections, "long" URLs are those containing a query string with named parameters; "nice" (also known as "source") URLs have data encoded in some other format. The primary goal of the Rewriter was to accept a nice URL from an application and convert this into a long URL, which then identifies the page that should actually be retrieved.

Rule Processing Mechanics

When an HTTP request is accepted by the Virtuoso HTTP server, the received nice URL is passed to an internal path translation function. This function takes the nice URL and, if the current virtual directory has a url_rewrite option set to an existing rule list name, tries to match the corresponding rule lists and rules; that is, the function performs a recursive traversal of any rule list associated with the virtual directory. For every rule in the rule list, the same logic is applied (only the logic for regex-based rules is described; that for sprintf-based rules is very similar):

  • The input for the rule is the resource URL as received from the HTTP header, i.e., the portion of the URL from the first '/' after the host:port fields to the end of the URL.
  • The input is normalized .
  • The input is matched against the rule's regex. If the match fails, the rule is not applied and the next rule is tried. If the match succeeds, the result is a vector of values.
  • If the URL contains a query string, the names and values of the parameters are decoded by split_and_decode().
  • The names and values of any parameters in the request body are also decoded.
  • The destination URL is composed.
    • The value of each parameter in the destination URL is taken from (in order of priority):
    • the value of a parameter in the match result;
    • the value of a named parameter in the query string of the input nice URL;
    • if the original request was submitted by the POST method, the value of a named parameter in the body of the POST request; or
  • if a parameter value cannot be derived from one of these sources, the rule is not applied and the next rule is tried.
The path translation function described above is internal to the Web server, so its signature is not appropriate for Virtuoso/PL calls and thus is not published. Virtuoso/PL developers can harness the same functionality using the DB.DBA.URLREWRITE_APPLY API call.

Enabling URL Rewriting via the Virtuoso Conductor UI

The URL rewriting examples which follow are taken from the Virtuoso Northwind demonstration database, which is included in the Demo VAD (Virtuoso Application Distribution) archive.

To check which version of the Demo VAD is installed, or to upgrade it, refer to the Conductor's 'VAD Packages' screen, reachable through the 'System Admin' > 'Packages' menu items.

The latest VADs for the closed source releases of Virtuoso can be downloaded from the downloads area of the OpenLink website. Select either the 'DBMS (WebDAV) Hosted' or 'File System Hosted' product format from the 'Distributed Collaborative Applications' section, depending on whether you want the Virtuoso application to be run from WebDAV or native filesystem storage. VADs for Virtuoso Open Source edition (VOS) are available for download from the VOS Wiki.

Northwind Demonstration Database

The Virtuoso Northwind database (contained in the "Demo" catalog) is very similar to the Northwind example database available for SQL Server. Its schema comprises commonly understood SQL tables that include: Customers, Orders, Employees, Products, Product Categories, Shippers, Countries, Provinces etc.

Northwind is installed with a preconfigured Linked Data View and a set of preconfigured URL rewrite rules that collectively expose RDF based entity graphs and URLs of (X)HTML web pages that describe the back-end relational data.

An Linked Data View over relational data is a named collection (graph) of RDF records (triples) derived from an RDBMS-to-RDF source data map exposed via a Virtuoso Quad Store. The process of declaring Linked Data Views over RDBMS data using the Virtuoso Meta-schema Language is described in detail in our Linked Data Views of SQL white paper.

To view the Northwind entity graph in RDF format, starting with the entity "ALFKI", simply place the following document URL into the OpenLink Data Explorer :

Alternatively, you can view an (X)HTML based description of the entity "ALFKI" by pointing your Web browser to the same URL. (The details of these URLs will be explained shortly; for now they are presented purely as pointers to illustrate example data available from Northwind.)

Configuring Rewrite Rules using Conductor

The steps for configuring URL Rewrite rules via the Virtuoso Conductor are as follows:

  1. Click to the "Web Application Server" > "Virtual Domains & Directories" tabs.
    figure 2
    Conductor's Hosted Domains and Virtual Directories screen
  2. Pick the domain that contains the virtual directories to which the rules are to be applied (in this case the default was taken).
    figure 3
    Accessing the URL rewrite rules for the Northwind demo database
  3. Click on the "URL-rewrite" link to create, delete, or edit a rule as shown below.
  4. Create a rule for HTML based representations of resource description requests.
    figure 4
    Northwind URL rewrite rule for HTML requests
  5. Create a rule for N3 or RDF/XML based representations or resource descriptions.
    figure 5
    Northwind URL rewrite rule for RDF requests
  6. Save your rules, exit the Conductor, and test your rules with " cURL " or any other HTTP-based user agent.

Dissection of Northwind Rewrite Rules Configured using Conductor

The screenshots above show the default Northwind rewrite rules. Let's analyze what they are doing.

Regex Rule for RDF Requests

The regex rule for handling RDF/XML or N3 representation requests specifies a 'Request Path Pattern' of (/[^#]*). Recall that the input path is the portion of the input URL from the first '/' after the host:port fields to the end of the normalized URL. So, given a request for, the request path pattern would match /Northwind/Customer/ALFKI. Parentheses in the pattern collect the results of the pattern matching into parameters. Each successive pair of parentheses denotes a parameter, referred to elsewhere in the rewrite rule as $U1, $U2, $U3, ... , or $s1, $s2, $s3, ... , etc. These parameters can then be used to substitute a part of the input path that was matched into the new URL being composed. The parameter markers $U1 and $s1 (likewise $U2 and $s2 etc.) identify the same pattern segment in the request path pattern. The only difference between them is how the matched text is encoded when it is inserted into the new URL. The 's' format specifier inserts the matched text as is, whereas the 'U' format specifier causes the inserted text to be URL encoded.

Content types specified in the request's Accept header and matched by the 'Accept Header Request Pattern' are available for substitution into the rewritten URL through the $accept variable.

Rather than hardcoding host names and ports, the rules are made more generic by using the convenience macro URIQADefaultHost. Every occurrence of ^{URIQADefaultHost}^ will be substituted with the value of the DefaultHost parameter defined in the URIQA section of the Virtuoso configuration file, virtuoso.ini. "DefaultHost" is the "canonical" server name that is used to identify the service. It should be either a server host name including domain (i.e. an FQDN), or an IP address in standard notation. If Virtuoso's default HTTP port is not equal to 80 then the port should also be included, e.g. "".

Constructing the Destination Path Format

The parameter markers, variables and macros just described provide the building blocks for constructing the 'Destination Path Format' which serves as a template for the rewritten URL. It must be stressed that it is not necessary to URL-encode the Destination Path Format by hand. You need only write the underlying CONSTRUCT or DESCRIBE SPARQL query. When defining a new Destination Path Format, click on the SPARQL button to enable a text box (shown below) into which you can enter the base SPARQL query which will describe the entity being dereferenced. On clicking the 'Format' button to return, the SPARQL query will be expanded into a full query string, including a result-set format-specifier, and URL-encoded automatically. For example, the base query:

DESCRIBE <http://^{URIQADefaultHost}^$U1#this> <http://^{URIQADefaultHost}^$U1> FROM <http://^{URIQADefaultHost}^/Northwind>



The pre-configured DESCRIBE query for Northwind describes two entities:

http://^{URIQADefaultHost}^/Northwind/Customer/ALFKI identifies a document (an entity of type foaf:Document) that has the entity http://^{URIQADefaultHost}^/Northwind/Customer/ALFKI#this as its foaf:PrimaryTopic property value. This relationship is the key to using the description of the document (a report) about "ALFKI" to expose the deeper entity graph that describes the entity "ALFKI#this".

figure 6
Defining the SPARQL query for the Northwind RDF requests

Data Flow in Conductor-Defined Northwind RDF Regex Rule

The process of rewriting a request for an RDF representation of Northwind customer ALFKI, through the corresponding regex rule, is depicted below as a data flow diagram. The arcs connecting similarly-colored items attempt to illustrate how portions of the input request are matched and substituted into the rewritten request.

figure 7
Breakdown of the URL rewriting process for Northwind RDF requests

Regex Rule for HTML Requests

The Northwind regex rule for HTML requests functions in a similar way to the regex rule for RDF requests. That is, the mechanisms for pattern matching and parameter substitution are the same. The only differences are the content types matched and the target URL.

In this case, the destination path format is: /about/html/http://^{URIQADefaultHost}^$s1

Here, the path /about/html/ redirects the client to the Virtuoso Sponger proxy interface. The Sponger itself is a highly customizable RDFizer. Virtuoso reserves two paths for the proxy service, '/about/rdf/' and '/about/html/'. (Note: These proxy paths have since been augmented to support a richer slash URI scheme for identifying format variants. Please refer to Appendix B for more details.) The web service takes the target URL following the proxy path and either returns the content "as is" or tries to transform it to RDF. The RDF graph derived from the sponging process is then rendered in one of the RDF serialization formats (RDF/XML or N3) or HTML depending on whether the request specified /about/rdf/ or /about/html/. Thus, the proxy service can be used as middleware for enabling RDF based exploration of non-RDF sources using dedicated RDF browsers or standard (X)HTML browsers.

The mechanism through which Virtuoso composes an HTML rendering of RDF data (whether this be a native RDF description, or one extracted by the Sponger) is via the "description.vsp" rendering template, a specialized Virtuoso Server Page specifically aimed at RDF-model-based resource description. The "description.vsp" template is described in more detail in Appendix A. A usage example covering the description of the entity <> is shown below.

description.vsp HTML rendering of Customer entity ALFKI
description.vsp HTML rendering of Customer entity ALFKI

Enabling URL Rewriting via Virtuoso PL

While the Conductor UI provides the easiest way to set up URL rewriting, on occasion it may be preferable to configure URL rewriting programmatically using Virtuoso PL.

Exporting Rewrite Rules from Conductor

The Conductor lets you export configured rules as Virtuoso PL, making it easier to use them on another system, for instance. The exported script recreates the rewrite rules using Virtuoso's URL Rewriting Configuration API.

Conductor's export button for exporting rewrite rules
Conductor's 'Export' button for exporting URL rewrite rules

The code listing below shows the exported Northwind rules. Describing the Configuration API and this exported rules file forms the focus of this section.

opts=>vector ('url_rewrite', 'demo_nw_rule_list1'),
'demo_nw_rule_list1', 1,
vector ('demo_nw_rule1', 'demo_nw_rule2'));
'demo_nw_rule1', 1,
vector ('path'),
vector ('path'),
0, 303, NULL );
'demo_nw_rule2', 1,
vector ('path'),
vector ('path', 'path', '*accept*'),
0, NULL, NULL );

Exporting Rewrite Rules from a Script

Use the function DB.DBA.URLREWRITE_DUMP_RULELIST_SQL to export rule lists programmatically. e.g. From isql, you can generate the listing shown above by executing:

Defining Virtual Hosts in Virtuoso PL

As can be seen above, the vhost_define() API call is used to define virtual hosts and virtual paths hosted by the Virtuoso HTTP server. URL rewriting is enabled through this function's opts parameter. opts is of type ANY, e.g. a vector of field-value pairs. Numerous fields are recognized for controlling different options. The field value url_rewrite controls URL rewriting. The corresponding field value is the IRI of a rule list to apply.

URL Rewriting Configuration API

Virtuoso includes the following functions for managing URL rewrite rules and rule lists. The names are self-explanatory.

  • DB.DBA.URLREWRITE_DROP_RULE - Deletes a rewrite rule.
  • DB.DBA.URLREWRITE_CREATE_SPRINTF_RULE - Creates a rewrite rule which uses sprintf-based pattern matching.
  • DB.DBA.URLREWRITE_CREATE_REGEX_RULE - Creates a rewrite rule which uses regular expression (regex)-based pattern matching.
  • DB.DBA.URLREWRITE_DROP_RULELIST - Deletes a rewrite rule list.
  • DB.DBA.URLREWRITE_CREATE_RULELIST - Creates a rewrite rule list.
  • DB.DBA.URLREWRITE_ENUMERATE_RULES - Lists all the rules whose IRIs match the specified 'SQL like' pattern.
  • DB.DBA.URLREWRITE_ENUMERATE_RULELISTS - Lists all the rule lists whose IRIs match the specified 'SQL like' pattern.

Creating Rewrite Rules

Rewrite rules take two forms: sprintf-based or regex-based. When used for nice URL to long URL conversion, the only difference between them is the syntax of format strings. The reverse long to nice conversion works only for sprintf-based rules, whereas regex-based rules are unidirectional. For the purpose of describing how to make dereferenceable URIs for Linked Data, we will focus on regex-based rules.

Regex rules are created using the URLREWRITE_CREATE_REGEX_RULE() function.


Function Prototype:

 target_expn := null,
 accept_pattern := null,
 do_not_continue := 0,
 http_redirect_code := null,
 http_headers := null


rule_iri : VARCHAR
  • The rule's name / identifier
allow_update : INTEGER
  • Indicates whether the rule can be updated. 1 indicates yes; 0 indicates no. The update is subject to the following rules:
    • If the given rule_iri is already in use as a rule list identifier, an error is signalled.
    • If the given rule_iri is already in use as a rule identifier and allow_update for the existing rule is zero, an error is signalled.
    • If the given rule_iri is already in use as a rule identifier and allow_update for the existing rule is non-zero, the existing rule is updated.
nice_match : VARCHAR
  • A regex match expression to parse the URL into a vector of occurrences.
nice_params : ANY
  • A vector of the names of the parsed parameters. The length of the vector should be equal to the number of '(...)' specifiers in the format string.
nice_min_params : INTEGER
  • Used to specify the minimum number of sprintf format patterns to be matched in order to trigger the given rule. It only affects sprintf rules and has no effect for regex rules.
target_compose : VARCHAR
  • A regex compose expression for the URL of the destination page.
target_params : ANY
  • A vector of names of parameters that should be passed to the compose expression (target_compose) as $1, $2 and so on.
target_expn : VARCHAR
  • Optional SQL text that should be executed instead of a regex compose call.
accept_pattern : VARCHAR
  • A regex expression to match the HTTP Accept header
do_not_continue : INTEGER
  • If the given rule satisfies the match conditions, 1 signifies do not try the next rule from same rule list, and 0 signifies try the next rule.
http_redirect_code : INTEGER
  • NULL or the integer values 301, 302, 303, or 406, are currently allowed. If a 3xx redirect code is given, an HTTP redirect response will be sent back to client. If NULL is specified, the server will process the redirect internally.
http_headers : VARCHAR
  • HTTP headers to supply with the rewritten request.

Dissection of Northwind Rewrite Rules Configured using Virtuoso PL

Having briefly outlined the URL Rewriting API, we return now to the Northwind rule configuration script listed earlier.

At the start of the script, we define a virtual directory in order to turn on URL rewriting through vhost_define(). We first remove any existing definition for logical path /Northwind on the virtual host defined by vhost, before redefining the logical path. vhost specifies the host name sent to a user-agent in an HTTP response. This must be a valid fully-qualified host name or alias and port separated by ':'. This parameter accepts the special value '*ini*' which will be replaced with the hostname and port configured in the virtuoso.ini file.

The /Northwind virtual directory is mapped to a DAV folder (indicated by is_dav being non-zero) whose physical path is /DAV/home/demo. The machine hosting the virtual directory listens on the IP address and port specified by lhost (i.e. listen host). Like vhost, this accepts the special value '*ini*'. Any VSP pages contained in the virtual directory will run as user 'dba'.

URL rewriting is enabled through the url_rewrite field in the opts vector; the URL rewriter will use the rule list named demo_nw_rule_list1. The latter is defined by the URLREWRITE_CREATE_RULELIST function call which follows. The rule list contains two regex-based rules, demo_nw_rule1 and demo_nw_rule2, each defined by calls to function URLREWRITE_CREATE_REGEX_RULE.

Consider first rule demo_nw_rule2. In this rule, the regular expression '(/[^#]*)' specified for nice_match matches the input IRI up to fragment delimiter (#). The corresponding occurrence is named 'path' in the nice_params vector. The client must be requesting the return data as RDF serialized as N3 or RDF/XML in order for the rule to apply.

Argument target_compose specifies a URL-encoded template for the rewritten destination URL. Spaces are encoded as '+' or '%20', the reserved character '#' is percent-encoded as '%23' and the '%' character itself is escaped by '%'.

Removing the URL encoding and the final format specifier ('&format=%U'), the SPARQL DESCRIBE query being built takes the form: DESCRIBE <http://^{URIQADefaultHost}^%U#this> <http://^{URIQADefaultHost}^%U> FROM <http://^{URIQADefaultHost}^/Northwind>

Unsurprisingly this is almost identical to the SPARQL query displayed by Conductor, when the same rewrite rules are viewed through the Conductor UI. The only difference lies in the slightly different syntax used for parameter markers ( %U or %s , as opposed to $U1, $U2, ... or $s1, $s2, ... in Conductor). Here, the two sprintf-like format characters %U are placeholders which receive the first two entries in the target_params vector, i.e. the value of 'path'. In our example, the value of 'path' would be '/Northwind/Customer/ALFKI'.

The query response format is controlled by the format query parameter. In the format specifier ('&format=%U') at the end of the constructed query string, the third placeholder '%U' receives the value of the third entry in the target_params vector, '*accept*'. The '*accept*' parameter is used to pass the part of Accept header matched against accept_pattern, e.g. if the Accept header specified MIME types of 'application/rdf+xml, application/xml' and the accept_pattern is '(text/rdf.n3)|(application/rdf.xml)', then the '*accept*' parameter will have the value of 'application/rdf+xml'.

The other rule, demo_nw_rule1, is essentially similar, but targeted at HTML browsers rather than RDF browsers. Rather than the internal redirect used by demo_nw_rule2, this rule returns HTTP redirect code 303 to the client when the rewrite rule is applied.

Internal Rewrites vs External Redirects

External redirect : Tells the client to ask for the requested content again using a new URL and HTTP request. An external redirect is indicated by one of the HTTP response codes:
301 - Moved permanently (for permanent redirection)
- Found (the most common way of performing a redirection)
- See Other (the correct manner in which to redirect web applications to a new URI)

Internal rewrite/redirect : Gets the content for the requested URL from a different server file path than implied by the requested URL.

As described earlier when examining the Conductor-configured rules, HTML requests are redirected to description.vsp via the Sponger proxy interface.

System Tables Supporting URL Rewriting

If you need to check your rewrite rule definitions, an alternative to inspecting them using Conductor is to query Virtuoso's system tables directly. The relevant system tables for URL rewriting are DB.DBA.URL_REWRITE_RULE_LIST and DB.DBA.URL_REWRITE_RULE. For example, the configured rule lists can be seen by executing 'SELECT URRL_LIST FROM DB.DBA.URL_REWRITE_RULE_LIST'

Data Flow in Virtuoso/PL-Defined Northwind RDF Regex Rule

Earlier we presented a data flow diagram showing the process of rewriting a request for an RDF representation of Northwind customer ALFKI, through a regex rule defined in the Conductor. Below is a similar diagram, depicting the same request rewrite, this time using the Virtuoso PL definition of the same rule. As before, the arcs connecting similarly coloured items illustrate how portions of the input request are matched and substituted into the rewritten request.

Breakdown of the URL rewriting process for Northwind RDF requests
Breakdown of the URL rewriting process for Northwind RDF requests

Northwind URL Rewriting Verification Using cURL

As illustrated earlier, the curl utility provides a useful tool for verifying HTTP server responses and rewrite rules. The first two curl exchanges below show the default Northwind URL rewrite rules being applied.

Example 1:
$ curl -I -H "Accept: text/html"
HTTP/1.1 303 See Other
Server: Virtuoso/05.09.3037 (Solaris) x86_64-sun-solaris2.10-64 VDB
Connection: close
Content-Type: text/html; charset=ISO-8859-1
Date: Fri, 06 Feb 2009 11:11:01 GMT
Accept-Ranges: bytes
Content-Length: 0

Example 2:

$ curl -I -H "Accept: application/rdf+xml"
HTTP/1.1 200 OK
Server: Virtuoso/05.09.3037 (Solaris) x86_64-sun-solaris2.10-64 VDB
Connection: Keep-Alive
Date: Fri, 06 Feb 2009 11:14:49 GMT
Accept-Ranges: bytes
Content-Type: application/rdf+xml; charset=UTF-8
Content-Length: 9488

Example 3:

$ curl -I -H "Accept: application/rdf+xml"
HTTP/1.1 303 See Other
Server: Virtuoso/05.09.3037 (Solaris) x86_64-sun-solaris2.10-64 VDB
Connection: close
Content-Type: text/html; charset=ISO-8859-1
Date: Thu, 12 Feb 2009 11:23:31 GMT
Accept-Ranges: bytes
Content-Length: 0

The third example shows the response generated when the default rule for RDF requests is changed to return an HTTP response code of 303, rather than use an internal redirect. Making this temporary change allows the generated SPARQL query to be viewed and checked with curl.

Back to Deploying Linked Data Guide | Previous: Introduction | Next: Browsing & Exploring the Northwind Linked Data View