This HTML5 document contains 37 embedded RDF statements represented using HTML+Microdata notation.

The embedded RDF content will be recognized by any processor of HTML5 Microdata.

PrefixNamespace IRI
dctermshttp://purl.org/dc/terms/
atomhttp://atomowl.org/ontologies/atomrdf#
n18http://test.example.
foafhttp://xmlns.com/foaf/0.1/
n7http://vos.openlinksw.com/dataspace/services/wiki/
oplhttp://www.openlinksw.com/schema/attribution#
n2http://vos.openlinksw.com/dataspace/owiki/wiki/VOS/
dchttp://purl.org/dc/elements/1.1/
n9http://vos.openlinksw.com:80/wiki/main/VOS/VirtRdfReplScenarios/topo-scenario.
n13http://vos.openlinksw.com/dataspace/dav#
rdfshttp://www.w3.org/2000/01/rdf-schema#
n8http://rdfs.org/sioc/services#
virtrdfhttp://www.openlinksw.com/schemas/virtrdf#
siocthttp://rdfs.org/sioc/types#
n11http://vos.openlinksw.com/dataspace/person/dav#
n5http://vos.openlinksw.com/dataspace/owiki/wiki/
n23http://docs.openlinksw.com/virtuoso/functions.html#
n20http://docs.openlinksw.com/virtuoso/fn_rdf_repl_graph_ins/
n22http://vos.openlinksw.com/dataspace/owiki/wiki/VOS/VirtRdfReplScenarios/sioc.
rdfhttp://www.w3.org/1999/02/22-rdf-syntax-ns#
n17http://vos.openlinksw.com/dataspace/owiki#
xsdhhttp://www.w3.org/2001/XMLSchema#
n16http://vos.openlinksw.com/dataspace/person/owiki#
siochttp://rdfs.org/sioc/ns#
Subject Item
n11:this
foaf:made
n2:VirtRdfReplScenarios
Subject Item
n13:this
sioc:creator_of
n2:VirtRdfReplScenarios
Subject Item
n7:item
n8:services_of
n2:VirtRdfReplScenarios
Subject Item
n17:this
sioc:creator_of
n2:VirtRdfReplScenarios
Subject Item
n5:VOS
sioc:container_of
n2:VirtRdfReplScenarios
atom:entry
n2:VirtRdfReplScenarios
atom:contains
n2:VirtRdfReplScenarios
Subject Item
n2:VOSIndex
sioc:links_to
n2:VirtRdfReplScenarios
Subject Item
n2:VirtRdfReplScenarios
rdf:type
sioct:Comment atom:Entry
dcterms:created
2017-06-13T05:49:29.654270
dcterms:modified
2020-01-19T14:39:44.581198
rdfs:label
VirtRdfReplScenarios
foaf:maker
n11:this n16:this
dc:title
VirtRdfReplScenarios
opl:isDescribedUsing
n22:rdf
sioc:has_creator
n13:this n17:this
sioc:attachment
n9:png
sioc:content
---+ Virtuoso RDF Replication scenarios %TOC% ---++ Introduction <table class="image" align="center" border="0"> <caption align="bottom">Network Topology</caption> <tr><td> <img src="%ATTACHURLPATH%/topo-scenario.png" style="wikiautogen"/ alt="[Graphical representation of the Network]"> </td></tr> </table> In this document we will examine a proposed setup for a back-end server called MASTER which publishes a number of graphs to a set of front-end machines called FARM-1 .. FARM-n and discuss a couple of common scenarios like adding an extra machine to the farm, or replacing a broken instance of MASTER. In this example we will assume each virtuoso instance running on its own machine, so they can use the same port numbers for both the main server (default 1111) as well as the http port (default 8890) as each machine has an unique IP addresses. In the example we use <tt>MASTER-IP</tt> and <tt>FARM-x-IP</tt> which should be replaced by either the real IP address or the DNS name of the machine in question. Since there will be a reverse-proxy service in front of the farm, all virtuoso instances should have the URIQA Default host set to the outside name for this service. In this example we will use http://test.example.com as the web service we are trying to setup. ---++ Setup ---+++ Installing Virtuoso All machines in this setup should be installed with similar installation paths like: * <tt>/opt/virtuoso</tt> * <tt>/dbs/virtuoso</tt> * <tt>/virtuoso</tt> * ... The partition should be big enough to have room for the Virtuoso binaries and libraries, the transaction logs, backups and, if you do not want to use the striping feature of Virtuoso, it will need to have room for the main database files as well. Here are the quick installation steps 1. Login as root 1. Create local user called virtuoso using the chosen installation path as home direcotory 1. Login as virtuoso 1. Extract <tt>virtuoso-universal-server-6.1.tar</tt> in home directory 1. Run <code>sh install.sh</code> to install Virtuoso 1. Remove the file install.sh virtuoso-universal-server-6.1.tar virtuoso-server.taz if not otherwise needed 1. Run <code>bin/virtuoso-stop.sh</code> to shutdown this Virtuoso instance 1. Install virtuoso.lic for this system in $HOME/bin directory As the replication process needs to make an ODBC connection to the MASTER machine, all machines should have the following information in the <tt>$HOME/bin/odbc.ini</tt>: <verbatim> [ODBC Data Sources] .. MASTER_DSN = OpenLink Virtuoso .. [MASTER_DSN] Driver = OpenLink Virtuoso Address = MASTER_IP:1111 </verbatim> ---+++ Setting up MASTER The MASTER machine is the back-end server machine. Various applications feed SPARQL data into this machine it publishes a set of graphs using RDF Replication. The MASTER machine should ideally be equipped with multiple redundant disks in *RAID-1* or *RAID-6* mode to minimize the risk that a single bad disk takes down the system. From a Virtuoso point of view we will use a combination of online backups combined with checkpoint audit trail to backup the content of the database in a safe way. The online backups, the checkpoint audit trail as well as the replication logs can also be copied to secondary storage using the <tt>rsync</tt> command and can be easily scripted as a cron job. Changes to database/virtuoso.ini <verbatim> ... [Parameters] SchedulerInterval = 1 ; run the internal scheduler every minute CheckpointAuditTrail = 1 ; enable audit trail on transaction logs CheckpointInterval = 60 ; perform an automated checkpoint every 60 minutes ... [URIQA] DefaultHost = test.example.com ... [Replication] ServerName = MASTER ServerEnable = 1 QueueMax = 5000000 ... </verbatim> Once the MASTER is started using the <tt>bin/virtuoso-start.sh</tt> script we must enable RDF replication before we start add data to the graphs we wish to replicate, so every record is accounted for by the replication process. If there is existing data in the graphs to be published, then this data would need to be added to a subscriber manually since the replication process creates a delta set of changes since publishing was enabled. To enable publishing of the graph we use the <tt>isql</tt> program to connect to the MASTER instance: <verbatim> $ isql MASTER-IP:1111 -- and run the following commands: -- enable this instance as a publisher rdf_repl_start(); -- add graphs to replication list rdf_repl_graph_ins('http://test.example.com'); </verbatim> *Note*: You can pass a comma delimited list of graphs to published to the [[http://docs.openlinksw.com/virtuoso/fn_rdf_repl_graph_ins/][rdf_repl_graph_ins()]] function or you can specify the special http://www.openlinksw.com/schemas/virtrdf#rdf_repl_all graph IRI name to publish *ALL* graphs for replication to subscribers as detailed in the [[http://docs.openlinksw.com/virtuoso/fn_rdf_repl_graph_ins/][rdf_repl_graph_ins()]] function reference guide. Next we create a backup directory inside the database directory and setup the online backup, again using the <tt>isql</tt> program: <verbatim> $ cd database $ mkdir backup $ isql MASTER_IP:1111 -- and run the following commands: -- clear any previous context backup_context_clear(); -- start the backup backup_online ('bkup-#', 1000000, 0, vector ('backup')); </verbatim> The following files can now be backed up using rsync or similar tool to another machine: |*Files*|*Description*| |database/backup/*.bp|the incremental backup files| |database/virtuoso.trx|the main transaction log containing the most recent updates to the database that have not been checkpointed into the database| |database/virtuosoTIMESTAMP.trx|all the previous transaction logs which can be used to reconstruct the database| |database/<verbatim>__</verbatim>rdf_repl*.log|all the replication logs containing the changes to the published graph| NOTE: Since the database is constantly modified during operation, it is of NO use to backup the virtuoso.db using an rsync script unless the virtuoso instance was shutdown beforehand, or certain extra precautions are taken which we will explain later on. ---+++ Setup SPARE master The SPARE machine is a replica of the MASTER machine. This machine subscribes to the publication of the MASTER to keep an exact match of the RDF graphs, but also publishes this data without any initial subscribers. The SPARE machine should ideally be equipped similar to the MASTER machine, with multiple redundant disks in *RAID-1* or *RAID-6* mode to minimize the risk that a single bad disk takes down the system. From a Virtuoso point of view we will use a combination of online backups combined with checkpoint audit trail to backup the content of the database in a safe way. The online backups, the checkpoint audit trail as well as the replication logs can also be copied to secondary storage using the <tt>rsync</tt> command and can be easily scripted as a cron job. Changes to database/virtuoso.ini <verbatim> ... [Parameters] SchedulerInterval = 1 ; run the internal scheduler every minute CheckpointAuditTrail = 1 ; enable audit trail on transaction logs CheckpointInterval = 60 ; perform an automated checkpoint every 60 minutes ... [URIQA] DefaultHost = test.example.com ... [Replication] ServerName = SPARE ServerEnable = 1 QueueMax = 5000000 ... </verbatim> We must enable RDF replication before we start add data to the graphs we wish to replicate, so every record is accounted for by the replication process. If there is existing data in the graphs to be published, then this data would need to be added to a subscriber manually since the replication process creates a delta set of changes since publishing was enabled. To enable publishing of the graph, as well as subscribing to the MASTER, we first start up this Virtuoso instance with <tt>bin/virtuoso-start.sh</tt> and then use the <tt>isql</tt> program to connect to the SPARE instance: <verbatim> $ bin/virtuoso-start.sh $ isql SPARE-IP:1111 -- and run the following commands: -- enable this instance as a publisher rdf_repl_start(); -- add graphs to replication list rdf_repl_graph_ins('http://test.example.com'); -- connect to master repl_server ('MASTER', 'MASTER_DSN'); -- start subscribing to __rdf_repl repl_subscribe ('MASTER', '__rdf_repl', 'dav', 'dav', 'dba', 'dba'); -- start initial replication repl_sync_all (); -- add subscription to scheduler DB.DBA.SUB_SCHEDULE ('MASTER', '__rdf_repl', 1); </verbatim> Next we create a backup directory inside the database directory and setup the online backup, again using the <tt>isql</tt> program: <verbatim> $ cd database $ mkdir backup $ isql SPARE_IP:1111 -- and run the following commands: -- clear any previous context backup_context_clear(); -- start the backup backup_online ('bkup-#', 1000000, 0, vector ('backup')); </verbatim> The following files can now be backed up using rsync or similar tool to another machine: |*Files*|*Description*| |database/backup/*.bp|the incremental backup files| |database/virtuoso.trx|the main transaction log containing the most recent updates to the database that have not been checkpointed into the database| |database/virtuosoTIMESTAMP.trx|all the previous transaction logs which can be used to reconstruct the database| |database/<verbatim>__</verbatim>rdf_repl*.log|all the replication logs containing the changes to the published graph| Note: Since the database is constantly modified during operation, it is of NO use to backup the virtuoso.db using an rsync script unless the virtuoso instance was shutdown beforehand, or certain extra precautions are taken which we will explain later on. ---+++ Setup FARM-1 The FARM-1 machine is the first front-end server machine. It subscribes to the publication of the MASTER instance to keep up-to-date. The FARM-1 machine can be run on simpler hardware than the MASTER instance.It does not require the same level of redundancy in terms of hard disks etc, as there are a number of these machines running in parallel each capable of returning results to the proxy. If one FARM machine dies, it can simply be taken from the reverse-proxy list, repaired or replaced with a fresh machine before it is added to the list of servers in the reverse proxy. As such it does not need to be backed up separately, although we could make a backup of this installation to quickly install the rest of the identical FARM boxes. Change the database/virtuoso.ini file: <verbatim> ... [Parameters] SchedulerInterval = 1 ; run the internal scheduler every minute CheckpointAuditTrail = 0 ; disable audit trail on transaction logs CheckpointInterval = 60 ; perform an automated checkpoint every 60 minutes ... [URIQA] DefaultHost = test.example.com ... [Replication] ServerName = FARM-1 ; each FARM machine needs to have a unique replication name ServerEnable = 1 QueueMax = 5000000 ... </verbatim> Next we start up the Virtuoso instance using the <tt>bin/virtuoso-start.sh</tt> command and use the <tt>isql</tt> program to subscribe to the MASTER: <verbatim> $ bin/virtuoso-start.sh $ isql FARM-1-IP:1111 -- connect to master repl_server ('MASTER', 'MASTER_DSN'); -- start subscribing to __rdf_repl repl_subscribe ('MASTER', '__rdf_repl', 'dav', 'dav', 'dba', 'dba'); -- start initial replication repl_sync_all (); -- add subscription to scheduler DB.DBA.SUB_SCHEDULE ('MASTER', '__rdf_repl', 1); </verbatim> At this point we can shutdown this Virtuoso instance using the <tt>bin/virtuoso-stop.sh</tt> command and make a copy of the whole virtuoso installation as a blueprint to copy to another FARM-x machine. ---+++ Setup FARM-2 from scratch We can repeat the same steps we did for the FARM-1 machine, and just make sure we use FARM-2 as the replication name in the database/virtuoso.ini file and use FARM-2-IP:1111 as an argument to the <tt>isql</tt> program. Change bin/virtuoso.ini: <verbatim> [Replication] ServerName = FARM-2 </verbatim> ---+++ Setup FARM-3 using blueprint from FARM-1 installation Extract the tarred/zipped copy of the installation made at the end of the setup of FARM-1. Before starting up the instance, we only need to give this instance a unique name for replication: Change bin/virtuoso.ini: <verbatim> [Replication] ServerName = FARM-3 </verbatim> Next we start up the Virtuoso instance using the <tt>bin/virtuoso-start.sh</tt> command and since the subscription records and schedule are already performed in the previous step, we just use the <tt>isql</tt> program to perform a sync against the MASTER: <verbatim> $ bin/virtuoso-start.sh $ isql FARM-3-IP:1111 -- change replication name DB.DBA.REPL_SERVER_RENAME ('FARM-1', 'FARM-3') -- sync against master repl_sync_all(); </verbatim> ---+++ Setup FARM-4 using clone of FARM-1 If the system has been running for some time, it may not be practical to do a replication from start, so there is an alternative way to setup a new FARM-4 machine. We can either restore the blue-print backup we make at the end of FARM-1 installation, or we do a fresh installation of virtuoso on the FARM-4 machine. In both cases we shutdown the virtuoso instance and remove the database, as we are going to replace this. <verbatim> $ bin/virtuoso-stop.sh $ cd database $ rm virtuoso.db virtuoso.trx virtuoso.log virtuoso.pxa </verbatim> Change the database/virtuoso.ini file: <verbatim> ... [Parameters] SchedulerInterval = 1 ; run the internal scheduler every minute CheckpointAuditTrail = 0 ; disable audit trail on transaction logs CheckpointInterval = 60 ; perform an automated checkpoint every 60 minutes ... [URIQA] DefaultHost = test.example.com ... [Replication] ServerName = FARM-4 ; each FARM machine needs to have a unique replication name ServerEnable = 1 QueueMax = 5000000 ... </verbatim> Next we are going to temporarily disable checkpointing on FARM-1 machine so we can copy its database without risking corruption: <verbatim> $ isql FARM-1-IP:1111 -- disable automatic checkpointing checkpoint_interval (-1); -- and do an explicit checkpoint checkpoint; </verbatim> It is now safe to copy the database across using the rsync command: <verbatim> $ rsync -avz virtuoso@FARM-1-IP:/path/to/virtuoso/database/virtuoso.db database/virtuoso.db </verbatim> Next we re-enable checkpoint interval on FARM-1: <verbatim> $ isql FARM-1-IP:1111 -- re-enable checkpointing checkpoint_interval(60); </verbatim> The last step is to start the database: <verbatim> $ bin/virtuoso-start.sh $ isql FARM-4-IP:1111 -- change replication name DB.DBA.REPL_SERVER_RENAME ('FARM-1', 'FARM-4') -- sync against master repl_sync_all(); </verbatim> ---++ Related * [[VirtGraphReplication][Virtuoso RDF Graph Replication Guide]] * [[VirtGraphReplicationPSQL][Howto setup RDF GRAPH Replication using the Virtuoso API]] * [[http://docs.openlinksw.com/virtuoso/functions.html#repl][Virtuoso API for replication]]
sioc:id
497ca8f72836b2781d4b84e68dbb9bfa
sioc:link
n2:VirtRdfReplScenarios
sioc:has_container
n5:VOS
n8:has_services
n7:item
atom:title
VirtRdfReplScenarios
sioc:links_to
virtrdf:rdf_repl_all n2:VirtGraphReplication n18:com n20: n23:repl n2:VirtGraphReplicationPSQL
atom:source
n5:VOS
atom:author
n11:this
atom:published
2017-06-13T05:49:29Z
atom:updated
2020-01-19T14:39:44Z
sioc:topic
n5:VOS