<docbook><section><title>VirtSetCrawlerJobsGuide</title><para> </para>
<title> Quad Store Data Loading via Virtuoso&#39;s In-built Content Crawler</title> Quad Store Data Loading via Virtuoso&#39;s In-built Content Crawler
<para>This guide covers the use of Virtuoso&#39;s in-built content crawler as a mechanism for scheduled of one-off data loading operations for its native quad store.</para>
<bridgehead class="http://www.w3.org/1999/xhtml:h2"> Why is this important?</bridgehead>
<para>Transforming external data sources into Linked Data &quot;on the fly&quot; (e.g., via the &#39;Sponger&#39;) is sufficient for many use cases, but there are times when the volume or sheer nature of a data source makes batch-loading necessary.
 For example, Freebase offers RDF representations of its data, but it doesn&#39;t publish RDF dumps; even if it did, such dumps would usually be outdated by the time they were loaded.
 Thus, a scheduled crawl of that resource collection offers a viable alternative.</para>
<bridgehead class="http://www.w3.org/1999/xhtml:h2"> How to Set Up the Content Crawler for Linked Data generation and import</bridgehead>
<para>The Virtuoso Conductor can be used to set up various Content Crawler Jobs:</para>
<itemizedlist mark="bullet" spacing="compact"><listitem><ulink url="http://docs.openlinksw.com/virtuoso/rdfinsertmethods.html#rdfinsertmethodvirtuosocrawler">Setting up a Content Crawler Job to Import Linked Data into the Virtuoso Quad Store</ulink> </listitem>
<listitem><ulink url="VirtSetCrawlerJobsGuideSitemaps">Setting up a Content Crawler Job to Retrieve Sitemaps</ulink> (when the source includes RDFa) </listitem>
<listitem><ulink url="VirtSetCrawlerJobsGuideSemanticSitemaps">Setting up a Content Crawler Job to Retrieve Semantic Sitemaps</ulink> (a variation of the standard sitemap) </listitem>
<listitem><ulink url="VirtSetCrawlerJobsGuideDirectories">Setting up a Content Crawler Job to Retrieve Content from Specific Directories</ulink> </listitem>
<listitem><ulink url="VirtCrawlerGuideAtom">Setting up a Content Crawler Job to Retrieve Content from ATOM feed</ulink> </listitem>
<listitem><ulink url="VirtCrawlerSPARQLEndpoints">Setting up a Content Crawler Job to Retrieve Content from SPARQL endpoint</ulink> </listitem>
</itemizedlist></section></docbook>