<docbook><section><title>VirtCrawlerGuideAtom</title><para> </para>
<title>Virtuoso Crawler Guide for populating Virtuoso Quad Store using ATOM feed</title>Virtuoso Crawler Guide for populating Virtuoso Quad Store using ATOM feed
<bridgehead class="http://www.w3.org/1999/xhtml:h2">What?</bridgehead>
<para>This Guide demonstrates populating the Virtuoso Quad Store using ATOM feed.</para>
<bridgehead class="http://www.w3.org/1999/xhtml:h2">Why?</bridgehead>
<para>Populating the Virtuoso Quad Store can be done in different ways Virtuoso supports.
 The Conductor -&gt; Content Import UI offers plenty of options, one of which is the XPath expression for crawling RDF resources URLs and this feature is a powerful and easy-to-use for managing the Quad Store.</para>
<bridgehead class="http://www.w3.org/1999/xhtml:h2">How?</bridgehead>
 To populate the Virtuoso Quad Store, in this Guide we will use a XPAth expression for the URLs of the RDF resources references in a given ATOM feed.
 For ex.
 <ulink url="http://data.libris.kb.se/nationalbibliography/feed/">this one</ulink> of the &quot;National Bibliography&quot; Store.<bridgehead class="http://www.w3.org/1999/xhtml:h3">Sample Scenario</bridgehead>
<orderedlist spacing="compact"><listitem>Go to <ulink url="http://cname/conductor">http://cname/conductor</ulink> </listitem>
<listitem>Enter dba credentials </listitem>
<listitem>Go to Web Application Server -&gt; Content Management -&gt; Content Imports: <ulink url="VirtCrawlerGuideAtom/cra1.png"><figure><graphic fileref="VirtCrawlerGuideAtom/cra1.png" /></figure></ulink> </listitem>
<listitem>Click &quot;New Target&quot;: <ulink url="VirtCrawlerGuideAtom/cra2.png"><figure><graphic fileref="VirtCrawlerGuideAtom/cra2.png" /></figure></ulink> </listitem>
<listitem>In the presented form specify respectively: <itemizedlist mark="bullet" spacing="compact"><listitem><emphasis>Crawl Job Name</emphasis>: for ex.
 National Bibliography ; </listitem>
<listitem><emphasis>Data Source Address (URL)</emphasis>: for ex.
 <ulink url="http://data.libris.kb.se/nationalbibliography/feed/">http://data.libris.kb.se/nationalbibliography/feed/</ulink> ; <itemizedlist mark="bullet" spacing="compact"><listitem><emphasis>Note</emphasis>: the entered URL will be the graph URI for storing the imported RDF data.
 You can also set it explicitly by entering another graph URI in the &quot;If Graph IRI is unassigned use this Data Source URL:&quot; option.
</listitem>
</itemizedlist></listitem>
<listitem><emphasis>Local <ulink url="WebDAV">WebDAV</ulink> Identifier </emphasis>: for ex.
<programlisting>/DAV/temp/nbio/ 
</programlisting></listitem>
<listitem><emphasis>XPath expression for links extraction</emphasis>: <programlisting>//entry/link/@href
</programlisting></listitem>
<listitem><emphasis>Update Interval (minutes)</emphasis>: for ex.
 10 ; </listitem>
<listitem><emphasis>Run Sponger</emphasis>: hatch this check-box ; </listitem>
<listitem><emphasis>Accept RDF</emphasis>: hatch this check-box ; </listitem>
<listitem><emphasis>Store metadata</emphasis>: hatch this check-box ; </listitem>
<listitem><emphasis>RDF Cartridge</emphasis>: hatch this check-box and specify what cartridges will be used: <ulink url="VirtCrawlerGuideAtom/cra3.png"><figure><graphic fileref="VirtCrawlerGuideAtom/cra3.png" /></figure></ulink> <ulink url="VirtCrawlerGuideAtom/cra4.png"><figure><graphic fileref="VirtCrawlerGuideAtom/cra4.png" /></figure></ulink> <ulink url="VirtCrawlerGuideAtom/cra5.png"><figure><graphic fileref="VirtCrawlerGuideAtom/cra5.png" /></figure></ulink> </listitem>
</itemizedlist></listitem>
<listitem>Click &quot;Create&quot;: </listitem>
<listitem>The new created target should be displayed in the list of available Targets: <ulink url="VirtCrawlerGuideAtom/cra7.png"><figure><graphic fileref="VirtCrawlerGuideAtom/cra7.png" /></figure></ulink> </listitem>
<listitem>Click &quot;Import Queues&quot;: <ulink url="VirtCrawlerGuideAtom/cra8.png"><figure><graphic fileref="VirtCrawlerGuideAtom/cra8.png" /></figure></ulink> </listitem>
<listitem>Click for &quot;National Bibliography&quot; target the &quot;Run&quot; link from the very-right &quot;Action&quot; column: </listitem>
<listitem>Should be presented list of Top pending URLs: <ulink url="VirtCrawlerGuideAtom/cra9.png"><figure><graphic fileref="VirtCrawlerGuideAtom/cra9.png" /></figure></ulink> </listitem>
<listitem>Finally when the import is finished, should be shown the total URLs that were processed: <ulink url="VirtCrawlerGuideAtom/cra10.png"><figure><graphic fileref="VirtCrawlerGuideAtom/cra10.png" /></figure></ulink> </listitem>
<listitem>Click &quot;Back&quot; <ulink url="VirtCrawlerGuideAtom/cra11.png"><figure><graphic fileref="VirtCrawlerGuideAtom/cra11.png" /></figure></ulink> </listitem>
<listitem>Click &quot;Retrieved Sites&quot;.
<ulink url="VirtCrawlerGuideAtom/cra12.png"><figure><graphic fileref="VirtCrawlerGuideAtom/cra12.png" /></figure></ulink> </listitem>
<listitem>Out target should be presented in the list of available retrieved sites.
 From here you could manage the retrieved URLs by editing the imported URLs or exporting to External/Internal <ulink url="WebDAV">WebDAV</ulink> destination.
 Click for ex.
 the &quot;Edit&quot; link of the very-right &quot;Action&quot; column for our retrieved site.
</listitem>
<listitem>Should be presented all downloaded URLs of RDF resources referenced in our initial <emphasis>ATOM</emphasis> feed.
<ulink url="VirtCrawlerGuideAtom/cra13.png"><figure><graphic fileref="VirtCrawlerGuideAtom/cra13.png" /></figure></ulink> </listitem>
<listitem>To view the imported RDF data, go to <ulink url="http://cname/sparql">http://cname/sparql</ulink> and enter a simple query for ex.: <programlisting>SELECT * 
FROM &lt;http://data.libris.kb.se/nationalbibliography/feed/&gt;
WHERE 
  {
    ?s ?p ?o
  }
</programlisting><ulink url="VirtCrawlerGuideAtom/cra14.png"><figure><graphic fileref="VirtCrawlerGuideAtom/cra14.png" /></figure></ulink> </listitem>
<listitem>Click &quot;Run Query&quot;.
</listitem>
<listitem>The imported RDF data triples should be shown: <ulink url="VirtCrawlerGuideAtom/cra15.png"><figure><graphic fileref="VirtCrawlerGuideAtom/cra15.png" /></figure></ulink></listitem>
</orderedlist><para> </para>
<bridgehead class="http://www.w3.org/1999/xhtml:h2">Related</bridgehead>
<para> </para>
<itemizedlist mark="bullet" spacing="compact"><listitem><ulink url="http://docs.openlinksw.com/virtuoso/rdfinsertmethods.html#rdfinsertmethodvirtuosocrawler">Setting up a Content Crawler Job to Add RDF Data to the Quad Store</ulink> </listitem>
<listitem><ulink url="VirtSetCrawlerJobsGuideSitemaps">Setting up a Content Crawler Job to Retrieve Sitemaps</ulink> (when the source includes RDFa) </listitem>
<listitem><ulink url="VirtSetCrawlerJobsGuideSemanticSitemaps">Setting up a Content Crawler Job to Retrieve Semantic Sitemaps</ulink> (a variation of the standard sitemap) </listitem>
<listitem><ulink url="VirtSetCrawlerJobsGuideDirectories">Setting up a Content Crawler Job to Retrieve Content from Specific Directories</ulink> </listitem>
<listitem><ulink url="VirtCrawlerSPARQLEndpoints">Setting up a Content Crawler Job to Retrieve Content from SPARQL endpoint</ulink> </listitem>
<listitem><ulink url="http://docs.openlinksw.com/virtuoso/xmlservices.html#xpath_sql">Virtuoso XPATH Implementation and SQL</ulink> </listitem>
<listitem><ulink url="http://librisbloggen.kb.se/2011/09/21/swedish-national-bibliography-and-authority-data-released-with-open-license/">Collection examples of live ATOM and OAI-PMH feeds.</ulink> </listitem>
</itemizedlist></section></docbook>