<docbook><section><title>VirtSetCrawlerJobsGuideSemanticSitemaps</title><para> </para>
<title>Setting up a Content Crawler Job to retrieve Semantic Sitemaps</title>Setting up a Content Crawler Job to retrieve Semantic Sitemaps
<para> The following guide describes how to set up crawler job for getting Semantic Sitemap&#39;s content -- a variation of standard sitemap:</para>
<orderedlist spacing="compact"><listitem>Go to Conductor UI.
 For ex.
 at <ulink url="http://localhost:8890/conductor">http://localhost:8890/conductor</ulink> . </listitem>
<listitem>Enter dba credentials.
</listitem>
<listitem>Go to &quot;Web Application Server&quot;.
<ulink url="VirtSetCrawlerJobsGuideSemanticSitemaps/cr1.png"><figure><graphic fileref="VirtSetCrawlerJobsGuideSemanticSitemaps/cr1.png" /></figure></ulink> </listitem>
<listitem>Go to &quot;Content Imports&quot;.
<ulink url="VirtSetCrawlerJobsGuideSemanticSitemaps/cr2.png"><figure><graphic fileref="VirtSetCrawlerJobsGuideSemanticSitemaps/cr2.png" /></figure></ulink> </listitem>
<listitem>Click &quot;New Target&quot;.
<ulink url="VirtSetCrawlerJobsGuideSemanticSitemaps/cr3.png"><figure><graphic fileref="VirtSetCrawlerJobsGuideSemanticSitemaps/cr3.png" /></figure></ulink> </listitem>
<listitem>In the shown form: <itemizedlist mark="bullet" spacing="compact"><listitem>Enter for &quot;Crawl Job Name&quot;: <programlisting>Semantic Web Sitemap Example 
</programlisting></listitem>
<listitem>Enter for &quot;Data Source Address (URL)&quot;: <programlisting>http://www.connexfilter.com/sitemap_en.xml
</programlisting></listitem>
<listitem>Enter the location in the Virtuoso <ulink url="WebDAV">WebDAV</ulink> repository the crawled should stored in the &quot;Local <ulink url="WebDAV">WebDAV</ulink> Identifier &quot; text-box, for example, if user demo is available, then: <programlisting>/DAV/home/demo/semantic_sitemap/
</programlisting></listitem>
<listitem>Choose the &quot;Local resources owner&quot; for the collection from the list box available, for ex: user demo.
</listitem>
<listitem>Hatch &quot;Semantic Web Crawling&quot;: <itemizedlist mark="bullet" spacing="compact"><listitem>Note: when you select this option, you can either: <orderedlist spacing="compact"><listitem>Leave the Store Function and Extract Function empty - in this case the system Store and Extract functions will be used for the Semantic Web Crawling Process, or: </listitem>
<listitem>You can select your own Store and Extract Functions.
 <ulink url="VirtSetCrawlerJobsGuideSemanticSitemapsFuncExample">View an example of these functions</ulink>.
</listitem>
</orderedlist></listitem>
</itemizedlist></listitem>
<listitem>Hatch &quot;Accept RDF&quot; <ulink url="VirtSetCrawlerJobsGuideSemanticSitemaps/cr16.png"><figure><graphic fileref="VirtSetCrawlerJobsGuideSemanticSitemaps/cr16.png" /></figure></ulink> <ulink url="VirtSetCrawlerJobsGuideSemanticSitemaps/cr17.png"><figure><graphic fileref="VirtSetCrawlerJobsGuideSemanticSitemaps/cr17.png" /></figure></ulink> </listitem>
<listitem>Optionally you can hatch &quot;Store metadata *&quot; and specify which RDF Cartridges to be included from the Sponger: <ulink url="VirtSetCrawlerJobsGuideSemanticSitemaps/cr17a.png"><figure><graphic fileref="VirtSetCrawlerJobsGuideSemanticSitemaps/cr17a.png" /></figure></ulink> </listitem>
</itemizedlist></listitem>
<listitem>Click the button &quot;Create&quot;.
<ulink url="VirtSetCrawlerJobsGuideSemanticSitemaps/cr18.png"><figure><graphic fileref="VirtSetCrawlerJobsGuideSemanticSitemaps/cr18.png" /></figure></ulink> </listitem>
<listitem>Click &quot;Import Queues&quot;.
<ulink url="VirtSetCrawlerJobsGuideSemanticSitemaps/cr19.png"><figure><graphic fileref="VirtSetCrawlerJobsGuideSemanticSitemaps/cr19.png" /></figure></ulink> </listitem>
<listitem>For &quot;Robot target&quot; with label &quot;Semantic Web Sitemap Example&quot; click &quot;Run&quot;.
</listitem>
<listitem>As result should be shown the number of the pages retrieved.
<ulink url="VirtSetCrawlerJobsGuideSemanticSitemaps/cr20.png"><figure><graphic fileref="VirtSetCrawlerJobsGuideSemanticSitemaps/cr20.png" /></figure></ulink> </listitem>
<listitem>Check the retrieved RDF data from your Virtuoso instance SPARQL endpoint <ulink url="http://cname:port/sparql">http://cname:port/sparql</ulink> with the following query selecting all the retrieved graphs for ex: <programlisting>SELECT ?g 
FROM &lt;http://localhost:8890/&gt;
WHERE 
  { 
    graph ?g { ?s ?p ?o } . 
    FILTER ( ?g LIKE &lt;http://www.connexfilter.com/%&gt; ) 
  }
</programlisting><ulink url="VirtSetCrawlerJobsGuideSemanticSitemaps/cr21.png"><figure><graphic fileref="VirtSetCrawlerJobsGuideSemanticSitemaps/cr21.png" /></figure></ulink></listitem>
</orderedlist><para> </para>
<bridgehead class="http://www.w3.org/1999/xhtml:h2">Related</bridgehead>
<itemizedlist mark="bullet" spacing="compact"><listitem><ulink url="VirtSetCrawlerJobsGuide">Setting up Crawler Jobs Guide using Conductor</ulink> </listitem>
<listitem><ulink url="http://docs.openlinksw.com/virtuoso/rdfinsertmethods.html#rdfinsertmethodvirtuosocrawler">Setting up a Content Crawler Job to Add RDF Data to the Quad Store</ulink> </listitem>
<listitem><ulink url="VirtSetCrawlerJobsGuideSitemaps">Setting up a Content Crawler Job to Retrieve Sitemaps (where the source includes RDFa)</ulink> </listitem>
<listitem><ulink url="VirtSetCrawlerJobsGuideDirectories">Setting up a Content Crawler Job to Retrieve Content from Specific Directories</ulink> </listitem>
<listitem><ulink url="VirtCrawlerSPARQLEndpoints">Setting up a Content Crawler Job to Retrieve Content from SPARQL endpoint</ulink> </listitem>
</itemizedlist></section></docbook>