<docbook><section><title>VirtSetCrawlerJobsGuideSitemaps</title><para> </para>
<title>Setting up a Content Crawler Job to retrieve Sitemaps</title>Setting up a Content Crawler Job to retrieve Sitemaps
<para>The following guide describes how to set up a crawler job for getting content of a basic Sitemap where the source includes RDFa.</para>
<orderedlist spacing="compact"><listitem>From the Virtuoso Conductor User Interface i.e.
 <ulink url="http://cname:port/conductor,">http://cname:port/conductor,</ulink> login as the &quot;dba&quot; user.
</listitem>
<listitem>Go to &quot;Web Application Server&quot; tab.
<ulink url="VirtSetCrawlerJobsGuideSitemaps/cr1.png"><figure><graphic fileref="VirtSetCrawlerJobsGuideSitemaps/cr1.png" /></figure></ulink> </listitem>
<listitem>Go to the &quot;Content Imports&quot; tab.
<ulink url="VirtSetCrawlerJobsGuideSitemaps/cr2.png"><figure><graphic fileref="VirtSetCrawlerJobsGuideSitemaps/cr2.png" /></figure></ulink> </listitem>
<listitem>Click on the &quot;New Target&quot; button.
<ulink url="VirtSetCrawlerJobsGuideSitemaps/cr3.png"><figure><graphic fileref="VirtSetCrawlerJobsGuideSitemaps/cr3.png" /></figure></ulink> </listitem>
<listitem>In the form displayed: <itemizedlist mark="bullet" spacing="compact"><listitem>Enter a name of choice in the &quot;Crawl Job Name&quot; text-box: <programlisting>Basic Sitemap Crawling Example 
</programlisting></listitem>
<listitem>Enter the URL of the site to be crawled in the &quot;Data Source Address (URL)&quot; text-box: <programlisting>http://psclife.pscdog.com/catalog/seo_sitemap/product/&amp;nbsp
</programlisting></listitem>
<listitem>Enter the location in the Virtuoso <ulink url="WebDAV">WebDAV</ulink> repository the crawled should stored in the &quot;Local <ulink url="WebDAV">WebDAV</ulink> Identifier&quot; text-box, for example, if user demo is available, then: <programlisting>/DAV/home/demo/basic_sitemap/
</programlisting></listitem>
<listitem>Choose the  &quot;Local resources owner&quot; for the collection from the list-box available, for ex: user demo.
</listitem>
<listitem>Select the &quot;Accept RDF&quot; check-box.
<ulink url="VirtSetCrawlerJobsGuideSitemaps/cr11.png"><figure><graphic fileref="VirtSetCrawlerJobsGuideSitemaps/cr11.png" /></figure></ulink><ulink url="VirtSetCrawlerJobsGuideSitemaps/cr11a.png"><figure><graphic fileref="VirtSetCrawlerJobsGuideSitemaps/cr11a.png" /></figure></ulink> </listitem>
</itemizedlist></listitem>
<listitem>Click the &quot;Create&quot; button to create the import: <ulink url="VirtSetCrawlerJobsGuideSitemaps/cr12.png"><figure><graphic fileref="VirtSetCrawlerJobsGuideSitemaps/cr12.png" /></figure></ulink> </listitem>
<listitem>Click the &quot;Import Queues&quot; button.
</listitem>
<listitem>For the &quot;Robot targets&quot; with label &quot;Basic Sitemap Crawling Example &quot; click the &quot;Run&quot; button.
</listitem>
<listitem>This will result in the Target site being crawled and the retrieved pages stored locally in DAV and any sponged triples in the RDF Quad store.
<ulink url="VirtSetCrawlerJobsGuideSitemaps/cr13.png"><figure><graphic fileref="VirtSetCrawlerJobsGuideSitemaps/cr13.png" /></figure></ulink> </listitem>
<listitem>Go to the &quot;Web Application Server&quot; -&gt; &quot;Content Management&quot; tab.
<ulink url="VirtSetCrawlerJobsGuideSitemaps/cr14.png"><figure><graphic fileref="VirtSetCrawlerJobsGuideSitemaps/cr14.png" /></figure></ulink> </listitem>
<listitem>Navigate to the location of newly created DAV collection: <programlisting>/DAV/home/demo/basic_sitemap/
</programlisting></listitem>
<listitem>The retrieved content will be available in this location.
<ulink url="VirtSetCrawlerJobsGuideSitemaps/cr15.png"><figure><graphic fileref="VirtSetCrawlerJobsGuideSitemaps/cr15.png" /></figure></ulink></listitem>
</orderedlist><para> </para>
<bridgehead class="http://www.w3.org/1999/xhtml:h2">Related</bridgehead>
<itemizedlist mark="bullet" spacing="compact"><listitem><ulink url="VirtSetCrawlerJobsGuide">Setting up Crawler Jobs Guide using Conductor</ulink> </listitem>
<listitem><ulink url="http://docs.openlinksw.com/virtuoso/rdfinsertmethods.html#rdfinsertmethodvirtuosocrawler">Setting up a Content Crawler Job to Add RDF Data to the Quad Store</ulink> </listitem>
<listitem><ulink url="VirtSetCrawlerJobsGuideSemanticSitemaps">Setting up a Content Crawler Job to Retrieve Semantic Sitemaps (a variation of the standard sitemap)</ulink> </listitem>
<listitem><ulink url="VirtSetCrawlerJobsGuideDirectories">Setting up a Content Crawler Job to Retrieve Content from Specific Directories</ulink> </listitem>
<listitem><ulink url="VirtCrawlerSPARQLEndpoints">Setting up a Content Crawler Job to Retrieve Content from SPARQL endpoint</ulink> </listitem>
</itemizedlist></section></docbook>