<docbook><section><title>VirtSetCrawlerJobsGuideDirectories</title><para> </para>
<title>Setting up a Content Crawler Job to Retrieve Content from Specific Directories</title>Setting up a Content Crawler Job to Retrieve Content from Specific Directories
<para>The following guide describes how to set up crawler job for getting directories using Conductor.</para>
<para> </para>
<orderedlist spacing="compact"><listitem>Go to Conductor UI.
 For ex.
 at <ulink url="http://localhost:8890/conductor">http://localhost:8890/conductor</ulink> . </listitem>
<listitem>Enter dba credentials.
</listitem>
<listitem>Go to &quot;Web Application Server&quot;.
<ulink url="VirtSetCrawlerJobsGuideDirectories/cr1.png"><figure><graphic fileref="VirtSetCrawlerJobsGuideDirectories/cr1.png" /></figure></ulink> </listitem>
<listitem>Go to &quot;Content Imports&quot;.
<ulink url="VirtSetCrawlerJobsGuideDirectories/cr2.png"><figure><graphic fileref="VirtSetCrawlerJobsGuideDirectories/cr2.png" /></figure></ulink> </listitem>
<listitem>Click &quot;New Target&quot;.
<ulink url="VirtSetCrawlerJobsGuideDirectories/cr3.png"><figure><graphic fileref="VirtSetCrawlerJobsGuideDirectories/cr3.png" /></figure></ulink> </listitem>
<listitem>In the shown form set respectively: <itemizedlist mark="bullet" spacing="compact"><listitem>&quot;Crawl Job Name&quot;: <programlisting>Gov.UK data
</programlisting></listitem>
<listitem>&quot;Data Source Address (URL)&quot;: <programlisting>http://source.data.gov.uk/data/
</programlisting></listitem>
<listitem>&quot;Local <ulink url="WebDAV">WebDAV</ulink> Identifier&quot; for available user, for ex.
 demo: <programlisting>/DAV/home/demo/gov.uk/
</programlisting></listitem>
<listitem>Choose from the available list &quot;Local resources owner&quot; an user, for ex.
 demo ; <ulink url="VirtSetCrawlerJobsGuideDirectories/d1.png"><figure><graphic fileref="VirtSetCrawlerJobsGuideDirectories/d1.png" /></figure></ulink> </listitem>
<listitem>Click the button &quot;Create&quot;.
</listitem>
</itemizedlist></listitem>
<listitem>As result the Robot target will be created: <ulink url="VirtSetCrawlerJobsGuideDirectories/d2.png"><figure><graphic fileref="VirtSetCrawlerJobsGuideDirectories/d2.png" /></figure></ulink> </listitem>
<listitem>Click &quot;Import Queues&quot;.
<ulink url="VirtSetCrawlerJobsGuideDirectories/d3.png"><figure><graphic fileref="VirtSetCrawlerJobsGuideDirectories/d3.png" /></figure></ulink> </listitem>
<listitem>For &quot;Robot target&quot; with label &quot;Gov.UK data &quot; click &quot;Run&quot;.
</listitem>
<listitem>As result will be shown the status of the pages: retrieved, pending or respectively waiting.
<ulink url="VirtSetCrawlerJobsGuideDirectories/d4.png"><figure><graphic fileref="VirtSetCrawlerJobsGuideDirectories/d4.png" /></figure></ulink> </listitem>
<listitem>Click &quot;Retrieved Sites&quot; </listitem>
<listitem>As result should be shown the number of the total pages retrieved.
<ulink url="VirtSetCrawlerJobsGuideDirectories/d5.png"><figure><graphic fileref="VirtSetCrawlerJobsGuideDirectories/d5.png" /></figure></ulink> </listitem>
<listitem>Go to  &quot;Web Application Server&quot; -&gt; &quot;Content Management&quot; . </listitem>
<listitem>Enter path: <programlisting>DAV/home/demo/gov.uk
</programlisting><ulink url="VirtSetCrawlerJobsGuideDirectories/d6.png"><figure><graphic fileref="VirtSetCrawlerJobsGuideDirectories/d6.png" /></figure></ulink> </listitem>
<listitem>Go to path: <programlisting>DAV/home/demo/gov.uk/data
</programlisting>  1 As result the retrieved content will be shown.
<ulink url="VirtSetCrawlerJobsGuideDirectories/d7.png"><figure><graphic fileref="VirtSetCrawlerJobsGuideDirectories/d7.png" /></figure></ulink></listitem>
</orderedlist><para> </para>
<bridgehead class="http://www.w3.org/1999/xhtml:h2">Related</bridgehead>
<itemizedlist mark="bullet" spacing="compact"><listitem><ulink url="VirtSetCrawlerJobsGuide">Setting up Crawler Jobs Guide using Conductor</ulink> </listitem>
<listitem><ulink url="http://docs.openlinksw.com/virtuoso/rdfinsertmethods.html#rdfinsertmethodvirtuosocrawler">Setting up a Content Crawler Job to Add RDF Data to the Quad Store</ulink> </listitem>
<listitem><ulink url="VirtSetCrawlerJobsGuideSitemaps">Setting up a Content Crawler Job to Retrieve Sitemaps (where the source includes RDFa)</ulink> </listitem>
<listitem><ulink url="VirtSetCrawlerJobsGuideSemanticSitemaps">Setting up a Content Crawler Job to Retrieve Semantic Sitemaps (a variation of the standard sitemap)</ulink> </listitem>
<listitem><ulink url="VirtCrawlerSPARQLEndpoints">Setting up a Content Crawler Job to Retrieve Content from SPARQL endpoint</ulink> </listitem>
</itemizedlist></section></docbook>