%META:TOPICPARENT{name="VirtSetCrawlerJobsGuide"}% ---+Setting up a Content Crawler Job to retrieve Semantic Sitemaps The following guide describes how to set up crawler job for getting Semantic Sitemap's content -- a variation of standard sitemap: 1 Go to Conductor UI. For ex. at http://localhost:8890/conductor . 1 Enter dba credentials. 1 Go to "Web Application Server". %BR%%BR%

%BR%%BR% 1 Go to "Content Imports". %BR%%BR%

%BR%%BR% 1 Click "New Target". %BR%%BR%

%BR%%BR% 1 In the shown form: * Enter for "Crawl Job Name": Semantic Web Sitemap Example * Enter for "Data Source Address (URL)": http://www.connexfilter.com/sitemap_en.xml * Enter the location in the Virtuoso WebDAV repository the crawled should stored in the "Local WebDAV Identifier " text-box, for example, if user demo is available, then: /DAV/home/demo/semantic_sitemap/ * Choose the "Local resources owner" for the collection from the list box available, for ex: user demo. * Hatch "Semantic Web Crawling": * Note: when you select this option, you can either: 1 Leave the Store Function and Extract Function empty - in this case the system Store and Extract functions will be used for the Semantic Web Crawling Process, or: 1 You can select your own Store and Extract Functions. [[VirtSetCrawlerJobsGuideSemanticSitemapsFuncExample][View an example of these functions]]. * Hatch "Accept RDF" %BR%%BR%

%BR%

%BR%%BR% * Optionally you can hatch "Store metadata *" and specify which RDF Cartridges to be included from the Sponger: %BR%%BR%

%BR%%BR% 1 Click the button "Create". %BR%%BR%

%BR%%BR% 1 Click "Import Queues". %BR%%BR%

%BR%%BR% 1 For "Robot target" with label "Semantic Web Sitemap Example" click "Run". 1 As result should be shown the number of the pages retrieved. %BR%%BR%

%BR%%BR% 1 Check the retrieved RDF data from your Virtuoso instance SPARQL endpoint http://cname:port/sparql with the following query selecting all the retrieved graphs for ex: SELECT ?g FROM WHERE { graph ?g { ?s ?p ?o } . FILTER ( ?g LIKE ) } %BR%%BR%

%BR%%BR% ---++Related * [[VirtSetCrawlerJobsGuide][Setting up Crawler Jobs Guide using Conductor]] * [[http://docs.openlinksw.com/virtuoso/rdfinsertmethods.html#rdfinsertmethodvirtuosocrawler][Setting up a Content Crawler Job to Add RDF Data to the Quad Store]] * [[VirtSetCrawlerJobsGuideSitemaps][Setting up a Content Crawler Job to Retrieve Sitemaps (where the source includes RDFa)]] * [[VirtSetCrawlerJobsGuideDirectories][Setting up a Content Crawler Job to Retrieve Content from Specific Directories]] * [[VirtCrawlerSPARQLEndpoints][Setting up a Content Crawler Job to Retrieve Content from SPARQL endpoint]]