%META:TOPICPARENT{name="VirtSetCrawlerJobsGuide"}%
---+Setting up a Content Crawler Job to retrieve Semantic Sitemaps
The following guide describes how to set up crawler job for getting Semantic Sitemap's content -- a variation of standard sitemap:
1 Go to Conductor UI. For ex. at http://localhost:8890/conductor .
1 Enter dba credentials.
1 Go to "Web Application Server".
%BR%%BR%
%BR%%BR%
1 Go to "Content Imports".
%BR%%BR%
%BR%%BR%
1 Click "New Target".
%BR%%BR%
%BR%%BR%
1 In the shown form:
* Enter for "Crawl Job Name":
Semantic Web Sitemap Example
* Enter for "Data Source Address (URL)":
http://www.connexfilter.com/sitemap_en.xml
* Enter the location in the Virtuoso WebDAV repository the crawled should stored in the "Local WebDAV Identifier " text-box, for example, if user demo is available, then:
/DAV/home/demo/semantic_sitemap/
* Choose the "Local resources owner" for the collection from the list box available, for ex: user demo.
* Hatch "Semantic Web Crawling":
* Note: when you select this option, you can either:
1 Leave the Store Function and Extract Function empty - in this case the system Store and Extract functions will be used for the Semantic Web Crawling Process, or:
1 You can select your own Store and Extract Functions. [[VirtSetCrawlerJobsGuideSemanticSitemapsFuncExample][View an example of these functions]].
* Hatch "Accept RDF"
%BR%%BR%
%BR%
%BR%%BR%
* Optionally you can hatch "Store metadata *" and specify which RDF Cartridges to be included from the Sponger:
%BR%%BR%
%BR%%BR%
1 Click the button "Create".
%BR%%BR%
%BR%%BR%
1 Click "Import Queues".
%BR%%BR%
%BR%%BR%
1 For "Robot target" with label "Semantic Web Sitemap Example" click "Run".
1 As result should be shown the number of the pages retrieved.
%BR%%BR%
%BR%%BR%
1 Check the retrieved RDF data from your Virtuoso instance SPARQL endpoint http://cname:port/sparql with the following query selecting all the retrieved graphs for ex:
SELECT ?g
FROM
WHERE
{
graph ?g { ?s ?p ?o } .
FILTER ( ?g LIKE )
}
%BR%%BR%
%BR%%BR%
---++Related
* [[VirtSetCrawlerJobsGuide][Setting up Crawler Jobs Guide using Conductor]]
* [[http://docs.openlinksw.com/virtuoso/rdfinsertmethods.html#rdfinsertmethodvirtuosocrawler][Setting up a Content Crawler Job to Add RDF Data to the Quad Store]]
* [[VirtSetCrawlerJobsGuideSitemaps][Setting up a Content Crawler Job to Retrieve Sitemaps (where the source includes RDFa)]]
* [[VirtSetCrawlerJobsGuideDirectories][Setting up a Content Crawler Job to Retrieve Content from Specific Directories]]
* [[VirtCrawlerSPARQLEndpoints][Setting up a Content Crawler Job to Retrieve Content from SPARQL endpoint]]