VirtSetCrawlerJobsGuideSemanticSitemaps

VirtSetCrawlerJobsGuideSemanticSitemaps Setting up a Content Crawler Job to retrieve Semantic SitemapsSetting up a Content Crawler Job to retrieve Semantic Sitemaps The following guide describes how to set up crawler job for getting Semantic Sitemap's content -- a variation of standard sitemap: Go to Conductor UI. For ex. at http://localhost:8890/conductor . Enter dba credentials. Go to "Web Application Server".

Go to "Content Imports".

Click "New Target".

In the shown form: Enter for "Crawl Job Name": Semantic Web Sitemap Example Enter for "Data Source Address (URL)": http://www.connexfilter.com/sitemap_en.xml Enter the location in the Virtuoso WebDAV repository the crawled should stored in the "Local WebDAV Identifier " text-box, for example, if user demo is available, then: /DAV/home/demo/semantic_sitemap/ Choose the "Local resources owner" for the collection from the list box available, for ex: user demo. Hatch "Semantic Web Crawling": Note: when you select this option, you can either: Leave the Store Function and Extract Function empty - in this case the system Store and Extract functions will be used for the Semantic Web Crawling Process, or: You can select your own Store and Extract Functions. View an example of these functions. Hatch "Accept RDF"

Optionally you can hatch "Store metadata *" and specify which RDF Cartridges to be included from the Sponger:

Click the button "Create".

Click "Import Queues".

For "Robot target" with label "Semantic Web Sitemap Example" click "Run". As result should be shown the number of the pages retrieved.

Check the retrieved RDF data from your Virtuoso instance SPARQL endpoint http://cname:port/sparql with the following query selecting all the retrieved graphs for ex: SELECT ?g FROM <http://localhost:8890/> WHERE { graph ?g { ?s ?p ?o } . FILTER ( ?g LIKE <http://www.connexfilter.com/%> ) }

Related Setting up Crawler Jobs Guide using Conductor Setting up a Content Crawler Job to Add RDF Data to the Quad Store Setting up a Content Crawler Job to Retrieve Sitemaps (where the source includes RDFa) Setting up a Content Crawler Job to Retrieve Content from Specific Directories Setting up a Content Crawler Job to Retrieve Content from SPARQL endpoint