Guide for Setting up Crawler Jobs for Directories
The following guide describes how to set up crawler job for getting directories using Conductor.
- Go to Conductor UI. For ex. at http://localhost:8890/conductor .
- Enter dba credentials.
- Go to "Web Application Server".
- Go to "Content Imports".
- Click "New Target".
- In the shown form:
- Enter for "Target description":
Gov.UK data
- Enter for "Target URL":
http://source.data.gov.uk/data/
- Enter for "Copy to local DAV collection" for available user, for ex.
demo:
/DAV/home/demo/gov.uk/
- Choose from the available list "Local resources owner" an user, for ex.
demo ;
- Click the button "Create".
- Enter for "Target description":
- As result the Robot target will be created:
- Click "Import Queues".
- For "Robot target" with label "Gov.UK data " click "Run".
- As result will be shown the status of the pages: retrieved, pending or respectively waiting.
- Click "Retrieved Sites"
- As result should be shown the number of the total pages retrieved.
- Go to "Web Application Server" -> "Content Management" .
- Enter path:
DAV/home/demo/gov.uk
- Go to path:
DAV/home/demo/gov.uk/data
1 As result the retrieved content will be shown.
Related
- Setting up Crawler Jobs Guide using Conductor
- Setting up Crawler Job for inserting RDF data
- Setting up Crawler Job for retrieving Sitemaps (basic where the source has RDFa)
- Setting up Crawler Job for retrieving Semantic Sitemaps -- a variation of standard sitemap