Guide for Setting up Crawler Jobs for Directories

The following guide describes how to set up crawler job for getting directories using Conductor.

  1. Go to Conductor UI. For ex. at http://localhost:8890/conductor .
  2. Enter dba credentials.
  3. Go to "Web Application Server".



  4. Go to "Content Imports".



  5. Click "New Target".



  6. In the shown form:
    • Enter for "Target description":

      Gov.UK data

    • Enter for "Target URL":

      http://source.data.gov.uk/data/

    • Enter for "Copy to local DAV collection" for available user, for ex. demo:

      /DAV/home/demo/gov.uk/

    • Choose from the available list "Local resources owner" an user, for ex. demo ;



    • Click the button "Create".
  7. As result the Robot target will be created:



  8. Click "Import Queues".



  9. For "Robot target" with label "Gov.UK data " click "Run".
  10. As result will be shown the status of the pages: retrieved, pending or respectively waiting.



  11. Click "Retrieved Sites"
  12. As result should be shown the number of the total pages retrieved.



  13. Go to "Web Application Server" -> "Content Management" .
  14. Enter path:

    DAV/home/demo/gov.uk





  15. Go to path:

    DAV/home/demo/gov.uk/data

    1 As result the retrieved content will be shown.



Related