Terracotta Framework Library > Samples > Spider

Distributed Web Spider - Demo for the Terracotta WorkManager

About

This sample is a part of the Terracotta Framework Library - tclib. More information is available from the Terracotta Framework Library readme.html.

This application implements a distributed web spider and is a demo application for the Terracotta WorkManager. Note that this demo can retry work automatically.

Build Instructions

This sample application requires Maven 2 and Java 5. First download and install Maven 2. Then perform these steps:

  1. Step up into the parent tclib directory cd ../..
  2. Invoke mvn install to install the tclib library
  3. Step back into the spider sample directory
  4. Invoke mvn package to build the application.

By default this simple spider will crawl the www.google.com site

Simple Run Instructions

By default, the tc:run command will all of the processes listed in the process section. There are two processes listed in the pom.xml of this sample, a Master and a Worker.

Start the server and then the Master an Worker to quickly see it in action by using the tc:run command:

$ mvn tc:start

$ mvn tc:run

Detailed Run Instructions

You can start each process individually. To start each process individually:

  1. Start a TC Server:

    $ mvn tc:start

  2. Start a Master:

    $ mvn -DactiveNodes=master tc:run

  3. Start one or more Workers:

    $ mvn -DactiveNodes=worker tc:run

By the way, this Master/Worker implementation uses an automatic retry mechanism by default, try killing the worker process half way through the crawl, and then start another to watch work resume on the second process