MadStore configuration is based on a simple XML file.
Here is a detailed description of its structure, defined with an x-path like notation:
There can be more than one namespace element definitions.
There can be one or more task
elements, each one for a different task to execute.
Right now, MadStore provides the following two tasks:
Here is a simple example:
<beans xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://www.springframework.org/schema/beans" xmlns:mds="http://www.pronetics.com/schema/madstore" xsi:schemaLocation=" http://www.springframework.org/schema/beans http://www.springframework.org/schema/beans/spring-beans-2.5.xsd http://www.pronetics.com/schema/madstore http://www.pronetics.com/schema/madstore/madstore.xsd"> <mds:config> <mds:crawler> <mds:grid-enabled localAddress="192.168.1.1"/> <mds:targetSite> <mds:hostName>http://127.0.0.1:8080</mds:hostName> <mds:startLink>/index.html</mds:startLink> <mds:maxConcurrentDownloads>3</mds:maxConcurrentDownloads> <mds:maxVisitedLinks>100</mds:maxVisitedLinks> </mds:targetSite> </mds:crawler> <mds:repository> <mds:maxHistory>100</mds:maxHistory> <mds:index> <mds:indexedPropertiesNamespaces> <mds:namespace prefix="atom" url="http://www.w3.org/2005/Atom" /> </mds:indexedPropertiesNamespaces> <mds:indexedProperties> <mds:property name="title"> <mds:xpath>//atom:entry/atom:title</mds:xpath> <mds:boost>1</mds:boost> </mds:property> <mds:property name="summary"> <mds:xpath>//atom:entry/atom:summary</mds:xpath> <mds:boost>1</mds:boost> </mds:property> <mds:property name="category"> <mds:xpath>//atom:entry/atom:category/@term</mds:xpath> <mds:boost>1</mds:boost> </mds:property> </mds:indexedProperties> </mds:index> </mds:repository> <mds:server> <mds:httpCache-enabled max-age="10"/> <mds:atomPub> <mds:workspace>MadStore</mds:workspace> </mds:atomPub> <mds:openSearch> <mds:shortName>MadStore Search</mds:shortName> <mds:description>My MadStore Search</mds:description> </mds:openSearch> </mds:server> <mds:tasks> <mds:task name="crawlerTask"> <mds:simpleTrigger> <mds:startDelay>0</mds:startDelay> <mds:repeatInterval>30</mds:repeatInterval> </mds:simpleTrigger> </mds:task> <mds:task name="cleanRepositoryHistoryTask"> <mds:simpleTrigger> <mds:startDelay>60</mds:startDelay> <mds:repeatInterval>60</mds:repeatInterval> </mds:simpleTrigger> </mds:task> </mds:tasks> </mds:config> </beans>
Let's take a look in more details at the sample configuration.
<mds:crawler> <mds:grid-enabled localAddress="192.168.1.1"/> <mds:targetSite> <mds:hostName>http://127.0.0.1:8080</mds:hostName> <mds:startLink>/index.html</mds:startLink> <mds:maxConcurrentDownloads>3</mds:maxConcurrentDownloads> <mds:maxVisitedLinks>100</mds:maxVisitedLinks> </mds:targetSite> </mds:crawler>
Here, we are telling the MadStore crawler to crawl the http://127.0.0.1:8080
host, starting from the index.html
page, with a maximum number of 3 requests per second, and a maximum number of 100 visited links (pages).
We're using the distributed grid crawler, bound to the 192.168.1.1 local address.
<mds:repository> <mds:maxHistory>100</mds:maxHistory> <mds:index> <mds:indexedPropertiesNamespaces> <mds:namespace prefix="atom" url="http://www.w3.org/2005/Atom" /> </mds:indexedPropertiesNamespaces> <mds:indexedProperties> <mds:property name="title"> <mds:xpath>//atom:entry/atom:title</mds:xpath> <mds:boost>1</mds:boost> </mds:property> <mds:property name="summary"> <mds:xpath>//atom:entry/atom:summary</mds:xpath> <mds:boost>1</mds:boost> </mds:property> <mds:property name="category"> <mds:xpath>//atom:entry/atom:category/@term</mds:xpath> <mds:boost>1</mds:boost> </mds:property> </mds:indexedProperties> </mds:index> </mds:repository>
The snippet above refers instead to the MadStore repository index. Please note, in particular, the indexed properties configuration: here, we are indexing each Atom entry title, summary and category term, all referred through simple xpath expressions.
<mds:server> <mds:httpCache-enabled max-age="10"/> <mds:os> <mds:shortName>MadStore Search</mds:shortName> <mds:description>My MadStore Search</mds:description> </mds:os> <mds:app> <mds:workspace>MadStore</mds:workspace> </mds:app> </mds:server>
The snippet above is dead simple: just a bunch of configuration strings for the Atom Publishing Protocol service document and Open Search description document.
The most important thing there is the (optional) httpCache-enabled
element, enabling explicit client-side HTTP caching (see here
).
<mds:tasks> <mds:task name="crawlerTask"> <mds:simpleTrigger> <mds:startDelay>0</mds:startDelay> <mds:repeatInterval>30</mds:repeatInterval> </mds:simpleTrigger> </mds:task> <mds:task name="cleanRepositoryHistoryTask"> <mds:simpleTrigger> <mds:startDelay>60</mds:startDelay> <mds:repeatInterval>60</mds:repeatInterval> </mds:simpleTrigger> </mds:task> </mds:tasks>
Finally, the snippet above configures the triggering of two MadStore tasks: the crawlerTask , immediately starting and repeating every 60 minutes, and the cleanHistoryTask , starting after 60 minutes and repeating every 60 minutes as well.
That's all!