MadStore configuration is based on a simple XML file.
Here is a detailed description of its structure, defined with an x-path like notation:
There can be more than one namespace element definitions.
There can be one or more task
elements, each one for a different task to execute.
Right now, MadStore provides the following two tasks:
Here is a simple example:
<beans
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns="http://www.springframework.org/schema/beans"
xmlns:mds="http://www.pronetics.com/schema/madstore"
xsi:schemaLocation="
http://www.springframework.org/schema/beans
http://www.springframework.org/schema/beans/spring-beans-2.5.xsd
http://www.pronetics.com/schema/madstore
http://www.pronetics.com/schema/madstore/madstore.xsd">
<mds:config>
<mds:crawler>
<mds:grid-enabled localAddress="192.168.1.1"/>
<mds:targetSite>
<mds:hostName>http://127.0.0.1:8080</mds:hostName>
<mds:startLink>/index.html</mds:startLink>
<mds:maxConcurrentDownloads>3</mds:maxConcurrentDownloads>
<mds:maxVisitedLinks>100</mds:maxVisitedLinks>
</mds:targetSite>
</mds:crawler>
<mds:repository>
<mds:maxHistory>100</mds:maxHistory>
<mds:index>
<mds:indexedPropertiesNamespaces>
<mds:namespace prefix="atom" url="http://www.w3.org/2005/Atom" />
</mds:indexedPropertiesNamespaces>
<mds:indexedProperties>
<mds:property name="title">
<mds:xpath>//atom:entry/atom:title</mds:xpath>
<mds:boost>1</mds:boost>
</mds:property>
<mds:property name="summary">
<mds:xpath>//atom:entry/atom:summary</mds:xpath>
<mds:boost>1</mds:boost>
</mds:property>
<mds:property name="category">
<mds:xpath>//atom:entry/atom:category/@term</mds:xpath>
<mds:boost>1</mds:boost>
</mds:property>
</mds:indexedProperties>
</mds:index>
</mds:repository>
<mds:server>
<mds:httpCache-enabled max-age="10"/>
<mds:atomPub>
<mds:workspace>MadStore</mds:workspace>
</mds:atomPub>
<mds:openSearch>
<mds:shortName>MadStore Search</mds:shortName>
<mds:description>My MadStore Search</mds:description>
</mds:openSearch>
</mds:server>
<mds:tasks>
<mds:task name="crawlerTask">
<mds:simpleTrigger>
<mds:startDelay>0</mds:startDelay>
<mds:repeatInterval>30</mds:repeatInterval>
</mds:simpleTrigger>
</mds:task>
<mds:task name="cleanRepositoryHistoryTask">
<mds:simpleTrigger>
<mds:startDelay>60</mds:startDelay>
<mds:repeatInterval>60</mds:repeatInterval>
</mds:simpleTrigger>
</mds:task>
</mds:tasks>
</mds:config>
</beans>
Let's take a look in more details at the sample configuration.
<mds:crawler>
<mds:grid-enabled localAddress="192.168.1.1"/>
<mds:targetSite>
<mds:hostName>http://127.0.0.1:8080</mds:hostName>
<mds:startLink>/index.html</mds:startLink>
<mds:maxConcurrentDownloads>3</mds:maxConcurrentDownloads>
<mds:maxVisitedLinks>100</mds:maxVisitedLinks>
</mds:targetSite>
</mds:crawler>
Here, we are telling the MadStore crawler to crawl the http://127.0.0.1:8080
host, starting from the index.html
page, with a maximum number of 3 requests per second, and a maximum number of 100 visited links (pages).
We're using the distributed grid crawler, bound to the 192.168.1.1 local address.
<mds:repository>
<mds:maxHistory>100</mds:maxHistory>
<mds:index>
<mds:indexedPropertiesNamespaces>
<mds:namespace prefix="atom" url="http://www.w3.org/2005/Atom" />
</mds:indexedPropertiesNamespaces>
<mds:indexedProperties>
<mds:property name="title">
<mds:xpath>//atom:entry/atom:title</mds:xpath>
<mds:boost>1</mds:boost>
</mds:property>
<mds:property name="summary">
<mds:xpath>//atom:entry/atom:summary</mds:xpath>
<mds:boost>1</mds:boost>
</mds:property>
<mds:property name="category">
<mds:xpath>//atom:entry/atom:category/@term</mds:xpath>
<mds:boost>1</mds:boost>
</mds:property>
</mds:indexedProperties>
</mds:index>
</mds:repository>
The snippet above refers instead to the MadStore repository index. Please note, in particular, the indexed properties configuration: here, we are indexing each Atom entry title, summary and category term, all referred through simple xpath expressions.
<mds:server>
<mds:httpCache-enabled max-age="10"/>
<mds:os>
<mds:shortName>MadStore Search</mds:shortName>
<mds:description>My MadStore Search</mds:description>
</mds:os>
<mds:app>
<mds:workspace>MadStore</mds:workspace>
</mds:app>
</mds:server>
The snippet above is dead simple: just a bunch of configuration strings for the Atom Publishing Protocol service document and Open Search description document.
The most important thing there is the (optional) httpCache-enabled
element, enabling explicit client-side HTTP caching (see here
).
<mds:tasks>
<mds:task name="crawlerTask">
<mds:simpleTrigger>
<mds:startDelay>0</mds:startDelay>
<mds:repeatInterval>30</mds:repeatInterval>
</mds:simpleTrigger>
</mds:task>
<mds:task name="cleanRepositoryHistoryTask">
<mds:simpleTrigger>
<mds:startDelay>60</mds:startDelay>
<mds:repeatInterval>60</mds:repeatInterval>
</mds:simpleTrigger>
</mds:task>
</mds:tasks>
Finally, the snippet above configures the triggering of two MadStore tasks: the crawlerTask , immediately starting and repeating every 60 minutes, and the cleanHistoryTask , starting after 60 minutes and repeating every 60 minutes as well.
That's all!