





It is a common task for computer users to visit the same resource multiple times during a short time span. Some example of resources a user might check often are
· news webpage
· server status
· job board
· auction or shopping site
· network file
· message board
Moreover, it is often critical to discover updates as soon as they occur. RSS feeds can notify users of updates on sites that support RSS, and such updates can even be emailed using third-party websites, but RSS has limitations. Many resources do not support RSS, and RSS reader applications frequently do not support alternative notification protocols, such as SMS. Additionally, RSS fails when the desired resource requires navigation through an authentication page. RSS also has another obstacle: configuration inflexibility. A full solution to the problem would allow users to specify which feeds were most important, and should be checked most frequently, and which feeds could be checked less often. Similarly, different notification protocols should be assignable to different feeds or classes of feeds. While an RSS reader application that provides all this functionality may become available in the future, the primary obstacle remains – RSS service cannot be used for all resources. The service only works with the HTTP protocol, and must be included by the resource’s provider.
There are software packages that allow users to monitor resources that do not provide RSS feeds, but these also have limitations.
· Powerful packages are not free.
· Most are platform-specific
· Limited numbers of supported resource and notification protocols
· Limited or no functional extensibility
· Most advanced tools are GUI based with unnecessary clutter and complexity
· External monitoring services cannot monitor local files without insecure exposure. Also, such services require users to store their credentials in plain text to check for the updates of the secured resources.
The insufficiency of existing solutions requires users who need to discover updates to a variety of resources to perform many of the required tasks manually.
The most feature-rich solution on the market is WebSite-Watcher (http://www.aignes.com), a retail, Windows-only, GUI-based application. It has the following features
· Monitor web pages
· Monitor password protected pages
· Monitor forums
· Monitor RSS-Feeds
· Monitor Newsgroups
· Monitor binary files
· Monitor local files
· Powerful yet simple filter system
· Highlight changes
· Monitor pages for specified words
· Monitor whole sites instead of single pages
· Additional actions when updates are detected
· Work with checked pages (Searches, Reports, etc.)
· Archive pages permanently
· Synchronize bookmark files
· Backup and Restore
Limitations that we found in the implementation are:
· When checking for updates, server synchronization issues can generate false positives. When a new version of a resource is published to a single server, and the other servers that offer the resource are only later synchronized with the updated version, the application interprets the previously-current version as a new update. The result is up to 50 update notifications for a single update.
· Creating a simple monitoring task is unnecessarily complex and time-consuming. Even the simplest tasks take as much time to create as the most complex.
· No extensibility to support new resource and notification protocols, or content extraction approaches.
· Minimum monitoring interval is 1 minute
An example of a simple scenario that monitors craigslist for new job postings is shown below.
Internally, WebSite Watcher has a scheduler thread that wakes up every minute to check on all enabled tasks. For each task, if the current system less the time that the task was run last is equal or greater than the monitoring interval specified in the settings, it adds the task to a queue of waiting tasks. It then spawns a new thread for each task than needs to run. In each thread, WebSite Watcher requests the resource and receives the response. It writes the response to a local file to be used later. If this is the first time this task has run, processing is complete for this task. If the user has specified to ignore updates that contain keywords and the resource response does contain such keywords, the update is ignored. Similarly, if the user has specified to restrict valid updates based on keywords and the response does not contain the specified keywords, the update is ignored.
WebSite Watcher also applies content filtering as specified by the user. It will apply such filtering on the previously saved version of the resource and the update content and then compare the results. If they do not match, it will highlight changes in the new content and notify the user as specified in the task settings.
Our solution is a domain-specific language: MUNDane (Monitor for Updates, Deliver Notifications - relieves you of those mundane tasks). The domain of MUNDane is retrieval of updates for local or network resources. The first version is a framework which supports retrieval through HTTP
· Run on multiple platforms
· Depend only on freely available tools/libraries/languages
· Monitor multiple resources with different refresh periods
· Support email notification
· Provide error logging
· Support content extraction plugins
· Support notification plugins
· Support resource retrieval plugins
· Support monitoring plugins
· Catch cyclical false positives.
MUNDane has the following major objects
· Configuration Blocks – optional structures, which contain variable definitions. These blocks can be passed as arguments to actions, as an alternative to literal values.
· Variables – defined in configuration blocks, these contain data used by an action. Variable names must match parameter names from the intended action, but can appear in a configuration block in any order.
· Task Blocks – contain actions, which collectively define a task. Task blocks may also contain variable redefinitions by referencing a configuration block and variable by their names. Redefinitions can use the '+' operator for concatenation of values.
· Actions – are defined by plugins, which are of four types: navigation, processing, monitoring, and notification. Actions take arguments: literal values or configuration blocks.
· Literals (numeric, strings)
· Regular expressions
While a MUNDane program may be written without any configuration blocks, using them allows data to be reused without retyping it. Hiding this data in a task block reduces task block complexity and aids in readability. Variable redefinition in a task block extends this functionality. If data from a configuration block is used frequently, and must be altered in a minor fashion for a given task, it may be done so without requiring an entire, additional configuration block. Value concatenation in redefinition allows a user to use templates, and extend them on a per-task basis.
Internally, actions are implemented as plugins, which are, in fact, dynamically resolved python functions; complex logic is abstracted away from the user program. This structure allows the language to be easily extended by adding new plugins.
Examples:
PROGRAM 1
Here is a MUNDane program to monitor craigslist for software internship postings. First a configuration block is defined to contain the data necessary for notification of the results of this monitoring task. Variables are defined in this block for the data, which will be used by the notify() plugin. Another block is defined for the url data, which might in another task contain a user ID and password. For this task only a url is necessary.
Finally a task is defined and begun using a get plugin. The httpGet() plugin retrieves text from a web address using the data in its argument block, which is passed to the next plugin called. removeHTMLtags() strips away HTML tags, leaving text for processing by the next plugin. keepAfter() passes on the text which comes after the first occurrence of its argument string (optionally using regular expressions), and removeFrom() strips from the text its argument string and everything after it.
The above plugin calls completely define what is to be done in the task, and their result is passed to the monitor() plugin, which defines how often this task is to be performed. The emailNotify() plugin comes last and performs the needed notification of results.
[Config:StdEmail]
to = "afclay@gmail.com"
server = "smtp.gmail.com"
port = 587
userid = "monitordemo@gmail.com"
password = ********
encryption = "tls"
[Config:CraigURL]
url = "http://sfbay.craigslist.org/search/egr?query=internship&catAbbreviation=sof"
[Task:CraigslistInternship]
httpGet(CraigURL)
removeHTMLtags()
keepAfter("Found: [0-9]* Displaying: [0-9]*[ \-0-9]*\s*")
removeFrom("Sort by: most recent best match")
monitor(1)
emailNotify(StdEmail)
Whenever the text result of this task is different from the previous result of the task, the user will be notified.
Note the asterisks used as a value for the password variable in the StdEmail configuration block. The interpreter will respond to this special value by prompting the user for the actual password. That password will be encrypted with the DES algorithm, and the program file will be updated with this encrypted password – see Part 4:Implementation for more details.
PROGRAM 2
This program similarly uses a configuration Block for notification, but the 'to' value is incomplete. It can serve as a template, and be extended with concatenation as needed for each task. The task here is a simple execution of the unix time command, in which the 'to' value from the configuration block is redefined.
[Config:EmailTemplate]
to = "@gmail.com"
server = "smtp.gmail.com"
port = 587
userid = "monitordemo@gmail.com"
password = ********
encryption = "tls"
[Task:Time]
execCmd("time")
monitor(1)
EmailTemplate.to = "monitordemo" + EmailTemplate.to
emailNotify(EmailTemplate)
PROGRAM 3
This program gets a locally stored file. It retains as a result every line in the file containing "error", and notifies the user of new errors.
[Config:StdEmail]
to = "monitordemo@gmail.com"
server = "smtp.gmail.com"
port = 587
userid = "monitordemo@gmail.com"
password = ********
encryption = "tls"
[Task:LogErrors]
getFile("firewall.log")
keepEachLineWith(“error”)
monitor(1)
emailNotify(StdEmail)
It is a common task for computer users to visit the same resource multiple times during a short time span. Some example of resources a user might check often are
· news webpage
· server status
· job board
· auction or shopping site
· network file
· message board
Moreover, it is often critical to discover updates as soon as they occur. RSS feeds can notify users of updates on sites that support RSS, and such updates can even be emailed using third-party websites, but RSS has limitations. Many resources do not support RSS, and RSS reader applications frequently do not support alternative notification protocols, such as SMS. Additionally, RSS fails when the desired resource requires navigation through an authentication page. RSS also has another obstacle: configuration inflexibility. A full solution to the problem would allow users to specify which feeds were most important, and should be checked most frequently, and which feeds could be checked less often. Similarly, different notification protocols should be assignable to different feeds or classes of feeds. While an RSS reader application that provides all this functionality may become available in the future, the primary obstacle remains – RSS service cannot be used for all resources. The service only works with the HTTP protocol, and must be included by the resource’s provider.
There are software packages that allow users to monitor resources that do not provide RSS feeds, but these also have limitations.
· Powerful packages are not free.
· Most are platform-specific
· Limited numbers of supported resource and notification protocols
· Limited or no functional extensibility
· Most advanced tools are GUI based with unnecessary clutter and complexity
· External monitoring services cannot monitor local files without insecure exposure. Also, such services require users to store their credentials in plain text to check for the updates of the secured resources.
The insufficiency of existing solutions requires users who need to discover updates to a variety of resources to perform many of the required tasks manually.
The most feature-rich solution on the market is WebSite-Watcher (http://www.aignes.com), a retail, Windows-only, GUI-based application. It has the following features
· Monitor web pages
· Monitor password protected pages
· Monitor forums
· Monitor RSS-Feeds
· Monitor Newsgroups
· Monitor binary files
· Monitor local files
· Powerful yet simple filter system
· Highlight changes
· Monitor pages for specified words
· Monitor whole sites instead of single pages
· Additional actions when updates are detected
· Work with checked pages (Searches, Reports, etc.)
· Archive pages permanently
· Synchronize bookmark files
· Backup and Restore
Limitations that we found in the implementation are:
· When checking for updates, server synchronization issues can generate false positives. When a new version of a resource is published to a single server, and the other servers that offer the resource are only later synchronized with the updated version, the application interprets the previously-current version as a new update. The result is up to 50 update notifications for a single update.
· Creating a simple monitoring task is unnecessarily complex and time-consuming. Even the simplest tasks take as much time to create as the most complex.
· No extensibility to support new resource and notification protocols, or content extraction approaches.
· Minimum monitoring interval is 1 minute
An example of a simple scenario that monitors craigslist for new job postings is shown below.
Internally, WebSite Watcher has a scheduler thread that wakes up every minute to check on all enabled tasks. For each task, if the current system less the time that the task was run last is equal or greater than the monitoring interval specified in the settings, it adds the task to a queue of waiting tasks. It then spawns a new thread for each task than needs to run. In each thread, WebSite Watcher requests the resource and receives the response. It writes the response to a local file to be used later. If this is the first time this task has run, processing is complete for this task. If the user has specified to ignore updates that contain keywords and the resource response does contain such keywords, the update is ignored. Similarly, if the user has specified to restrict valid updates based on keywords and the response does not contain the specified keywords, the update is ignored.
WebSite Watcher also applies content filtering as specified by the user. It will apply such filtering on the previously saved version of the resource and the update content and then compare the results. If they do not match, it will highlight changes in the new content and notify the user as specified in the task settings.
The domain of our language is retrieval of updates for local or Internet resources. The first version will support HTTP
· Run on multiple platforms
· Depend only on the freely available tools/libraries/languages
· Monitor 1 or more resources with different refresh periods
· Support email notification
· Provide error logging and error email reporting
· Support regex extraction of content
· Support content extraction plugins
· Support notification plugins
· Support resource retrieval plugins
· Catch cyclical false positives.
Our language is loosely object-oriented, and has the following major objects
· Tasks
· Actions – such as get, monitor, and notify in the example below
· Configuration blocks – data for a specific method, such as email settings (address, subject, etc.) for a notify method
· Variables
· Literals (numeric, strings)
· Regular expressions
Tasks are created by chaining actions. Actions take configuration blocks, variables, and constants as possible arguments. Internally, actions are implemented as dynamically resolved python functions, abstracting away complex logic. New plugins are just additional actions, which are, at base, python functions.
The demo will monitor craigslist job postings and email notification upon detecting the changes.
Additional features possible for future resources:
Sample program to monitor 2 resources:
In the craigslist task below, job posting listings are contained between two Found: blocks; to extract the job listings, extract and remove actions are used.
[EmailSettings1]
to = cs164@cs164.com
get(“http://sfbay.craigslist.org/search/jjj?query=cool+jobs&catAbbreviation=jjj”).removeTags().extract(“.*Found: [0-9]* Displaying: [0-9]*”).remove(“.*Found: [0-9]* Displaying: [0-9]*”).monitor(5).notify(EmailSettings1)
get(\logs\firewall.log).extractAllLinesWith(“error”).monitor(0.1).notify(EmailSettings1)
Two possible approaches for implementation:
With the above approaches the following aspects will be implemented as follows:
· Frontend
With the first approach, a parser will be used to construct an ast that will be passed to the interpreter. With the second approach, eval loop will interpret each action and keep constructing the two lambda’s (monitor, chain call).
· The core language
Python will be our core language where scheduling thread and utility functions will be created. The actions will be implemented as separate python files with the semantics that the python file and function inside the file should have the same name as the action in the task. For instance, get(\localhost\logs.txt) call would expect a python file with name “get.py” to exist in the current directory and that such file has function get with one parameter. In addition, plugins will have to implement the following functions to help with syntax and runtime error checking: boolean isNotifier(), boolean isNavigator(),boolean isContentProcessor(),boolean isMonitor(),… .
· Internal representation
The first approach will generate AST as internal representation of a program. The second approach will construct a list of tuples with lambdas for monitor and chain action calls. In both approaches, variables and configuration blocks will be stored in the environment.
· Interpreter/Compiler
In both approaches program will not be compiled but will be interpreted using driver written in Python.
· Debugging
The interpreter and parser will be providing error details in the case of an issue in the program. Syntax errors will be detected before the program starts monitoring. Plugin errors will be detected upon test task execution which will invalidate such task from further execution, report once in the email and create error log but will not stop program from executing. Runtime errors in data extraction or resource retrieval will be logged in the error log but will neither invalidate task nor terminate the program.