WebSite-Watcher         S a v e   T i m e ,   S t a y   I n f o r m e d !
Home       Features       Screenshots       Videos       Downloads       Buy Now       Knowledgebase



Search Knowledgebase:

 

Regular Expressions

Regular expressions (regex) can be used in various features, for example to define ignore filters, watch filters, or keywords.

 

With Regular Expressions you can define complex search and filter expressions. All regular expressions are case insensitive by default.

Regex functions

Regular Expressions must be placed into one of the following functions:

 

regex( ... )
Filters the given regular expression
For example: regex(\d+ downloads)
 
FirstRegex( ... )
Filters only the first occurrence of the defined regular expression
For example: FirstRegex(\d+ downloads)
 
StartToRegex( ... )
Filters everything from the page beginning to the first occurrence of the given Regular Expression
For example: StartToRegex(\d+ visitors)
 
RegexToRegex( ... , ... )
Filters everything between two Regular Expressions
For example: RegexToRegex(Downloads\: \d+,License\:)
 
RegexToEnd( ... )
Filters everything from the last occurrence of the given Regular Expression to the end of the page
For example: RegexToEnd(\d+ users online)
 
RegexCmp( ... )
Finds a defined regular expression, extracts all digits from the result and compares them with a pre-defined number. This can for example be used to extract and compare prices. Eg. to only find a match when a certain price is higher than 1000.
For example: RegexCmp(\d+([,\.]\d+)* Euro;,; > 1000)
The regexcmp function can be used in the Keywords functionality, Ignore filters and Watch filters. A detailed description can be found below.

Tokens of Regular Expressions

Below you can find a list of useful Regular Expression tokens.

 

\

The backslash escapes any character and can therefore be used to force characters to be matched as literals instead of being treated as characters with special meaning. For example, '\[' matches '[' and '\\' matches '\'.

.

A dot matches any character. For example, 'go.d' matches 'gold' and 'good'.

{ }

{n} ... Match exactly n times

{n,} ... Match at least n times

{n,m} ... Match at least n but not more than m times

[ ]

A string enclosed in square brackets matches any character in that string, but no others. For example, '[xyz]' matches only 'x', 'y', or 'z', a range of characters may be specified by two characters separated by '-'. Note that '[a-z]' matches alphabetic characters, while '[z-a]' never matches.

[-]

A hyphen within the brackets signifies a range of characters. For example, [b-o] matches any character from b through o.

|

A vertical bar matches either expression on either side of the vertical bar. For example, bar|car will match either bar or car.

*

An asterisk after a string matches any number of occurrences of that string, including zero characters. For example, bo* matches: bo, boo and booo but not b.

+

A plus sign after a string matches any number of occurrences of that string, except zero characters. For example, bo+ matches: boo, and booo, but not bo or be.

\d+

matches all numbers with one or more digits

\d*

matches all numbers with zero or more digits

\w+

matches all words with one or more characters containing a-z, A-Z and 0-9. \w+ will find title, border, width etc. Please note that \w matches only numbers and characters (a-z, A-Z, 0-9) lower than ordinal value 128.

\s

matches a whitespace (space, tab and carriage return/line feed)

.*?

find as few characters as possible.

a.*?b means: "find "a", followed by as few characters as possible, followed by "b

[a-zA-Z\xA1-\xFF]+

matches all words with one or more characters containing a-z, A-Z and characters larger than ordinal value 161 (eg. ä or Ü). If you want to find words with numbers, then add 0-9 to the expression: [0-9a-zA-Z\xA1-\xFF]+

RegexCmp(...)

The RegexCmp function finds a defined regular expression, extracts all digits from the result and compares them with a pre-defined number. If the comparison returns true, the match will be accepted.

This function requires 3 parameters (divided by the ; character), the exact syntax is:

 

   regexcmp(regular expression; decimal point character; operator number)

 

Parameters:
 

regular expression
This regular expression extracts defined numbers from a page. The result can contain characters and numbers, for example a regular expression that finds "Price: 49,00 Euro". The regexcmp function will then extract all digits from the found result and compare the extracted number.
 
decimal point character
Defines if a dot or a coma is used as decimal point character in the page. Valid parameter characters are "." and "," (without quotes).
 
operator
valid operators:
= ... equal
< ... less than
<= ... less or equal than
> ... greater than
>= ... greater or equal than
<> ... not equal
 
number
The number defines the number for the comparison and can optionally contain a decimal point character, for example 49,95 or 49.95. Thousands separators are not allowed.

 

Example:

  regexcmp(\d+([,\.]\d+)* Euro;,; > 49.95)

 

The first parameter searches the regular expression "\d+([,\.]\d+)* Euro" and extracts all digits from the found result (incl. decimal point character). For example 1449,95
The second parameter defines which character is used as decimal point character, in that example it's the character ","
The third parameter compares if the price is higher than 49.95
If the extracted price is lower or equal than 49.95, then the found match is omitted. If the extracted price is higher than 49.95, then the found match is accepted.

Typical examples

 

regex(bo*)
will find "b", "bo", "boo", "booooo"
 
regex(bx+)
will find "bxxxxxxxx", "bxx", "bx" but not "b"
 
regex(\d+)
will find all numbers
 
regex(\d+ visitors)
will find "3 visitors" or "243234 visitors" or "2763816 visitors"
 
regex(\d+ of \d+ messages)
will find "2 of 1200 messages" or "1 of 10 messages"
 
RegexToEnd(\d+ of \d+ messages)
will filter everything from the last occurrence of "2 of 1200 messages" or "1 of 10 messages" to the end of the page
 
regex(MyText.{0,20})
will find "MyText" and the next 20 characters after "MyText"
 
regex(\d\d.\d\d.\d\d\d\d)
will find date-strings with format 99.99.9999 or 99-99-9999 (the dot in the regex matches any character)
 
regex(\d\d\.\d\d\.\d\d\d\d)
will find date-strings with format 99.99.9999
 
regex(([_a-zA-Z\d\-\.]+@[_a-zA-Z\d\-]+(\.[_a-zA-Z\d\-]+)+))
will find all e-mail addresses
 
regexcmp(\d+([,\.]\d+)* Euro;,; > 49.95)
will find all prices with format "9.999,99 Euro" and only accept results with prices higher than 49,95 Euro





Privacy Policy - Terms and Conditions - Contact - Impressum/Imprint

Copyright © 2000-2018 Aignesberger Software GmbH
www.aignes.com