You can use Adelia Studio's Find/ Replace functions to search for regular expressions.
Regular expressions can also be used to define rules of a Quality manager user plugin.
Adelia Studio does this using a library (the PCRE library) compatible with PERL regular expressions.
A regular expression is a mask applied to a subject string, running from left to right. Most characters represent themselves.
You can use the Adelia Studio editor's regular expression search feature to select sections of text with the structure specified by the expression, and to an extent, perform processing on this text in order to produce a substitute value.
Adelia Studio uses a library compatible with the PERL regular expression management system. The syntax for the usable expressions derives from this library.
Metacharacters
The power of regular expressions derives from their ability to allow alternatives and repetition quantifiers in the mask. They are coded in the mask using metacharacters, which do not represent themselves, but are interpreted in a particular way.
There are two types of metacharacter:
Metacharacters that are recognized anywhere within a mask except between square brackets,
Metacharacters that are recognized between square brackets (within a character class definition).
The following metacharacters are used outside square brackets:
|
Character |
Meaning |
\ |
Backslash |
Escape character |
^ |
Circumflex accent |
Start of a string |
$ |
Dollar |
End of a string |
. |
Dot |
Any character |
[ |
Opening square bracket |
Start of a character class definition |
] |
Closing square bracket |
End of a character class definition |
| |
Vertical bar |
Definition of an alternative (or) |
( |
Opening parenthesis (round bracket) |
Start of a submask |
) |
Closing parenthesis (round bracket) |
End of a submask |
? |
Question mark |
Equivalent to the quantifier {0,1} (zero or one) |
* |
Asterisk |
Equivalent to the quantifier {0,} (zero or more) |
+ |
Plus sign |
Equivalent to the quantifier {1,} (one or more) |
{ |
Opening brace |
Start of a quantifier |
} |
Closing brace |
End of a quantifier |
Mask sections enclosed between square brackets are known as "character classes".
Only the following metacharacters can be used in character classes:
|
Character |
Meaning |
\ |
Backslash |
Escape character |
^ |
Circumflex accent |
Class negation (must be the first character) |
- |
Minus sign |
Used to express character ranges |
] |
Closing square bracket |
End of a character class definition |
Using metacharacters
Backslash \
This section only covers the most common uses of the "\" character.
If it is followed by a non-alphanumerical character, it acts as an escape character for the following character. This enables users to specify a metacharacter without assigning it any special meaning.
For example, to search for a string containing the "*" character, you must specify "\*", failing which, "*" will be interpreted as a quantifier.
The backslash character can also be used to specify generic value types:
Expression |
Meaning |
\d |
Any decimal character |
\D |
Any character other than a decimal character |
\s |
Any blank character |
\S |
Any character other than a blank character |
\w |
Any "word" character (i.e. letter, digit or underscore) |
\W |
Any character other than a "word" characters |
These character sequences can be used either inside or outside character classes. They act as substitutes for a character of the relevant type.
Circumflex accent ^ and Dollar $
When used outside of a character class, the "^" character places a constraint on the search expression, which must begin at the start of the line. This character must be specified first in the regular expression, or in its alternatives on the first level.
Used inside a character class, "^" expresses a negation of the class (i.e. any character that does not belong to the class).
The "$" character can be used to place a constraint on the search expression, which must finish at the end of the line. This character must be specified last in the regular expression, or in its alternatives on the first level.
Dot .
A dot can be used outside a character class to replace any single character.
The dot does not behave in a special way when used in a character class.
Square brackets [ ]
An opening square bracket "[" introduces a character class and a closing square bracket "]" ends it. If a closing bracket is required inside a character class, it must be preceded with the escape character "\".
A character class replaces a single character in the subject chain, unless the first character in the class is the negation character "^", in which case the character must not belong to the class. If a "^" is required in the class, it must be preceded with the escape character "\".
Vertical bar |
The vertical bar "|" is used to separate alternatives. It behaves as an "or" logic operator. The various alternatives are evaluated from left to right and the first possible alternative returns the end result.
For example, "num_bin_2|num_bin_4" would detect occurrences of either "num_bin_2" or "num_bin_4".
Submasks
Submasks are enclosed between parentheses, and can be nested.
Submasks can be added in order to:
Mark out alternatives.
For example, the mask "num_bin_(2|4)" accepts the words "num_bin_2" and "num_bin_4".Capture "subexpressions".
Where a string is accepted by the complete mask, any submasks are sent to the caller using a submask vector. Opening parentheses are counted from left to right, starting from 1.
In some situations, in particular when you are working with alternatives, you may not want to capture the submasks. In such cases, you should use (?:submask) instead of (submask).
Repetitions
Repetitions are specified using quantifiers that can be placed after a single character, a character class or a submask.
Quantifiers specify a minimum and maximum number of repetitions, represented by two numbers in braces, separated by a comma. These numbers must be less than 65,536, and the first number must be less than or equal to the second.
If the second number is omitted, but the comma is included, the quantifier is interpreted as indicating no upper limit. If the second number and the comma are missing, the quantifier represents the exact number of expected repetitions.
The following shortcuts can be used for certain special quantifiers:
|
Character |
Meaning |
? |
Question mark |
Equivalent to the quantifier {0,1} (zero or one) |
* |
Asterisk |
Equivalent to the quantifier {0,} (zero or greater) |
+ |
Plus sign |
Equivalent to the quantifier {1,} (one or more) |
By default, the system searches for the longest possible expressions permitted by the quantifiers.
Using metacharacters in the find and replace function
The Replace function lets you reference the values captured in subexpressions, allowing the use of complex processes for the replacement.
Captured subexpressions are referenced by $<number> where number is the subexpression number. The expression with the number zero ($0) is the found string.
To specify a "$" sign in a Replace expression with regular expressions, you must double it up (i.e. "$$").
For example, the expression " '(\d\d\d\d)/(\d\d)/(\d\d)' " combined with the replacement " '$3/$2/$1' " converts a date from the ISO format (YYYY/MM/DD) to the French format (DD/MM/YYYY).
For more information about regular expressions, refer to the complete documentation, available on the following site:
PERL documentation: http://perldoc.perl.org/perlre.html