Regular Expressions

You can use Adelia Studio's Find/Replace functions to search for regular expressions.

Regular expressions can also be used to define rules of a Quality manager user plugin.

Adelia Studio does this using a library (the PCRE library) compatible with PERL regular expressions.

A regular expression is a mask applied to a subject string, running from left to right. Most characters represent themselves.

You can use the Adelia Studio editor's regular expression search feature to select sections of text with the structure specified by the expression, and to an extent, perform processing on this text in order to produce a substitute value.

Adelia Studio uses a library compatible with the PERL regular expression management system. The syntax for the usable expressions derives from this library.

Metacharacters

The power of regular expressions derives from their ability to allow alternatives and repetition quantifiers in the mask. They are coded in the mask using metacharacters, which do not represent themselves, but are interpreted in a particular way.

There are two types of metacharacter:

Metacharacters that are recognized anywhere within a mask except between square brackets,
Metacharacters that are recognized between square brackets (within a character class definition).

The following metacharacters are used outside square brackets:

	Character	Meaning
\	Backslash	Escape character
^	Circumflex accent	Start of a string
$	Dollar	End of a string
.	Dot	Any character
[	Opening square bracket	Start of a character class definition
]	Closing square bracket	End of a character class definition
\|	Vertical bar	Definition of an alternative (or)
(	Opening parenthesis (round bracket)	Start of a submask
)	Closing parenthesis (round bracket)	End of a submask
?	Question mark	Equivalent to the quantifier {0,1} (zero or one)
*	Asterisk	Equivalent to the quantifier {0,} (zero or more)
+	Plus sign	Equivalent to the quantifier {1,} (one or more)
{	Opening brace	Start of a quantifier
}	Closing brace	End of a quantifier

Mask sections enclosed between square brackets are known as "character classes".

Only the following metacharacters can be used in character classes:

	Character	Meaning
\	Backslash	Escape character
^	Circumflex accent	Class negation (must be the first character)
-	Minus sign	Used to express character ranges
]	Closing square bracket	End of a character class definition

Using metacharacters

Backslash \

This section only covers the most common uses of the "\" character.

If it is followed by a non-alphanumerical character, it acts as an escape character for the following character. This enables users to specify a metacharacter without assigning it any special meaning.

For example, to search for a string containing the "*" character, you must specify "\*", failing which, "*" will be interpreted as a quantifier.

The backslash character can also be used to specify generic value types:

Expression	Meaning
\d	Any decimal character
\D	Any character other than a decimal character
\s	Any blank character
\S	Any character other than a blank character
\w	Any "word" character (i.e. letter, digit or underscore)
\W	Any character other than a "word" characters

These character sequences can be used either inside or outside character classes. They act as substitutes for a character of the relevant type.

Circumflex accent ^ and Dollar $

When used outside of a character class, the "^" character places a constraint on the search expression, which must begin at the start of the line. This character must be specified first in the regular expression, or in its alternatives on the first level.

Used inside a character class, "^" expresses a negation of the class (i.e. any character that does not belong to the class).

The "$" character can be used to place a constraint on the search expression, which must finish at the end of the line. This character must be specified last in the regular expression, or in its alternatives on the first level.

Example 1

The expression "^num_\w+*" detects any occurrences of words beginning with "num_" placed at the start of a line (in Adelia Studio, this expression can be used to detect all declarations of numerical variables).

Example 2

The expression "^num_bin_2|^num_bin_4" can be used to detect all declarations of binary numerical variables.

Example 3

The expression "^\s*\*\s.*$" can be used to detect remark lines and select the whole line (start of line + optional spaces + '*' character + space + any number of any character + end of line).

Dot .

A dot can be used outside a character class to replace any single character.

The dot does not behave in a special way when used in a character class.

Square brackets [ ]

An opening square bracket "[" introduces a character class and a closing square bracket "]" ends it. If a closing bracket is required inside a character class, it must be preceded with the escape character "\".

A character class replaces a single character in the subject chain, unless the first character in the class is the negation character "^", in which case the character must not belong to the class. If a "^" is required in the class, it must be preceded with the escape character "\".

Example 1

The expression "[A-Z]" represents an uppercase character (or lower/uppercase if the "Match case" option is not checked) with no accents.

Example 2

The expression "[^aeiouAEIOU]" can be used to select any non-vowel character.

Vertical bar |

The vertical bar "|" is used to separate alternatives. It behaves as an "or" logic operator. The various alternatives are evaluated from left to right and the first possible alternative returns the end result.

For example, "num_bin_2|num_bin_4" would detect occurrences of either "num_bin_2" or "num_bin_4".

Submasks

Submasks are enclosed between parentheses, and can be nested.

Submasks can be added in order to:

Mark out alternatives.
For example, the mask "num_bin_(2|4)" accepts the words "num_bin_2" and "num_bin_4".
Capture "subexpressions".
Where a string is accepted by the complete mask, any submasks are sent to the caller using a submask vector. Opening parentheses are counted from left to right, starting from 1.

If the string "the sun king" is used with the mask "The ((king|prince) (sun|charming))", the following submasks would be captured:

"sun king", "king", and "sun", numbered 1, 2, and 3, respectively.

This capture can then be used to refer to expressions in the substitute string.

In this example, replacing the located value with "the $3 of the $2" would yield the substitute string "the sun of the king".

In some situations, in particular when you are working with alternatives, you may not want to capture the submasks. In such cases, you should use (?:submask) instead of (submask).

Repetitions

Repetitions are specified using quantifiers that can be placed after a single character, a character class or a submask.

Quantifiers specify a minimum and maximum number of repetitions, represented by two numbers in braces, separated by a comma. These numbers must be less than 65,536, and the first number must be less than or equal to the second.

If the second number is omitted, but the comma is included, the quantifier is interpreted as indicating no upper limit. If the second number and the comma are missing, the quantifier represents the exact number of expected repetitions.

The following shortcuts can be used for certain special quantifiers:

	Character	Meaning
?	Question mark	Equivalent to the quantifier {0,1} (zero or one)
*	Asterisk	Equivalent to the quantifier {0,} (zero or greater)
+	Plus sign	Equivalent to the quantifier {1,} (one or more)

By default, the system searches for the longest possible expressions permitted by the quantifiers.

Example 1

The expression "[aeiou]{3,}" accepts any series of 3 consecutive lowercase vowels.

Example 2

The expression "\d{8}" only accepts exactly 8 digits.

Using metacharacters in the find and replace function

The Replace function lets you reference the values captured in subexpressions, allowing the use of complex processes for the replacement.

Captured subexpressions are referenced by $<number> where number is the subexpression number. The expression with the number zero ($0) is the found string.

To specify a "$" sign in a Replace expression with regular expressions, you must double it up (i.e. "$$").

For example, the expression " '(\d\d\d\d)/(\d\d)/(\d\d)' " combined with the replacement " '$3/$2/$1' " converts a date from the ISO format (YYYY/MM/DD) to the French format (DD/MM/YYYY).

1. Searching for complete lines containing just two separate words

The regular expression "^\s*(\w+)\s+(\w+)\s*$" can be used to find all lines that contain exactly two "words" separated by spaces (ignoring any spaces at the start and end of the line).

Explanation:

Start of line marker "^", followed by an unspecified number of spaces "\s*", followed by one or more word characters "(\w+)", which are stored (expression 1), followed by one or more spaces (\s+), followed by one or more word characters "(\w+)" which are stored (expression 2), followed by an unspecified number of spaces "\s*" and an end of line marker "$".

Using this expression in conjunction with the Replace expression "$2 $1", you can swap the two words on all the lines.

2. Searching for numerical constants

The expression "[+-]?\d+(?:\.\d+)?" can be used to find all the numerical constants in a text.

Explanation: Optional sign "[+-]?" followed by one or more digits, possibly followed "(?:...)?" by a dot "\." and by one or more digits (decimals).

Here, any decimals are expressed as a subexpression that is not stored.

For more information about regular expressions, refer to the complete documentation, available on the following site:

PERL documentation: http://perldoc.perl.org/perlre.html

↑ Top of page