Regular Expressions

FAR uses Java regular expressions. The following section only repeats the most important key points. You may want to read the javadoc of class java.util.regex.Pattern for further details.

Basics

The following characters have special meaning and must be escaped using the backslash (\) character:

"." (dot), "*" (asterisk), "?" (question mark), "+" (plus), "{" and "}", "(" and ")", "^", "$" and "-" (hyphen).

Non-alphabetic characters may always be escaped, even if they do not have special meaning, e.g. "\#".

Alphabetic characters may not be escaped. However, some alphabetic characters gain special meaning when preceded by a backslash:

\t The tab character ('\u0009')
\n The newline (line feed) character ('\u000A')
\r The carriage-return character ('\u000D')
\d A digit: [0-9]
\D A non-digit: [^0-9]
\s A whitespace character: [ \t\n\x0B\f\r]
\S A non-whitespace character: [^\s]
\w A word character: [a-zA-Z_0-9]
\W A non-word character: [^\w]
\b A word boundary

The dot (.) denotes any character, "x|y" means x or y. For any expression x the appearance of

x means "x" (exactly once)
x* means "x" zero or more times
x? means "x" once or not at all
x+ means "x" at least one time
x{n} x, exactly n times
x{n,} x, at least n times
x{n,m} x, at least n but not more than m times

Thus ".*" means any character, any number of times. This expression matches any text, even the empty String.

"^" denotes the beginning of a line, "$" denotes the end of a line. Patterns may span multiple lines, and line break patterns will automatically be included (this is a special FAR feature).

Character Classes

Generic Character Classes

[abc]	a, b, or c (simple class)
[^abc]	Any character except a, b, or c (negation)
[a-zA-Z]	a through z or A through Z, inclusive (range)
[a-d[m-p]]	a through d, or m through p: [a-dm-p] (union)
[a-z&&[def]]	d, e, or f (intersection)
[a-z&&[^bc]]	a through z, except for b and c: [ad-z] (subtraction)
[a-z&&[^m-p]]	a through z, and not m through p: [a-lq-z](subtraction)

POSIX Character Classes

\p{Lower}	A lower-case ASCII character: [a-z]
\p{Upper}	An upper-case ASCII character: [A-Z]
\p{ASCII}	All ASCII:[\x00-\x7F]
\p{Alpha}	An alphabetic ASCII character: [\p{Lower}\p{Upper}]
\p{Digit}	A decimal digit: [0-9]
\p{Alnum}	An alphanumeric ASCII character: [\p{Alpha}\p{Digit}]
\p{Punct}	Punctuation: One of !"#$%&'()*+,-./:;<=>?@[\]^_`{\|}~
\p{Graph}	A visible character: [\p{Alnum}\p{Punct}]
\p{Print}	A printable character: [\p{Graph}\x20]
\p{Blank}	A space or a tab: [ \t]
\p{Cntrl}	A control character: [\x00-\x1F\x7F]
\p{XDigit}	A hexadecimal digit: [0-9a-fA-F]
\p{Space}	A whitespace character: [ \t\n\x0B\f\r]

Capturing Groups

In order to perform a find and replace operation, you need to use "capturing groups". Each pair of ordinary brackets () in a regex pattern defines a capturing group. The following regular expression for example

my (\w*) ((\w*) is (\w*))

defines four capturing groups, numbered from 1 to 4. The zero capturing group always denotes the entire expression. Applied to the sentence

One of my favourite fruits is banana

the expression will yield the following groups:

0: "my favourite fruits is banana"
1: "favourite"
2: "fruits is banana"
3: "fruits"
4: "banana"

A replacement string may refer to these capturing groups using a backslash (\) followed by the group number (the Java API uses the dollar sign ($) as "group reference indicator", you can change the indicator character in the configuration settings). To stay with our example, we could use the replacement string

her \1 drinks is \4-shake

which will transform the sentence as follows:

One of her favourite drinks is banana-shake

Replacement strings may also span multiple lines. Just type your replacement text into the text area as you would like it to be, including line breaks and capturing groups. "\n" will not be recognised as a line break in a replacement pattern. Just type naturally!

\t	The tab character ('\u0009')
\n	The newline (line feed) character ('\u000A')
\r	The carriage-return character ('\u000D')
\d	A digit: [0-9]
\D	A non-digit: [^0-9]
\s	A whitespace character: [ \t\n\x0B\f\r]
\S	A non-whitespace character: [^\s]
\w	A word character: [a-zA-Z_0-9]
\W	A non-word character: [^\w]
\b	A word boundary

x	means "x" (exactly once)
x*	means "x" zero or more times
x?	means "x" once or not at all
x+	means "x" at least one time
x{n}	x, exactly n times
x{n,}	x, at least n times
x{n,m}	x, at least n but not more than m times