RegEx and Wildcard #
Wildcards #
Wildcards are characters used to represent any
number of characters
or specific character sets within a string. They’re commonly used in search patterns for file names and text.
For instance, in UNIX-like operating systems, the
'*'
and'?'
characters are popular wildcards. Here’s a brief explanation of these two wildcard characters:
Asterisk (*):
Represents zero or more characters, often used for matching multiple files or text patterns.Question mark (?):
Represents exactly one character, typically used when you want to match a pattern with a variable single character.
RegEx #
Regular expressions
are a powerful tool forpattern matching
andsearching within text
. They allow you to define a search pattern and apply it to a string to find, extract, or modify the desired text. RegEx is available in most programming languages, such as Python, JavaScript, Java, and Perl.
Basic RegEx Concept #
Literal characters: Matches the exact character(s) in the string.
Example:
The RegEx pattern"abc"
would match the substring"abc"
in the string"abcdef"
.
Metacharacters: Special characters that have a specific meaning in the context of a regular expression.
Example:
The dot(.)
is a metacharacter that matches any single character except a newline.
Character classes: Enclosed in square brackets
[]
, they define a set of characters to match.Example:
The RegEx pattern"[aeiou]"
would match any single vowel in the string"hello world"
.
RegEx Quantifiers #
Zero or one: A question mark (?)
indicates that the preceding character or group may appear zero or one time.
Example: The RegEx pattern “colou?r” matches both “color” and “colour”.
Zero or more: An asterisk (*)
indicates that the preceding character or group may appear zero or more times.
Example: The RegEx pattern
"ab*c"
matches “ac”, “abc”, “abbc”, and so on.
One or more: A plus sign (+)
indicates that the preceding character or group must appear at least once.
Example: The RegEx pattern
"ab+c"
matches “abc”, “abbc”, but not “ac”.
Specific number: Curly braces {} contain a number to indicate how many times the preceding character or group should appear.
Example: The RegEx pattern
"ab{3}c"
matches “abbbc”, but not “abc” or “abbc”.
Anchors #
Start of the string: The caret (^)
symbol indicates that the pattern must start at the beginning of the string.
Example: The RegEx pattern “^abc” matches “abcde”, but not “xabcde”.
Note: In[]
,^
, on the other hand, ONLY means “not the following” when inside AND at the start of[]
.
To clarify, therefore:
-
[^abc]
» not a, b or c -
[ab^cd]
» a, b, ^ (character), c or d -
\^
» an actual^
character
End of the string: The dollar sign ($)
indicates that the pattern must appear at the end of the string.
Example: The RegEx pattern “abc$” matches “xabc”, but not “xabcx”.
Grouping and Capturing #
Vertical bar (|)
: Represents an OR operator, allowing a match for either the expression on the left or the right.
Example: The RegEx pattern “abc|xyz” matches either “abc” or “xyz”.
Predefined character classes #
\d
: Matches any digit (0-9).\D
: Matches any non-digit character.\s
: Matches any whitespace character (space, tab, newline, etc.).\S
: Matches any non-whitespace character.\w
: Matches any word character (alphanumeric, including underscore).\W
: Matches any non-word character.
^(\d{3}-\d{2}-\d{4})|(\d{9})$
» » valid Social Security numbers