Pattern Matching
Pattern matching in TinyMUX determines how user input is compared against templates defined in softcode. Two matching systems are available: wildcard (glob) patterns, which are the default, and regular expressions (PCRE), which offer more precise control. Understanding when to use each is key to writing effective softcode.
Wildcard (Glob) Patterns
Wildcard patterns are the default matching system in TinyMUX. They are used by $-commands, @listen, and functions like match() and strmatch(). Wildcard matching is always case-insensitive.
Two wildcard characters are available:
*matches zero or more characters of any kind.?matches exactly one character of any kind.
Wildcards in $-Commands
When a $-command fires, the text matched by each wildcard is captured into the substitution variables %0 through %9, numbered left to right by wildcard position:
&CMD_GIVE obj=$give * to *:@pemit %#=You give %0 to %1.
If a player types give sword to knight, then %0 is sword and %1 is knight.
The same capture mechanism applies to ^-listen patterns set via @listen. The matched substrings are available in the triggered AHEAR, AMHEAR, or AAHEAR attributes:
@listen camera=* has arrived.
@ahear camera=@va me=%va %0
Wildcard Functions
- match() – Tests each word in a list against a wildcard pattern, returning the 1-based index of the first match (0 if none). Useful for finding a word in a list:
match(red green blue, gr*)returns2. - matchall() – Like
match(), but returns the indices of all matching words. - strmatch() – Matches a wildcard pattern against an entire string (not word-by-word). Returns 1 or 0. Example:
strmatch(This is a test, *is*)returns1.
Regular Expressions (PCRE)
TinyMUX uses the PCRE (Perl Compatible Regular Expressions) library, specifically PCRE2 as of recent versions. Regular expressions provide character classes, quantifiers, alternation, anchoring, lookahead/lookbehind, and other features far beyond what wildcards offer.
Enabling Regex on Attributes
By default, $-command and ^-listen patterns use wildcard matching. To switch an attribute to regex matching, set the regexp attribute flag:
&DO_NUM obj=$\+setnum (.+)=([0-9]+):@pemit %#=Setting %1 to %2.
@set obj/DO_NUM=regexp
In regex $-commands, %0 holds the entire matched substring, and %1 through %9 hold the parenthesized capture groups.
Colon Escaping in Patterns
Because the first unescaped : separates the pattern from the action in a $-command, any literal colon within the pattern must be doubled (::) to prevent it from being treated as the delimiter. This is particularly relevant for regex non-capturing groups:
&CMD obj=$(?::red|blue) ball::@pemit %#=You see a colorful ball.
After parsing, every :: in the pattern collapses to a single :. This convention applies to ^-listen patterns as well.
Regex Functions
TinyMUX provides a family of functions for regex operations. Most come in case-sensitive and case-insensitive variants (the latter suffixed with i):
- regmatch() / regmatchi() – Tests whether a regex matches a string. Returns 1 or 0. Optionally stores capture groups in specified registers:
regmatch(cookies=30, (.+)=([0-9]+), 0 1 2)sets%q0tocookies=30,%q1tocookies,%q2to30. Use-1in the register list to discard a capture. - regrab() / regrabi() – Returns the first element of a list matching a regex.
- regraball() / regraballi() – Returns all elements of a list matching a regex.
- regedit() / regediti() – Regex find-and-replace on a string; replaces the first match. Multiple pattern/replacement pairs can be chained.
- regeditall() / regeditalli() – Like
regedit(), but replaces all matches. - reglattr() / reglattri() – Lists attribute names on an object whose names match a regex.
In regedit() replacement strings, $0 through $99 refer to numbered captures, and $<name> refers to named captures:
say regedit(The quick brown fox, (?P<first>\w+)\s+(?P<second>\w+), $<second> $<first>)
You say, "quick The brown fox"
PCRE Syntax Highlights
Since TinyMUX uses PCRE2, the full Perl-compatible regex syntax is available:
- Character classes:
[A-Za-z],[0-9],\d(digit),\w(word character),\s(whitespace). - Quantifiers:
*(0+),+(1+),?(0 or 1),{n,m}(between n and m). - Non-greedy quantifiers:
*?,+?,??– match as little as possible. - Alternation:
cat|dogmatches either word. - Anchors:
^(start of string),$(end of string),\b(word boundary). - Grouping:
(...)for capture,(?:...)for non-capturing groups. - Named captures:
(?P<name>...)or(?<name>...). - Lookahead/lookbehind:
(?=...),(?!...),(?<=...),(?<!...). - Inline flags:
(?i)for case-insensitive mode within the pattern. - Unicode properties:
\p{L}(any letter),\p{N}(any number),\p{Lu}(uppercase letter), and other Unicode property classes, depending on how PCRE2 was compiled.
Case Sensitivity
Wildcard matching is always case-insensitive. Regular expressions are case-sensitive by default. There are three ways to get case-insensitive regex matching:
- Use the
i-suffixed function variant:regmatchi(),regrabi(),regediti(),regeditalli(), etc. - Set the
caseattribute flag alongsideregexpon a$-commandattribute:@set obj/CMD=case. - Embed
(?i)at the start of the regex pattern itself.
Performance Considerations
Wildcard matching is significantly faster than regular expression matching. Wildcards use a simple linear scan, while regex requires compilation and a more complex matching engine. For $-commands that will be checked on every command entered in a room, this difference matters.
- Use wildcards for simple prefix matching:
$+who *is cheaper than a regex equivalent. - Reserve regex for cases where you need character-class validation, anchoring, or alternation that wildcards cannot express.
- Set the
NO_COMMANDflag on objects that do not carry$-commandsto avoid unnecessary matching overhead.
Common Patterns for $-Commands
| Goal | Wildcard | Regex (with regexp flag) |
|---|---|---|
| Simple command with argument | $+cmd * | $\+cmd (.+) |
| Command with two arguments | $+cmd *=* | $\+cmd (.+)=(.+) |
| Optional argument | $+cmd* | $\+cmd\s*(.*) |
| Numeric argument only | (validate in action) | $\+cmd ([0-9]+) |
| Command starting with special char | $+test * | $\+test (.+) |
Note that + is a regex metacharacter and must be escaped with \ in regex patterns. In wildcard patterns, + is literal.
Pitfalls
- Greedy matching with wildcards: The
*wildcard is greedy. In$cmd *=*, the inputcmd a=b=cassignsa=bto%0andcto%1, because the first*consumes as much as possible before yielding to the second. Use regex with non-greedy quantifiers ((.+?)=(.+)) if you need the first=to be the split point. - Greedy regex quantifiers:
.+and.*are greedy by default. Use.+?and.*?for non-greedy behavior. - Regex special characters: Characters like
.,+,*,?,(,),[,],{,},\,^,$, and|have special meaning in regex. Escape them with\when matching literally. - Capture group numbering: In regex
$-commands,%0is the entire match, not the first capture group. The first parenthesized group is%1. This is the opposite convention from wildcard$-commands, where%0is the first wildcard. - Colon in patterns: A bare
:terminates the pattern in$-commandsand^-listenattributes. Always double it (::) inside patterns. Forgetting this is a common source of silently broken commands. - Nested captures: Only
%1through%9are available in$-commandactions. Complex regex with more than nine capture groups will lose the excess. Useregmatch()with named registers when you need more captures.
See Also
- Arbitrary Commands—defining
$-commands - REGEXPS—regex reference from help
- Wild card—wildcard basics
- strmatch(), match()—wildcard functions
- @listen—listen patterns