How to parse groups using pgfparser? - TeX

TopAnswers TeX

Meta

Databases

TeX

Code Golf

APL

C++

.net

db<>fiddle

Java

*nix

PHP

PowerShell

Python

Rust

टेक्-मराठी

Typst

Web Client Dev

Web Server Dev

How to parse groups using pgfparser?

add tag

joulev

## Problem

`parser` module parses token by token. It does not respect groups. In other words, if `parser` parses `Foo {bar} baz`, I believe it parses like this (correct me if I am wrong)

```none
F-o-o-␣-{-b-a-r-}-␣-b-a-z
```

I want it to parse like this (it is guaranteed that there is no nesting like `{{ba}r}`).

```none
F-o-o-␣-{bar}-␣-b-a-z
```

Note that `{` can be preceeded by any character (not just the space character, most frequently it is preceeded by a control sequence), so things like `\pgfparserdef{foo}{initial}{blank space}[m]` unfortunately won't help.

## Approach

I think I will define an action when `{` is parsed. It will then execute another parser that does nothing and ends at `}`.

So although the problem does not explicitly ask for a special action at `{`, all of the following attempts aim to do so.

## Attempt 1

Use the character directly obviously doesn't work:

```tex
% arara: pdflatex
\documentclass{article}
\usepackage{pgf}
\usepgfmodule{parser}
\pgfparserdef{foo}{initial}.{\pgfparserswitch{final}}
\pgfparserdef{foo}{initial}{{\typeout{Got a \char`\{}}
\pgfparserdeffinal{foo}{}
\pgfparserset{silent=true}
\begin{document}
\pgfparserparse{foo}This is an \emph{emphasized} word.
\end{document}

% Error: Runaway argument?
```

## Attempt 2

`\meaning{` gives me `begin-group character {`. But using that phrase also doesn't work.

```tex
% arara: pdflatex
\documentclass{article}
\usepackage{pgf}
\usepgfmodule{parser}
\pgfparserdef{foo}{initial}.{\pgfparserswitch{final}}
\pgfparserdef{foo}{initial}{begin-group character {}{\typeout{Got a \char`\{}}
\pgfparserdeffinal{foo}{}
\pgfparserset{silent=true}
\begin{document}
\pgfparserparse{foo}This is an \emph{emphasized} word.
\end{document}

% Error: Runaway argument?
```

### Attempt 2.1

Using `begin-group character` also doesn't help.

```tex
% arara: pdflatex
\documentclass{article}
\usepackage{pgf}
\usepgfmodule{parser}
\pgfparserdef{foo}{initial}.{\pgfparserswitch{final}}
\pgfparserdef{foo}{initial}{begin-group character}{\typeout{Got a \char`\{}}
\pgfparserdeffinal{foo}{}
\pgfparserset{silent=true}
\begin{document}
\pgfparserparse{foo}This is an \emph{emphasized} word.
\end{document}
```

No errors, but because `begin-group character` is not a meaning of any character, `\typeout` is not executed.

## Attempt 3

Changing category code is my last resource. It does not return an error, but once again, `\typeout` is not executed.

```tex
% arara: pdflatex
\documentclass{article}
\usepackage{pgf}
\usepgfmodule{parser}
\pgfparserdef{foo}{initial}.{\pgfparserswitch{final}}
\begingroup
  \catcode`\{12\relax
  \catcode`\}12\relax
  \catcode`\(1\relax
  \catcode`\)2\relax
  \pgfparserdef(foo)(initial){(\typeout(Got a \char`\{))
\endgroup
\pgfparserdeffinal{foo}{}
\pgfparserset{silent=true}
\begin{document}
\pgfparserparse{foo}This is an \emph{emphasized} word.
\end{document}
```

## Question

So how to parse `{` using `parser`? Or is there a better approach to solve the 'root' problem?

Note that while answers using other tools, e.g. `expl3`, are welcome, I am afraid they will hardly be useful to me, as my colleagues won't understand it :)

Top Answer

Skillmon

# Gobbling the Tokens using `parser`

The easiest way to define a rule for `{` or `}` is to use the `\meaning` of `\bgroup` and `\egroup`, since those are let to `{` and `}`. (Obviously) you can't use `\pgfparserdef{foo}{initial}\bgroup{stuff}` as the code couldn't distinguish this from an actual opening brace (it's using `\futurelet` -- pretty equivalent of `\@ifnextchar` -- to look for the opening brace).

Also, while of course possible, I'd advice you to silence parsers individually so that you don't accidentally break other code using the module.

Your code should look like this:

```tex
\documentclass{article}
\usepackage{pgf}
\usepgfmodule{parser}
\pgfparserdef{foo}{initial}.{\pgfparserswitch{final}}
\pgfparserdef{foo}{initial}{\meaning\bgroup}{\typeout{Got a \char`\{}}
\pgfparserdef{foo}{initial}\egroup{\typeout{Got a \char`\}}}
\pgfparserset{foo/silent=true}
\begin{document}
\pgfparserparse{foo}This is an \emph{emphasized} word.
\end{document}
```

----

# Grabbing the tokens as an argument

You could actually also grab the argument in braces instead of trashing it with `parser`'s ability to ignore everything, but this requires a bit of extra code, a small trick to reinsert an unbalanced opening brace, and an undocumented `parser` internal:

```tex
\documentclass{article}

\usepackage{pgf}
\usepgfmodule{parser}

\makeatletter
\pgfparserdef{foo}{initial}.{\pgfparserswitch{final}}
\pgfparserdef{foo}{initial}{\meaning\bgroup}{\foogroupremover}
\newcommand*\foogroupremover[1]
  {%
    \expandafter\foogroupremoverAUX\expandafter{\iffalse}\fi
  }
\newcommand\foogroupremoverAUX[1]
  {%
    \typeout{There was a group containing `#1'}%
    \pgfparser@getnexttoken
  }
\pgfparserset{foo/silent=true}
\makeatother

\begin{document}
\pgfparserparse{foo}This is an \emph{emphasized} word.
\end{document}
```

The above uses two steps to grab the braced content. The first step removes a `parser` internal which would parse the next token (that's the argument grabbed by `\foogroupremover`) and inserts an unbalanced opening brace. The next step (`\foogroupremoverAUX`) grabs the braced contents and reinserts the `parser` internal to give control back to `parser`.

I like "LaTeX benchmarks are like air conditioners. They don't work as soon as you open Windows". ;-) Reminds me of "The first day at which Microsoft is producing something that doesn't suck is when they start manufacturing vacuum cleaners." ;-)

I don't know why the difference is so big (and flipped) on your system. But these benchmark results (especially those with `l3benchmark`) are pretty stable for me, the `\egroup` being faster on every run. I know that benchmarking TeX on Windows is close to non-functional (or, as Paulo once put it, "LaTeX benchmarks are like air conditioners. They don't work as soon as you open Windows"), but I never heard that MacOS is unstable here as well (which isn't necessarily the case here, it could be some machine code weirdness on MacOS compared to Linux). My benchmarking setup is pretty well-tested on my machine, though, and gives consistent results (almost) every time (see the massive amount of benchmarks I've done on sorting algorithms in TeX and key=value parsers).

Looking at the code, the `\meaning\egroup` route has one macro definition and one `\ifx` more than the `\egroup` route. This is done when you use `\meaning\egroup`: ```tex {% \def\pgfparserdef@arg{#1}% \ifx\pgfparser@blankspace\pgfparserdef@arg \pgfparser@fi@BTb \fi \pgfutil@secondoftwo {\expandafter\pgfparserdef@d\pgfparserdef@twoargs{blank space \space}} {\expandafter\pgfparserdef@d\pgfparserdef@twoargs{#1}}% } ``` And this is done if you use `\egroup` directly: ```tex {% \expandafter\pgfparserdef@d\pgfparserdef@twoargs {\meaning\pgfparserdef@arg}% }% ``` (in the definition of `\pgfparserdef@c`, lines 216 on wards in the file `pgfmoduleparser.code.tex`)

`time` results on my machine with the example code of my first answer: ``` real 0m0.353s user 0m0.225s sys 0m0.127s ``` Using the same script but with `{\meaning\egroup}`: ``` real 0m0.362s user 0m0.250s sys 0m0.110s ``` Comparing using `l3benchmark`, three repetitions, target time 1sec each, then changing the order of the benchmarks and doing the same: ``` \meaning\egroup / \egroup 1.007 1.007 1.007 \egroup / \meaning\egroup 0.985 0.985 0.985 ```

If the input is directly to `pgfparserparse` without a wrapping macro, you don't have to care for matching braces as well. The parser is agnostic to any of this (this is why I used it to block remove code in https://topanswers.xyz/tex?q=715).

No, if the meaning of a single token is not `blank space ` or `begin-group character {` one can shortcut the definition by just putting a token there that has the meaning of the token for which you want to define a rule. So `\pgfparserdef{foo}{initial}\egroup{<stuff>}` is perfectly fine.

@JouleV Stupid question: why don't you just use ordinary parentheses to delimit groups in the parsed expressions? Or something like < and >? The you will have less headaches if a user forgets to match the `{` and `}` signs.

Parsing (or ignoring) a balanced set of braces could easily be done by incrementing a counter on opening and decrementing it on closing braces, only switching the inner parser to `final` if the counter is 0. Just in case someone was wondering.

I don't really know how to use `parser`, but from what I saw of it you need the `\meaning` of the token you're looking for. If that's the case, try changing your attempt 2 to ``\expanded{\noexpand\pgfparserdef{foo}{initial}{\meaning{}}{\typeout{Got a \char`\{}}``

1 Answer