Expandable test for an empty token list—methods, performance, and robustness - TeX

TopAnswers TeX

Meta

Databases

TeX

Code Golf

APL

C++

.net

db<>fiddle

Java

*nix

PHP

PowerShell

Python

Rust

टेक्-मराठी

Typst

Web Client Dev

Web Server Dev

Expandable test for an empty token list—methods, performance, and robustness

add tag

Phelype Oleinik (imported from SE)

With &epsilon;-TeX, the go-to method for testing if a `<token-list>` is empty is the following test:

    \if\relax\detokenize{<token-list>}\relax
      % empty
    \else
      % not empty
    \fi
The method is fool-proof as long as the `<token-list>` can be safely `\detokenize`d, which is the case when it is grabbed as argument to some other macro which does the testing.

Now looking at the `expl3` sources I found the test to actually be (modulo `_` and `:`)

    \expandafter\ifx\expandafter\qnil\detokenize{#1}\qnil
      % empty
    \else
      % not empty
    \fi

where `\qnil` are “quarks” defined with `\def\qnil{\qnil}`, which means that `\ifx\qnil<token>` will only be true if `<token>` is `\qnil`, which will be the case _only if_ `#1` is empty; otherwise `<token>` will be any other (catcode-10 or 12) token which will make the test return false.

But this condition is also true for the first test: `\if\relax<token>` will only be true if `<token>` is another control sequence, which will never be the case if there's _anything_ inside the `\detokenize`.

### Or is it?

Is there a reason for the second method being preferred over the first? Is there an edge-case in which one of them would fail?

Both methods, as far as I can tell, apply the same treatment to the input token list, and are both robust regarding weird arguments, such as `\iftrue\else\fi` (which would otherwise be a problem) because in either case the `<token-list>` is `\detokenize`d, so the argument can be virtually anything.

---

### Motivation:

I’m working on some code that will use this test and should be executed a few hundred times for each function call, so performance is important. According to my tests the first method is slightly (very, _very_ slightly) faster than the second:
```latex
\RequirePackage{l3benchmark}
\ExplSyntaxOn
\prg_new_conditional:Npnn \pho_tl_if_empty:n #1 { TF }
  {
    \if:w \scan_stop: \tl_to_str:n {#1} \scan_stop:
      \prg_return_true:
    \else:
      \prg_return_false:
    \fi:
  }
\cs_new:Npn \pho_test:N #1
  {
    \benchmark_tic:
    \int_step_inline:nn { 999999 }
      {
        #1 { } { } { } % Empty
        #1 { X } { } { } % non-empty
        #1 { \iftrue \else \fi } { } { } % just in case
      }
    \benchmark_toc:
  }
\pho_test:N \pho_tl_if_empty:nTF
\pho_test:N \tl_if_empty:nTF
\stop
```
output:
```none
(l3benchmark) + TIC
(l3benchmark) + TOC: 2.17 s
(l3benchmark) + TIC
(l3benchmark) + TOC: 2.32 s
```
.&thinsp;.&thinsp;. Yes, those are 15 hundredths of a second in one million repetitions :-)

Thus, the motivation here is to know whether I can use the (in)significantly faster method without sacrificing robustness. The _real_ motivation is to know in what way this type of choice may come to bite me in the future.

Top Answer

Skillmon (imported from SE)

# General

There are a few considerations when it comes to performance of TeX code:

1. argument grabbing costs time, don't grab arguments unnecessarily
1. `\expandafter` is slow, if you can work around it with the same amount of expansions it's faster, so instead of
    ```latex
    \if...
      \expandafter\@firstoftwo
    \else
      \expandafter\@secondoftwo
    \fi
    ```
    we'd use (this uses an aspect of the first point, too, namely if false only the contents of the true branch will be gobbled)
    ```latex
    \long\def\my@fi@firstoftwo\fi#1#2#3{\fi#2}
    \if...
      \my@fi@firstoftwo
    \fi
    \@secondoftwo
    ```
1. gobbling tokens explicitly as delimiters for arguments is faster than gobbling them as an argument which is delimited, so the above example can further be optimized:
    ```latex
    \long\def\my@fi@firstoftwo\fi\@secondoftwo#1#2{\fi#1}
    \if...
      \my@fi@firstoftwo
    \fi
    \@secondoftwo
    ```
    But be aware that this way code becomes less readable, less reusable, and less maintainable, so the small performance gain comes at a cost.

`\if...` can represent any if test that results in a TeX-syntax if, such as `\ifx AB`, `\iftrue`, etc.

Also `\if` tests can be slow (depending on the used test) and so is `\detokenize`, if we can get around those, we should. Another thing to consider is that `\if` tests are not robust if their arguments contains other `\if` tests, `\else` or `\fi`. To overcome this the standard test for an empty argument does `\detokenize` the argument with:

```latex
\long\def\ifemptyStandard#1%
  {%
    \if\relax\detokenize{#1}\relax
      \expandafter\@firstoftwo
    \else
      \expandafter\@secondoftwo
    \fi
  }
```

This yields an unbeatable robustness, as the only possible argument that might fail this test would be an unbalanced input, which needs to be actively created, such as `\expandafter\ifemptyStandard\expandafter{\iffalse{\fi}}{true}{false}` (but who would do that anyway).

Of all the if tests built into TeX, `\ifx` is probably the fastest. So a naive test `\ifx <some-token>#1<some-token>` would be pretty fast, unfortunately this would not be robust. Cases for which it'd fail would be if `\if...`, `\else`, or `\fi` would be part of the argument or if `#1` starts with `<some-token>` (though we can make `<some-token>` pretty unlikely).

# Fast `\ifempty`

The following is a fast test, that considers some of the above mentioned aspects. We don't use any `\if...` test, but instead do the branching through TeX's argument grabbing logic:

```latex
\long\def\ifempty@true\ifempty@A\ifempty@B\@secondoftwo#1#2{#1}
\long\def\ifempty@#1\ifempty@A\ifempty@B{}
\long\def\ifempty#1%
  {%
    \ifempty@\ifempty@A#1\ifempty@B\ifempty@true
      \ifempty@A\ifempty@B\@secondoftwo
  }
```

So if `#1` is empty `\ifempty@` will gobble only the first `\ifempty@A` and `\ifempty@B` and `\ifempty@true` will be executed, gobbling the following `\ifempty@A\ifempty@B\@secondoftwo` and the false-branch. On the other hand, if `#1` is not empty everything up to `\@secondoftwo` (non-inclusive) will be gobbled and `\@secondoftwo` will execute the false-branch.

This way we get a fast testing macro (taking about 70% the time of the `\if\relax\detokenize{#1}\relax` test during my benchmarks), that's fairly robust (only input which contains `\ifempty@A\ifempty@B` will fail the test, and that should be rare).

And of course, we can use tokens which are even more unlikely than `\ifempty@A` and `\ifempty@B`, e.g., why not use a `<DEL>` characters for both but with different category codes (that should be pretty very very unlikely to ever be part of a valid argument):

```latex
\begingroup
\lccode`\&=127
\lccode`\$=127
\catcode`\&=12
\catcode`\$=11
\lowercase{\endgroup
\long\def\ifempty@true&$\@secondoftwo#1#2{#1}
\long\def\ifempty@#1&${}
\long\def\ifempty#1{\ifempty@&#1$\ifempty@true&$\@secondoftwo}
}
```

# Fast `\ifblank`

As a small addition, we can also create a fast `\ifblank` test based on the aforementioned thoughts. The standard `\ifblank` looks something like the following:

```latex
\long\def\ifblankStandard#1%
  {%
    \if\relax\detokenize\expandafter{\@gobble #1.}\relax
      \expandafter\@firstoftwo
    \else
      \expandafter\@secondoftwo
    \fi
  }
```

So essentially the same as `\ifemptyStandard` but with an `\expandafter` and a `\@gobble #1.` added. But we could do the same as for our fast `\ifempty` test with just some small additions (I'll just add this to the slightly obfuscated variant using the `<DEL>` tokens). And we don't want to use some `\expandafter`s (remember they are slow) so we use `\ifblank@` to gobble one token and insert the necessary tests of `\ifempty`.

```latex
\begingroup
\lccode`\&=127
\lccode`\$=127
\catcode`\&=12
\catcode`\$=11
\lowercase{\endgroup
\long\def\ifempty@true&$\@secondoftwo#1#2{#1}
\long\def\ifempty@#1&${}
\long\def\ifempty#1{\ifempty@&#1$\ifempty@true&$\@secondoftwo}
\long\def\ifblank@#1{\ifempty@&}
\long\def\ifblank#1{\ifblank@#1.$\ifempty@true&$\@secondoftwo}
}
```

1 Answer