add tag
निरंजन
Let's have a look at two files:

---
#### Key-value pair:

```
\ExplSyntaxOn
\keys_define:nn { test } {
  test
  .code:n                = {
    \cs_set_protected:Npn \__test_cmd_create: { #1 }
  }
}

\benchmark:n { \keys_set:nn { test } { test = foo } }
\ExplSyntaxOff
\stop
```

---
#### Commands:

```
\ExplSyntaxOn
\cs_new_protected:Npn \__test_cmd:n #1 {
  \cs_set_protected:Npn \__test_cmd_create: { #1 }
}

\benchmark:n { \__test_cmd:n { foo } }
\ExplSyntaxOff
\stop
```

I get the following on a GNU Linux machine with fully updated TL2026 (and `lualatex-dev`):

```txt
Command: 1.76e-7 seconds (0.849 ops)
Key-val: 3.5e-5 seconds (172 ops)
```

That's a huge difference. But okay, I used a benchmarking tool called `hyperfine` on the two files (obviously I removed `\benchmark:n` first). I created a hello world file and asked `hyperfine` to run it three times with lualatex-dev for warming up. After the warm up, I ran both of the test files 100 times. Asked Claude to write a basic Python script that analyzes the data given by hyperfine and the script returned the following:

```txt
============================================================
  Benchmark Results
============================================================
  keyval.tex                          591.4 ms  ± 12.0 ms
  commands.tex                        589.8 ms  ± 6.2 ms
------------------------------------------------------------
  Faster file : commands.tex
  Slower file : keyval.tex
  Speed ratio : 1.00x faster
  Difference  : 0.3% faster

  Warning: The difference is within noise.
  Result may not be statistically significant.
============================================================
```

I asked it again how such a drastic difference (the one based on `\benchmark:n`) can simply be insignificant. And this was its response:

---

This is a classic cold vs. warm measurement problem, and your two measurement methods are capturing completely different things.

What the benchmark script measures:

The script times entire `lualatex-dev` process invocations — from the OS forking the process, through Lua startup, format loading, font cache loading, all the way to `\stop`. That fixed overhead dominates completely at ~590 ms. The actual execution time of `\__test_cmd:n` vs `\keys_set:nn` is in the range of microseconds to tens of microseconds, which is 4–5 orders of magnitude smaller than the 590 ms wall time. It simply drowns in the noise floor, hence the 0.3% difference and the noise warning.

What the terminal measurement captures:

When you profile the call directly inside a running TeX session (using something like `\benchmark:n`), the format is already loaded, Lua is warm, fonts are cached, and you're measuring only the instruction execution itself in isolation. There, the ~200× difference between `1.76e-7 s` and `3.5e-5 s` is real and reflects the actual cost of the key–value dispatch machinery in `\keys_set:nn` versus a direct command call.

What to conclude:

Both results are correct — they're just answering different questions:

The benchmark script answers: "Does choosing between these two approaches have any meaningful effect on document compile time?" The answer is no — at one call each, the difference is unmeasurable at the process level. Even at thousands of calls, you'd need to be in a very hot loop before it would show up in wall-clock compile time.

The `\benchmark:n` measurement answers: "Is there an inherent performance difference between the two mechanisms?" The answer is yes, a large one — key–value parsing is genuinely much more expensive per call than a direct command dispatch.

So if you're asking whether this architectural choice matters for a document author calling it a handful of times, the process-level benchmark gives you the correct answer: it doesn't. If you're writing a package where this code runs tens of thousands of times per document, the micro-benchmark gives you the correct answer: it matters a lot, and you'd want to avoid `\keys_set:nn` in the hot path.

---

I am curious to know what others think about whether this is a right interpretation of the numbers or not? What should a developer consider while writing their own package? What would be the _best practice_ here?

Enter question or answer id or url (and optionally further answer ids/urls from the same question) from

Separate each id/url with a space. No need to list your own answers; they will be imported automatically.