How can I Obtain all the Match Positions of a Regex over a String? - C++

TopAnswers C++

Meta

Databases

TeX

Code Golf

APL

C++

.net

db<>fiddle

Java

*nix

PHP

PowerShell

Python

Rust

टेक्-मराठी

Typst

Web Client Dev

Web Server Dev

How can I Obtain all the Match Positions of a Regex over a String?

add tag

Jonathan Mee

I've written a toy example to try to help me understand this. Given the following string:

    const char input[] = "if (KnR)\n"
                         "\tfoo();\n"
                         "if (spaces) {\n"
                         "    foo();\n"
                         "}\n"
                         "if (allman)\n"
                         "{\n"
                         "\tfoo();\n"
                         "}\n"
                         "if (horstmann)\n"
                         "{\tfoo();\n"
                         "}\n"
                         "if (pico)\n"
                         "{\tfoo(); }\n"
                         "if (whitesmiths)\n"
                         "\t{\n"
                         "\tfoo();\n"
                         "\t}\n";

If I'm using the following: `const regex r("(.+?)\\s*\\{?\\s*(.+?;)\\s*\\}?\\s*")` How can I find the begin and end position of all of the first capture in `input` and of all the second capture in `input`?

So for example I expect capture 1 to have the following ranges:

 1. 0 to 8
 1. 17 to 28
 1. 44 to 55
 1. 68 to 82
 1. 94 to 103
 1. 115 to 131

And I expect capture 2 to have the following ranges:

 1. 10 to 16
 1. 35 to 41
 1. 59 to 65
 1. 85 to 91
 1. 106 to 112
 1. 136 to 142

Top Answer

Jonathan Mee

We should start by talking about [`std::match_results`](https://en.cppreference.com/w/cpp/regex/match_results) which is what `std::regex_match` or `std::regex_search` would store the function results into. Provided the regex suceeded it will contain [`std::match_results::size`](https://en.cppreference.com/w/cpp/regex/match_results/size) [`std::sub_match`](https://en.cppreference.com/w/cpp/regex/sub_match)s. Provided that the function generating the `std::match_results` succeeded there will be a 1-to-1 mapping from the captures in the regex to the `std::sub_match`s in the `std::match_results`. When indexing a `std::match_results`' `std::sub_match`s:

 * The `std::sub_match`s at indices less than 0 contain the portion of the matched string which precededs the first matched character of the entire regex
 * The `std::sub_match` at index 0 contains the portion of the string matched by the entire regex
 * The `std::sub_match`s greater than 0 and less than `std::match_results::size` contain the portion of the string matched by the regex's corresponding 1-based capture
 * The `std::sub_match`s greater than or equal to `std::match_results::size` contain the portion of the string which follows the last matched character of the entire regex

 We can use a [`std::regex_iterator`](https://en.cppreference.com/w/cpp/regex/regex_iterator). To obtain the `std::match_results`s from the 1^st^ capture we could do:

     const std::vector<std::cmatch> output = { std::cregex_iterator(std::cbegin(input), std::cend(input), r), std::cregex_iterator() };

To obtain the matched range from these `std::cmatch`s you can use the `position` method to find the offset and the `length` method to find the size of the match, simply provide these methods the index of the desired capture.

So for example the 1^st^ captures offset in the 1^st^ match could be found by doing: `output.front().position(1)`

The length of this match could be found by doing: `output.front().length(1)`

These could be added together to find the end of the range.

[**Live Example**](https://ideone.com/x0Ekkt)

1 Answer