Jonathan Mee
I've written a toy example to try to help me understand this. Given the following string:
const char input[] = "if (KnR)\n"
"\tfoo();\n"
"if (spaces) {\n"
" foo();\n"
"}\n"
"if (allman)\n"
"{\n"
"\tfoo();\n"
"}\n"
"if (horstmann)\n"
"{\tfoo();\n"
"}\n"
"if (pico)\n"
"{\tfoo(); }\n"
"if (whitesmiths)\n"
"\t{\n"
"\tfoo();\n"
"\t}\n";
If I'm using the following: `const regex r("(.+?)\\s*\\{?\\s*(.+?;)\\s*\\}?\\s*")` How can I find the begin and end position of all of the first capture in `input` and of all the second capture in `input`?
So for example I expect capture 1 to have the following ranges:
1. 0 to 8
1. 17 to 28
1. 44 to 55
1. 68 to 82
1. 94 to 103
1. 115 to 131
And I expect capture 2 to have the following ranges:
1. 10 to 16
1. 35 to 41
1. 59 to 65
1. 85 to 91
1. 106 to 112
1. 136 to 142
Top Answer
Jonathan Mee
We should start by talking about [`std::match_results`](https://en.cppreference.com/w/cpp/regex/match_results) which is what `std::regex_match` or `std::regex_search` would store the function results into. Provided the regex suceeded it will contain [`std::match_results::size`](https://en.cppreference.com/w/cpp/regex/match_results/size) [`std::sub_match`](https://en.cppreference.com/w/cpp/regex/sub_match)s. Provided that the function generating the `std::match_results` succeeded there will be a 1-to-1 mapping from the captures in the regex to the `std::sub_match`s in the `std::match_results`. When indexing a `std::match_results`' `std::sub_match`s:
* The `std::sub_match`s at indices less than 0 contain the portion of the matched string which precededs the first matched character of the entire regex
* The `std::sub_match` at index 0 contains the portion of the string matched by the entire regex
* The `std::sub_match`s greater than 0 and less than `std::match_results::size` contain the portion of the string matched by the regex's corresponding 1-based capture
* The `std::sub_match`s greater than or equal to `std::match_results::size` contain the portion of the string which follows the last matched character of the entire regex
We can use a [`std::regex_iterator`](https://en.cppreference.com/w/cpp/regex/regex_iterator). To obtain the `std::match_results`s from the 1^st^ capture we could do:
const std::vector<std::cmatch> output = { std::cregex_iterator(std::cbegin(input), std::cend(input), r), std::cregex_iterator() };
To obtain the matched range from these `std::cmatch`s you can use the `position` method to find the offset and the `length` method to find the size of the match, simply provide these methods the index of the desired capture.
So for example the 1^st^ captures offset in the 1^st^ match could be found by doing: `output.front().position(1)`
The length of this match could be found by doing: `output.front().length(1)`
These could be added together to find the end of the range.
[**Live Example**](https://ideone.com/x0Ekkt)