Moreover, std::minmax_element is slower than calling std::min_element+std::max_element if there's a lot of data and it does not fit into the CPU cache: http://quick-bench.com/tlgxCx9CUMZgOfYbwhFaEI0WNOg
When I visited that Quickbench link, I noticed that it was set to GCC and to libstdc++. When I changed it to clang and libc++, the standard library implementation was faster than the hand-written version. Did you mean to report this against libstdc++?
Sorry, wrong link. Here's the right one http://quick-bench.com/RQrBIzB8sBYS932z90b7DTEyEks There's a good comment from Marc Glisse with description of close behaviour in GCC https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89120#c1 However, it looks like there's still a room for improvement. If std::minmax_element is called with standard comparators and for fundamental type, then using a simple loop (with more comparisons) will improve performance (for CPUs that have branch predictors and non costly comparison instructions for builtins). Although I'm not sure that this could/should be solved on the library level - too many CPU specific knowledge in libc++ does not seem right.
(In reply to Antony Polukhin from comment #2) > Sorry, wrong link. Here's the right one > http://quick-bench.com/RQrBIzB8sBYS932z90b7DTEyEks That's better; but you are testing different things. Your "hand written code" returns (in pointers) two values. The "standard one" returns a pair of iterators. Also, despite being faster, you're doing more comparisons than the one in the standard: You: 2N comparisons. Std: 3N/2 comparisons. > There's a good comment from Marc Glisse with description of close behaviour > in GCC https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89120#c1 And now I see I've made the same points that he has.
Closing as "Not a bug"