The test robot highlighted this massive improvement through its "will-it-scale.per_process_ops" scalability test case, running on an Intel Xeon Platinum (Cooper Lake) test server.
According to Phoronix the commit responsible for this performance uplift was labelled with the description term mm, mmap: limit THP alignment of anonymous mappings to PMD-aligned sizes. The patch message confirms that it resolves prior performance regressions and offers substantial improvements in specialised cases.
"Since commit efa7df3e3bb5 ("mm: align larger anonymous mappings on THP boundaries") a mmap() of anonymous memory without a specific address hint and of at least PMD_SIZE will be aligned to PMD so that it can benefit from a THP backing page. However this change has been shown to regress some workloads significantly. [1] reports regressions in various spec benchmarks, with up to 600 per cent slowdown of the cactusBSSN benchmark on some platforms," the patch message stated.
The issue stemmed from the fact that certain benchmarks, such as cactusBSSN, created many mappings of 4632kB that would previously merge into a large THP-backed area.
Post-commit efa7df3e3bb5, these mappings were fragmented into multiple areas, each aligned to PMD boundaries with gaps in between, causing significant performance regressions due to TLB or cache aliasing.
Another regression identified was in the darktable application, which the new patch reportedly also addresses. To remedy the regressions while still benefiting from THP-friendly anonymous mapping alignment, the patch now requires the mapping size to be a multiple of PMD size instead of merely at least PMD size. This change allows for natural merging of many odd-sized mappings, avoiding the performance issues previously encountered.
The mmap patch merged last week affects just one line of code but has had a profound impact. The memory management patch that introduced the regressions into the mainline Linux kernel has been upstream since December of 2023.
Further benchmarks will be conducted to assess real-world workloads and measure any additional performance shifts with this latest Linux kernel code beyond the synthetic test cases.