Back in my early Microsoft days on the FlightSim team we were always looking for better perf. If nothing else we needed to give as much time to the rendered as possible because our customers always wanted to run the graphics with everything set to 11. And since this was somewhere between C and C++ (home made cross module vtable and dynamic linking, but that's another story) memory management was a thing. And Intel's vTune told us that allocating memory was taking a measurable amount of time.
Our solution was to build free lists for each type of object we wanted to make faster. The initial implementation was mostly replacing free/delete with putting the object into a linked list and having malloc/new check the list then allocate it if needed. This worked, but we needed to be careful with re-use and re-initialization, and didn't give us much protection from stale pointers to existing objects. Slowly we added more safety and protection. But it was always work and made things more cumbersome.
One of the nice things about working for Microsoft was access to Microsoft Research (MSR) a whole division of folks whose job was to think about big problems in computer science and figure out solutions. And they were always looking for new problems and ways to get their solutions into shipping products. We got all sorts of interesting tools and ideas from them. One of them was a memory management system called Rockall.
Rockall was based on individual heaps, basically a heap per type of object, and gave us a couple of advantages over simple new/malloc. The biggest was that it was a drop in replacement. We didn't need to do anything but set it up at the beginning, and then it did it's magic. Don't ask me how that part worked. I just used it :) We would pre-allocate the space for however many of each of the different types we wanted, then getting a new one was blazingly fast since the memory was already allocated and waiting. Initialization magically happened. It was also contiguous, so we got better memory packing. Just that level of usage gave us a lot of benefits.
But it also had a bunch of debug features which helped out. If you had enough memory you could tell it to put each object at the end of a page and mark the next page as unreadable. Any memory overruns are quickly pointed out in that case. You could set it to mark the page as unreadable on delete, and again, any use after delete quickly became apparent.
It was also faster on cleanup. Transitioning out of flying and into the UI just required a few calls to delete each heap instead of making sure each and every object was individually deleted, in the right order, so we didn't leak anything. At debug time it could also tell us which things hadn't been deallocated manually if we wanted, but since we used it on the way out we didn't care.
That kind of thing can also help out in GC based code as well. Memory that's alway in use is never subject to GC, you don't end up with things moving around in memory to compact, and your allocation times go down. That was really important when we were doing 3D rendering in the browser for VirtualEarth. We didn't have Rockall, so we had to use a manual free_list, but we still got most of the benefits.
Something to think about.