diff options
Diffstat (limited to 'lib/libmalloc/NOTE')
-rw-r--r-- | lib/libmalloc/NOTE | 151 |
1 files changed, 151 insertions, 0 deletions
diff --git a/lib/libmalloc/NOTE b/lib/libmalloc/NOTE new file mode 100644 index 000000000000..dede8664d38e --- /dev/null +++ b/lib/libmalloc/NOTE @@ -0,0 +1,151 @@ +Miscellaneous notes on the malloc internals. + +First fit malloc with boundary tags for fast freeing - based on the +techniques described in 'The Art of Computer Programming' by Donald +Knuth. The essential features of the boundary tag are described in +Algorithm C, Section 2.5. The actual algorithms here incorporate the +improvements suggested in Exercises 12 and and 15 (I think), and +adaptations to the C/Un*x environment, plus some performance +improvements. By keeping the list circular, we don't have an AVAIL at +all - just use ROVER as the pointer into the list. It also tries to +"preserve the wilderness" -- i.e. try to keep the block nearest the +data break free as far as possible, but it does go overboard about it. +(i.e. it is a simple first fit -- many wilderness preservation schemes +tend to keep the wilderness block out of the free list, and search the +rest of the free list first. I find that degrades performance +somewhat, even if it keeps memory slightly less fragmented) + +This has a lot of debugging and tracing support built in - compiling +with -DDEBUG provides fairly comprehensive ASSERT protection to detect +heap corruption. (It aborts on the first sign of trouble). I hope that +every pointer reference is protected with an assertion, so the malloc +should never dump core or behave weirdly because of user program +misbehaviour -- it should blow an assertion instead. If the debugging +version ever gets a segmentation fault, bus error and dumps core +instead of blowing an assertion and then dumping core on an IO trap or +Illegal Instruction (depending on how your system does an abort()), I +want to know... + +For the non-debugging malloc: + +A minimum size for an allocated block is 4 units - two for the +leading and trailing size word, and two for the next and previous +pointers. Therefore the minimum size that we can allocate is 2 +units, since allocated and freed blocks both have the leading and +trailing size blocks. Only freed blocks have the next, prev +pointers. The size block is actually a signed - the sign bit +indicates whether it is a free or allocated block. i.e the tag. + +For the debugging malloc: + +The minimum size for an allocated block is still 4 units - two for the +leading and trailing size word, and two for the next and previous +pointers. But the minimum size that we can allocate is 1 units, since +allocated and freed blocks both have the leading and trailing size +blocks. Only freed blocks have the next, prev pointers. Only allocated +blocks have the realsize word, which indicates the size of the block +requested by the user in bytes. The next pointer and the realsize word +share the same header word. So the overhead for an allocated block is +3 words when debugging, as opposed to 2 words when not debugging. +(even though the minimum block size is still 4 words) + +The size block is actually a signed - the sign bit indicates whether +it is a free or allocated block. i.e the tag. Since the size is in +words, this loss of a bit is no big deal. The realsize block is a full +unsigned word, since it stores the size in bytes. + +For the leak tracing malloc + +The _*.c files all have the extra macros that record the blocks as +they are allocated, and deleted. The extra tracing macro also prints +the file name and line number before the usual tracing information +about size etc. This permits you to find out where block xxx was +freed, so that you can do an address to name mapping more easily. + +Note that to use this stuff, you HAVE to include malloc.h and define +MALLOC_TRACE when compiling. It still works correctly when you don't +include malloc.h, but you won't get all the information. + +Performance + +The fastest malloc I know of is the BSD4.3 malloc, which uses a number +of bins with different sized blocks, sized in powers of two. All it +has to do to find a free block is an iterated shift to compute the +bin, if the bin is empty then split a block from one of the higher +bins. Unfortunately, it can waste spectacular amounts of space for +many programs. + +The most space efficient malloc I know is Sun's Cartesian tree "better +fit" malloc (It has a hidden overhead of 8K of static data, though). +It is also unpleasantly slow. + +My malloc is close to the BSD4.3 malloc in performance - the +difference starts to become obvious only as the number of blocks in +the free list climbs towards the 100000 to million range at which +point the 4.3 malloc starts to pull ahead in user time, but quite +often, its VM behaviour deteriorates and makes its elapsed times very +large. My malloc is close to Sun's malloc in space efficiency. + +The key to the improved performance over typical first fit allocators +is the roving pointer, as well as the coalescing of freed blocks, +which keeps memory fragmentation and the size of the free list down. +The wilderness preservation heuristic also helps a lot. + +Guide to porting it: + +Should port without trouble to machines where any ugliness or +weirdness in pointers or segmentation is hidden from the programmer. I +haven't tried porting it to the 286 and below under DOS, nor do I +intend to at present. + +You may need to check align.h if your machine has non-standard +alignment requirements. (I interpret standard alignment to be +alignenment on one of char, int, char *, int *, char **, int **, +whichever requires the strictest alignment) If not, define ALIGN and +NALIGN. + +PLEASE LEAVE THE MAIN CODE FREE FROM #ifdefs. I'm interested in fixes +for porting it to new machines, only if they do not damage the code. +I'd prefer to find a least common denominator, or hide the difference +in a macro. + +People interested in reading and understanding this code should start +in defs.h (after this file, of course) -- it contains all the macros. +It'll help if you've read the following references. After that, study +globals.c, malloc.c, verify.c and dumpheap.c. Running testmalloc and +following the development of the heap is recommended. + +After you've ported it, you should be able to run testmalloc without +trouble, as well as the few runs of simumalloc listed in regress, +compiled with the debugging version of malloc. (-DDEBUG defined) + +If you uncover a bug, please enhance testmalloc.c with a case that +triggers the bug, if you can. That way, I can make sure future changes +to malloc don't re-introduce that bug. + +References: + +%A Donald E. Knuth +%T The Art of Computer Programming (Fundamental Algorithms) +%V 1 +%I Addison Wesley +%C Reading, Mass. +%D 1973 + +%A David G. Korn +%A Kiem-Phong Vo +%T In search of a better malloc +%J Proceedings of the USENIX 1985 Summer Conference +%C Portland, Oregon +%D June 1985 +%P 489-506 + +%A C.J. Stephenson +%T Fast fits: new methods for dynamic storage allocation +%J Proceedings of the Ninth ACM Symposium on Operating Systems Principles +%C Bretton Woods, New Hampshire +%D October 1983 +%P 30-32 +%O abstract only - paper to be published in TOCS +%O published as Operating Systems Review 17:5 +%O also appeared in Scientific American somewhere -- reference in Korn and Vo |