summaryrefslogtreecommitdiff
path: root/share/FAQ
diff options
context:
space:
mode:
authorJoerg Wunsch <joerg@FreeBSD.org>1995-01-02 12:01:59 +0000
committerJoerg Wunsch <joerg@FreeBSD.org>1995-01-02 12:01:59 +0000
commite02a6ccb4ee3b16483fabd00bcc0daf14daf6212 (patch)
treee8678c332a0b384a59af000ac6d8829ce6bbfa03 /share/FAQ
parent5cf1ec501a4e57f62d00c4f50154122cc1264684 (diff)
downloadsrc-test2-e02a6ccb4ee3b16483fabd00bcc0daf14daf6212.tar.gz
src-test2-e02a6ccb4ee3b16483fabd00bcc0daf14daf6212.zip
Notes
Diffstat (limited to 'share/FAQ')
-rw-r--r--share/FAQ/kernel-debug.FAQ424
1 files changed, 404 insertions, 20 deletions
diff --git a/share/FAQ/kernel-debug.FAQ b/share/FAQ/kernel-debug.FAQ
index 0d098c3714bb..3221f82a8c11 100644
--- a/share/FAQ/kernel-debug.FAQ
+++ b/share/FAQ/kernel-debug.FAQ
@@ -1,33 +1,417 @@
- Kernel debugging FAQ
- for FreeBSD 1.1.5.1 and below
+ Kernel debugging FAQ for FreeBSD
-Last modified: $Id: kernel-debug.FAQ,v 1.1 1994/09/11 10:56:06 jkh Exp $
+Last modified: $Id: kernel-debug.FAQ,v 1.2 1994/10/03 03:19:41 gclarkii Exp $
-Here are some instructions for getting kernel debugging working on
-a crash dump, it assumes that you have enough swap space for a crash
-dump.
-*** Start ***
+*** Debugging a kernel crash dump with kgdb ***
-Config you're kernel using config -g
+ Here are some instructions for getting kernel debugging working on a
+ crash dump, it assumes that you have enough swap space for a crash
+ dump. If you happen to have multiple swap partitions with the first
+ one being too small to keep the dump, you can configure your kernel to
+ use an alternate dump device (in the ``kernel'' line). Dumps to non-
+ swap devices (e.g. tapes) are currently not supported.
-Remove ${STRIP} -x $@; from the Makefile for the kernel so it doesn't
-get stripped.
+ Config your kernel using config -g
-When the kernel's been built make a copy of it, say 386BSD.debug, and
-then run strip -x on the original. Install the original as normal.
+ Remember that you need to specify ``options DODUMP'' in your config
+ file in order to get kernel core dumps.
-Now, after a crash dump, go to /sys/compile/WHATEVER and run kgdb. From kgdb
-do:
+ When the kernel's been built make a copy of it, say kernel.debug, and
+ then run strip -x on the original. Install the original as normal.
+ You may also install the unstripped kernel, but symtab lookup time
+ for some programs might drastically increase.
-symbol-file 386BSD.debug
-exec-file /var/crash/system.0
-core-file /var/crash/ram.0
+ If you are testing a new kernel (e.g. by typing the new kernel's
+ name at the boot prompt), but need to boot a different one in order
+ to get your system up & running again, do boot it only into single
+ user state (the -s flag at the boot prompt), and then perform the
+ following steps:
-and viola, you can debug the crash dump using the kernel sources just like
-you can for any other program.
+ fsck -p
+ mount -a -t ufs # so your file system for /var/crash is writable
+ savecore -N /kernel.panicked /var/crash
+ exit # ...to multi-user
+ This instructs savecore to use another kernel for symbol name
+ extraction; it would default to the currently running kernel
+ otherwise.
+ Now, after a crash dump, go to /sys/compile/WHATEVER and run
+ kgdb. From kgdb do:
- Paul Richards, FreeBSD core team member.
+ symbol-file kernel.debug
+ exec-file /var/crash/system.0
+ core-file /var/crash/ram.0
+
+ and voila, you can debug the crash dump using the kernel sources
+ just like you can for any other program.
+
+ If your kernel panicked due to a trap (perhaps the most common case
+ for getting a core dump), the following trick might help you. Examine
+ the stack (`where') and look for the stack frame in the function
+ trap(). Go `up' to that frame, and then type:
+
+ frame frame->tf_ebp frame->tf_eip
+
+ This will tell kgdb to go to the stack frame explicitly named by a
+ frame pointer and instruction pointer, which is the location where
+ the trap occured. There are still some bugs in kgdb (you can go
+ `up' from there, but not `down'; the stack trace will still remain
+ as it was before going to here), but generally this method will lead
+ you much closer to the failing piece of code.
+
+ Here's a script log of a kgdb session illustrating the above. Long
+ lines have been folded to improve readability, and the lines are
+ numbered for reference. Despite of this, it's a real-world error
+ trace taken during the development of the pcvt console driver.
+
+ 1:Script started on Fri Dec 30 23:15:22 1994
+ 2:uriah # cd /sys/compile/URIAH
+ 3:uriah # kgdb kernel /var/crash/vmcore.1
+ 4:Reading symbol data from /usr/src/sys/compile/URIAH/kernel...done.
+ 5:IdlePTD 1f3000
+ 6:panic: because you said to!
+ 7:current pcb at 1e3f70
+ 8:Reading in symbols for ../../i386/i386/machdep.c...done.
+ 9:(kgdb) where
+ 10:#0 boot (arghowto=256) (../../i386/i386/machdep.c line 767)
+ 11:#1 0xf0115159 in panic ()
+ 12:#2 0xf01955bd in diediedie () (../../i386/i386/machdep.c line 698)
+ 13:#3 0xf010185e in db_fncall ()
+ 14:#4 0xf0101586 in db_command (-266509132, -266509516, -267381073)
+ 15:#5 0xf0101711 in db_command_loop ()
+ 16:#6 0xf01040a0 in db_trap ()
+ 17:#7 0xf0192976 in kdb_trap (12, 0, -272630436, -266743723)
+ 18:#8 0xf019d2eb in trap_fatal (...)
+ 19:#9 0xf019ce60 in trap_pfault (...)
+ 20:#10 0xf019cb2f in trap (...)
+ 21:#11 0xf01932a1 in exception:calltrap ()
+ 22:#12 0xf0191503 in cnopen (...)
+ 23:#13 0xf0132c34 in spec_open ()
+ 24:#14 0xf012d014 in vn_open ()
+ 25:#15 0xf012a183 in open ()
+ 26:#16 0xf019d4eb in syscall (...)
+ 27:(kgdb) up 10
+ 28:Reading in symbols for ../../i386/i386/trap.c...done.
+ 29:#10 0xf019cb2f in trap (frame={tf_es = -260440048, tf_ds = 16, tf_\
+ 30:edi = 3072, tf_esi = -266445372, tf_ebp = -272630356, tf_isp = -27\
+ 31:2630396, tf_ebx = -266427884, tf_edx = 12, tf_ecx = -266427884, tf\
+ 32:_eax = 64772224, tf_trapno = 12, tf_err = -272695296, tf_eip = -26\
+ 33:6672343, tf_cs = -266469368, tf_eflags = 66066, tf_esp = 3072, tf_\
+ 34:ss = -266427884}) (../../i386/i386/trap.c line 283)
+ 35:283 (void) trap_pfault(&frame, FALSE);
+ 36:(kgdb) frame frame->tf_ebp frame->tf_eip
+ 37:Reading in symbols for ../../i386/isa/pcvt/pcvt_drv.c...done.
+ 38:#0 0xf01ae729 in pcopen (dev=3072, flag=3, mode=8192, p=(struct p\
+ 39:roc *) 0xf07c0c00) (../../i386/isa/pcvt/pcvt_drv.c line 403)
+ 40:403 return ((*linesw[tp->t_line].l_open)(dev, tp));
+ 41:(kgdb) list
+ 42:398
+ 43:399 tp->t_state |= TS_CARR_ON;
+ 44:400 tp->t_cflag |= CLOCAL; /* cannot be a modem (:-) */
+ 45:401
+ 46:402 #if PCVT_NETBSD || (PCVT_FREEBSD >= 200)
+ 47:403 return ((*linesw[tp->t_line].l_open)(dev, tp));
+ 48:404 #else
+ 49:405 return ((*linesw[tp->t_line].l_open)(dev, tp, flag));
+ 50:406 #endif /* PCVT_NETBSD || (PCVT_FREEBSD >= 200) */
+ 51:407 }
+ 52:(kgdb) print tp
+ 53:Reading in symbols for ../../i386/i386/cons.c...done.
+ 54:$1 = (struct tty *) 0x1bae
+ 55:(kgdb) print tp->t_line
+ 56:$2 = 1767990816
+ 57:(kgdb) up
+ 58:#1 0xf0191503 in cnopen (dev=0x00000000, flag=3, mode=8192, p=(st\
+ 59:ruct proc *) 0xf07c0c00) (../../i386/i386/cons.c line 126)
+ 60: return ((*cdevsw[major(dev)].d_open)(dev, flag, mode, p));
+ 61:(kgdb) up
+ 62:#2 0xf0132c34 in spec_open ()
+ 63:(kgdb) up
+ 64:#3 0xf012d014 in vn_open ()
+ 65:(kgdb) up
+ 66:#4 0xf012a183 in open ()
+ 67:(kgdb) up
+ 68:#5 0xf019d4eb in syscall (frame={tf_es = 39, tf_ds = 39, tf_edi =\
+ 69: 2158592, tf_esi = 0, tf_ebp = -272638436, tf_isp = -272629788, tf\
+ 70:_ebx = 7086, tf_edx = 1, tf_ecx = 0, tf_eax = 5, tf_trapno = 582, \
+ 71:tf_err = 582, tf_eip = 75749, tf_cs = 31, tf_eflags = 582, tf_esp \
+ 72:= -272638456, tf_ss = 39}) (../../i386/i386/trap.c line 673)
+ 73:673 error = (*callp->sy_call)(p, args, rval);
+ 74:(kgdb) up
+ 75:Initial frame selected; you cannot go up.
+ 76:(kgdb) quit
+ 77:uriah # exit
+ 78:exit
+ 79:
+ 80:Script done on Fri Dec 30 23:18:04 1994
+
+ Comments to the above script:
+
+ line 6: this is a dump taken from within DDB (see below), hence the
+ panic comment ``because you said to!'', and a rather long
+ stack trace; the initial reason for going into DDB has been
+ a page fault trap though
+
+ line 20: the location of function ``trap()'' in the stack trace
+
+ line 36: force usage of a new stack frame, kgdb responds and displays
+ the source line where the trap happened; from looking at the
+ code, there's a high probability that either the pointer
+ access for ``tp'' was messed up, or the array access was
+ out of bounds
+
+ line 52: the pointer looks suspicious, but happens to be a valid
+ address...
+
+ line 56: ... but obviously points to garbage, so we have found our
+ error, sigh! [For those uncommon with that particular piece
+ of code: tp->t_line refers to the line discipline of the
+ console device here, which must be a rather small integer
+ number.]
+
+
+
+*** Post-mortem analysis of a dump ***
+
+ What to do if a kernel dumped core but you didn't expect it, and it's
+ therefore not compiled using config -g?
+
+ Not everything is lost here. Don't panic. :-)
+
+ Of course, you still need to configure all your kernels with the
+ DODUMP option being set, otherwise you won't get a core dump at all.
+ (This is for safety reasons in the default kernels, to avoid them
+ trying to dump e.g. during system installation where there's no
+ FreeBSD partition at all and valuable data on the disk could be
+ destroyed.)
+
+ Go to your kernel compile directory, and edit the line containing
+ COPTFLAGS?=-O. Add the `-g' option there (but DON'T change anything
+ on the level of optimization). If you do already know roughly the
+ probable location of the failing piece of code (e.g., the `pcvt'
+ driver in the example above), remove all the object files for this
+ code. Rebuild the kernel. Due to the time stamp change on the
+ Makefile, there will be some other object files rebuild, e.g.
+ trap.o. With a bit of luck, the added -g option won't change
+ anything for the generated code, so you'll finally get a new kernel
+ with similiar code to the faulting one but some debugging symbols.
+ You should at least verify the old and new sizes with the `size'
+ command; if they mismatch, you probably need to give up here.
+
+ Go and examine the dump as described above. The debugging symbols
+ might be incomplete for some places (as can be seen in the stack trace
+ in the example above: some functions are displayed without line
+ numbers and argument lists). If you need more debugging symbols,
+ remove the appropriate object files and repeat the kgdb session until
+ you know enough.
+
+ All this is not guaranteed to work, but most likely will do it fine.
+
+
+
+*** On-line kernel debugging using DDB ***
+
+ While kgdb as an offline debugger provides a very high level of user
+ interface (e.g. it can lookup source files, display C structures
+ etc.), there are some things it cannot do. The most important ones
+ being breakpointing and single-stepping kernel code.
+
+ If you need to do low-level debugging on your kernel, there's an on-
+ line debugger available called DDB. It allows to set breakpoints,
+ single-step kernel functions, examine and change kernel variables
+ etc. It can however not access kernel source files, and it does
+ only have access to the global and static symbols, but not to the
+ full debug information (including type and line number information)
+ like kgdb.
+
+ To configure your kernel to include DDB, add the option lines
+
+ options DDB
+ options "SYMTAB_SPACE=XXXX"
+
+ to your config file, and rebuild. XXXX is the amount of space to be
+ reserved into a global array DDB examines to find its symbols at run
+ time. It must be large enough to hold all symbols, but not too
+ large at all to avoid wasting space. 100000 Bytes are a good first
+ bet for a ``normal'' kernel. The link stage will tell you about the
+ usage of the symtab space, you'll see something like:
+
+ dbsym: need 98765; avail 100000
+
+ If the amount of allocated space has been too small, the above
+ message is accompanied by the following error message:
+
+ not enough room in db_symtab array
+
+ and the link stage fails. You then need to increase the number,
+ reconfig and recompile. If your config(8) has been compiled to not
+ remove the old compile directory before continuing (this is a
+ compile-time option [CONFIG_DONT_CLOBBER]), you need to remove
+ db_aout.o prior to recompilation; this is the only file being
+ affected by the SYMTAB_SPACE option.
+
+
+ Once your DDB kernel is running, there are several ways to enter
+ DDB. The first (and most early) way is to set the boot flag `-d'
+ (right at the boot prompt). The kernel will start up in debug mode
+ and enter DDB prior to any device probing. Hence you are able to
+ even debug the device probe/attach functions.
+
+ The second scenario is a hot-key on the keyboard, usually Ctrl-Alt-
+ ESC. (For syscons, this can be remapped, and some of the
+ distributed maps do this, so watch out.) There are patches
+ available for a COMCONSOLE kernel, ask me (joerg@FreeBSD.org) for
+ them.
+
+ The third way is that any panic condition will branch to DDB if the
+ kernel is configured to use it. (Thus it is not wise to configure a
+ kernel with DDB for a machine running unattended.)
+
+
+ The DDB commands roughly resemble some gdb commands. The first you
+ probably need is to set a breakpoint:
+
+ b function-name
+ b address
+
+ Numbers are taken hexadecimal by default, but to make them distinct
+ from symbol names, hex numbers starting with the letters `a' - `f'
+ need to be preceded with `0x' (for other numbers, this is optional).
+ Simple expressions are allowed, e.g. ``function-name + 0x103''.
+
+ To continue the operation of an interrupted kernel, simply type
+
+ c
+
+ To get a stack trace, use
+
+ trace
+
+ Note that when entering DDB via a hot-key, the kernel is currently
+ servicing an interrupt, so the stack trace might be not of much use
+ for you.
+
+ If you want to remove a breakpoint, use
+
+ del
+ del address-expression
+
+ The first form will be accepted immediately after a breakpoint hit,
+ and deletes the current breakpoint. The second form can remove any
+ breakpoint, but you need to specify the exact address, as it can be
+ obtained from
+
+ show b
+
+ To single-step the kernel, try
+
+ s
+
+ This will step into functions, but you can make DDB trace them until
+ the matching return statement is reached by
+
+ n
+
+ NOTE: this is different from gdb's ``next'' statement, it's like
+ gdb's ``finish''.
+
+ To examine data from memory, use e.g.
+
+ x/wx 0xf0133fe0,40
+ x/hd db_symtab_space
+ x/bc termbuf,10
+ x/s stringbuf
+
+ for word/halfword/byte access, and hexadecimal/decimal/character/
+ string display. The number after the comma is the object count.
+ To display the next 0x10 items, simply use
+
+ x ,10
+
+ Similiarly, use
+
+ x/ia foofunc,10
+
+ to disassemble the first 0x10 instructions of foofunc, and display
+ them along with their offset from the beginning of foofunc.
+
+ To modify the memory, use the write command:
+
+ w/b termbuf 0xa 0xb 0
+ w/w 0xf0010030 0 0
+
+ The command modifier (b/h/w) specifies the size of the data to be
+ writtten, the first following expression is the address to write to,
+ the remainder is interpreted as data to write to successive memory
+ locations.
+
+ If you need to know the current registers, use
+
+ show reg
+
+ Alternatively, you can display a single register value by e.g.
+
+ print $eax
+
+ and modify it by
+
+ set $eax new-value
+
+
+ Should you need to call some kernel functions from DDB, simply
+ say
+
+ call func(arg1, arg2, ...)
+
+ The return value will be printed.
+
+ For a ps-style summary of all running processes, use
+
+ ps
+
+
+
+ Well, you've now examined why your kernel failed, and you wish to
+ reboot. Remember that, depending on the severity of previous
+ malfunctioning, not all parts of the kernel might still be working
+ as expected. Perform one of the following actions to shut down and
+ reboot your system:
+
+
+ call diediedie()
+
+ (must usually be followed by another ``c[ontinue]'' statement),
+ will cause your kernel to dump core and reboot, so you can later
+ analyze the core on a higher level with kgdb.
+
+
+ call boot(0)
+
+ might be a good way to cleanly shut down the running system, sync()
+ all disks, and finally reboot. As long as the disk and file system
+ interfaces of the kernel are not damaged, this might be a good way
+ for an almost clean shutdown.
+
+
+ call cpu_reset()
+
+ ...is the final way out of the desaster, almost similiar to hitting
+ the Big Red Button.
+
+
+
+*** What to do if i want to debug a console driver? ***
+
+ Since you need a console driver to run DDB on, things are more
+ complicated if the console driver itself is flakey. You might
+ remember the ``options COMCONSOLE'' line, and hook up a standard
+ terminal onto your first serial port. DDB works on any configured
+ console driver, of course it also works on a COMCONSOLE.
+
+
+
+ Paul Richards, FreeBSD core team member. (paul@FreeBSD.org)
+ J"org Wunsch (joerg@FreeBSD.org)