Опубликован: 06.08.2012 | Уровень: специалист | Доступ: платный
Лекция 33:

Custom kernels

Finding out what really happened

In general, you start analyzing a panic dump in the stack frame that called panic, but in the case of the fatal trap that we have here, the most important stack frame is the one below trap, in this case frame 5. That's where things went wrong. Select it with the frame command, abbreviated to f, and list the code with list (or l):

(kgdb) f 5
#5 0xc01c434b in malloc (size=1024, type=0xc03c3c60, flags=0)
    at ../../kern/kern_malloc.c:233
233             va = kbp->kb_next;
(kgdb) l
228                }
229                freep->next = savedlist;
230                if(kbp->kb_last == NULL)
231                        kbp->kb_last = (caddr_t)freep;
232                }
233             va = kbp->kb_next;
234             kbp->kb_next = ((struct freelist *)va)->next;
235    #ifdef INVARIANTS
236             freep = (struct freelist *)va;
237             savedtype = (const char *) freep->type->ks_shortdesc;
(kgdb)

You might want to look at the local (automatic) variables. Use info local, which you can abbreviate to I loc:

(kgdb) I loc
type = (struct malloc_type *) 0xc03c3c60
kbp = (struct kmembuckets *) 0xc03ebc68
kup = (struct kmemusage *) 0x0
freep = (struct freelist *) 0x0
indx = 10
npg = -1071714292
allocsize = -1069794208
s=6864992
va = 0xffffffff <Address 0xffffffff out of bounds>
cp = 0x0 savedlist = 0x0
ksp = (struct malloc_type *) 0xffffffff
(kgdb)

The line where the problem occurs is 233:

233   va =kbp->kb_next;

Look at the structure kbp:

(kgdb) p*kbp $2 = {
kb_next = 0xffffffff <Address 0xffffffff out of bounds>,
kb_last = 0xc1a31000 "",
kb_calls = 83299,
kb_total = 1164,
kb_elmpercl = 4,
kb_totalfree = 178,
kb_highwat = 20,
kb couldfree = 3812
}

The problem here is that the pointer kb_next is set to Oxffffffff . It should contain a valid address, but as gdb observes, this isn't not valid.

So far we have found that the crash is in malloc, and that it's caused by an invalid pointer in an internal data structure. malloc is a function that is used many times a second by all computers. It's unlikely that the bug is in malloc. In fact, the most likely cause is that a function that has used memory allocated by malloc has overwritten its bounds and hit malloc's data structures.

What do we do now? To quote fortune:

The seven eyes of Ningauble the Wizard floated back to his hood as he reported to Fafhrd: "I have seen much, yet cannot explain all. The Gray Mouser is exactly twenty-five feet below the deepest cellar in the palace of Gilpkerio Kistomerces. Even though twenty-four parts in twenty-five of him are dead, he is alive.

"Now about Lankhmar. She's been invaded, her walls breached everywhere and desperate fighting is going on in the streets, by a fierce host which out-numbers Lankhmar's inhabitants by fifty to one -and equipped with all modern weapons. Yet you can save the city."

"How?" demanded Fafhrd.

Ningauble shrugged. "You're a hero. You should know."

-- Fritz Leiber, from "The Swords of Lankhmar"

From here on, you're on your own. If you get this far, the FreeBSD-hackers mailing list may be interested in giving suggestions.

Бехзод Сайфуллаев
Бехзод Сайфуллаев
Узбекистан, Бухара, Бухарский институт высоких технологий, 2013
Василь Остапенко
Василь Остапенко
Россия