Discussion:
gdbstub initial code, v12
Oleg Nesterov
2010-09-30 18:23:20 UTC
Permalink
Changes: fully implement g/G/p/P. Well, allmost.

I didn't expect this is so hairy. And, unfortunately, all
this code is x86 specific. I tried to avoid this very much,
but failed.

Also, it doesn't really support FPU. gdb has its own idea
about FPU regs and uses struct i387_fxsave. I failed to
understand i387_fxsave_to_cache/i387_cache_to_fxsave,
probably I'll simply copy-and-paste this code later.
Right now $G doesn't change FPU registers. $P does, but
mostly incorrectly.

Why gdb doesn't use $p after $P? It always fetches all
registers via $G, I don't even know how can I test $p.
Perhaps because gdbserver doesn't implement p/P.

Btw, I didn't veryfy this, but gdbserver looks obviously
wrong wrt 64/32 bit support. "struct reg *reg_defs" is global,
it is initalized by ->arch_setup() when we attach the first
inferior (or a new one after detach). But every inferior
needs its own reg_defs, I think. Otherwize even the simplest
x86_fill_gregset/x86_store_gregset can be confused.

Next: watchpoints.

Oleg.
Oleg Nesterov
2010-10-04 18:10:53 UTC
Permalink
Post by Oleg Nesterov
Right now $G doesn't change FPU registers.
It turns out, $G can't even set the REGSET_GENERAL reg correctly.
I noticed this only because the tracee sometimes dies during the
single-stepping inside the dynamic linker. Indeed, $G passes the
wrong fs_index==0 to genregs_set().
Post by Oleg Nesterov
Next: watchpoints.
While trying to understand what does this mean, I hit another bug
in ugdb. A multithreaded tracee can "hang" if gdb simulates
watchpoints with single-steps + mem-fetch/compare. Still can't
understand why this happens, this should be resolved. This never
happens if the tracee is single-threaded.

Oleg.
Oleg Nesterov
2010-10-05 17:27:29 UTC
Permalink
Post by Oleg Nesterov
While trying to understand what does this mean, I hit another bug
in ugdb.
No, /usr/bin/gdb is buggy.
Post by Oleg Nesterov
A multithreaded tracee can "hang" if gdb simulates
watchpoints with single-steps + mem-fetch/compare.
It doesn't, but gdb "forgets" about the pending "Stop:" notification,
see below.
Post by Oleg Nesterov
This never
happens if the tracee is single-threaded.
This is clear. And this doesn't happen with the real gdbbserver.
Probably timing issues, with ugdb the tracee can report itself
as stopped much faster (not because ugdb is faster, but because
it sends the notification before the tracee actually stops).


Consider this simplified artifitial test-case:

#include <stdio.h>
#include <unistd.h>
#include <pthread.h>

struct {
char x[128];
} VAR;

void *tfunc(void *arg)
{
for (;;)
;
}

int main(void)
{
pthread_t thr;

printf("pid: %d\n", getpid());

pthread_create(&thr, NULL, tfunc, NULL);

tfunc(NULL);

return 0;
}

It just creates 2 threads, each threads spins in the trivial endless loop.
VAR is the dummy variable for "watch", it is never changed. It should be big
enough to ensure can_use_hardware_watchpoint->default_region_ok_for_hw_watchpoint()
returns 0 and gdb falls back to simulating.

(gdb) set target-async on
(gdb) set non-stop
(gdb) target extended-remote :2000
(gdb) attach 1404
(gdb) watch VAR
(gdb) c -a

After that, sooner or later the tracee "hangs". gdb sleeps in poll(), both
traced threads sleep after reporting SIGTRAP (single-step).

The more or less "typical" transcript is:

[... snip ...]
=> s
<= OK
<= %Stop:T05thread:p57c.57d;
=> vStopped
<= T05thread:p57c.57c;
=> vStopped
<= OK
=> g
<= 50994242000000000000000000000000761f0a16677f000000000000000000...
=> m600a60,80
<= 00000000000000000000000000000000000000000000000000000000000000...
=> g
<= 50994242000000000000000000000000761f0a16677f000000000000000000...
=> s
<= OK
<= %Stop:T05thread:p57c.57d;
=> Hgp57c.57c
<= OK
=> g
<= 000000000000000030064000000000008162e115677f000000000000000000...
=> m600a60,80
<= 00000000000000000000000000000000000000000000000000000000000000...
=> Hcp57c.57c
<= OK
=> s#73$vStopped
<= OK#9a$OK
<= %Stop:T05thread:p57c.57c;
=> Hgp57c.57d#ec$g
<= OK#9a$50994242000000000000000000000000761f0a16677f000000000000...
=> m600a60,80
<= 00000000000000000000000000000000000000000000000000000000000000...
=> g
<= 50994242000000000000000000000000761f0a16677f000000000000000000...
=> Hcp57c.57d
<= OK
=> s#73$vStopped
<= OK#9a$OK
<= %Stop:T05thread:p57c.57d;
=> Hgp57c.57c
<= OK
=> g
<= 000000000000000030064000000000008162e115677f000000000000000000...
=> m600a60,80
<= 00000000000000000000000000000000000000000000000000000000000000...
=> Hcp57c.57c
<= OK
=> s#73$vStopped
<= OK#9a$OK

Everything is fine so far. And this is how it works with the real gdbserver:
gdb always issues vStopped until it gets the final "OK". Like it should,
I guess.

So. At this point gdb knows that 57d is stopped, and it sends '$s' to 57c.

<= %Stop:T05thread:p57c.57c;

57c reports the next stop, but gdb temporary "ignores" this notification
and plays with 57d. This is fine.

=> Hgp57c.57d#ec$g
<= OK#9a$50994242000000000000000000000000761f0a16677f000000000000...
=> m600a60,80
<= 00000000000000000000000000000000000000000000000000000000000000...
=> g
<= 50994242000000000000000000000000761f0a16677f000000000000000000...
=> Hcp57c.57d
<= OK
=> s#73$vStopped

Finally it does vStopped. But, by this time 57d is stopped again (note
that gdb sends '$s' before vStopped).

<= OK#9a$T05thread:p57c.57d;

ugdb reports 57d as stopped.

_If_ gdb sent vStopped right now, he would get "OK" in return. But it
doesn't. Again, this is not the bug yet.

=> Hgp57c.57c
<= OK
=> g
<= 000000000000000030064000000000008162e115677f000000000000000000...
=> m600a60,80
<= 00000000000000000000000000000000000000000000000000000000000000...
=> Hcp57c.57c
<= OK
=> s
<= OK

Now, 57c stops. But it can't report this stop. We are "inside" the
"vStopped" sequence, ugdb waits for gdb which should complete it
by sending the next vStopped packet which should find that 57c is
stopped.
If the stub receives a `vStopped' packet and there are no additional
stop events to report, the stub shall return an `OK' response. At this
point, if further stop events occur, the stub shall send a new stop
reply notification,

IOW, 57c should not send the notification and it doesn't. But this confuses
gdb.

In short: gdb forgets that the last 'vStopped' didn't return 'OK' and
thus we probably have more stopped tracees but waits for the next
notification. ugdb doesn't send the notification because it shouldn't
do this until gdb sends the "final" vStopped according to the doc
above.

Another similar transcript:

[... snip ...]
=> Hcp589.589
<= OK
=> s
<= OK#9a%Stop:T05thread:p589.589;
=> vStopped
<= OK
=> g
<= 0000000000000000300640000000000081621fd01f7f000000000000000000...
=> m600a60,80
<= 00000000000000000000000000000000000000000000000000000000000000...
=> s
<= OK#9a%Stop:T05thread:p589.589;
=> vStopped
<= OK
=> g
<= 0000000000000000300640000000000081621fd01f7f000000000000000000...
=> m600a60,80
<= 00000000000000000000000000000000000000000000000000000000000000...
=> s#73$vStopped
<= OK#9a$T05thread:p589.589;
=> Hgp589.58a#c1$g
<= OK#9a$50990141000000000000000000000000761f48d01f7f000000000000...
=> m600a60,80
<= 00000000000000000000000000000000000000000000000000000000000000...
=> g
<= 50990141000000000000000000000000761f48d01f7f000000000000000000...
=> Hcp589.58a
<= OK
=> s
<= OK

Almost the same.


Once again, whatever I did I failed to reproduce this hang with the
real gdbserver.

So I hacked gdb to send the similar sequence of packets instead, and
result is the same:

getpkt ("Hcp589.589"); [no ack sent]
putpkt ("$OK#9a"); [noack mode]
getpkt ("s"); [no ack sent]
putpkt ("$OK#9a"); [noack mode]
pending stop replies: 1
putpkt ("%Stop:T0506:f089f2deff7f0* ;07:f089f2deff7f0* ;10:d40540*';thread:p589.589;core:1;#94"); [notif]

getpkt ("Hcp589.58a"); [no ack sent]
putpkt ("$OK#9a"); [noack mode]

getpkt ("s"); [no ack sent]
putpkt ("$OK#9a"); [noack mode]
pending stop replies: 2

getpkt ("vStopped"); [no ack sent]
vStopped: acking LWP 1417.1417
putpkt ("$T0506:309101410*"00;07:309101410*"00;10:d40540*';thread:p589.58a;core:1;#24"); [noack mode]
getpkt ("Hcp589.589"); [no ack sent]
putpkt ("$OK#9a"); [noack mode]

getpkt ("s"); [no ack sent]
putpkt ("$OK#9a"); [noack mode]
pending stop replies: 2

Like ugdb, gdbserver does NOT send the new notification, but waits for gdb
which should continue and complete the vStopped sequence.

So, I strongly believe gdb is buggy and should be fixed.

Oleg.
Pedro Alves
2010-10-05 18:30:38 UTC
Permalink
Post by Oleg Nesterov
[... snip ...]
=> s
This is already wrong.

"The stub must support @samp{vCont} if it reports support for
multiprocess extensions (@pxref{multiprocess extensions})."

The stub must also support vCont for non-stop, though I'll give you
that it doesn't appear to be mentioned in the manual, and that
gdb could be more noisy about this.

Look at remote.c:remote_resume, and you'll see that gdb does not
wait for the "OK" after 'c'/'s'/'S'/'C' in non-stop mode.
You're getting "lucky", because when gdb processes the OK you're
sending it, gdb is interpreting that as an 'O' stop reply, but
since 'K' is an uneven number of characters, remote_console_output
happens to just bail out silently.

Now, given this, I won't be surprised if you're seeing races
with ->s, <-OK, ->vCont sequences, as GDB may well be thinking
that the "OK" is a reply to the vCont.
Post by Oleg Nesterov
So, I strongly believe gdb is buggy and should be fixed.
Fix your stub to implement vCont;s/c(/S/C).
--
Pedro Alves
Pedro Alves
2010-10-05 18:35:11 UTC
Permalink
Post by Pedro Alves
Now, given this, I won't be surprised if you're seeing races
with ->s, <-OK, ->vCont sequences, as GDB may well be thinking
that the "OK" is a reply to the vCont.
I meant ->s, <-OK, ->vStopped sequences.
--
Pedro Alves
Oleg Nesterov
2010-10-06 17:19:53 UTC
Permalink
On 10/05, Pedro Alves wrote:
(reordered)
Post by Pedro Alves
Post by Oleg Nesterov
So, I strongly believe gdb is buggy and should be fixed.
Fix your stub to implement vCont;s/c(/S/C).
First of all, I confirm that when I added the (incomplete right
now) support for vCont;s the problem goes away, gdb never forgets
to send the necessary vStopped to consume all stop-reply packets.

Thanks Pedro.
Post by Pedro Alves
Post by Oleg Nesterov
[... snip ...]
=> s
This is already wrong.
Cough. Previously I was told here (on ***@sourceware.org) that
Hc + s/c is enough and I shouldn't worry about vCont;s/c ;)

Currently ugdb only supports vCont;t because otherwise there is
obviously no way to stop all threads.
Post by Pedro Alves
The stub must also support vCont for non-stop, though I'll give you
that it doesn't appear to be mentioned in the manual,
Yes, the manual doesn't explain this. Quite contrary, the decsription
of 'vCont?' definitely looks as if the stub is not obliged to implement
all vCont commands.

And, if the stub must support vCont for non-stop, then why gdb
doesn't complain after 'vCont?' but falls back to '$s' ?
Post by Pedro Alves
Look at remote.c:remote_resume, and you'll see that gdb does not
wait for the "OK" after 'c'/'s'/'S'/'C' in non-stop mode.
Then gdbserver should be fixed? It does send "OK" in response to '$s',
that is why ugdb does this.

And what should be replied back to '$s', nothing? Very strange.
But seems to work... And yes, this explains the hang, gdb thinks
that this "OK" is connected to vStopped.

Again, the documentation is very confusing. Looking at
remote_resume()->remote_vcont_resume()->getpkt() I think that
vCont;s needs "OK". Looking at "D.3 Stop Reply Packets" in
gdb.info I do not see any difference between `s' and `vCont'.

So. I still belive that something is wrong with gdb/gdbserever
but I don't care.


In any case ugdb should fully support vCont, hopefully I'll finish
this tomorrow. Could you answer a couple of questions?

1. Say, $vCont;s or $vCont;s:p-1.-1

I assume, this should ignore the running threads, correct?
IOW, iiuc this 's' applies to all threads which we already
reported as stopped.

2. Say, $vCont;c:pPID.TID;s:p-1.-1

Can I assume that gdb can never send this request as

$vCont;s:p-1.-1;c:pPID.TID ?

If yes, then the implementation will be much simpler, I can
add something like gencounters to ugdb_thread/process. Otherwise
this needs more complications to figure out what should be done
with each tracee.

Thanks,

Oleg.
Pedro Alves
2010-10-06 18:32:31 UTC
Permalink
Post by Oleg Nesterov
Hc + s/c is enough and I shouldn't worry about vCont;s/c ;)
vCont was introduced because with only 'Hc', 's' and 'c', there's
no way to distinguish "step a thread and resume all others" vs "step
a thread and leave others stopped" (scheduler-locking, in gdb lingo).
This was added way before non-stop was added, back in 2002/2003,
I believe. vCont;t was added much later, when non-stop was
introduced.
Post by Oleg Nesterov
Post by Pedro Alves
The stub must also support vCont for non-stop, though I'll give you
that it doesn't appear to be mentioned in the manual,
Yes, the manual doesn't explain this. Quite contrary, the decsription
of 'vCont?' definitely looks as if the stub is not obliged to implement
all vCont commands.
And, if the stub must support vCont for non-stop, then why gdb
doesn't complain after 'vCont?' but falls back to '$s' ?
Because nobody took the trouble to made it complain. As I said,
I'll give you that gdb could be noisier about that...
Post by Oleg Nesterov
Post by Pedro Alves
Look at remote.c:remote_resume, and you'll see that gdb does not
wait for the "OK" after 'c'/'s'/'S'/'C' in non-stop mode.
Then gdbserver should be fixed? It does send "OK" in response to '$s',
that is why ugdb does this.
Think of it as "undefined behavior". It could be made to
error out instead, if somebody cared. Not sure how you got gdb to
send gdbserver 's' or 'c' (well, unless you used
"set remote verbose-resume-packet off", or started gdbserver
with --disable-packet=vCont).
Post by Oleg Nesterov
Again, the documentation is very confusing. Looking at
remote_resume()->remote_vcont_resume()->getpkt() I think that
vCont;s needs "OK". Looking at "D.3 Stop Reply Packets" in
gdb.info I do not see any difference between `s' and `vCont'.
Yeah. It's the problem that those that are very familiar with
the thing get to write docs for it, so may have missed spelling
out things that were obvious to them.

It goes without saying, but ... patches to improve the docs are
always welcome.
Post by Oleg Nesterov
In any case ugdb should fully support vCont, hopefully I'll finish
this tomorrow. Could you answer a couple of questions?
1. Say, $vCont;s or $vCont;s:p-1.-1
I assume, this should ignore the running threads, correct?
IOW, iiuc this 's' applies to all threads which we already
reported as stopped.
Yes.
Post by Oleg Nesterov
2. Say, $vCont;c:pPID.TID;s:p-1.-1
This would be effectively

$vCont;c:pPID.TID;s
Post by Oleg Nesterov
Can I assume that gdb can never send this request as
$vCont;s:p-1.-1;c:pPID.TID ?
If yes, then the implementation will be much simpler, I can
add something like gencounters to ugdb_thread/process. Otherwise
this needs more complications to figure out what should be done
with each tracee.
All GDB currently sends is in gdb/remote.c:remote_vcont_resume.
All vCont packets GDB sends today have the actions ordered
from more specific to less specific --- the most complicated
is something like "vCont;s:pPID.TID;c" (step PID.TID, continue
all others). It will probably make sense to maintain that
ordering, if we ever make a single vCont contain more
actions.
--
Pedro Alves
Oleg Nesterov
2010-10-07 22:59:22 UTC
Permalink
Post by Pedro Alves
Post by Oleg Nesterov
Hc + s/c is enough and I shouldn't worry about vCont;s/c ;)
vCont was introduced because with only 'Hc', 's' and 'c', there's
no way to distinguish "step a thread and resume all others" vs "step
a thread and leave others stopped" (scheduler-locking, in gdb lingo).
Hmm. Not sure I understand this... gdb could issue a series of Hc+c
after s to do "step a thread and resume all others".

But this doesn't matter. Obviously vCont is better and more handy.
Post by Pedro Alves
Think of it as "undefined behavior". It could be made to
error out instead, if somebody cared. Not sure how you got gdb to
send gdbserver 's' or 'c'
I did $ gdb `which gdb` `pidof gdb` to change its behaviour ;)
Post by Pedro Alves
(well, unless you used
"set remote verbose-resume-packet off", or started gdbserver
with --disable-packet=vCont).
Ah, I'd wish I knew this before. Damn, I recall I saw these
disable_packet_xxx code in gdbserver sources, but forgot.
Post by Pedro Alves
Post by Oleg Nesterov
1. Say, $vCont;s or $vCont;s:p-1.-1
I assume, this should ignore the running threads, correct?
IOW, iiuc this 's' applies to all threads which we already
reported as stopped.
Yes.
Post by Oleg Nesterov
2. Say, $vCont;c:pPID.TID;s:p-1.-1
This would be effectively
$vCont;c:pPID.TID;s
Post by Oleg Nesterov
Can I assume that gdb can never send this request as
$vCont;s:p-1.-1;c:pPID.TID ?
If yes, then the implementation will be much simpler, I can
add something like gencounters to ugdb_thread/process. Otherwise
this needs more complications to figure out what should be done
with each tracee.
All GDB currently sends is in gdb/remote.c:remote_vcont_resume.
All vCont packets GDB sends today have the actions ordered
from more specific to less specific
Great.

Pedro, thanks a lot.

Oleg.
Pedro Alves
2010-10-08 00:03:43 UTC
Permalink
Post by Oleg Nesterov
Hmm. Not sure I understand this... gdb could issue a series of Hc+c
after s to do "step a thread and resume all others".
But this doesn't matter. Obviously vCont is better and more handy.
Not in all-stop mode. GDB can not send any other command to the
stub until the stub returns a stop reply to the first 's'. Remember,
there's no vStopped+notifications in the all-stop mode protocol.
--
Pedro Alves
Oleg Nesterov
2010-10-08 00:18:58 UTC
Permalink
Post by Pedro Alves
Post by Oleg Nesterov
Hmm. Not sure I understand this... gdb could issue a series of Hc+c
after s to do "step a thread and resume all others".
But this doesn't matter. Obviously vCont is better and more handy.
Not in all-stop mode.
Ah indeed. I missed you mentioned "This was added way before non-stop"
later.

Oleg.

Loading...