Discussion:
Q: mutlithreaded tracees && clone/exit
Oleg Nesterov
2010-07-16 20:51:47 UTC
Permalink
Hello.

In case this matters, I used gdb-7.1 for testing.

Q1: if gsbstub reported that the tracee has exited (say, we sent
'$X09#..' to gdb), can gsbstub assume it can forget about this thread?
I mean, can it assume that gdb won't send something like 'D;EXITED_PID'?
Or it should keep the info connected to the exited thread until the
explicit detach request? Or may be we should keep some info just to
report this exit code again to the next '$?#3f' request?

Looking at gdb sources/behaviour, I think the answer is yes, it can
forget. But I'd like to have the confirmation.


And. I'd like to let you know that gdb is buggy ;) But it is not very
easy to explain the bug because I don't know the terminology. Let's
consider this particular example:

(gdb) target extended-remote whatever
(gdb) attach PID
...
(gdb) c

Now gdb sleeps in sys_read/sys_recvfrom.

The user presses ^C, gdb sends 3 and waits for reply. Suppose that
gdbstub doesn't reply immediately.

The user presses ^C again and acks the "Give up (and stop debugging it)?"
question.

gdb does remote_close()->discard_all_inferiors()->...->exit_inferior_1().
Surprisingly, exit_inferior_1() does not remove this thread from
inferior_list but clears inf->pid.

This means that the subsequent find_inferior_pid() fails and returns NULL,
and gdb segfaults if the user does, say,

(gdb) detach

after that.

I noticed this bug when I found another problem, gdb+gdbserver doesn't
work correctly if the main thread exits. But let's forget about this
problem for now.


The main question is, I do not understand how gdbstub should handle the
multithreaded targets.

Trivial testcase:

void *tfunc(void *arg)
{
getchar();
return NULL;
}

int main(int argc, const char *argv[])
{
pthread_t thr;

printf("PID=%d\n", getpid());

pthread_create(&thr, NULL, tfunc, NULL);

for (;;)
pause();

return 0;
}

Gdbserver:

gdbserver --multi :2000

gdb:

(gdb) file test1
(gdb) target extended-remote :2000
(gdb) attach 16927
Attached to process 16927
...
0x00000033af60e57d in pause () from /lib64/libpthread.so.0
(gdb)

OK. gdbserver ptraces both threads. But afaics gdb doesn't now this
program is multithreaded, and strace shows that gdb doesn't send
qfThreadInfo request.

Q2: Shouldn't gdbstub let debugger know about sub-threads somehow?

Let'c continue:

(gdb) c
Continuing.

gdbserver resumes both threads. Press enter, the sub-thread exits.

And nothing happens! gdbserver sends nothing to gdb, it just reaps
the tracee silently:

rt_sigsuspend([]) = ? ERESTARTNOHAND (To be restarted)
--- SIGCHLD (Child exited) @ 0 (0) ---
rt_sigreturn(0x11) = -1 EINTR (Interrupted system call)
wait4(-1, 0x7fff2c719fbc, WNOHANG, NULL) = 0
wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], WNOHANG|__WCLONE, NULL) = 16928
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
rt_sigprocmask(SIG_BLOCK, ~[RTMIN RT_1], [], 8) = 0
wait4(-1, 0x7fff2c719fbc, WNOHANG, NULL) = 0
wait4(-1, 0x7fff2c719fbc, WNOHANG|__WCLONE, NULL) = -1 ECHILD (No child processes)
rt_sigsuspend([]

Q3: is it correct? shouldn't we inform the debugger?


So. Afaics, gdb can only find the new thread if the user does
"info threads", or if this thread reports somehow about itself
(say, it gets a signal and gdbserver sends "$T..." with its tid).

Also. gdb can't know the sub-thread has exited unless the user
does "info threads" again, or something like "$TpPID.TGID" gets
"E01" in reply.

Correct?

Q4: is this what we want to implement?


I am asking because that I thought that gdb+gdbserver should
try to work the same way as it works without gdbserver, and
thus it should see clone/exit.

However, gdbserver sends nothing to gdb if the tracee does
pthread_create() or pthread_exit().

Oleg.
Roland McGrath
2010-07-16 21:39:50 UTC
Permalink
[I trimmed the CC because everybody concerned should be on this list.]
Post by Oleg Nesterov
In case this matters, I used gdb-7.1 for testing.
I'm not the gdb expert. So I'll just say what I think is reasonable,
and then we'll figure out where we need to change our thinking and
where need to get gdb changed.
Post by Oleg Nesterov
Q1: if gsbstub reported that the tracee has exited (say, we sent
'$X09#..' to gdb), can gsbstub assume it can forget about this thread?
I think so. In the ptrace-based implementation of gdbserver, it sends
X or W reports after it has done wait and gotten a death status. In
Linux, that means the zombie is reaped and its PID is available for reuse.
Post by Oleg Nesterov
The main question is, I do not understand how gdbstub should handle the
multithreaded targets.
It's not really clear to me when gdb decides to ask for the thread
list. It looks like it only does it at extended-remote attach time if
you have set non-stop mode.
Post by Oleg Nesterov
Q2: Shouldn't gdbstub let debugger know about sub-threads somehow?
That's what I would expect in the abstract. But I know that gdb
didn't used to get new-thread notifications from ptrace either. It
looks like the linux-nat code does track PTRACE_EVENT_CLONE now.
But it may be that the gdbserver code and remote protocol were made
to match how things were when the native ptrace case didn't do that.

gdb also uses higher-level knowledge read from user memory
(libthread_db) for some aspects of thread tracking. I really don't
know how the kernel-thread layer and the thread_db layer fit together
in gdb at this point. But certainly historically it had means to get
by without kernel facilities either for enumerating the live threads
or for notifying of thread creation and death. libthread_db instructs
gdb based on libpthread details read from the user memory, to extract
the thread list and TIDs, and to set breakpoints in libpthread to tell
it about thread creation and death. All that should be superfluous
when proper low-level thread (LWP in gdbspeak) tracking is being done.
But it's there historically, so in practice gdb could get by without
whichever low-level layer (ptrace or remote or whatever) helping it.
Post by Oleg Nesterov
I am asking because that I thought that gdb+gdbserver should
try to work the same way as it works without gdbserver, and
thus it should see clone/exit.
I agree that's how it seems it should be.


Thanks,
Roland
Oleg Nesterov
2010-07-18 17:48:51 UTC
Permalink
Post by Roland McGrath
Post by Oleg Nesterov
Q1: if gsbstub reported that the tracee has exited (say, we sent
'$X09#..' to gdb), can gsbstub assume it can forget about this thread?
I think so. In the ptrace-based implementation of gdbserver, it sends
X or W reports after it has done wait and gotten a death status. In
Linux, that means the zombie is reaped and its PID is available for reuse.
Yes, but this doesn't necessarily mean gdbserver can forget its exit
code (or some internal state), I do not see anything about this in docs.

But yes, I think it can too.
Post by Roland McGrath
Post by Oleg Nesterov
The main question is, I do not understand how gdbstub should handle the
multithreaded targets.
It's not really clear to me when gdb decides to ask for the thread
list.
Never in my (limited) testing.
Post by Roland McGrath
It looks like it only does it at extended-remote attach time if
you have set non-stop mode.
OK, I'll check this. But this doesn't really matter.
Post by Roland McGrath
Post by Oleg Nesterov
Q2: Shouldn't gdbstub let debugger know about sub-threads somehow?
That's what I would expect in the abstract. But I know that gdb
didn't used to get new-thread notifications from ptrace either. It
looks like the linux-nat code does track PTRACE_EVENT_CLONE now.
But it may be that the gdbserver code and remote protocol were made
to match how things were when the native ptrace case didn't do that.
gdbserver tracks PTRACE_EVENT_CLONE, yes. But it doesn't inform gdb.
Post by Roland McGrath
gdb also uses higher-level knowledge read from user memory
(libthread_db) for some aspects of thread tracking.
Well, yes and no (if I understood your message correctly).

I have already looked at this code in horror. I really hope this magic
is not needed for our purposes.

It is gdbserver, not gdb, who uses libthread_db to find sub-threads and
do other things.

gdbserver asks gdb what is the symbol's address (say, _thread_db_list_t_next)
via 'qSymbol'.
Post by Roland McGrath
Post by Oleg Nesterov
I am asking because that I thought that gdb+gdbserver should
try to work the same way as it works without gdbserver, and
thus it should see clone/exit.
I agree that's how it seems it should be.
OK, so far it is not clear to me what should we do. If nothing else,
I can replicate the gdbserver's behaviour. But imho it makes sense to
do something more clever.

However, there is the complication I already mentioned. If the main
thread exits, this confuses gdbserver at least. It sends the "$T05"
packets to gdb, then eventually gdb does vCont;c:pTGID.-1 and gdbserver
doesn't work. It doesn't resume sub-threads, doesn't react to ^C, etc.

I guess, gdbserver shouldn't send '$W' packet in this case, this can
confuse gdb (but I didn't verify this yet). OTOH, it is not clear if
gdbserver can delay this notification until all threads exit. Say,
what should gdbserver do if gdb sends a private signal to the exited
main thread? Or do something else which assumes it alive.

Let's see what other experts think...

Oleg.
Oleg Nesterov
2010-07-18 18:01:57 UTC
Permalink
Post by Oleg Nesterov
I guess, gdbserver shouldn't send '$W' packet in this case, this can
confuse gdb (but I didn't verify this yet).
I meant, I didn't verify what happens if gdb already knows about
sub-threads.

Otherwise it will be confused for sure.
Post by Oleg Nesterov
OTOH, it is not clear if
gdbserver can delay this notification until all threads exit. Say,
what should gdbserver do if gdb sends a private signal to the exited
main thread? Or do something else which assumes it alive.
Yes.

Oleg.
Roland McGrath
2010-07-18 20:21:51 UTC
Permalink
Post by Oleg Nesterov
Yes, but this doesn't necessarily mean gdbserver can forget its exit
code (or some internal state), I do not see anything about this in docs.
It means that any protocol requirement about this would almost certainly be
broken, if there were one. It couldn't be implemented robustly.
Post by Oleg Nesterov
Post by Roland McGrath
It's not really clear to me when gdb decides to ask for the thread
list.
Never in my (limited) testing.
It clearly does have paths to do it in the code.
So we need gdb folks to clarify how those are reached.
Post by Oleg Nesterov
Post by Roland McGrath
Post by Oleg Nesterov
Q2: Shouldn't gdbstub let debugger know about sub-threads somehow?
That's what I would expect in the abstract. But I know that gdb
didn't used to get new-thread notifications from ptrace either. It
looks like the linux-nat code does track PTRACE_EVENT_CLONE now.
But it may be that the gdbserver code and remote protocol were made
to match how things were when the native ptrace case didn't do that.
gdbserver tracks PTRACE_EVENT_CLONE, yes. But it doesn't inform gdb.
I was talking about the non-remote gdb code, not gdbserver.
gdbserver does attach new threads implicitly, but indeed that is
only noticed by gdb if a new thread happens to hit a signal
(breakpoint or whatever).
Post by Oleg Nesterov
I have already looked at this code in horror. I really hope this magic
is not needed for our purposes.
It is gdbserver, not gdb, who uses libthread_db to find sub-threads and
do other things.
Again, I was talking about what gdb does in the non-remote case.
AFAIK, it does the same stuff on top of the remote layer too, but
I'm not sure about that.
Post by Oleg Nesterov
OK, so far it is not clear to me what should we do. If nothing else,
I can replicate the gdbserver's behaviour. But imho it makes sense to
do something more clever.
We need some more feedback from the gdb folks.
Post by Oleg Nesterov
However, there is the complication I already mentioned. If the main
thread exits, this confuses gdbserver at least. It sends the "$T05"
packets to gdb, then eventually gdb does vCont;c:pTGID.-1 and gdbserver
doesn't work. It doesn't resume sub-threads, doesn't react to ^C, etc.
I guess, gdbserver shouldn't send '$W' packet in this case, this can
confuse gdb (but I didn't verify this yet). OTOH, it is not clear if
gdbserver can delay this notification until all threads exit. Say,
what should gdbserver do if gdb sends a private signal to the exited
main thread? Or do something else which assumes it alive.
Yes, it's not clear what is intended or would be right here.
The X/W packets are documented as talking about "the process".
Perhaps some new flavors of notification packets are needed to
distinguish thread-granularity events from process-granularity.


Thanks,
Roland
Jan Kratochvil
2010-07-19 16:01:27 UTC
Permalink
Post by Oleg Nesterov
In case this matters, I used gdb-7.1 for testing.
FSF GDB (not Fedora/RHEL GDB) probably.
Post by Oleg Nesterov
Q1: if gsbstub reported that the tracee has exited (say, we sent
'$X09#..' to gdb), can gsbstub assume it can forget about this thread?
`X' is about processes, not threads ('W'=TARGET_WAITKIND_EXITED,
'X'=TARGET_WAITKIND_SIGNALLED).

Threads death is handled by GDB-driven 'T' packet (remote_thread_alive).

(I just mostly read the GDB sources, I am intact by the remote GDB stuff.)
Post by Oleg Nesterov
I mean, can it assume that gdb won't send something like 'D;EXITED_PID'?
TARGET_WAITKIND_EXITED and TARGET_WAITKIND_SIGNALLED in
handle_inferior_event() call target_mourn_inferior(), this is very terminal.
Post by Oleg Nesterov
Looking at gdb sources/behaviour, I think the answer is yes, it can
forget. But I'd like to have the confirmation.
Yes, I also think so. I cannot give the confirmation.
Post by Oleg Nesterov
And. I'd like to let you know that gdb is buggy ;)
Please file those bugs while discussing them here:
http://sourceware.org/bugzilla/enter_bug.cgi?product=gdb
Post by Oleg Nesterov
The user presses ^C, gdb sends 3 and waits for reply. Suppose that
gdbstub doesn't reply immediately.
IMHO this remote GDB protocol and non-stop mode are primarily tested with
Eclipse-over-MI. Bugs faced by GDB CLI are going to be very common.
Post by Oleg Nesterov
I noticed this bug when I found another problem, gdb+gdbserver doesn't
work correctly if the main thread exits. But let's forget about this
problem for now.
This issue does not work well even with linux-nat.c (local GDB), in the
current development stage of ugdb I believe we do not have to solve it before
linux-nat.c gets fixed first:
GDB hangs with simple multi-threaded program on linux
http://sourceware.org/ml/gdb/2010-07/msg00045.html
Post by Oleg Nesterov
The main question is, I do not understand how gdbstub should handle the
multithreaded targets.
[...]
Post by Oleg Nesterov
(gdb) file test1
(gdb) target extended-remote :2000
(gdb) attach 16927
Attached to process 16927
...
0x00000033af60e57d in pause () from /lib64/libpthread.so.0
(gdb)
OK. gdbserver ptraces both threads. But afaics gdb doesn't now this
program is multithreaded,
Q2: Shouldn't gdbstub let debugger know about sub-threads somehow?
gdb did not ask for it so why gdbserver should tell gdbserver it?

(gdb) info threads
[New Thread 14739.14740] <-- GDB has notified it now.
2 Thread 14739.14740 0x000000349e8a6a6d in nanosleep () at ../sysdeps/unix/syscall-template.S:82
* 1 Thread 14739.14739 0x000000349f007fbd in pthread_join (threadid=140515741927184, thread_return=0x0) at pthread_join.c:89

Eclipse apparently does `info threads' over MI so it is not a problem.
Also as you state in non-stop mode gdb asks for the thread list anyway.
Post by Oleg Nesterov
gdbserver resumes both threads. Press enter, the sub-thread exits.
And nothing happens! gdbserver sends nothing to gdb, it just reaps
...
Post by Oleg Nesterov
Q3: is it correct? shouldn't we inform the debugger?
GDB will sooner or later use the 'T' packet (remote_thread_alive) to reclaim
dead threads.

With libpthread_db (linux-thread-db.c) it just sets
thread_info->private->dying = 1; on TD_DEATH anyway and continues tracking the
threads before its kernel task finally dies.
Post by Oleg Nesterov
So. Afaics, gdb can only find the new thread if the user does
"info threads", or if this thread reports somehow about itself
(say, it gets a signal and gdbserver sends "$T..." with its tid).
Yes, GDB is master of the remote protocol communication. Not the gdbserver.
Post by Oleg Nesterov
Also. gdb can't know the sub-thread has exited unless the user
does "info threads" again, or something like "$TpPID.TGID" gets
"E01" in reply.
Correct?
Q4: is this what we want to implement?
IMO yes, we should first get ugdb a bit on-par with linux-nat.c, don't we?
Post by Oleg Nesterov
I am asking because that I thought that gdb+gdbserver should
try to work the same way as it works without gdbserver, and
thus it should see clone/exit.
GDB has two lists of "threads":

Real libpthread / libthread_db / linux-thread-db.c / struct thread_info *
which is primarily used. Name is displayed by thread_db_pid_to_str().

Then there are also kernel tasks / linux-nat.c / struct lwp_info * which are
provided when libthread_db is not available. This second category IIRC does
not work so well as linux-thread-db.c is more commonly in use (I do not have
a fail testcase offhand, it may work). Name is displayed by
linux_nat_pid_to_str().

Both types of "threads" are displayed by GDB CLI `info threads'. Their name
format differs a bit, according to the display name function (to_pid_to_str).
Post by Oleg Nesterov
However, gdbserver sends nothing to gdb if the tracee does
pthread_create() or pthread_exit().
yes
Post by Oleg Nesterov
gdbserver tracks PTRACE_EVENT_CLONE, yes. But it doesn't inform gdb.
IMO we can tune the non-libpthread mode later, AFAIK it does not work well
with linux-nat.c anyway.
Post by Oleg Nesterov
Post by Roland McGrath
gdb also uses higher-level knowledge read from user memory
(libthread_db) for some aspects of thread tracking.
Well, yes and no (if I understood your message correctly).
I have already looked at this code in horror. I really hope this magic
is not needed for our purposes.
It is gdbserver, not gdb, who uses libthread_db to find sub-threads and
do other things.
gdbserver asks gdb what is the symbol's address (say, _thread_db_list_t_next)
via 'qSymbol'.
i see this can be a problem for ugdb. Guessing we will need to change GDB to
support new variant of proc-service.c working over the GDB protocol wire.
Post by Oleg Nesterov
However, there is the complication I already mentioned. If the main
thread exits, this confuses gdbserver at least.
Replied above, this is a GDB bug even with linux-nat.c first, it was fixed in
Fedora GDB before but for some cases it apparently still does not work.


Thanks,
Jan
Roland McGrath
2010-07-19 22:57:04 UTC
Permalink
Post by Jan Kratochvil
This issue does not work well even with linux-nat.c (local GDB), in the
current development stage of ugdb I believe we do not have to solve it before
GDB hangs with simple multi-threaded program on linux
http://sourceware.org/ml/gdb/2010-07/msg00045.html
On the contrary, this is something where not being ptrace makes an
important difference. The ptrace behavior that makes this difficult to
deal with in linux-nat.c is not at all an inherent issue with the scenario,
it's just a quirk of the ptrace interface. The fact that linux-nat.c
doesn't handle it well means that it's not a big priority to handle it
better than that. But it's certainly not the case that linux-nat.c
improving its support would have anything to do with how ugdb would do it.
Post by Jan Kratochvil
GDB will sooner or later use the 'T' packet (remote_thread_alive) to reclaim
dead threads.
For this to be correct, it must be using multiprocess mode so that its
packet says Tn.m instead of just Tm. In the latter case, the TID m may
have been reused for an unrelated new thread.
Post by Jan Kratochvil
IMO yes, we should first get ugdb a bit on-par with linux-nat.c, don't we?
Sure. But we should carefully note all the ways in which that standard of
comparison is less than ideal.
Post by Jan Kratochvil
Post by Oleg Nesterov
gdbserver asks gdb what is the symbol's address (say, _thread_db_list_t_next)
via 'qSymbol'.
i see this can be a problem for ugdb. Guessing we will need to change GDB to
support new variant of proc-service.c working over the GDB protocol wire.
proc-service.c is already written in terms of the gdb target backend.
I had presumed that all this thread_db layer of concern would happen
above the LWP layer, and the remote protocol supplies the LWP layer.


Thanks,
Roland
Oleg Nesterov
2010-07-20 13:16:15 UTC
Permalink
This post might be inappropriate. Click to display it.
Oleg Nesterov
2010-07-20 14:01:36 UTC
Permalink
Post by Oleg Nesterov
Using the same test-case, I did
Another oddity. Again, the same test-case but I guess any mt program
can be used.

(gdb) set non-stop
(gdb) file test1
(gdb) target extended-remote :2000
(gdb) attach 18245
(gdb) info threads
2 Thread 18245.18246 0x000000375fad65cb in read () from /lib64/libc.so.6
* 1 Thread 18245.18245 0x00000033af60e57d in pause () from /lib64/libpthread.so.0

So far everything is OK. But,

(gdb) help c
...
To continue all stopped threads in non-stop mode, use the -a option.

(gdb) c -a
Continuing.

this sends "vCont;c:p4745.4746" but not "vCont;c:p4745.-1" as I'd expect.
More, it resumes thread 2 while thread 1 is selected.

gdbserver resumes 18246 (sub-thread) correctly. Let's press ^C,

^CQuit
(gdb)

gdb sends "Hgp4745.4745, this selects the main thread 1. It was already
stopped, the sub-thread continues to run.

(gdb) c -a
Continuing.

Now it sends "vCont;c:p4745.4745" and resumes the main thread. But
now gdb doesn't react to ^C and "hangs". (no, SIGINT is not blocked).

Bug?

Oleg.
Jan Kratochvil
2010-07-20 14:11:55 UTC
Permalink
Post by Oleg Nesterov
Post by Oleg Nesterov
Using the same test-case, I did
Another oddity. Again, the same test-case but I guess any mt program
can be used.
(gdb) set non-stop
non-stop mode should be generally used together with:
(gdb) set target-async on
Post by Oleg Nesterov
this sends "vCont;c:p4745.4746" but not "vCont;c:p4745.-1" as I'd expect.
Why is this a problem to list all the threads instead of -1?
Post by Oleg Nesterov
Now it sends "vCont;c:p4745.4745" and resumes the main thread. But
now gdb doesn't react to ^C and "hangs". (no, SIGINT is not blocked).
CTRL-C is not useful in target-async mode.


Regards,
Jan
Oleg Nesterov
2010-07-20 14:47:38 UTC
Permalink
Post by Jan Kratochvil
Post by Oleg Nesterov
Post by Oleg Nesterov
Using the same test-case, I did
Another oddity. Again, the same test-case but I guess any mt program
can be used.
(gdb) set non-stop
(gdb) set target-async on
Hmm, thanks... I'll read the docs about target-async.
Post by Jan Kratochvil
Post by Oleg Nesterov
this sends "vCont;c:p4745.4746" but not "vCont;c:p4745.-1" as I'd expect.
Why is this a problem to list all the threads instead of -1?
But doesn't list all threads? p4745.4746 is the single thread?

And, in any case, gdbserver resumes only this thread.
Post by Jan Kratochvil
Post by Oleg Nesterov
Now it sends "vCont;c:p4745.4745" and resumes the main thread. But
now gdb doesn't react to ^C and "hangs". (no, SIGINT is not blocked).
CTRL-C is not useful in target-async mode.
OK, thanks.

So, it is not possible to enter the CLI mode again after "c -a" ?
(unless the target hits the bp or something, of course).

Oleg.
Jan Kratochvil
2010-07-20 15:08:03 UTC
Permalink
Post by Oleg Nesterov
Post by Jan Kratochvil
Post by Oleg Nesterov
this sends "vCont;c:p4745.4746" but not "vCont;c:p4745.-1" as I'd expect.
Why is this a problem to list all the threads instead of -1?
But doesn't list all threads? p4745.4746 is the single thread?
OK, sorry, I am not used to the protocol.

So yes, but it is immediately followed by enxt vCont for the other thread:
Sending packet: $vCont;c:p4a8f.4a90#b1...Packet received: OK
[...]
Sending packet: $vCont;c:p4a8f.4a8f#e6...Packet received: OK
Post by Oleg Nesterov
And, in any case, gdbserver resumes only this thread.
It seems to resume both in my case. We should provide better reproducers

killall -9 threadit gdbserver;~/t/threadit&p=$!;./gdbserver/gdbserver --multi :1234&sleep 0.5;./gdb -nx -ex 'set non-stop on' -ex 'set target-async on' -ex 'set debug remote 1' -ex 'file ~/t/threadit' -ex 'target extended-remote :1234' -ex "attach $p"
(gdb) c-a

sends vCont twice.
Post by Oleg Nesterov
So, it is not possible to enter the CLI mode again after "c -a" ?
(unless the target hits the bp or something, of course).
CTRL-C BTW works for me; testing FSF GDB HEAD, IMO you should too when I think
about it. Retested it even on FSF GDB 7.1.


Regards,
Jan
Oleg Nesterov
2010-07-20 15:26:20 UTC
Permalink
Post by Jan Kratochvil
Post by Oleg Nesterov
Post by Jan Kratochvil
Post by Oleg Nesterov
this sends "vCont;c:p4745.4746" but not "vCont;c:p4745.-1" as I'd expect.
Why is this a problem to list all the threads instead of -1?
But doesn't list all threads? p4745.4746 is the single thread?
OK, sorry, I am not used to the protocol.
Sending packet: $vCont;c:p4a8f.4a90#b1...Packet received: OK
[...]
Sending packet: $vCont;c:p4a8f.4a8f#e6...Packet received: OK
Post by Oleg Nesterov
And, in any case, gdbserver resumes only this thread.
It seems to resume both in my case. We should provide better reproducers
killall -9 threadit gdbserver;~/t/threadit&p=$!;./gdbserver/gdbserver --multi :1234&sleep 0.5;./gdb -nx -ex 'set non-stop on' -ex 'set target-async on' -ex 'set debug remote 1' -ex 'file ~/t/threadit' -ex 'target extended-remote :1234' -ex "attach $p"
(gdb) c-a
sends vCont twice.
Yes. When I added "set target-async on" as you suggested, gdb sends
vConct twice.

And,
Post by Jan Kratochvil
Post by Oleg Nesterov
So, it is not possible to enter the CLI mode again after "c -a" ?
(unless the target hits the bp or something, of course).
CTRL-C BTW works for me;
^C works too.

Thanks!


So, I'll assume "set target-async on" must be always used if I want
to test something in !all-stop mode. Afaics, this has nothing to do
with gdbserver, it only affects the behaviour of gdb itself.

Oleg.
Roland McGrath
2010-07-20 19:43:32 UTC
Permalink
Post by Jan Kratochvil
(gdb) set target-async on
IMHO if gdb confuses itself without these settings being in lock-step,
there should only be one setting.
Post by Jan Kratochvil
Post by Oleg Nesterov
Now it sends "vCont;c:p4745.4745" and resumes the main thread. But
now gdb doesn't react to ^C and "hangs". (no, SIGINT is not blocked).
CTRL-C is not useful in target-async mode.
Now that's just a bug. Even if known and intended in the code,
it's clearly a misfeature from any sensible user perspective.


Thanks,
Roland
Oleg Nesterov
2010-07-21 07:56:44 UTC
Permalink
Post by Roland McGrath
Post by Jan Kratochvil
(gdb) set target-async on
IMHO if gdb confuses itself without these settings being in lock-step,
there should only be one setting.
Post by Jan Kratochvil
Post by Oleg Nesterov
Now it sends "vCont;c:p4745.4745" and resumes the main thread. But
now gdb doesn't react to ^C and "hangs". (no, SIGINT is not blocked).
CTRL-C is not useful in target-async mode.
Now that's just a bug. Even if known and intended in the code,
it's clearly a misfeature from any sensible user perspective.
To avoid the confusion, CTRL-C works with target-async && non-stop.
It didn't work in my testing because I didn't know about
"set target-async on"

Oleg.
Jan Kratochvil
2010-07-21 08:09:58 UTC
Permalink
Post by Oleg Nesterov
Post by Roland McGrath
Post by Jan Kratochvil
CTRL-C is not useful in target-async mode.
Now that's just a bug. Even if known and intended in the code,
it's clearly a misfeature from any sensible user perspective.
To avoid the confusion, CTRL-C works with target-async && non-stop.
It didn't work in my testing because I didn't know about
"set target-async on"
I was incorrect, not that CTRL-C would not be useful during target-async but
that one probably should use `(gdb) interrupt [-a]' to benefit from the
target-async mode at all.
info '(gdb)Background Execution'


Regards,
Jan
Oleg Nesterov
2010-07-21 11:10:20 UTC
Permalink
Post by Jan Kratochvil
Post by Oleg Nesterov
Post by Roland McGrath
Post by Jan Kratochvil
CTRL-C is not useful in target-async mode.
Now that's just a bug. Even if known and intended in the code,
it's clearly a misfeature from any sensible user perspective.
To avoid the confusion, CTRL-C works with target-async && non-stop.
It didn't work in my testing because I didn't know about
"set target-async on"
I was incorrect, not that CTRL-C would not be useful during target-async but
that one probably should use `(gdb) interrupt [-a]' to benefit from the
target-async mode at all.
info '(gdb)Background Execution'
Yes. But, just in case, CTRL-C works too and does the same.

Oleg.
Jan Kratochvil
2010-07-20 14:21:29 UTC
Permalink
Post by Oleg Nesterov
Just curious, what is Eclipse and MI ?
IDE written in Java, see package eclipse-cdt which uses GDB as its debugger
backend. MI is the protocol Eclipse<->GDB - not the (gdb)-prompt interface
but a different one, see `gdb -i=mi', testsuite/gdb.mi/ and: info '(gdb)GDB/MI'


Regards,
Jan
Oleg Nesterov
2010-07-20 15:07:01 UTC
Permalink
Post by Jan Kratochvil
Post by Oleg Nesterov
Just curious, what is Eclipse and MI ?
IDE written in Java, see package eclipse-cdt which uses GDB as its debugger
backend. MI is the protocol Eclipse<->GDB - not the (gdb)-prompt interface
but a different one, see `gdb -i=mi', testsuite/gdb.mi/ and: info '(gdb)GDB/MI'
Thanks Jan.

Oleg.
Roland McGrath
2010-07-20 19:41:19 UTC
Permalink
This post might be inappropriate. Click to display it.
Oleg Nesterov
2010-07-21 08:30:28 UTC
Permalink
Post by Roland McGrath
Post by Oleg Nesterov
Probably this is fine for gdb. But ugdb was started to prototype the
new general purpose API. Say, vAttach attaches the whole thread group,
there is no way to debug a single thread. Not good in general. The same
for D command and for W/X notifications from gdbserver.
It seems fine and normal for whole process to be the granularity of
attaching. You need to be able to control the individual threads, of
course. But it doesn't really make a lot of sense to "debug" one thread
and not another in the same process.
I disagree. But currently this is off-topic.
Post by Roland McGrath
Post by Oleg Nesterov
However, when this thread exits, gdbserver sends nothing and gdb
continues to wait. For what? Another (main) thead is TASK_TRACED,
it can do nothing unless it is SIGKILLED.
Yes, it seems like gdb is confusing itself here.
Perhaps it is not confused that way when in non-stop mode.
No, I did this testing in non-stop mode. With or without target-async.

Just in case, more info. So, gdb hangs when the sub-thread exits
(to remind, gdbserver sends nothing).

If I press ^C, gdb sends "vCont;t:pTGID.PID" and gdbserver replies
"OK". Now this looks like a bug in gdbserver. This thread no longer
exists, it was already reaped.

So, gdb hangs again after ^C waiting for gdbserver which does nothing.


This is what gdbserver does when the sub-thread exits:

select(5, [3 4], [], [3 4], NULL) = ? ERESTARTNOHAND (To be restarted)
--- SIGCHLD (Child exited) @ 0 (0) ---

(the tracee exits)

read(3, 0x7fffc13431bf, 1) = -1 EAGAIN (Resource temporarily unavailable)
write(5, "+", 1) = 1
rt_sigreturn(0x5) = -1 EINTR (Interrupted system call)
select(5, [3 4], [], [3 4], NULL) = 1 (in [3])
read(3, "+", 1) = 1
read(3, 0x7fffc13434bf, 1) = -1 EAGAIN (Resource temporarily unavailable)
rt_sigprocmask(SIG_BLOCK, ~[RTMIN RT_1], [], 8) = 0
wait4(-1, 0x7fffc134356c, WNOHANG, NULL) = 0
wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], WNOHANG|__WCLONE, NULL) = 6538

(this means release_task(), this thread doesn't exist any longer)

rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
rt_sigprocmask(SIG_BLOCK, ~[RTMIN RT_1], [], 8) = 0
wait4(-1, 0x7fffc134356c, WNOHANG, NULL) = 0
wait4(-1, 0x7fffc134356c, WNOHANG|__WCLONE, NULL) = -1 ECHILD (No child processes)
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
select(5, [3 4], [], [3 4], NULL <unfinished ...>

So, it sends nothing to gdb. When I press ^C, gdb sends vCont and:

select(5, [3 4], [], [3 4], NULL) = 1 (in [4])
--- SIGIO (I/O possible) @ 0 (0) ---
read(4, "$vCont;t:p1989.198a#6f", 8192) = 22
write(4, "$OK#9a", 6) = 6
select(5, [3 4], [], [3 4], NULL <unfinished ...>

gdbserver sends the bogus "OK".


The bug is not "fatal", if I press ^C again gdb sends T, gets the
correct "E01", and detects the fact it has exited. Still this looks
like a obvious bug.

Oleg.
Oleg Nesterov
2010-07-19 13:41:35 UTC
Permalink
Hi.

I am trying to change ugdb.c to multiprocess mode, and now
I hit another bug in gdb.

(gdb) target extended-remote /proc/ugdb
(gdb) attach 980
(gdb) info threads

results in

thread.c:880: internal-error: switch_to_thread: Assertion `inf != NULL' failed.

Once again, I didn't see this problem in !multiprocess mode.

The immediate reason is clear, add_thread_silent() calls
switch_to_thread(minus_one_ptid), and find_inferior_pid(-1)
obviously fails. I guess add_thread_silent() is buggy and
should be fixed in any case.

But it is not clear to me what provokes this bug, gdb works
with gdbserver but not with /proc/ugdb.

Still investigaing...

Oleg.
Oleg Nesterov
2010-07-19 15:34:00 UTC
Permalink
Post by Oleg Nesterov
I am trying to change ugdb.c to multiprocess mode, and now
I hit another bug in gdb.
(gdb) target extended-remote /proc/ugdb
(gdb) attach 980
(gdb) info threads
results in
thread.c:880: internal-error: switch_to_thread: Assertion `inf != NULL' failed.
Once again, I didn't see this problem in !multiprocess mode.
The immediate reason is clear, add_thread_silent() calls
switch_to_thread(minus_one_ptid), and find_inferior_pid(-1)
obviously fails. I guess add_thread_silent() is buggy and
should be fixed in any case.
Yes, this looks obviously wrong.
Post by Oleg Nesterov
But it is not clear to me what provokes this bug, gdb works
with gdbserver but not with /proc/ugdb.
Still investigaing...
Not sure my understanding is correct, and definitely it is not complete.

But, afaics. The bug is triggered because, unlike gdbserver, ugdb.ko
doesn't implement "$T..." command, it just replies "$#00" according to

For any COMMAND not supported by the stub, an empty response
(`$#00') should be returned.

from the docs.

However, gdb can't handle this case in multiprocess mode correctly.
update_thread_list()->prune_threads() treats this reply as not-alive
and calls delete_thread() which marks this THREAD_EXITED.

Later then update_thread_list() calls remote_notice_new_inferior()
and hits the "in_thread_list() && is_exited()" case, this implies
remote_add_thread()->add_thread_silent().

I still do not understand why this all works in !multiprocess mode,
but I give up. I added support for 'T' packet, the problem has gone
away.

Oleg.
Oleg Nesterov
2010-07-20 14:41:19 UTC
Permalink
To simplify, let's consider the simple case. gdb attaches to the
single-threaded process which can never exit. To simplify again, it
can be either RUNNING (gdb sent '$c') or STOPPED (say, gdb sends ^C).

Who maintains this state? gdb, server, both? To illustrate, suppose
that ugdb receives

$c <--- resumes the tracee
$g <--- asks general registers

What should ugdb do?

- reply E01 because it is not stopped?

- stop it, read the regs, reply ?

- stop, read the regs, reply, then resume it again ?


This is connected to another question. From gdb.info:

`H C THREAD-ID'
Set thread for subsequent operations (`m', `M', `g', `G', et.al.).
...

What does this "Set" above actually mean? Does it mean "stop this
thread" as well?



Unrelated question. From gdb.info:

`qC'
Return the current thread ID.

Reply:
`QC THREAD-ID'

Cough. I simply can't understand what is the "current thread" in
general. Perhaps, "current thread" == "the last thread selected
by Hg, or the last thead which reported something like T05:thread" ?

And,

`(anything else)'
Any other reply implies the old thread ID.

This looks even more confusing. What is the "old thread" ?


To clarify, I am mostly asking about the protocol in general,
not about the current implementation in gdb.

Oleg.
Oleg Nesterov
2010-07-21 10:18:26 UTC
Permalink
Post by Oleg Nesterov
[...]
Who maintains this state? gdb, server, both? To illustrate, suppose
that ugdb receives
$c <--- resumes the tracee
$g <--- asks general registers
What should ugdb do?
Even if the protocol documentation theoretically permits this,
if actual gdb does not send it, and actual gdbserver does not
specifically handle it, you don't have to worry about it.
I'd like to know how can I check that gdb can't do this.

OK, it seems to be true, so I'll assume ugdb should return E01.
Post by Oleg Nesterov
- reply E01 because it is not stopped?
- stop it, read the regs, reply ?
- stop, read the regs, reply, then resume it again ?
I'm pretty sure that even if this packet is legitimate in this
context, gdbserver will not stop the target thread just to pull out a
sample of its registers. The values would become invalid the moment
the thread resumed anyway.
Oh. Sure, but I used this scenario only to illustrate the question.
Post by Oleg Nesterov
`H C THREAD-ID'
Set thread for subsequent operations (`m', `M', `g', `G', et.al.).
...
What does this "Set" above actually mean? [...]
(Whatever gdbserver uses it to mean.)
Nice. OK, unless I misread the (non-trivial) sources, gdbserver doesn't
stop the tracee. It merely sets general_thread, set_desired_inferior()
just sets current_inferior in this case.
Post by Oleg Nesterov
To clarify, I am mostly asking about the protocol in general, not
about the current implementation in gdb.
The abstract protocol is not precise enough or well-documented enough
to answer these questions.
Thanks, I knew this from the very beginning ;)
Reference to actual practice is usually
required to implement the normal cases.
IOW, RTFS. Understand.

Yes, I tried to avoid the questions. Until I hit the limitations and
bugs (threading).

As for '$g' in particular... I didn't verify this, but looking at the
sources I strongly believe gdbserver is buggy.

It doesn't check the state of the tracee at all. Usually this doesn't
matter, because (I think) gdb will never send '$g' unless the tracee is
stopped. And if it is stopped, registers should be already cached in
thread_info->regcache_data, gdbserver does fetch_inferior_registers()
in advance when the tracee stops, before gdb actually asks.

However, if gdb sends '$g' when the tracee is not stopped, or if
get_thread_regcache() races with SIGKILL we have the (minor) problems.

regsets_fetch_inferior_registers() will notice that ptrace() fails
and it will complain via perror(), but that is all. It will send the
wrong regcache->registers content to gdb instead of E01.


OK. To answer my question, I think that ugdb can assume that the tracee
must be always stopped by debugger. Unless gdb asks to stop it, of
course.

Oleg.
Oleg Nesterov
2010-07-21 10:48:49 UTC
Permalink
Post by Oleg Nesterov
Post by Oleg Nesterov
[...]
Who maintains this state? gdb, server, both? To illustrate, suppose
that ugdb receives
$c <--- resumes the tracee
$g <--- asks general registers
What should ugdb do?
Even if the protocol documentation theoretically permits this,
if actual gdb does not send it, and actual gdbserver does not
specifically handle it, you don't have to worry about it.
I'd like to know how can I check that gdb can't do this.
It can. It doesn't send exactly '$g' because it has its own cache,
but it can send, say, '$m' when the tracee is running. gdbserver
returns E01 in this case.
Post by Oleg Nesterov
so I'll assume ugdb should return E01.
Yes.


Hmm. Speaking of '$m', I looked in the sources and noticed by
accident that read_inferior_memory() "hides" the breakpoints
inserted by gdbserver.

Of course, this is not documented in gdb.info too.

Oleg.
Oleg Nesterov
2010-07-21 17:04:00 UTC
Permalink
Probably the last question before I'll try to add the threading
support.

I am trying to undestand what ugdb should know about the multiple
inferiors. Looks like, I shouldn't worry at all. They do not exist
from gdbserver's pov. It should handle the multiple attach/detach,
of course, but that is all. Say, qsThreadInfo should list all
attached threads in random order. IOW, there is no any command
which can refer to any particular inferior somehow, explicitly or
implicitly. Only THREAD-ID can matter.

Correct?

-------------------------------------------------------------------------------
But there is one oddity I can't understand in all-stop mode.

(gdb) info inferiors
Num Description Executable
* 3 process 22988 /bin/sleep
2 process 22823 /bin/sleep
1 process 22822 /bin/sleep
(gdb) info threads
* 3 Thread 22988.22988 0x000000375faa6390 in __nanosleep_nocancel () from /lib64/libc.so.6
2 Thread 22823.22823 0x000000375faa6390 in __nanosleep_nocancel () from /lib64/libc.so.6
1 Thread 22822.22822 0x000000375faa6390 in __nanosleep_nocancel () from /lib64/libc.so.6
(gdb) c
Continuing.

The last 'c' correctly resumes all processes/threads. But, gdb
sends the single vCont packet, and this packet is

vCont;c:p59cc.-1

iow, it asks to resume only the 3rd 22988 process.

Does this mean that gdbserver should always ignore the THREAD-ID part
of this command in all-stop mode?

Oh. grep, grep, grep. Looks like this is true, but the code is nontrivial.
Not sure I understand it correctly.

It seems to me, handle_v_cont() first resumes the process it was
asked explicitly, then it does mywait(minus_one_ptid) and eventually
this ptid is passed to linux_resume() to wake up other threads.

I must admit, this looks a bit strange. Why gdb doesn't send
"vCont;c:-1.-1" which looks obviously more logical ?

I am afraid I missed something I should know here, I'll appreciate
any info.


-------------------------------------------------------------------
Well. And perhaps there is another bug....

If I sent SIGCONT to one of the processes, then gdbserver reports
T13 and stops all 3 processes, this looks correct.

gdb looks good too:

Program received signal SIGCONT, Continued.
[Switching to Thread 22822.22822]

But, the next continue

(gdb) c
Continuing.

resumes only one process, that one who recieved the signal. Other
processes are not resumed. Hmm, and CTRL-C does not work after that.

I can't understand this. Bug? Or something I should learn?

Just in case, I tried other sig_kernel_ignore() signals instead of
SIGCONT, the same.

-------------------------------------------------------------------
Let's play with SIGKILL.

(gdb) info inferiors
Num Description Executable
3 process 29449 /bin/sleep
2 process 28409 /bin/sleep
* 1 process 28408 /bin/sleep
(gdb) c
Continuing.

kill -9 29449. gdbserver correctly reports "X9;process:7309", gdb
informs the user:

Program terminated with signal SIGKILL, Killed.
The program no longer exists.
(gdb) info inferiors
Num Description Executable
* 3 <null> /bin/sleep
2 process 28409 /bin/sleep
1 process 28408 /bin/sleep
(gdb) info threads
2 Thread 28409.28409 0x00000000000000b0 in ?? ()
1 Thread 28408.28408 0x3030303030303030 in ?? ()

No selected thread. See `help thread'.
(gdb)

But. In this case gdbserver does not stop other inferiors/threads,
there are running.

Isn't this wrong? This is all-stop mode. Or I misread the docs, this
is very possible.

But afaics, this can't be right.

(gdb) inferior 1
[Switching to inferior 1 [process 28408] (/bin/sleep)]
[Switching to thread 1 (Thread 28408.28408)]
#0 0x3030303030303030 in ?? ()

Indeed, gdb blindly sends m/g requests and gets E01 or the bogus
register data from gdbserver.

(gdb) c
Continuing.
^C
Program received signal SIGINT, Interrupt.
0x3030303030303030 in ?? ()

this finally stops 2 other processes,

(gdb) c
Continuing.

Program received signal SIGSEGV, Segmentation fault.
0x3030303030303030 in ?? ()
(gdb)

Hmm. Yes, /bin/sleep is killed by SIGSEGV. Nice. Well, starting from
here nothing really works.

----------------------------------------------------------------------
So. Reference to actual practice doesn't help, I suspect it is buggy.
Post by Oleg Nesterov
From gdb.info
`vCont[;ACTION[:THREAD-ID]]...'
Resume the inferior, specifying different actions for each thread.
If an action is specified with no THREAD-ID, then it is applied to
any threads that don't have a specific action specified; if no
default action is specified then other threads should remain
stopped in all-stop mode

I can't really parse this... but let's look at gdb's vCont packet again,

vCont;c:p59cc.-1

To me, it looks as "no default action is specified". Doesn't this mean
gdbserver should not have resumed other processes?

Confused.

Oleg.
Roland McGrath
2010-07-21 20:42:03 UTC
Permalink
Post by Oleg Nesterov
I am trying to undestand what ugdb should know about the multiple
inferiors. Looks like, I shouldn't worry at all. They do not exist
from gdbserver's pov. It should handle the multiple attach/detach,
of course, but that is all. Say, qsThreadInfo should list all
attached threads in random order. IOW, there is no any command
which can refer to any particular inferior somehow, explicitly or
implicitly. Only THREAD-ID can matter.
I'm not sure what the story on gdb's multi-process support is and how that
relates to remote protocol details. It's fine to start out by only
worrying about being attached to a single process with all its threads.
Certainly we want to be handling multiple processes well at some point, so
don't structure things so it would be especially difficult. But it is
probably the case that representing that notion on the remote protocol is
all a bit of a muddle, and you shouldn't worry about resolving that before
moving ahead.
Post by Oleg Nesterov
So. Reference to actual practice doesn't help, I suspect it is buggy.
Well, actual practice of rarely-used features, anyway.
Multi-process is newfangled and apparently quite unreliable.
Single-process, multi-thread is what is used a lot.


Thanks,
Roland
Oleg Nesterov
2010-07-23 17:31:34 UTC
Permalink
Post by Roland McGrath
Post by Oleg Nesterov
I am trying to undestand what ugdb should know about the multiple
inferiors. Looks like, I shouldn't worry at all. They do not exist
from gdbserver's pov. It should handle the multiple attach/detach,
of course, but that is all. Say, qsThreadInfo should list all
attached threads in random order. IOW, there is no any command
which can refer to any particular inferior somehow, explicitly or
implicitly. Only THREAD-ID can matter.
I'm not sure what the story on gdb's multi-process support is and how that
relates to remote protocol details. It's fine to start out by only
worrying about being attached to a single process with all its threads.
Certainly we want to be handling multiple processes well at some point, so
don't structure things so it would be especially difficult. But it is
probably the case that representing that notion on the remote protocol is
all a bit of a muddle, and you shouldn't worry about resolving that before
moving ahead.
Post by Oleg Nesterov
So. Reference to actual practice doesn't help, I suspect it is buggy.
Well, actual practice of rarely-used features, anyway.
Multi-process is newfangled and apparently quite unreliable.
Single-process, multi-thread is what is used a lot.
OK. I'll try to send something more or less working on Monday.

I guess, non-stop mode is more interesting/important than all-stop.

Oleg.
Oleg Nesterov
2010-07-26 14:27:59 UTC
Permalink
Post by Oleg Nesterov
Post by Roland McGrath
Well, actual practice of rarely-used features, anyway.
Multi-process is newfangled and apparently quite unreliable.
Single-process, multi-thread is what is used a lot.
OK. I'll try to send something more or less working on Monday.
No, I need more time. Hopefully one more day, may be two, I am
not sure.

The problem is not the coding itself. The problem is that almost
every step needs a lot of experiments with gdb/gdbserver/ugdb.

For example, I spent several hours trying to understand why gdb
ignores '%Stop:' notification and never sends '$vStopped', but
it does send vStopped to the real gdbserver with the same batch
file. The reason was partly my misunderstanding, but also another
bug in gdb and the timing issues.

Or vAttach in the multithreaded case. I'd say that gdbserver is just
wrong here, even if this works in practice. The first qfThreadInfo
after vAttach reports only the main thread. After the first
vCont;t:PID.-1 only the main thread is reported again. Somehow it
provokes gdb to send more 'vCont;t:PID.-1's packets, only then it
reports the new threads via Stop/vStopped.

At first I tried to mimic this behaviour, I was already totally
confused because I also had other problems with gdb - it constantly
crashed. But finally I have found that the simple approach seems to
work too.

Right now I am trying to understand why gdb doesn't use 'vCont:c'
but sends 'c' instead. And yes, I report 'vCont;c;t' to 'vCont?'.

The are some other issues with vCont which I don't understand...

And I'd say gdb crashes too often.


So. I am working. Everything goes slower than I expected. When
I have the code which more or less works - I'll send it.

I decided to take a bit different approach, we will see if it
makes sense in the longer term.

Oleg.
Oleg Nesterov
2010-07-26 16:03:58 UTC
Permalink
Post by Oleg Nesterov
Right now I am trying to understand why gdb doesn't use 'vCont:c'
but sends 'c' instead. And yes, I report 'vCont;c;t' to 'vCont?'.
OK. After I read remote_vcont_resume()->remote_vcont_probe() path
I understand why 'vCont;c;t' doesn't work. Contrary to what the
documentation says it is all or nothing. Except 't' has the separate
remote_state->support_vCont_t flag. Very strange.

Oleg.
Oleg Nesterov
2010-07-28 18:17:02 UTC
Permalink
Post by Oleg Nesterov
I decided to take a bit different approach, we will see if it
makes sense in the longer term.
Please see the attached files,

- ugdb.c

The kernel module which implements the basic
user-space API on top of utrace. Of course,
this API should be discussed.

- gdbstub

The simple user-space gdbserver written in
perl which works with ugdb API.

Limitations:

- this is just initial code, again.

- doesn't work in all-stop mode (should be simple to
implement).

- currently it only supports attach, stop, cont, detach
and exit.

- the testing was very limited. I played with it about
an hour and didn't find any problems, vut that is all.

However, please note that this time the code is clearly opened
for improvements.

I stronly believe this is the only sane approach. Even for
prototyping. No, _especially_ for prototyping!

Btw, gdb crashes very often right after

(gdb) set target-async on
(gdb) set non-stop
(gdb) file mt-program
(gdb) target extended-remote :port
(gdb) attach its_pid

I didn't even try to investigate (this doesn't happen when
it works with the real gdbserver). Just retry, gdb is buggy.

What do you think?

Oleg.
fche-H+wXaHxf7aLQT0dZR+ (Frank Ch. Eigler)
2010-07-29 21:38:03 UTC
Permalink
Post by Oleg Nesterov
[...]
- ugdb.c
The kernel module which implements the basic
user-space API on top of utrace. Of course,
this API should be discussed.
- gdbstub
The simple user-space gdbserver written in
perl which works with ugdb API.
[...]
To the extent that the problems with an in-kernel gdbstub are
weaknesses in the protocol - or gdb's implementation thereof - how
would this split improve that situation?

- FChE
Oleg Nesterov
2010-07-30 12:57:55 UTC
Permalink
Post by fche-H+wXaHxf7aLQT0dZR+ (Frank Ch. Eigler)
Post by Oleg Nesterov
[...]
- ugdb.c
The kernel module which implements the basic
user-space API on top of utrace. Of course,
this API should be discussed.
- gdbstub
The simple user-space gdbserver written in
perl which works with ugdb API.
[...]
To the extent that the problems with an in-kernel gdbstub are
weaknesses in the protocol - or gdb's implementation thereof - how
would this split improve that situation?
I have the strong desire to ask you by turn why do you think
that in-kernel gdbstub can help in any way ;)

Yes, I never liked the idea of in-kernel gdbstub. Apart from
too-highlevel and vague it is also a bit limited. And some things,
say, register renumbering, doesn't belong to kernel. Or vRun.
Many other things. The only advantage is that we already have
the great tool which works with this protocol - gdb.

Ok, it is easy to criticize, and my opinion doesn't really matter.
We can put it in kernel later, when we have something more than
just the proof of concept.

But I do not see how in-kernel gdbstub can help even to prototype
things. In my opinion it only complicates this. If nothing else,
it is not easy to test even the simple things. Just imagine the
simple tests like ptrace-tests rewritten to work via remote
protocol.

IIUK, the main goal is prototype the new generic API, while the
remote protocol (in my opinion) is obviously can't be considered
as such. With this split it is possible to try to add some API
and test it with or without gdb. Also, it is much more easy to
play with the the protocol extensions (which I believe it needs)
this way. It would be (I think) much easier to teach the real
gdbserver and/or gdb to use this new API if we already had the
userspace aplication which actually works using this API.

OTOH, with this split we still have the same advantage: we can
use gdb to prove that this code can do something useful.

Oleg.
Frank Ch. Eigler
2010-07-30 13:16:27 UTC
Permalink
Hi, Oleg -
Post by Oleg Nesterov
[...]
But I do not see how in-kernel gdbstub can help even to prototype
things. In my opinion it only complicates this. If nothing else,
it is not easy to test even the simple things. Just imagine the
simple tests like ptrace-tests rewritten to work via remote
protocol.
(One could use a new user-space library. There is not that much
complexity difference between a write/read syscall pair and a complex
ioctl.)
Post by Oleg Nesterov
IIUK, the main goal is prototype the new generic API [...] It would
be (I think) much easier to teach the real gdbserver and/or gdb to
use this new API if we already had the userspace aplication which
actually works using this API.
To an extent, it's all a SMOP. But the key is the level of
abstraction provided by any new API. ptrace(2) is low, the
gdb-wire-protocol is high, and both are pretty well established. A
brand new API aiming into some new middle point will be harder to
validate.
Post by Oleg Nesterov
OTOH, with this split we still have the same advantage: we can
use gdb to prove that this code can do something useful.
Not if you run into the exact same multithreading protocol glitches,
but this time with three separate interacting bodies of code instead
of two.


- FChE
Oleg Nesterov
2010-07-30 14:58:51 UTC
Permalink
Post by Frank Ch. Eigler
Hi, Oleg -
Post by Oleg Nesterov
[...]
But I do not see how in-kernel gdbstub can help even to prototype
things. In my opinion it only complicates this. If nothing else,
it is not easy to test even the simple things. Just imagine the
simple tests like ptrace-tests rewritten to work via remote
protocol.
(One could use a new user-space library. There is not that much
complexity difference between a write/read syscall pair and a complex
ioctl.)
Oh. I do not think so. First af all, I do not know what this library
can actually do, except it can provide the helpers for get/put
packet plus some parsing. But in any case, we don't have this lib
right now.

And write/read pair is not only inconvenient. Imho, it is really
bad because read() is used for the asynchronous events as well.
Post by Frank Ch. Eigler
Post by Oleg Nesterov
IIUK, the main goal is prototype the new generic API [...] It would
be (I think) much easier to teach the real gdbserver and/or gdb to
use this new API if we already had the userspace aplication which
actually works using this API.
To an extent, it's all a SMOP. But the key is the level of
abstraction provided by any new API. ptrace(2) is low, the
gdb-wire-protocol is high, and both are pretty well established. A
brand new API aiming into some new middle point will be harder to
validate.
Yes, agreed. But I hope that the user-space gdbserver which actually
works on top of this API can be considered as validation. IOW, if
this API is simple and good enough to write the reasonable gdbserver,
then it probably makes sense.
Post by Frank Ch. Eigler
Post by Oleg Nesterov
OTOH, with this split we still have the same advantage: we can
use gdb to prove that this code can do something useful.
Not if you run into the exact same multithreading protocol glitches,
but this time with three separate interacting bodies of code instead
of two.
I don't understand this part. We already have some problems here,
with the existing protocol, yes. If we want to fix them, I do not
understand how the fact that some code runs in user-space can't
complicate things.

Oleg.
Jan Kratochvil
2010-07-30 13:25:37 UTC
Permalink
Post by Oleg Nesterov
IIUK, the main goal is prototype the new generic API,
As I thought there is an agreement the ptrace API has to stay.

ptrace as an API is really ugly but it works. GDB internally already has an
abstraction on top of it (linux-nat.c as a target).

We definitely need some serialized protocol as we need remote debugging with
multiple inferiors for cloud.

The gdb remote protocol is already very thin to just provide some
"ptrace-like" functionality serialized over the wire.

I already tried (test only, not indended for a production) once to "replace
ptrace" with disagreement on the design:
Re: Proof-of-concept on fd-connected linux-nat.c server
http://sourceware.org/ml/archer/2009-q2/msg00082.html

I do not see why to create a new layer (your `new generic API') between kernel
and gdbserver-in-userland.
Post by Oleg Nesterov
while the remote protocol (in my opinion) is obviously can't be considered
as such. With this split it is possible to try to add some API and test it
with or without gdb. Also, it is much more easy to play with the the
protocol extensions (which I believe it needs) this way.
If it is only a development tool for the in-kernel server then OK.
Post by Oleg Nesterov
It would be (I think) much easier to teach the real
gdbserver and/or gdb to use this new API
gdb linux-nat.c (=local gdb) should be deprecated. There is definitely a need
for remote target and actively maintaining two modes is not effective, we can
run gdbserver even during single-host debugging.

We can port gdbserver to anything but I do not see the point. We should
probably move the threading support from gdbserver to gdb but there isn't much
left to do in userland gdbserver with properly designed kernel API.


Thanks,
Jan
Oleg Nesterov
2010-07-30 14:41:24 UTC
Permalink
Post by Jan Kratochvil
Post by Oleg Nesterov
IIUK, the main goal is prototype the new generic API,
As I thought there is an agreement the ptrace API has to stay.
Of course, ptrace can't go away.
Post by Jan Kratochvil
We definitely need some serialized protocol as we need remote debugging with
multiple inferiors for cloud.
Nobody argues.
Post by Jan Kratochvil
I do not see why to create a new layer (your `new generic API') between kernel
and gdbserver-in-userland.
Because gdb is not alone? I agree, it is probably most important.
Post by Jan Kratochvil
Post by Oleg Nesterov
while the remote protocol (in my opinion) is obviously can't be considered
as such. With this split it is possible to try to add some API and test it
with or without gdb. Also, it is much more easy to play with the the
protocol extensions (which I believe it needs) this way.
If it is only a development tool for the in-kernel server then OK.
Right now I do not know.
Post by Jan Kratochvil
Post by Oleg Nesterov
It would be (I think) much easier to teach the real
gdbserver and/or gdb to use this new API
gdb linux-nat.c (=local gdb) should be deprecated. There is definitely a need
for remote target and actively maintaining two modes is not effective, we can
run gdbserver even during single-host debugging.
OK,
Post by Jan Kratochvil
We can port gdbserver to anything but I do not see the point. We should
probably move the threading support from gdbserver to gdb but there isn't much
left to do in userland gdbserver with properly designed kernel API.
IOW, you think that it is better to shift gdbserver into kernel-space
than port the existing one to the new API or write the new one in user
space ?

Oleg.
Jan Kratochvil
2010-07-30 15:20:25 UTC
Permalink
IOW, you think that it is better to shift gdbserver into kernel-space than
port the existing one to the new API or write the new one in user space ?
So far I just assumed kernel-space ugdb is the plan. As I wrote before I do
not know gdbserver too much.

If you check gdb/gdbserver/linux-low.c it is just one big ptrace/wait/\/proc
interface. I would guess it could be more simple with the utrace API at hand.

Catching up with systemtap's 200x higher software-watchpoint performance over
current (local) gdb (described in "[debug-list] Utrace Discussion Notes" off
this list) could be easier with in-kernel gdb I thought.


Thanks,
Jan
Oleg Nesterov
2010-08-02 12:51:22 UTC
Permalink
Post by Jan Kratochvil
IOW, you think that it is better to shift gdbserver into kernel-space than
port the existing one to the new API or write the new one in user space ?
So far I just assumed kernel-space ugdb is the plan. As I wrote before I do
not know gdbserver too much.
I am not sure, but I do not really know.

Jan, all, let me explain again what I think.

Yes, as I said I personally do not believe in in-kernel gdbstub too much.
If nothing else, I bet it will be never merged upstream. Unless at least
this code will also have the more "traditional" user-space API which is
immediately clear to the reviewers on lkml.

And how we can implement, say, vRun in kernel? I am not saying this is
technically impossible, but this against the common sense, imho.

Or remote debugging via tcp. We need the user-space helper anyway. Again,
of course it is technically possible to create the socket and the kernel
thread which serves the requests, but I don't think we should do this.

Or two modes, all-stop and non-stop. Imho, the kernel shouldn't even
know about this. Or register renumbering.


However. I have also said that my opinion doesn't matter. And I meant
this! I do not understand the user-space needs, I do not understand the
problems from the gdb's pov. So, we can put this code in kernel later,
in the same module or another one if this is really needed.

At least the prototyping is much easier in user-space. And I hope very
much this helps to separate the utrace problems and the protocol problems.

I may be wrong, but the most complex "conceptual" part is the thread
management. I mean the very basic things: attach, detach, exit, clone.
But, from the remore protocol pov these things do not exist, gdbserver
hides this details. This is good for gdb, but complicates the testing
and surely this is not enough in general. Just think about /bin/strace.

Or. Currently I am not sure gdbstub does exactly same as the real
gdbserver when the main thread exits. But I do not care at all, it would
be trivial to change this user-level code if needed without changing
the implementation details in kernel.
Post by Jan Kratochvil
Catching up with systemtap's 200x higher software-watchpoint performance over
current (local) gdb (described in "[debug-list] Utrace Discussion Notes" off
this list) could be easier with in-kernel gdb I thought.
Perhaps, I can't comment because I do not understand the problem space.

Oleg.
Jan Kratochvil
2010-08-03 13:54:51 UTC
Permalink
Post by Oleg Nesterov
Yes, as I said I personally do not believe in in-kernel gdbstub too much.
If nothing else, I bet it will be never merged upstream. Unless at least
this code will also have the more "traditional" user-space API which is
immediately clear to the reviewers on lkml.
I find knfsd is a precedent, isn't it? It contains some compatibility-kludges
(such as the SUNRPC layer used only for nfs) and still the filesystem
operations are AFAIK fully kernel-side.

NFS is a well-established protocol such as the gdbserver one and both need
high performance of their server-side execution.
Post by Oleg Nesterov
Or remote debugging via tcp. We need the user-space helper anyway.
There is rpc.nfsd as the userland wrapper, I do not find a problem if such
program would exist for ugdb.
Post by Oleg Nesterov
Or two modes, all-stop and non-stop. Imho, the kernel shouldn't even
know about this. Or register renumbering.
The NFS protocol also isn't perfect.


Thanks,
Jan
Tom Tromey
2010-07-30 17:59:08 UTC
Permalink
Jan> gdb linux-nat.c (=local gdb) should be deprecated. There is
Jan> definitely a need for remote target and actively maintaining two
Jan> modes is not effective, we can run gdbserver even during
Jan> single-host debugging.

I think we should differentiate a bit between Oleg's project and
projects internal to gdb.

I agree that the current approach of writing all linux-nat code twice --
once for gdb and once for gdbserver -- is no good.

Also, I think Oleg's recent questions and investigations have shown that
perhaps gdbserver is currently a bit lacking for local debugging.

But, a lot of this is a problem specific to gdb. We could, for example,
remove linux-nat.c and move to only allowing gdbserver. Or, we could
have gdb and gdbserver share code. But either of these would be
independent of whatever interface the kernel provides.

Tom
Oleg Nesterov
2010-08-02 18:22:45 UTC
Permalink
Post by Oleg Nesterov
- currently it only supports attach, stop, cont, detach
and exit.
OK, I am a bit stuck.

I am trying to implement attach-to-the-thread-group, and I'd like
to invent something simple without O(n**2) and semaphores in
->report_clone(). There are other problems with process-wide ops
which should be addressed somehow.

Will continue tomorrow...

Oleg.
Jan Kratochvil
2010-08-02 23:53:58 UTC
Permalink
Post by Oleg Nesterov
- the testing was very limited. I played with it about
an hour and didn't find any problems, vut that is all.
[...]
Post by Oleg Nesterov
Btw, gdb crashes very often right after
(gdb) set target-async on
(gdb) set non-stop
(gdb) file mt-program
(gdb) target extended-remote :port
(gdb) attach its_pid
I didn't even try to investigate (this doesn't happen when
it works with the real gdbserver). Just retry, gdb is buggy.
Trying it with both /bin/sleep and a threaded testcase and I never got a crash
(kernel-2.6.33.6-147.fc13.x86_64 as both host and KVM guest OS).

$ killall gdbstub;~/redhat/threadit&p=$!;~/redhat/gdbstub &>~/redhat/out&sleep 0.1;./gdb -nx -ex 'set target-async on' -ex 'set non-stop' -ex "file $HOME/redhat/threadit" -ex 'target extended-remote :2000' -ex "attach $p" -ex 'set confirm no';kill $p;
gdbstub: no process killed
[6] 22822
[7] 22823
GNU gdb (GDB) 7.2.50.20100802-cvs
Copyright (C) 2010 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-unknown-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Reading symbols from /home/jkratoch/redhat/threadit...done.
Remote debugging using :2000
Attached to process 22822
[New Thread 22822.22822]
[New Thread 22822.22825]
Reading symbols from /lib64/libpthread.so.0...Reading symbols from /usr/lib/debug/lib64/libpthread-2.12.so.debug...done.
done.
Loaded symbols for /lib64/libpthread.so.0
Reading symbols from /lib64/libc.so.6...Reading symbols from /usr/lib/debug/lib64/libc-2.12.so.debug...done.
done.
Loaded symbols for /lib64/libc.so.6
Reading symbols from /lib64/ld-linux-x86-64.so.2...Reading symbols from /usr/lib/debug/lib64/ld-2.12.so.debug...done.
done.
Loaded symbols for /lib64/ld-linux-x86-64.so.2
0x00007fead8db6fbd in pthread_join (threadid=140646633633552, thread_return=0x0) at pthread_join.c:89
89 lll_wait_tid (pd->tid);
(gdb)
[Thread 22822.22825] #2 stopped.
0x00007fead8ad6a6d in nanosleep () at ../sysdeps/unix/syscall-template.S:82
82 T_PSEUDO (SYSCALL_SYMBOL, SYSCALL_NAME, SYSCALL_NARGS)
Current language: auto
The current source language is "auto; currently asm".
info threads
2 Thread 22822.22825 0x00007fead8ad6a6d in nanosleep () at ../sysdeps/unix/syscall-template.S:82
* 1 Thread 22822.22822 0x00007fead8db6fbd in pthread_join (threadid=140646633633552, thread_return=0x0) at pthread_join.c:89
(gdb) q
[7]+ Done ~/redhat/gdbstub &>~/redhat/out
[6]+ Terminated ~/redhat/threadit


Thanks,
Jan
Oleg Nesterov
2010-08-03 12:24:34 UTC
Permalink
Post by Jan Kratochvil
Post by Oleg Nesterov
Btw, gdb crashes very often right after
(gdb) set target-async on
(gdb) set non-stop
(gdb) file mt-program
(gdb) target extended-remote :port
(gdb) attach its_pid
I didn't even try to investigate (this doesn't happen when
it works with the real gdbserver). Just retry, gdb is buggy.
^^^^^^^^^^^^
Yes, I still think gdb is wrong, but please correct me.
Post by Jan Kratochvil
Trying it with both /bin/sleep and a threaded testcase and I never got a crash
(kernel-2.6.33.6-147.fc13.x86_64 as both host and KVM guest OS).
To clarify, let me repeat: I never saw such a crash with the real
gdbserver, but this often happens in my testing.


I think I understand what happens. And this leads to the question
about the %Stop notifications which I was going to delay, see below.

I just reproduced the crash. I entered the following commands via
CLI interface:

(gdb) set target-async on
(gdb) set non-stop
(gdb) target extended-remote :2000
(gdb) file mt

Everything is OK so far. "mt" is not interesting, just the simple
application with 4 sleeping threads.

Then gdb crashes during attach:

(gdb) attach 24291
Attached to process 24291
[New Thread 24291.24291]
[New Thread 24291.24292]
[New Thread 24291.24293]
[New Thread 24291.24294]
Reading symbols from /lib64/libpthread.so.0...(no debugging symbols found)...done.
Loaded symbols for /lib64/libpthread.so.0
Reading symbols from /lib64/libc.so.6...(no debugging symbols found)...done.
Loaded symbols for /lib64/libc.so.6
Reading symbols from /lib64/ld-linux-x86-64.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib64/ld-linux-x86-64.so.2
0x000000375faf21ce in __lll_lock_wait_private () from /lib64/libc.so.6
(gdb)
[Thread 24291.24293] #3 stopped.
0x000000375faf21ce in __lll_lock_wait_private () from /lib64/libc.so.6

[Thread 24291.24292] #2 stopped.
0x000000375fad65cb in read () from /lib64/libc.so.6

[Thread 24291.24291] #1 stopped.
0x00000033af60e57d in pause () from /lib64/libpthread.so.0
inline-frame.c:335: internal-error: skip_inline_frames: Assertion `find_inline_frame_state (ptid) == NULL' failed.
A problem internal to GDB has been detected,

And I think this is because of %Stop issues.
Post by Jan Kratochvil
From gdb.info
Because the
notification mechanism is unreliable, the stub is permitted to resend a
stop reply notification if it believes GDB may not have received it.
GDB ignores additional stop reply notifications received before it has
finished processing a previous notification and the stub has completed
sending any queued stop events.

So I assumed it is always safe to resend the notification unless gdb already
sent vStopped. Since it is not clear to me when it makes sense to resend it,
currently gdbstub does re-send every time /proc/ugdb reports the new event
(T00 in this case). I agree this is not optimal, but this looks correct to me.

However, gdb.info also states:

Only one stop reply notification at a time may be pending; if
additional stop events occur before GDB has acknowledged the previous
notification, they must be queued by the stub for later synchronous
transmission in response to `vStopped' packets from GDB.

That is why gdbstub re-sends the same notification, until it gets vStopped.

Now let's look into the log:

=> vAttach;5ee3
<= OK
=> qfThreadInfo
<= mp5ee3.5ee3,p5ee3.5ee4,p5ee3.5ee5,p5ee3.5ee6
=> qsThreadInfo
<= l
=> Hgp5ee3.5ee3
<= OK
=> vCont?
<= vCont;t;c;C;s;S
=> vCont;t:p5ee3.-1
<= OK

Note: gdbstub reports OK before any thread actually stops. I believe
this is correct from the remote protocol pov, and this is what we want.

<= Stop:T00thread:p5ee3.5ee4;

Some thread actually stops, we sent the notification.

<= Stop:T00thread:p5ee3.5ee4;

Another threads stops, gdbstub resends the same notification according
to the docs above (or according to my understanding).

Note: this doesn't happen _every time_. In the more likely case all
threads are already stopped when ->poll() succeeds. But sometimes
some thread stops a little bit later.

Once again, please note that both notifications are the same thing,
but I guess gdb doesn't understand this, see below.

Then,

=> vStopped
<= T00thread:p5ee3.5ee3;
=> vStopped
<= T00thread:p5ee3.5ee6;
=> vStopped
<= T00thread:p5ee3.5ee5;
=> vStopped
<= OK
=> Hgp5ee3.5ee4
<= OK
=> g
<= 00feffffffffffffa066d65f37000000ffffffffffffffff00040000000...

[...snip a lot of $m packets ]

everything is fine so far. Then,

=> vCont;t:p5ee3.-1
<= OK

Well. I hope this 'OK' without the subsequent notifications matches
the documentation:

vCont[;ACTION[:THREAD-ID]]...

...
The `t' action is only relevant in non-stop mode
...
A stop reply should be generated for any affected thread
not already stopped.

IIUC, "already stopped" means "already reported as stopped to gdb".
So gdbstub replies 'OK' and doesn't send any %Stop packets, but gdb
seems to expect the new STOP-REPLY packets:

=> m375fad65cb,1
<= 48
=> m375fad65cb,1
<= 48
=> vStopped

And what should I do in this case??? Probably, this vStopped pairs the
_second_ notification above. But gdbstub has already acked this notification
during the previous vStopped sequence. E01? This seems to confuse gdb.
Post by Jan Kratochvil
From gdb.info
`vStopped'
In non-stop mode (*note Remote Non-Stop::), acknowledge a previous
stop reply and prompt for the stub to report another one.

Reply:
`Any stop packet'
if there is another unreported stop event (*note Stop Reply
Packets::)

`OK'
if there are no unreported stop events

So I am sending 'OK' because there are no unreported stop events. But this
seems to confuse gdb, it thinks this this 'OK' acks the second notification,

<= OK
=> Hgp5ee3.5ee3
<= OK
=> g
<= fefdffffffffffff0000000000000000ffffffffffffffff02000000000...
=> m33af60e57d,1
<= 48
=> m33af60e57d,1
<= 48
=> Hgp5ee3.5ee6
<= OK
=> g
<= 00feffffffffffffa066d65f37000000ffffffffffffffff02000000000...
=> m375faf21ce,1
<= 89
=> m375faf21ce,1
<= 89
=> Hgp5ee3.5ee3
<= OK
=> g
<= fefdffffffffffff0000000000000000ffffffffffffffff02000000000...
=> m33af60e57d,1
<= 48
=> m33af60e57d,1
<= 48
=> Hgp5ee3.5ee5
<= OK
=> g
<= 00feffffffffffffa066d65f37000000ffffffffffffffff02000000000...
=> m375faf21ce,1
<= 89
=> m375faf21ce,1
<= 89
=> Hgp5ee3.5ee3
<= OK
=> g
<= fefdffffffffffff0000000000000000ffffffffffffffff02000000000...
=> m33af60e57d,1
<= 48
=> m33af60e57d,1
<= 48
=> Hgp5ee3.5ee4
<= OK

Note: _this_ thread was reported twice via %Stop.

=> g
<= 00feffffffffffffa066d65f37000000ffffffffffffffff00040000000...

Amen, gdb crashes. Indeed, it has already looked at this thread (see
another Hgp5ee3.5ee4 above).

Jan, I am not sure but _IIRC_ I observed other scenarios when gdb
crashes during the attach, but can't reproduce right now.

==========================================================================
Now, let's talk about %Stop.

I must admit, I believe the idea behind %Stop in its current state
is not very good. First of all, it is not clear how this all can
be implemented correctly. Forget about the multithreading, consider
the simplest case: gdb tracees the single thread, this thread stops,
gdbserver sends '%Stop:T00thread:pPID.PID;'.
After receiving a stop reply notification, GDB shall acknowledge it
by sending a `vStopped' packet (*note vStopped packet::) as a regular,
synchronous request to the stub. Such acknowledgment is not required
to happen immediately, as GDB is permitted to send other, unrelated
packets to the stub first, which the stub should process normally.

Very nice. Suppose that, before sending vStopped, gdb sends 'D;PID'.
Then it sends vStopped. How should gdbstub reply?

- OK seems incorrect, it acks the previous T00 but this
thread/process is already detached.

- E01? probably, but this is not documented and surely
it is not right if we have other events to reply
(say, multiple inferiors).

- But, any other reply (especially if we have other stop
events to reply) acks the previous T00 which is no longer
true!

Or, instead of detach from gdb, suppose that the the tracee changes
its state by the time gdb sends vStopped in reply to %Stop. Say, it
is SIGKILL'ed. There is no way to let gdb know its state was already
changed. We can only ack the state which was reported previously.
And there is no way to inform gdb there is nothing new and nothing
to ack because the previous notification was already acked (like
it happens during the crash).

And probably this crash (if my understanding is correct) at least
proves that the current scheme is not very convenient.


I do not suggest to discuss this right now, but perhaps we can have
a stateless notification? Say, just '%Stop#..' which informs gdb
it has some events to get via 'vStopped'. In this case any reply
to vStopped does not ack the history, but reports the new event or
'OK' if no more events. This is at least very understandable and
clear. And simpler.

Oleg.
Oleg Nesterov
2010-08-03 13:14:36 UTC
Permalink
Forgot to mention,
Post by Oleg Nesterov
So I assumed it is always safe to resend the notification unless gdb already
sent vStopped. Since it is not clear to me when it makes sense to resend it,
currently gdbstub does re-send every time /proc/ugdb reports the new event
(T00 in this case). I agree this is not optimal, but this looks correct to me.
I'll change gdbstub to never resend the notification to avoid the problem.
But probably gdb should be fixed anyway.

And, now I recalled why I added resend into the initial code. This is
because I hit another minor problem which I misinterpreted as if gdb
can miss the notification.

To avoid the unnecessary details, consider the oversimplified example,

$ sleep 10000&
[1] 2923

$ cat > SLEEP
set target-async on
set non-stop
target extended-remote :2000
file /bin/sleep
attach 2923
info registers
detach
^D

$ gdb <SLEEP
GNU gdb (GDB) 7.1
Copyright (C) 2010 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-unknown-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
(gdb) (gdb) (gdb) Remote debugging using :2000
(gdb) Reading symbols from /bin/sleep...(no debugging symbols found)...done.
(gdb) Attached to process 2923
[New Thread 2923.2923]
Target is executing.
(gdb) Detached from remote process 2923.
(gdb) quit

And yes, gdb ignores %Stop and just detaches. But this is because
of another issue (which looks like a minor gdb bug to me), note the

"Target is executing."

above. This is the reply to "info registers". Why? OK, yes, it is
executing. Then send vCont:t ? "attach PID" means attach and stop
it, no? And note, the same commands work as expected in CLI mode.

I also tried to add "interrupt" before "info registers", this doesn't
help although in this case gdb does send vStopped.


Can't resist, I spent a lot of time trying to understand what is
wrong. Because at first I played with the real gdbserver via CLI
to ensure everything works as I expect, then I tried to achieve the
same results with /proc/ugdb doing "$ gdb < BATCH_FILE" with the
same commands.

Oleg.
Jan Kratochvil
2010-08-03 13:36:27 UTC
Permalink
Post by Oleg Nesterov
Post by Jan Kratochvil
Trying it with both /bin/sleep and a threaded testcase and I never got a crash
(kernel-2.6.33.6-147.fc13.x86_64 as both host and KVM guest OS).
To clarify, let me repeat: I never saw such a crash with the real
gdbserver, but this often happens in my testing.
I am also testing it with your ugdb.c and gdbstub.
Post by Oleg Nesterov
Post by Jan Kratochvil
So I assumed it is always safe to resend the notification unless gdb already
sent vStopped. Since it is not clear to me when it makes sense to resend it,
currently gdbstub does re-send every time /proc/ugdb reports the new event
(T00 in this case). I agree this is not optimal, but this looks correct to me.
I'll change gdbstub to never resend the notification to avoid the problem.
Yes, I has been now just writing you such reply.
Post by Oleg Nesterov
But probably gdb should be fixed anyway.
There are so many serious bugs in GDB affecting regular GDB usage...
Post by Oleg Nesterov
To avoid the unnecessary details, consider the oversimplified example,
$ sleep 10000&
[1] 2923
$ cat > SLEEP
set target-async on
set non-stop
target extended-remote :2000
file /bin/sleep
attach 2923
info registers
detach
^D
$ gdb <SLEEP
GNU gdb (GDB) 7.1
Copyright (C) 2010 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-unknown-linux-gnu".
<http://www.gnu.org/software/gdb/bugs/>.
(gdb) (gdb) (gdb) Remote debugging using :2000
(gdb) Reading symbols from /bin/sleep...(no debugging symbols found)...done.
(gdb) Attached to process 2923
[New Thread 2923.2923]
Target is executing.
(gdb) Detached from remote process 2923.
(gdb) quit
And yes, gdb ignores %Stop and just detaches. But this is because
of another issue (which looks like a minor gdb bug to me), note the
"Target is executing."
above. This is the reply to "info registers". Why? OK, yes, it is
executing.
Yes.
Post by Oleg Nesterov
Then send vCont:t ? "attach PID" means attach and stop it, no?
But it is not yet stopped that time.
Post by Oleg Nesterov
Can't resist, I spent a lot of time trying to understand what is wrong.
Nothing, you should wait till GDB reports the inferior has stopped. It is
easy/normal in the GDB testsuite and by FE (Front Ends). I understand it is
not convenient from -ex or -x argument. There could be probably some
async-only command besides `interrupts', also some `wait-till-stopped'.
Post by Oleg Nesterov
I tried to achieve the same results with /proc/ugdb doing
"$ gdb < BATCH_FILE" with the same commands.
Maybe you can write a new *.exp testcase for such testing.


Thanks,
Jan
Oleg Nesterov
2010-08-03 15:06:56 UTC
Permalink
Post by Jan Kratochvil
Post by Oleg Nesterov
Post by Oleg Nesterov
So I assumed it is always safe to resend the notification unless gdb already
sent vStopped. Since it is not clear to me when it makes sense to resend it,
currently gdbstub does re-send every time /proc/ugdb reports the new event
(T00 in this case). I agree this is not optimal, but this looks correct to me.
I'll change gdbstub to never resend the notification to avoid the problem.
Yes, I has been now just writing you such reply.
Sure, will do.
Post by Jan Kratochvil
Post by Oleg Nesterov
But probably gdb should be fixed anyway.
There are so many serious bugs in GDB affecting regular GDB usage...
OK, so I assume that the current behaviour of gdbstub is correct, even
if stupid.
Post by Jan Kratochvil
Post by Oleg Nesterov
To avoid the unnecessary details, consider the oversimplified example,
$ sleep 10000&
[1] 2923
$ cat > SLEEP
set target-async on
set non-stop
target extended-remote :2000
file /bin/sleep
attach 2923
info registers
detach
^D
$ gdb <SLEEP
GNU gdb (GDB) 7.1
Copyright (C) 2010 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-unknown-linux-gnu".
<http://www.gnu.org/software/gdb/bugs/>.
(gdb) (gdb) (gdb) Remote debugging using :2000
(gdb) Reading symbols from /bin/sleep...(no debugging symbols found)...done.
(gdb) Attached to process 2923
[New Thread 2923.2923]
Target is executing.
(gdb) Detached from remote process 2923.
(gdb) quit
And yes, gdb ignores %Stop and just detaches. But this is because
of another issue (which looks like a minor gdb bug to me), note the
"Target is executing."
above. This is the reply to "info registers". Why? OK, yes, it is
executing.
Yes.
Post by Oleg Nesterov
Then send vCont:t ? "attach PID" means attach and stop it, no?
But it is not yet stopped that time.
Well. And how can I stop it?

Once again, this all works in CLI mode. And this looks very natural

(gdb) attach PID
(gdb) info registers

As a newbie user of gdb, I expected it is gdb who should take care
and stop the tracee after "attach". And please remember, "interrupt"
doesn't help.

OK, please ignore. Now that I know I can't trust 'gdb < BATCH' I do
not use this.
Post by Jan Kratochvil
Post by Oleg Nesterov
Can't resist, I spent a lot of time trying to understand what is wrong.
Nothing, you should wait till GDB reports the inferior has stopped.
Yes, yes, now I understand this. Once again, I was greatly confused
because I didn't know that CLI mode makes the difference. Even if I
enter the commands via copy-and-paste, gdb always "completes" this
attach before it reacts to "info registers".

And there were other issues which I didn't understand when I tried
to solve this problem...
Post by Jan Kratochvil
It is
easy/normal in the GDB testsuite
Hmm. How? probably the tests in testsuite wait for something which
looks like "[Thread 5683.5683] #1 stopped." from gdb?
Post by Jan Kratochvil
Post by Oleg Nesterov
I tried to achieve the same results with /proc/ugdb doing
"$ gdb < BATCH_FILE" with the same commands.
Maybe you can write a new *.exp testcase for such testing.
I guess you want me to learn /usr/bin/expect ;)

Oleg.
Jan Kratochvil
2010-08-03 15:55:21 UTC
Permalink
Post by Oleg Nesterov
However, I do not really understand how this can work reliably in the
terms of remote protocol. Somehow this scheme relies on the fact that
gdb will send another vCont;t:pTGID.-1 _once again_ after the previous
vCont;t:pTGID.-1, and gdbserver can report the other threads via
Stop/vStopped. OK, I hope this doesn't matter.
attach_command_post_wait:

/* At least the current thread is already stopped. */

/* In all-stop, by definition, all threads have to be already
stopped at this point. In non-stop, however, although the
selected thread is stopped, others may still be executing.
Be sure to explicitly stop all threads of the process. This
should have no effect on already stopped threads. */
if (non_stop)
target_stop (pid_to_ptid (inferior->pid));
Post by Oleg Nesterov
Yes. And please note that at least in-kernel gdbstub can not use
libthread_db. But, I also hope that we can avoid it even if gdbstub
runs in user-space ?
As far as I understand it now in-kernel gdbserver does not need libthread_db
at all. The communication is based only on PID/TID anyway and on Linux we do
not need libthread_db to enumerate the TIDs of a PID.
Post by Oleg Nesterov
Post by Jan Kratochvil
2 Thread 23487.23490 0x00007fb25c983a6d in nanosleep () at ../sysdeps/unix/syscall-template.S:82
* 1 Thread 23487.23487 0x00007fb25cc63fbd in pthread_join (threadid=140404033701648, thread_return=0x0) at pthread_join.c:89
as provided by local linux-nat.c / linux-thread-db.c.
2 Thread 0x7ffff7842710 (LWP 23503) 0x00007ffff78e9a6d in nanosleep () at ../sysdeps/unix/syscall-template.S:82
* 1 Thread 0x7ffff7ff3700 (LWP 23500) 0x00007ffff7bc9fbd in pthread_join (threadid=140737346021136, thread_return=0x0) at pthread_join.c:89
The LWP -> thread_t conversion could be done later from the client side only
libthread_db (td_ta_map_lwp2thr(), td_thr_get_info(),
typeof (td_thrinfo_t->ti_tid) = thread_t)
Cough. This is black magic to me ;)
This was more just a comment how to bring FSF gdbserver (or ugdb) on-par with
local linux-nat.c. This is a future extension, so far I believe we should
more get udb on-par with FSF gdbserver.

FSF gdbserver itself already does not support displaying pthread_t IIUC.
Post by Oleg Nesterov
I probably understand what pthread_t is, but I do not know how/if this
is important for gdb.
As all the pthread_* functions in inferior use pthread_t it is generally
useful to be able to associate pthread_t inferior values with the live threads
being debugged.
Post by Oleg Nesterov
Should I worry about this issue right now?
The pthread_t / libthread_db stuff should be probably only the GDB client
problem. You will probably never face it.
Post by Oleg Nesterov
And, I hope that "client side only using: libthread_db" means gdb, not
gdbserver ?
yes, I have meant it that way.
Post by Oleg Nesterov
Post by Jan Kratochvil
why do you want from GDB to use specifically `vCont:c'?
Because, first of all, I wanted to understand why gdb doesn't send
vCont:c to me, but uses this command when it works with the real
gdbserver.
Anyway I see GDB uses `vCont;c' even with gdbstub so this problem is not
reproducible for me.

=> $vCont;c:p658.658
<= $OK
Post by Oleg Nesterov
Post by Jan Kratochvil
Post by Oleg Nesterov
But probably gdb should be fixed anyway.
There are so many serious bugs in GDB affecting regular GDB usage...
OK, so I assume that the current behaviour of gdbstub is correct, even
if stupid.
Yes, so far I do not see problems on the gdbstub side.
Post by Oleg Nesterov
Post by Jan Kratochvil
Post by Oleg Nesterov
Then send vCont:t ? "attach PID" means attach and stop it, no?
But it is not yet stopped that time.
Well. And how can I stop it?
Once again, this all works in CLI mode. And this looks very natural
What do you call CLI mode? We use only CLI mode so far all the time. The
other mode is MI (`gdb -i=mi').

We can talk about local (=linux-nat.c, default without `target *remote') mode
or remote (=remote.c, `target extended-remote') mode.


With CLI remote mode with FSF gdbserver it also errors on:
killall gdbserver gdbstub;~/redhat/threadit&p=$!;./gdbserver/gdbserver --remote-debug --multi :2000 &>~/redhat/out&sleep 0.1;./gdb -nx -ex 'set target-async on' -ex 'set non-stop' -ex "file $HOME/redhat/threadit" -ex 'target extended-remote :2000' -ex "attach $p" -ex 'set confirm no' -ex bt;kill $p;
...
Target is executing.

With CLI remote mode with gdbstub it also errors on:
killall -9 gdbserver gdbstub;sleep 0.5;~/redhat/threadit&p=$!;~/redhat/gdbstub &>~/redhat/out&sleep 0.1;./gdb -nx -ex 'set target-async on' -ex 'set non-stop' -ex "file $HOME/redhat/threadit" -ex 'target extended-remote :2000' -ex "attach $p" -ex 'set confirm no' -ex bt;kill $p
...
Target is executing.

With CLI local mode it prints a bogus backtrace:
killall -9 gdbserver gdbstub;sleep 0.5;~/redhat/threadit&p=$!;sleep 0.1;./gdb -nx -ex 'set target-async on' -ex 'set non-stop' -ex "file $HOME/redhat/threadit" -ex "attach $p" -ex 'set confirm no' -ex bt;kill $p
...
#0 0x00007fbc1989cfbd in ?? ()
#1 0x00007fff7f643110 in ?? ()
#2 0x00007fbc19ac5795 in ?? ()
#3 0x00007fbc1989ce90 in ?? ()
#4 0x00007fbc19515d38 in ?? ()
#5 0x00007fbc177129e0 in ?? ()
#6 0x0000000000000000 in ?? ()
later:
#0 0x00007fbc1989cfbd in pthread_join (threadid=140445855340304, thread_return=0x0) at pthread_join.c:89
#1 0x000000000040074c in main () at threadit.c:30
Post by Oleg Nesterov
As a newbie user of gdb, I expected it is gdb who should take care
and stop the tracee after "attach".
This was this way in the sync/all-stop mode. In async/non-stop mode it is
intentionally asynchronous:

attach_command:
if (target_can_async_p ())
add_inferior_continuation (attach_command_continuation, a,
attach_command_continuation_free_args);
return;

It may take some time before this inferior reports us back the stop and we may
want to debug other inferiors in the meantime.
Post by Oleg Nesterov
And please remember, "interrupt" doesn't help.
You need "wait-till-inferior-completes-interrupt" (which does not exist),
not "interrupt" which means "start-asynchronous-inferior-interruption".
Post by Oleg Nesterov
Yes, yes, now I understand this. Once again, I was greatly confused
because I didn't know that CLI mode makes the difference. Even if I
^^^ local (=non-remote) probably
Post by Oleg Nesterov
enter the commands via copy-and-paste, gdb always "completes" this
attach before it reacts to "info registers".
It does not work before the target stops anyway in the local mode. Trying to
access thread data while the thread is running can give stale data without
printing an error. It is probably expected FE (Front End - such as Eclipse)
takes care of it anyway.
Post by Oleg Nesterov
Post by Jan Kratochvil
It is easy/normal in the GDB testsuite
Hmm. How? probably the tests in testsuite wait for something which
looks like "[Thread 5683.5683] #1 stopped." from gdb?
Yes, see gdb/testsuite/gdb.mi/mi-nonstop.exp for all its mi_expect_stop calls.
But that is the MI mode, not CLI mode, with commands like:
-break-insert -t main


Thanks,
Jan
Oleg Nesterov
2010-08-03 16:53:59 UTC
Permalink
Post by Jan Kratochvil
Post by Oleg Nesterov
However, I do not really understand how this can work reliably in the
terms of remote protocol. Somehow this scheme relies on the fact that
gdb will send another vCont;t:pTGID.-1 _once again_ after the previous
vCont;t:pTGID.-1, and gdbserver can report the other threads via
Stop/vStopped. OK, I hope this doesn't matter.
/* At least the current thread is already stopped. */
/* In all-stop, by definition, all threads have to be already
stopped at this point. In non-stop, however, although the
selected thread is stopped, others may still be executing.
Be sure to explicitly stop all threads of the process. This
should have no effect on already stopped threads. */
if (non_stop)
target_stop (pid_to_ptid (inferior->pid));
This just reflects the current situation with the current implementation.
gdb already did

vAttach;PID
vCont;t:pPID.-1

I do not see anything in the _documentation_ which could explain that
only the main thread can be stopped despite the fact "-1" means all
threads.

Once again, I already understand why gdb + gdbserver work this way,
I meant remote protocol "in general".

And in fact, I do not think your explanation is correct. Yes, this
attach_command_post_wait() is called during attach. But even after that
gdbserver reports only the main thread. This happens before qSymbol
stage.

Only when gdb issues yet _another_ vCont;t:pPID.-1 after that, gdbserver
reports other threads, and this have nothing to do (I think) with
attach_command_post_wait(). In fact, to me this all works _contrary_
to the comment above.

Be sure to explicitly stop all threads of the process.

This doesn't happen.

But, it is very possible I missed something. Ang again, I think (I hope ;)
we can forget this because the simple method works too.
Post by Jan Kratochvil
Post by Oleg Nesterov
Yes. And please note that at least in-kernel gdbstub can not use
libthread_db. But, I also hope that we can avoid it even if gdbstub
runs in user-space ?
As far as I understand it now in-kernel gdbserver does not need libthread_db
at all. The communication is based only on PID/TID anyway and on Linux we do
not need libthread_db to enumerate the TIDs of a PID.
Yes, thanks. This is what I meant.

I was afraid there are some other reason why we can't avoid libthread_db.
Post by Jan Kratochvil
Post by Oleg Nesterov
Should I worry about this issue right now?
The pthread_t / libthread_db stuff should be probably only the GDB client
problem. You will probably never face it.
Post by Oleg Nesterov
And, I hope that "client side only using: libthread_db" means gdb, not
gdbserver ?
yes, I have meant it that way.
Great, thanks.
Post by Jan Kratochvil
Post by Oleg Nesterov
Post by Jan Kratochvil
why do you want from GDB to use specifically `vCont:c'?
Because, first of all, I wanted to understand why gdb doesn't send
vCont:c to me, but uses this command when it works with the real
gdbserver.
Anyway I see GDB uses `vCont;c' even with gdbstub so this problem is not
reproducible for me.
=> $vCont;c:p658.658
<= $OK
Yes. From another email from me:

OK. After I read remote_vcont_resume()->remote_vcont_probe() path
I understand why 'vCont;c;t' doesn't work. Contrary to what the
documentation says it is all or nothing. Except 't' has the separate
remote_state->support_vCont_t flag. Very strange.

That is why gdbstub reports 'vCont;t;c;C;s;S' to 'vCont?', despite
the fact it doesn't implement C/s/S yet.
Post by Jan Kratochvil
Post by Oleg Nesterov
Post by Jan Kratochvil
But it is not yet stopped that time.
Well. And how can I stop it?
Once again, this all works in CLI mode. And this looks very natural
What do you call CLI mode? We use only CLI mode so far all the time. The
other mode is MI (`gdb -i=mi').
Sorry for confusion, I do not know how to name it correctly...

I can start gdb and then enter these commands by hand, this is what
I called CLI (command line interface ;). And in this case everything
works as I expected.

Or I can put these command into the file FILE, and then do "$ gdb < FILE".
In this case "attach" + "info registers" doesn't work.

I didn't try "gdb -nx -ex ..." method yet.

Otherwise, I always use remote mode.
Post by Jan Kratochvil
This was this way in the sync/all-stop mode. In async/non-stop mode it is
if (target_can_async_p ())
add_inferior_continuation (attach_command_continuation, a,
attach_command_continuation_free_args);
return;
It may take some time before this inferior reports us back the stop and we may
want to debug other inferiors in the meantime.
Yes, I see,
Post by Jan Kratochvil
Post by Oleg Nesterov
Yes, yes, now I understand this. Once again, I was greatly confused
because I didn't know that CLI mode makes the difference. Even if I
^^^ local (=non-remote) probably
no, please see above. And sorry for confusion again.


So. To summarise, I never claimed this is a bug. OTOH, I think this
difference can confuse a newbie like me.

Yes, I do understand vAttach issues, but I thought that "attach"
command should always hide these details. From the documentation:

attach PROCESS-ID

...
The first thing GDB does after arranging to debug the specified
process is to stop it. You can examine and modify an attached process
with all the GDB commands that are ordinarily available when you start
processes with `run'. You can insert breakpoints; you can step and
continue; you can modify storage. If you would rather the process
continue running, you may use the `continue' command after attaching
GDB to the process.


Oleg.
Jan Kratochvil
2010-08-03 18:36:40 UTC
Permalink
Post by Oleg Nesterov
Post by Jan Kratochvil
Post by Oleg Nesterov
However, I do not really understand how this can work reliably in the
terms of remote protocol. Somehow this scheme relies on the fact that
gdb will send another vCont;t:pTGID.-1 _once again_ after the previous
vCont;t:pTGID.-1, and gdbserver can report the other threads via
Stop/vStopped. OK, I hope this doesn't matter.
/* At least the current thread is already stopped. */
/* In all-stop, by definition, all threads have to be already
stopped at this point. In non-stop, however, although the
selected thread is stopped, others may still be executing.
Be sure to explicitly stop all threads of the process. This
should have no effect on already stopped threads. */
if (non_stop)
target_stop (pid_to_ptid (inferior->pid));
This just reflects the current situation with the current implementation.
gdb already did
vAttach;PID
vCont;t:pPID.-1
I do not see anything in the _documentation_ which could explain that
only the main thread can be stopped despite the fact "-1" means all
threads.
"-1" really means all threads - all those gdbserver knows about that time.

Anyway this double-stop issue is gdbserver/libthread_db specific and offtopic
for ugdb.
Post by Oleg Nesterov
Once again, I already understand why gdb + gdbserver work this way,
I meant remote protocol "in general".
In remote protocol - and even internally in gdbserve - "-1" really always
means all the (currently known) threads.
Post by Oleg Nesterov
And in fact, I do not think your explanation is correct. Yes, this
attach_command_post_wait() is called during attach. But even after that
gdbserver reports only the main thread. This happens before qSymbol
stage.
This attach_command_post_wait code is executed after the qSymbol command.

The first single-thread vCont:

#0 putpkt (buf=0x1f348b0 "vCont;t:p517.-1") at remote.c:6730
#1 in remote_stop_ns (ptid=...) at remote.c:4709
#2 in remote_stop (ptid=...) at remote.c:4747
#3 in target_stop (ptid=...) at target.c:3031
#4 in attach_command (args=0x7fffffffd861 "1303", from_tty=1) at infcmd.c:2436
#5 in do_cfunc (c=0x1db8bf0, args=0x7fffffffd861 "1303", from_tty=1) at ./cli/cli-decode.c:67
#6 in cmd_func (cmd=0x1db8bf0, args=0x7fffffffd861 "1303", from_tty=1) at ./cli/cli-decode.c:1771
#7 in execute_command (p=0x7fffffffd864 "3", from_tty=1) at top.c:422
#8 in catch_command_errors (command=0x48a3e3 <execute_command>, arg=0x7fffffffd85a "attach 1303", from_tty=1, mask=6) at exceptions.c:534
#9 in captured_main (data=0x7fffffffd360) at ./main.c:887

The second all-threads vCont:

#0 putpkt (buf=0x1f4ecb0 "vCont;t:p517.-1") at remote.c:6730
#1 in remote_stop_ns (ptid=...) at remote.c:4709
#2 in remote_stop (ptid=...) at remote.c:4747
#3 in target_stop (ptid=...) at target.c:3031
#4 in attach_command_post_wait (args=0x1f3b6f0 "1303", from_tty=1, async_exec=0) at infcmd.c:2334
#5 in attach_command_continuation (args=0x1f3b6a0) at infcmd.c:2355
#6 in do_my_cleanups (pmy_chain=0x7fffffffcd08, old_chain=0x0) at utils.c:421
#7 in do_all_inferior_continuations () at utils.c:692
#8 in inferior_event_handler (event_type=INF_EXEC_COMPLETE, client_data=0x0) at inf-loop.c:96
#9 in fetch_inferior_event (client_data=0x0) at infrun.c:2649
#10 in fetch_inferior_event_wrapper (client_data=0x0) at inf-loop.c:169
#11 in catch_errors (func=0x6b4287 <fetch_inferior_event_wrapper>, func_args=0x0, errstring=0xe378dd "", mask=6) at exceptions.c:518
#12 in inferior_event_handler (event_type=INF_REG_EVENT, client_data=0x0) at inf-loop.c:65
#13 in remote_async_serial_handler (scb=0x1f30b00, context=0x0) at remote.c:10317
#14 in push_event (context=0x1f30b00) at ser-base.c:176
#15 in handle_timer_event (dummy=...) at event-loop.c:1306
#16 in process_event () at event-loop.c:399
#17 in gdb_do_one_event (data=0x0) at event-loop.c:452
#18 in catch_errors (func=0x6b0d2a <gdb_do_one_event>, func_args=0x0, errstring=0xe07943 "", mask=6) at exceptions.c:518
#19 in tui_command_loop (data=0x0) at ./tui/tui-interp.c:171
#20 in current_interp_command_loop () at interps.c:291
#21 in captured_command_loop (data=0x0) at ./main.c:227
#22 in catch_errors (func=0x47ff66 <captured_command_loop>, func_args=0x0, errstring=0xdc6967 "", mask=6) at exceptions.c:518
#23 in captured_main (data=0x7fffffffd360) at ./main.c:910
Post by Oleg Nesterov
But, it is very possible I missed something. Ang again, I think (I hope ;)
we can forget this because the simple method works too.
This discussion is really offtopic for ugdb.
Post by Oleg Nesterov
I was afraid there are some other reason why we can't avoid libthread_db.
Roland has correctly pointed out the TLS support. But that will come later.
Post by Oleg Nesterov
Yes, I do understand vAttach issues, but I thought that "attach"
attach PROCESS-ID
...
The first thing GDB does after arranging to debug the specified
process is to stop it. You can examine and modify an attached process
with all the GDB commands that are ordinarily available when you start
processes with `run'. You can insert breakpoints; you can step and
continue; you can modify storage. If you would rather the process
continue running, you may use the `continue' command after attaching
GDB to the process.
It is true "attach" stops the inferior. Just the stop completes only after
the "attach" command returns.


Regards,
Jan
Kevin Buettner
2010-08-03 19:56:51 UTC
Permalink
On Tue, 3 Aug 2010 15:14:36 +0200
Post by Oleg Nesterov
To avoid the unnecessary details, consider the oversimplified example,
$ sleep 10000&
[1] 2923
$ cat > SLEEP
set target-async on
set non-stop
target extended-remote :2000
file /bin/sleep
attach 2923
info registers
detach
^D
I'd be curious to know if the behavior improves when you omit
"set target-async on" and "set non-stop".

Kevin
Oleg Nesterov
2010-08-04 19:39:35 UTC
Permalink
Post by Kevin Buettner
On Tue, 3 Aug 2010 15:14:36 +0200
Post by Oleg Nesterov
To avoid the unnecessary details, consider the oversimplified example,
$ sleep 10000&
[1] 2923
$ cat > SLEEP
set target-async on
set non-stop
target extended-remote :2000
file /bin/sleep
attach 2923
info registers
detach
^D
I'd be curious to know if the behavior improves when you omit
"set target-async on" and "set non-stop".
Yes, it works without target-async ;)

Oleg.
Kevin Buettner
2010-08-04 23:32:05 UTC
Permalink
On Wed, 4 Aug 2010 21:39:35 +0200
Post by Oleg Nesterov
Post by Kevin Buettner
I'd be curious to know if the behavior improves when you omit
"set target-async on" and "set non-stop".
Yes, it works without target-async ;)
I think that using "set target-async on" is probably very buggy as it
is not used very often. (Well, at any rate, I never use it.) I
recommend that you avoid it too unless you have some very compelling
reason for turning it on.

"set non-stop" is probably less buggy in spite of the fact that it's a
much more recent addition. (The async stuff has been in GDB since
1999.) Still, I would think that using non-stop will turn up more bugs
in gdb than not using it. So, again, I'd suggest avoiding it at least
for the early phases of your prototype. At some point, of course,
you'll want to turn it on.

Kevin
Oleg Nesterov
2010-08-05 18:21:32 UTC
Permalink
Kevin, I am sorry for the delays.

I try to avoid reading emails to make the new (heh, initial again ;)
all-in-kernel version asap.
Post by Kevin Buettner
On Wed, 4 Aug 2010 21:39:35 +0200
Post by Oleg Nesterov
Post by Kevin Buettner
I'd be curious to know if the behavior improves when you omit
"set target-async on" and "set non-stop".
Yes, it works without target-async ;)
I think that using "set target-async on" is probably very buggy as it
is not used very often. (Well, at any rate, I never use it.) I
recommend that you avoid it too unless you have some very compelling
reason for turning it on.
"set non-stop" is probably less buggy in spite of the fact that it's a
much more recent addition. (The async stuff has been in GDB since
1999.) Still, I would think that using non-stop will turn up more bugs
in gdb than not using it. So, again, I'd suggest avoiding it at least
for the early phases of your prototype. At some point, of course,
you'll want to turn it on.
Thanks Kevin.

I'll try to play with "set non-stop" when I have the new code working.

Oleg.
Jan Kratochvil
2010-08-03 12:39:40 UTC
Permalink
Post by Oleg Nesterov
For example, I spent several hours trying to understand why gdb
ignores '%Stop:' notification and never sends '$vStopped', but
it does send vStopped to the real gdbserver with the same batch
file. The reason was partly my misunderstanding, but also another
bug in gdb and the timing issues.
Which bug in gdb do you mean? I do not have the problem reproducible, the
logs look OK.
=> $vCont;t:p5b84.-1
<= $OK
<= %Stop:T00thread:p5b84.5b84;
=> $vStopped
<= $T00thread:p5b84.5b87;
=> $vStopped
<= $OK

Are you aware of the comment before remote_get_pending_stop_replies()?
Post by Oleg Nesterov
Or vAttach in the multithreaded case. I'd say that gdbserver is just
wrong here, even if this works in practice. The first qfThreadInfo
after vAttach reports only the main thread.
Yes...

OK:
getpkt ("qXfer:threads:read::0,fff"); [no ack sent]
putpkt ("$l<threads>
<thread id="p5baa.5baa" core="2"/>
</threads>
#de"); [noack mode]
Post by Oleg Nesterov
After the first
vCont;t:PID.-1 only the main thread is reported again. Somehow it
provokes gdb to send more 'vCont;t:PID.-1's packets, only then it
reports the new threads via Stop/vStopped.
Only after GDB issues:

->server: getpkt ("qSymbol::"); [no ack sent]
[...]
<-server: putpkt ("$qSymbol:5f7468726561645f64625f6c6973745f745f6e657874#a0"); [noack mode]
= _thread_db_list_t_next
->server: getpkt ("qSymbol:7f4c9a0ab0bc:5f7468726561645f64625f6c6973745f745f6e657874"); [no ack sent]
= _thread_db_list_t_next

gdbserver can ask GDB for some symbols the gdbserver needs to provide to
libthread_db to be able to find out all the inferior threads using
libthread_db calling back gdbserver's ps_pglobal_lookup.

`qSymbol::'
Notify the target that GDB is prepared to serve symbol lookup
requests. Accept requests from the target for the values of
symbols.

Immediately at the moment of vAttach GDB may not be able to answer those
questions so it is delayed a bit.
Post by Oleg Nesterov
At first I tried to mimic this behaviour, I was already totally
confused because I also had other problems with gdb - it constantly
crashed. But finally I have found that the simple approach seems to
work too.
Yes, I see:

gdb=> $vAttach;5bb7
gdb<= $OK
gdb=> $qfThreadInfo
gdb<= $mp5bb7.5bb7,p5bb7.5bba
gdb=> $qsThreadInfo
gdb<= $l
[...]
gdb=> $qSymbol::
gdb<= $


Even FSF gdbserver does not seem to use/provide pthread_t identifiers:
2 Thread 23487.23490 0x00007fb25c983a6d in nanosleep () at ../sysdeps/unix/syscall-template.S:82
* 1 Thread 23487.23487 0x00007fb25cc63fbd in pthread_join (threadid=140404033701648, thread_return=0x0) at pthread_join.c:89

as provided by local linux-nat.c / linux-thread-db.c.
2 Thread 0x7ffff7842710 (LWP 23503) 0x00007ffff78e9a6d in nanosleep () at ../sysdeps/unix/syscall-template.S:82
* 1 Thread 0x7ffff7ff3700 (LWP 23500) 0x00007ffff7bc9fbd in pthread_join (threadid=140737346021136, thread_return=0x0) at pthread_join.c:89

The LWP -> thread_t conversion could be done later from the client side only
using:
libthread_db (td_ta_map_lwp2thr(), td_thr_get_info(),
typeof (td_thrinfo_t->ti_tid) = thread_t)
Post by Oleg Nesterov
Right now I am trying to understand why gdb doesn't use 'vCont:c'
but sends 'c' instead. And yes, I report 'vCont;c;t' to 'vCont?'.
GDB can use both, why do you want from GDB to use specifically `vCont:c'?
`c' uses the inferior-kind specified by `Hc' (server'c cont_thread).


Thanks,
Jan
Oleg Nesterov
2010-08-03 14:30:04 UTC
Permalink
Post by Jan Kratochvil
Post by Oleg Nesterov
For example, I spent several hours trying to understand why gdb
ignores '%Stop:' notification and never sends '$vStopped', but
it does send vStopped to the real gdbserver with the same batch
file. The reason was partly my misunderstanding, but also another
bug in gdb and the timing issues.
Which bug in gdb do you mean?
Heh. Funny that, I just sent another email which explains how I was
confused (in reply to my first "Q: %Stop && gdb crash" message).
Post by Jan Kratochvil
I do not have the problem reproducible, the
logs look OK.
=> $vCont;t:p5b84.-1
<= $OK
<= %Stop:T00thread:p5b84.5b84;
=> $vStopped
<= $T00thread:p5b84.5b87;
=> $vStopped
<= $OK
Yes, this is OK. And this is how this all works when I enter the
commands via CLI (probably -ex works too).
Post by Jan Kratochvil
Are you aware of the comment before remote_get_pending_stop_replies()?
Well, I read it before, but I was never able to parse it up to the end ;)
Post by Jan Kratochvil
Post by Oleg Nesterov
Or vAttach in the multithreaded case. I'd say that gdbserver is just
wrong here, even if this works in practice. The first qfThreadInfo
after vAttach reports only the main thread.
Yes...
getpkt ("qXfer:threads:read::0,fff"); [no ack sent]
putpkt ("$l<threads>
<thread id="p5baa.5baa" core="2"/>
</threads>
#de"); [noack mode]
Post by Oleg Nesterov
After the first
vCont;t:PID.-1 only the main thread is reported again. Somehow it
provokes gdb to send more 'vCont;t:PID.-1's packets, only then it
reports the new threads via Stop/vStopped.
->server: getpkt ("qSymbol::"); [no ack sent]
[...]
<-server: putpkt ("$qSymbol:5f7468726561645f64625f6c6973745f745f6e657874#a0"); [noack mode]
= _thread_db_list_t_next
->server: getpkt ("qSymbol:7f4c9a0ab0bc:5f7468726561645f64625f6c6973745f745f6e657874"); [no ack sent]
= _thread_db_list_t_next
gdbserver can ask GDB for some symbols the gdbserver needs to provide to
libthread_db to be able to find out all the inferior threads using
libthread_db calling back gdbserver's ps_pglobal_lookup.
Yes, I know.
Post by Jan Kratochvil
Immediately at the moment of vAttach GDB may not be able to answer those
questions so it is delayed a bit.
Yes, I have to take my "gdbserver is just wrong" back.

However, I do not really understand how this can work reliably in the
terms of remote protocol. Somehow this scheme relies on the fact that
gdb will send another vCont;t:pTGID.-1 _once again_ after the previous
vCont;t:pTGID.-1, and gdbserver can report the other threads via
Stop/vStopped. OK, I hope this doesn't matter.
Post by Jan Kratochvil
Post by Oleg Nesterov
At first I tried to mimic this behaviour, I was already totally
confused because I also had other problems with gdb - it constantly
crashed. But finally I have found that the simple approach seems to
work too.
gdb=> $vAttach;5bb7
gdb<= $OK
gdb=> $qfThreadInfo
gdb<= $mp5bb7.5bb7,p5bb7.5bba
gdb=> $qsThreadInfo
gdb<= $l
[...]
gdb<= $
Yes. And please note that at least in-kernel gdbstub can not use
libthread_db. But, I also hope that we can avoid it even if gdbstub
runs in user-space ?
Post by Jan Kratochvil
2 Thread 23487.23490 0x00007fb25c983a6d in nanosleep () at ../sysdeps/unix/syscall-template.S:82
* 1 Thread 23487.23487 0x00007fb25cc63fbd in pthread_join (threadid=140404033701648, thread_return=0x0) at pthread_join.c:89
as provided by local linux-nat.c / linux-thread-db.c.
2 Thread 0x7ffff7842710 (LWP 23503) 0x00007ffff78e9a6d in nanosleep () at ../sysdeps/unix/syscall-template.S:82
* 1 Thread 0x7ffff7ff3700 (LWP 23500) 0x00007ffff7bc9fbd in pthread_join (threadid=140737346021136, thread_return=0x0) at pthread_join.c:89
The LWP -> thread_t conversion could be done later from the client side only
libthread_db (td_ta_map_lwp2thr(), td_thr_get_info(),
typeof (td_thrinfo_t->ti_tid) = thread_t)
Cough. This is black magic to me ;)

I probably understand what pthread_t is, but I do not know how/if this
is important for gdb.

Should I worry about this issue right now?

And, I hope that "client side only using: libthread_db" means gdb, not
gdbserver ?
Post by Jan Kratochvil
Post by Oleg Nesterov
Right now I am trying to understand why gdb doesn't use 'vCont:c'
but sends 'c' instead. And yes, I report 'vCont;c;t' to 'vCont?'.
GDB can use both,
Yes, I know.
Post by Jan Kratochvil
why do you want from GDB to use specifically `vCont:c'?
Because, first of all, I wanted to understand why gdb doesn't send
vCont:c to me, but uses this command when it works with the real
gdbserver.

Also, I am trying to identify the "essential" commands subset.
Afaics, vCont:c is preferred, and gdb never uses 'c' if it knows
it can use 'vCont:c'.

And. How could I verify that I handle vCont:c correctly if I wasn't
able to provoke gdb to send it?

Oleg.
Jan Kratochvil
2010-08-18 17:07:16 UTC
Permalink
Post by Jan Kratochvil
The LWP -> thread_t conversion could be done later from the client side only
^^^^^^^^^^^^^^^^^^^^
Post by Jan Kratochvil
libthread_db (td_ta_map_lwp2thr(), td_thr_get_info(),
typeof (td_thrinfo_t->ti_tid) = thread_t)
Why libthread_db is at gdbserver and not at gdb host:

http://sourceware.org/bugzilla/show_bug.cgi?id=8210#c7
------- Additional Comment #7 From Pedro Alves 2010-08-18 12:37 -------
# Oh, dear me, I almost forgot: continuing on the comment above, supporting
# thread_db for cross debugging would also mean that we'd need to have available
# a libthread_db.so that is built for the host architecture so that GDB can load
# it, but that understands the _target_ architecture's glibc. (This is also the
# reason why we have gdbserver itself load libthread_db, not gdb, when remote
# debugging). So, I'm definitely closing this harder. :-)


Regards,
Jan
Roland McGrath
2010-08-18 19:21:55 UTC
Permalink
Post by Jan Kratochvil
http://sourceware.org/bugzilla/show_bug.cgi?id=8210#c7
------- Additional Comment #7 From Pedro Alves 2010-08-18 12:37 -------
# Oh, dear me, I almost forgot: continuing on the comment above, supporting
# thread_db for cross debugging would also mean that we'd need to have available
# a libthread_db.so that is built for the host architecture so that GDB can load
# it, but that understands the _target_ architecture's glibc. (This is also the
# reason why we have gdbserver itself load libthread_db, not gdb, when remote
# debugging). So, I'm definitely closing this harder. :-)
Note that in the big libthread_db rewrite that made it biarch-friendly, I
also made the code ready to support cross targets. IIRC, if the glibc
source versions match, a libthread_db should be cross-friendly already
today. All that's really required for more properly full
cross-friendliness is some build magic to make it easy to build just
libthread_db from the libc sources. Then you could install that in some
place that gdb knows to dlopen it from.


Thanks,
Roland

Loading...