gdbstub initial code, v16

Discussion:

Oleg Nesterov

2010-11-15 19:05:37 UTC

The only change is hardware watchpoints.

Well. I can't say this change is good. Because ugdb uses (unexported)
arch_ptrace() to set debugregs in a very much x86-specific way. However,
I do not see what else can I do.

2.6.32 doesn't have the generic hardware breakpoint handler interface.
And I can't play with thread.debugregX by hand, otherwise ugdb can't
be compiled with the fresh kernels.

arch_ptrace() method should work (I hope) with any kernel. But, this
obviously means more multitracing problems. Perhaps ugdb needs the
"can_use_hw_watchpoints" parameter.

Also, currently the usage of debugregs is far from optimal, hopefully
it is simple to improve.

And. I think it makes sense to change gdb somehow. Even if it works
with gdbserver, it falls back to stepping if the size of wp is too
big for hw (default_region_ok_for_hw_watchpoint). This means that
ugdb can't be faster in this case although it obviously could.

What should I do next? (apart from internal changes, of course)

Say, should I implement vRun? From the very beginnig, I hate the idea
to exec the target from kernel space, but otoh I'm afraid this is
important for gdb users.

Oleg.

Roland McGrath

2010-11-16 20:02:26 UTC

Permalink

Post by Oleg Nesterov
Well. I can't say this change is good. Because ugdb uses (unexported)
arch_ptrace() to set debugregs in a very much x86-specific way. However,
I do not see what else can I do.

That kludge is unworkable in several ways. It is just not worth pursuing.
Just use hw_breakpoint on kernels that have it, and don't try to support
the feature on kernels that don't.

Post by Oleg Nesterov
Say, should I implement vRun? From the very beginnig, I hate the idea
to exec the target from kernel space, but otoh I'm afraid this is
important for gdb users.

We've also vaguely discussed doing some hybrid solution where gdb does
something different for "run". If you can do a kludge to implement vRun
fairly quickly and it works OK--including that it remains possible for
ugdb to be a module--then go ahead. If that is too ungainly, or is just
plain infeasible (which I think it might be), then don't bend over
backwards for it. We may have reached the end of what it's possible to
get done at all sensibly without more active involvement from the GDB team.

Thanks,
Roland

Jan Kratochvil

2011-02-23 15:51:35 UTC

Permalink

Hi Oleg,

notice: Moved thread to the Archer list.

I can confirm this problem exists.

AFAIK on recent kernels this whole "trick" (if-stopped then tkill(SIGSTOP) and
PTRACE_CONT(0)) is not needed as it now works even for `eaten-out SIGSTOP
notifications'.

But to be compatible with the older kernels (despite having this race there)
what do you suggest? Checking /proc/version seems too fragile to me.
GDB could do another ptrace test (like linux_test_for_tracesysgood etc.).

Thanks,
Jan

On Tue, 22 Feb 2011 21:38:34 +0100, Oleg Nesterov wrote:
[...]

Btw. Jan, linux_nat_post_attach_wait() doesn't look right. It assumes
that the first signal reported by tracee should be SIGSTOP. This is
not true.
This is what happens if gdb tries to attach to the 'T (stopped)' task,
but the tracee gets SIGCONT after gdb does kill_lwp(pid, SIGSTOP).
ptrace(PTRACE_ATTACH, 21462, 0, 0) = 0
open("/proc/21462/status", O_RDONLY) = 5
read(5, "Name:\tsleep\nState:\tT (stopped)\nTg"..., 1024) = 753
pid_is_stopped()
tkill(21462, SIGSTOP) = 0
kill_lwp(pid, SIGSTOP) in case we dont have exit code
--- Suppose that SIGCONT come here ---
ptrace(PTRACE_CONT, 21462, 0, SIG_0) = 0
wait4(21462, [{WIFSTOPPED(s) && WSTOPSIG(s) == SIGCONT}], 0, NULL) = 21462
ptrace(PTRACE_CONT, 21462, 0x1, SIG_0) = 0
^^^^^^^
this makes the tracee running, and
wait4(21462,
gdb hangs until it reports something else.
Oleg.

Oleg Nesterov

2011-02-23 17:16:10 UTC

Permalink

Post by Jan Kratochvil
notice: Moved thread to the Archer list.
I can confirm this problem exists.
AFAIK on recent kernels this whole "trick" (if-stopped then tkill(SIGSTOP) and
PTRACE_CONT(0)) is not needed as it now works even for `eaten-out SIGSTOP
notifications'.

It is still needed, but the reason is quite different. See the test-case
in http://marc.info/?l=linux-kernel&m=129676623323195

The previous reason for this bug was fixed a long ago. IOW, it is still
needed in the unlikely case.

But this is easy to fix (although the simple fix is not clean), and then
this trick is not needed.

Post by Jan Kratochvil
But to be compatible with the older kernels (despite having this race there)
what do you suggest? Checking /proc/version seems too fragile to me.
GDB could do another ptrace test (like linux_test_for_tracesysgood etc.).

Oh, I do not know what would be the best check. But anyway this is
"easy", I mean we can do thi somehow.

The problem is, I do not see how we can modify the kernel and do not
break the unmodified gdb.

Oh. You know, gdb looks completely broken when it comes to jctl signals ;)
Like the kernel. At least in all-stop mode.

This is because... I don't know how to explain, please see the example.

Absolutely trivial test-case:

void *tf(void *arg)
{
for (;;)
pause();
}

int main(void)
{
pthread_t pt;

pthread_create(&pt, NULL, tf, NULL);

tf(NULL);
return 0;
}

Now,
GNU gdb (GDB) 7.1
...

(gdb) attach 29412
Attaching to program: /tmp/0/mt, process 29412
Reading symbols from /lib64/libpthread.so.0...(no debugging symbols found)...done.
[Thread debugging using libthread_db enabled]
[New Thread 0x41b54950 (LWP 29413)]
Loaded symbols for /lib64/libpthread.so.0
Reading symbols from /lib64/libc.so.6...(no debugging symbols found)...done.
Loaded symbols for /lib64/libc.so.6
Reading symbols from /lib64/ld-linux-x86-64.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib64/ld-linux-x86-64.so.2
0x00000033af60e57d in pause () from /lib64/libpthread.so.0

(gdb) c
Continuing.

lets send SIGSTOP to 29067: $ kill -STOP 29067

Program received signal SIGSTOP, Stopped (signal).
0x00000033af60e57d in pause () from /lib64/libpthread.so.0
(gdb)

very nice, but what gdb does?

--- SIGCHLD (Child exited) @ 0 (0) ---
wait4(-1, 0x7ffffab89b4c, WNOHANG|__WCLONE, NULL) = 0
wait4(-1, [{WIFSTOPPED(s) && WSTOPSIG(s) == SIGSTOP}], WNOHANG, NULL) = 29412
tkill(29412, SIG_0) = 0
tkill(29413, SIGSTOP) = 0
wait4(29413, 0x7ffffab898b4, 0, NULL) = -1 ECHILD (No child processes)
wait4(29413, [{WIFSTOPPED(s) && WSTOPSIG(s) == SIGSTOP}], __WCLONE, NULL) = 29413

Note this tkill(SIGSTOP) to sub-thread!

Now,

(gdb) c
Continuing.

Program received signal SIGSTOP, Stopped (signal).
0x00000033af60e57d in pause () from /lib64/libpthread.so.0
(gdb) c
Continuing.

Program received signal SIGSTOP, Stopped (signal).
[Switching to Thread 0x41b54950 (LWP 29413)]
0x00000033af60e57d in pause () from /lib64/libpthread.so.0
(gdb) c
Continuing.

Program received signal SIGSTOP, Stopped (signal).
[Switching to Thread 0x7f00007be6f0 (LWP 29412)]
0x00000033af60e57d in pause () from /lib64/libpthread.so.0
(gdb) c
Continuing.

Program received signal SIGSTOP, Stopped (signal).
[Switching to Thread 0x41b54950 (LWP 29413)]
0x00000033af60e57d in pause () from /lib64/libpthread.so.0
(gdb)

and so on forever. every time it does

ptrace(PTRACE_CONT, 29413, 0x1, SIG_0) = 0
ptrace(PTRACE_CONT, 29412, 0x1, SIGSTOP) = 0
--- SIGCHLD (Child exited) @ 0 (0) ---
wait4(-1, [{WIFSTOPPED(s) && WSTOPSIG(s) == SIGSTOP}], WNOHANG|__WCLONE, NULL) = 29413
tkill(29413, SIG_0) = 0
tkill(29412, SIGSTOP) = 0
wait4(29412, [{WIFSTOPPED(s) && WSTOPSIG(s) == SIGSTOP}], 0, NULL) = 29412

with the obvious result.

"signal SIGSTOP" (instead of "c") does work not too by the same reason.

Oleg.