Discussion:
gdbstub initial code, v9
Oleg Nesterov
2010-09-08 19:18:38 UTC
Permalink
Changes:

- partly fix the multitracing problems.

ugdb still can't work with ptrace, ptrace-utrace.c needs
changes. But at least multiple udgb tracers can coexist,
more or less.

But of course they can confuse each other anyway, no matter
what ugdb does.

- implement memory writes ($M).

- refactor memory reads to avoid the "Too late to report the
error" case.

But, Jan. Implementing the memory writes does not mean breakpoints
automatically start to work!

Yes, gdb writes cc, and yes the tracee reports SIGTRAP. But after
that "continue" does nothing except "$c", and the tracee naturally
gets SIGILL. I expected that, since ugdb doesn't even know the code
was changed, gdb should write the original byte back before continue,
but this doesn't happen.

Tried to understand how this works with gdbserver, but failed so far.
Will continue, but any hint is very much appreciated ;)

Oleg.
Jan Kratochvil
2010-09-09 10:28:52 UTC
Permalink
Hi Oleg,

kernel-devel-2.6.34.6-54.fc13.x86_64 (real F13) says:

ugdb.c:1988: error: implicit declaration of function $B!F(Bhex_to_bin$B!G(B


Jan
Oleg Nesterov
2010-09-09 15:02:58 UTC
Permalink
Post by Jan Kratochvil
Hi Oleg,
ugdb.c:1988: error: implicit declaration of function ‘hex_to_bin’
OOPS. It was added by 903788892ea0fc7fcaf7e8e5fac9a77379fc215b

Please see the attachment with the copy-and-pastes hex_to_bin().

Oleg.
Frank Ch. Eigler
2010-09-09 12:39:23 UTC
Permalink
Post by Oleg Nesterov
[...]
But, Jan. Implementing the memory writes does not mean breakpoints
automatically start to work!
It approximately should though.
Post by Oleg Nesterov
Yes, gdb writes cc, and yes the tracee reports SIGTRAP. But after
that "continue" does nothing except "$c", and the tracee naturally
gets SIGILL. I expected that, since ugdb doesn't even know the code
was changed, gdb should write the original byte back before continue,
but this doesn't happen.
In normal all-stop mode, gdb does normally replace the old
instruction, in order to single-step over it with the 's' packet.
Perhaps you're testing some buggy non-stop aspect that only works
with 'Z' breakpoint management packets? A fuller packet trace
would help explain.

- FChE
Oleg Nesterov
2010-09-09 15:29:37 UTC
Permalink
Post by Frank Ch. Eigler
Post by Oleg Nesterov
[...]
But, Jan. Implementing the memory writes does not mean breakpoints
automatically start to work!
It approximately should though.
Post by Oleg Nesterov
Yes, gdb writes cc, and yes the tracee reports SIGTRAP. But after
that "continue" does nothing except "$c", and the tracee naturally
gets SIGILL. I expected that, since ugdb doesn't even know the code
was changed, gdb should write the original byte back before continue,
but this doesn't happen.
In normal all-stop mode,
Currently ugdb only supports non-stop
Post by Frank Ch. Eigler
gdb does normally replace the old
instruction, in order to single-step over it with the 's' packet.
Yes, probably single-stepping is needed... I am still trying to
understand how this works with gdbserver, but I see vCont:s packets.
Post by Frank Ch. Eigler
Perhaps you're testing some buggy non-stop aspect that only works
with 'Z' breakpoint management packets?
No. Just a trivial test-case which printfs in a loop.
Post by Frank Ch. Eigler
A fuller packet trace
would help explain.
Please see below. But the only important part is:

$M4005ba,1:cc <------- set bp
$c <------- resume

of course, this can't work.

Full trace:

=> qSupported:multiprocess+
<= PacketSize=400;QStartNoAckMode+;QNonStop+;multiprocess+;QPassS...
=> QStartNoAckMode
<= OK
=> !
<= OK
=> Hgp0.0
<= E01
=> QNonStop:1
<= OK
=> qfThreadInfo
<= E01
=> ?
<= OK
=> qSymbol::
<=
=> vAttach;95b
<= OK
=> qfThreadInfo
<= mp95b.95b
=> qsThreadInfo
<= l
=> Hgp95b.95b
<= OK
=> vCont?
<= vCont;t
=> vCont;t:p95b.-1
<= OK
<= %Stop:T00thread:p95b.95b;
=> vStopped
<= OK
=> g
<= fcfdffffffffffff90ad5329ff7f0000ffffffffffffffff00000000000000...
=> m600880,8
<= 403c6d7d007f0000
=> m7f007d6d3c48,8
<= 00106d7d007f0000
=> m7f007d6d1000,28
<= 0000000000000000f6e04c7d007f0000e80760000000000080156d7d007f00...
=> m7f007d6d1580,28
<= 00f0ef29ff7f0000f6e04c7d007f000050f45f29ff7f000000c06c7d007f00...
=> m7f007d4ce0f4,4
<= 090a0069
=> m7f007d6cc000,28
<= 0030167d007f0000781f6d7d007f0000400b4b7d007f0000e8346d7d007f00...
=> m7f007d6d1f78,4
<= 2f6c6962
=> m7f007d6d1f7c,4
<= 2f6c6962
=> m7f007d6d1f80,4
<= 632e736f
=> m7f007d6d1f84,4
<= 2e360000
=> m7f007d6d34e8,28
<= 00704b7d007f00000002400000000000082e6d7d007f000000000000000000...
=> m400200,4
<= 2f6c6962
=> m400204,4
<= 2f6c642d
=> m400208,4
<= 6c696e75
=> m40020c,4
<= 782d7838
=> m400210,4
<= 362d3634
=> m400214,4
<= 2e736f2e
=> m400218,4
<= 32000000
=> m7f007d6d3c40,4
<= 01000000
=> m7f007d6d3c48,8
<= 00106d7d007f0000
=> m7f007d6d3c50,8
<= c04e4c7d007f0000
=> Z0,7f007d4c4ec0,1
<=
=> m7f007d4c4ec0,1
<= f3
=> X7f007d4c4ec0,0:
<=
=> M7f007d4c4ec0,1:cc
<= OK
=> m600880,8
<= 403c6d7d007f0000
=> m7f007d6d3c48,8
<= 00106d7d007f0000
=> m7f007d6d1000,28
<= 0000000000000000f6e04c7d007f0000e80760000000000080156d7d007f00...
=> m7f007d6d1580,28
<= 00f0ef29ff7f0000f6e04c7d007f000050f45f29ff7f000000c06c7d007f00...
=> m7f007d4ce0f4,4
<= 090a0069
=> m7f007d6cc000,28
<= 0030167d007f0000781f6d7d007f0000400b4b7d007f0000e8346d7d007f00...
=> m7f007d6d1f78,4
<= 2f6c6962
=> m7f007d6d1f7c,4
<= 2f6c6962
=> m7f007d6d1f80,4
<= 632e736f
=> m7f007d6d1f84,4
<= 2e360000
=> m7f007d6d34e8,28
<= 00704b7d007f00000002400000000000082e6d7d007f000000000000000000...
=> m400200,4
<= 2f6c6962
=> m400204,4
<= 2f6c642d
=> m400208,4
<= 6c696e75
=> m40020c,4
<= 782d7838
=> m400210,4
<= 362d3634
=> m400214,4
<= 2e736f2e
=> m400218,4
<= 32000000
=> m7f007d6d3c40,4
<= 01000000
=> vCont;t:p95b.-1
<= OK
=> m7f007d201f40,1
<= 48
=> m7f007d201f40,1
<= 48
=> g
<= fcfdffffffffffff90ad5329ff7f0000ffffffffffffffff00000000000000...
=> m7f007d201f40,1
<= 48
=> m7f007d201f40,1
<= 48
=> m40056c,12
<= 554889e5e8e3feffff89c6ba07000000bfdc
=> m40056c,1
<= 55
=> m40056d,3
<= 4889e5
=> m40056c,12
<= 554889e5e8e3feffff89c6ba07000000bfdc
=> m40056c,1
<= 55
=> m40056d,3
<= 4889e5
=> m4005ba,1
<= e8
=> m4005ba,1
<= e8

(gdb) b BP.c:13
Breakpoint 1 at 0x4005ba: file BP.c, line 13.

=> M4005ba,1:cc
<= OK

gdb writes "int 3".

(gdb) c
Continuing.

=> QPassSignals:e;10;14;17;1a;1b;1c;21;24;25;4c;
<= OK
=> Hcp95b.95b
<= OK
=> c
<= OK
<= %Stop:T05thread:p95b.95b;

the tracee hits this bp and reports SIGTRAP

=> vStopped
<= OK
=> g
<= 00000000000000000006400000000000401f207d007f000000000000000000...
=> P10=ba05400000000000
<=
=> G00000000000000000006400000000000401f207d007f00000000000000000...
<=
=> m4005ba,1
<= cc
=> m4005ba,1
<= cc
=> g
<= 00000000000000000006400000000000401f207d007f000000000000000000...
=> m4005bb,1
<= 99
=> m4005bb,1
<= 99

Breakpoint 1, main () at BP.c:13
13 printf("THREE %d %d\n\n", getpid(), __LINE__);
(gdb) c
Continuing.

=> c
<= OK

gdb just resumes the tracee,

<= %Stop:T04thread:p95b.95b;

and of course it gets SIGILL after "int 3"

=> vStopped
<= OK
=> g
<= 00000000000000000006400000000000401f207d007f000000000000000000...
=> m4005bc,1
<= fe
=> m4005bc,1
<= fe
=> g
<= 00000000000000000006400000000000401f207d007f000000000000000000...
=> m4005bc,1
<= fe
=> m4005bc,1
<= fe
=> qTStatus
<= T0
=> M4005ba,1:e8
<= OK
=> M7f007d4c4ec0,1:f3
<= OK
=> D;95b
<= OK
=> qTStatus
<= T0
Oleg Nesterov
2010-09-09 15:50:47 UTC
Permalink
Post by Oleg Nesterov
the tracee hits this bp and reports SIGTRAP
=> vStopped
<= OK
=> g
<= 00000000000000000006400000000000401f207d007f000000000000000000...
=> P10=ba05400000000000
<=
=> G00000000000000000006400000000000401f207d007f00000000000000000...
<=
May be this can explain...

Probably I need to implement G/P first, otherwise gdb can't change ip.

Still, I'd appreciate if someone can explain me what gdb needs/expects
to handle breakpoints before I start to read the sources.

Oleg.
Frank Ch. Eigler
2010-09-09 16:07:00 UTC
Permalink
Hi -
Post by Oleg Nesterov
Probably I need to implement G/P first, otherwise gdb can't change ip.
Perhaps.
Post by Oleg Nesterov
Still, I'd appreciate if someone can explain me what gdb needs/expects
to handle breakpoints before I start to read the sources.
It'd be simpler if the normal all-stop mode was what you first focused
on. That mode works fine with Z/z and with M/m based breakpoint
insertion/removal and c/s continue/singlestep. (This stuff was all
working in the earlier gdbstub code.)

Re. non-stop mode, see
http://www.codesourcery.com/publications/non_stop_multi_threaded_debugging_in_gdb.pdf

- FChE
Oleg Nesterov
2010-09-09 16:34:00 UTC
Permalink
Post by Frank Ch. Eigler
Hi -
Post by Oleg Nesterov
Probably I need to implement G/P first, otherwise gdb can't change ip.
Perhaps.
Post by Oleg Nesterov
Still, I'd appreciate if someone can explain me what gdb needs/expects
to handle breakpoints before I start to read the sources.
It'd be simpler if the normal all-stop mode was what you first focused
on. That mode works fine with Z/z and with M/m based breakpoint
insertion/removal and c/s continue/singlestep. (This stuff was all
working in the earlier gdbstub code.)
Re. non-stop mode, see
http://www.codesourcery.com/publications/non_stop_multi_threaded_debugging_in_gdb.pdf
Thanks a lot, downloaded.

Oleg.
Jan Kratochvil
2010-09-09 16:04:23 UTC
Permalink
Post by Oleg Nesterov
Probably I need to implement G/P first, otherwise gdb can't change ip.
Still, I'd appreciate if someone can explain me what gdb needs/expects
to handle breakpoints before I start to read the sources.
Yes, GDB tries to set PC to PC-1 to skip back after the hit of 0xCC.
As it fails to fix-up PC then PC is wrong and trap_expected is then 0 (it
should be 1) and everything fails.

BTW ugdb needs one simple fix. :-)
(gdb) p/x done=0x5a
$1 = 0xa5


Thanks,
Jan
Oleg Nesterov
2010-09-09 16:30:31 UTC
Permalink
Post by Jan Kratochvil
Post by Oleg Nesterov
Probably I need to implement G/P first, otherwise gdb can't change ip.
Still, I'd appreciate if someone can explain me what gdb needs/expects
to handle breakpoints before I start to read the sources.
Yes, GDB tries to set PC to PC-1 to skip back after the hit of 0xCC.
As it fails to fix-up PC then PC is wrong and trap_expected is then 0 (it
should be 1) and everything fails.
OK. I'll implement G or P, then we will see.

But probably not right now, I have to switch to bug-report I have.
Post by Jan Kratochvil
BTW ugdb needs one simple fix. :-)
(gdb) p/x done=0x5a
$1 = 0xa5
OOPS! indeed, unhex() confuses lo and hi. I am attaching ugdb.c with
the trivial fix.

Thanks!

Cough... could you tell me how can I change the variable "done"
without printing it?

Oleg.
Jan Kratochvil
2010-09-09 16:36:35 UTC
Permalink
Post by Oleg Nesterov
OOPS! indeed, unhex() confuses lo and hi.
It works for 0xcc, though.
Post by Oleg Nesterov
Cough... could you tell me how can I change the variable "done"
without printing it?
(gdb) help set variable
Evaluate expression EXP and assign result to variable VAR, using assignment
syntax appropriate for the current language (VAR = EXP or VAR := EXP for
example). VAR may be a debugger "convenience" variable (names starting
with $), a register (a few standard names starting with $), or an actual
variable in the program being debugged. EXP is any valid expression.
This may usually be abbreviated to simply "set".


Regards,
Jan
Oleg Nesterov
2010-09-09 16:45:00 UTC
Permalink
Post by Jan Kratochvil
Post by Oleg Nesterov
Cough... could you tell me how can I change the variable "done"
without printing it?
(gdb) help set variable
Evaluate expression EXP and assign result to variable VAR, using assignment
syntax appropriate for the current language (VAR = EXP or VAR := EXP for
example). VAR may be a debugger "convenience" variable (names starting
with $), a register (a few standard names starting with $), or an actual
variable in the program being debugged. EXP is any valid expression.
This may usually be abbreviated to simply "set".
Thanks... but I tried this when I tested the fix.

(gdb) p/x var
$1 = 0x1234
(gdb) set var
Argument required (expression to compute).
(gdb) set var 0
(gdb) p/x var
$2 = 0x1234

strange ;)

Oleg.
Oleg Nesterov
2010-09-09 16:47:13 UTC
Permalink
Post by Oleg Nesterov
Thanks... but I tried this when I tested the fix.
(gdb) p/x var
$1 = 0x1234
(gdb) set var
Argument required (expression to compute).
(gdb) set var 0
(gdb) p/x var
$2 = 0x1234
strange ;)
Aah.

(gdb) set variable var=0x4321
(gdb) p/x var
$7 = 0x4321

Thanks ;)

Oleg.
Tom Tromey
2010-09-09 16:51:27 UTC
Permalink
Oleg> (gdb) set var 0

You need: set variable var = 0
The "variable" can be abbreviated.

FWIW, "print", "set variable", and "call" are basically aliases.
"print" just happens to print the result of the expression.

Tom
Roland McGrath
2010-09-10 10:12:42 UTC
Permalink
Post by Tom Tromey
Oleg> (gdb) set var 0
You need: set variable var = 0
The "variable" can be abbreviated.
I've always just used:

(gdb) set var=0

Thanks,
Roland
Oleg Nesterov
2010-09-10 18:11:00 UTC
Permalink
Post by Roland McGrath
Post by Tom Tromey
Oleg> (gdb) set var 0
You need: set variable var = 0
The "variable" can be abbreviated.
(gdb) set var=0
No, I tried this too, doesn't work.

(gdb) set var=0
A syntax error in expression, near `=0'.

But, it turns out I choosed a bad name for the variable when
I tested the fix in unxex().

(gdb) set xxx=0

This works.

(gdb) set var var=0

This works too. I guess, when gdb sees "set var" it expects the
full "set variable ..." command.

Oleg.
Tom Tromey
2010-09-10 19:42:54 UTC
Permalink
Post by Roland McGrath
(gdb) set var=0
Oleg> No, I tried this too, doesn't work.
Oleg> (gdb) set var=0
Oleg> A syntax error in expression, near `=0'.

Yeah, it is ambiguous if the actual variable name conflicts with any gdb
"set" subcommand.

I typically just use call or print.

Tom
Oleg Nesterov
2010-09-10 19:46:02 UTC
Permalink
Post by Tom Tromey
Post by Roland McGrath
(gdb) set var=0
Oleg> No, I tried this too, doesn't work.
Oleg> (gdb) set var=0
Oleg> A syntax error in expression, near `=0'.
Yeah, it is ambiguous if the actual variable name conflicts with any gdb
"set" subcommand.
Ah, good to know. gdb has a lot of them ;)

Oleg.
Kevin Buettner
2010-09-13 05:53:50 UTC
Permalink
On Thu, 9 Sep 2010 17:29:37 +0200
Post by Oleg Nesterov
Currently ugdb only supports non-stop
Could someone explain to me why non-stop (rather than all-stop) is
being focused upon first?


I skimmed the v9 code. It's good to see that support has been added
for the 'm', 'M', and 'g' commands. It appears to me that 'G' and 's'
are still missing though. As discussed by Jan, x86 will need G (or P,
but G is required by the spec whereas P is not), due to the need to
adjust the PC backwards after a break. (In gdb sources, this is
known as decr_pc_after_break.)

Something else that came to mind while reading this thread is the
issue of cache synchronization. When writing either breakpoints or
original instructions to memory, it may be necessary to flush the data
cache to memory and invalidate some lines in the instruction cache so
as to not have the CPU execute the instruction that was there prior to
the modification - i.e. you want the CPU to execute the instruction
that was just written to memory, not necessarily whatever it has lying
about it it's cache. It occurred to me that this could be one reason
for breakpoints not working after implementation of the 'M' command.
(If the kernel calls that you're using already automagically do this, then
ignore this comment...)

Kevin
Roland McGrath
2010-09-14 02:20:03 UTC
Permalink
Post by Kevin Buettner
Could someone explain to me why non-stop (rather than all-stop) is
being focused upon first?
We've tried to encourage Oleg to do whatever is easiest first.
It's not clear to me why all-stop isn't what's easiest, all in all.
Post by Kevin Buettner
Something else that came to mind while reading this thread is the
issue of cache synchronization. [...]
(If the kernel calls that you're using already automagically do this, then
ignore this comment...)
They do. Not to worry. (It's the same internal path that you get to when
gdb or gdbserver uses a ptrace syscall or a write syscall on /proc/pid/mem.)


Thanks,
Roland
Oleg Nesterov
2010-09-14 16:07:00 UTC
Permalink
Post by Roland McGrath
Post by Kevin Buettner
Could someone explain to me why non-stop (rather than all-stop) is
being focused upon first?
We've tried to encourage Oleg to do whatever is easiest first.
It's not clear to me why all-stop isn't what's easiest, all in all.
Hmm. I thought that all-stop is more simple, and the very first version
worked in this mode.

I guess, I misunderstood the subsequent discussion as if non-stop is
more important or preferred (I do not know how people use gdb). That
is why I switched to non-stop.

So, in the long term, which mode is more useful? And, I suppose that
(unfortunately ;) ugdb should support both ?

Oleg.
Jan Kratochvil
2010-09-14 16:30:01 UTC
Permalink
Post by Oleg Nesterov
I guess, I misunderstood the subsequent discussion as if non-stop is
more important or preferred (I do not know how people use gdb). That
is why I switched to non-stop.
I would find non-stop as a more general solution. In non-stop mode one still
can stop each and all the threads manually. But in all-stop mode one cannot
keep the (other) threads running.
Post by Oleg Nesterov
So, in the long term, which mode is more useful? And, I suppose that
(unfortunately ;) ugdb should support both ?
Just people are more used to the all-stop mode and also the testsuite cases do
not expect non-stop so for the compatibility reasons (both with humans and
with the testsuite) the all-stop mode at ugdb probably makes sense.

all-stop mode could be also emulated probably from the GDB client side.
Still not suitable for some specially crafted testsuite cases but those may
not be applicable for various reasons to ugdb anyway.



Thanks,
Jan
Tom Tromey
2010-09-14 21:29:53 UTC
Permalink
Jan> I would find non-stop as a more general solution. In non-stop mode
Jan> one still can stop each and all the threads manually. But in
Jan> all-stop mode one cannot keep the (other) threads running.
[...]
Jan> all-stop mode could be also emulated probably from the GDB client side.

I thought Pedro talked about this once, but I can't find the message.

This seems like a reasonable future project for us -- it seems to me
that if ugdb does the thread-specific stop filtering, then it would make
sense to notify it of thread-specific breakpoints, and then always use
non-stop.

Tom
Kevin Buettner
2010-09-15 19:43:03 UTC
Permalink
On Tue, 14 Sep 2010 18:07:00 +0200
Post by Oleg Nesterov
So, in the long term, which mode is more useful?
I don't know about long term, but the most common use case is
currently all-stop. Certainly for the short term, I think it
makes more sense to implement all-stop first. (I think you'll
run into fewer GDB bugs with all-stop...)
Post by Oleg Nesterov
And, I suppose that (unfortunately ;) ugdb should support both ?
It depends upon the goals of the ugdb project.

If the goal is to demonstrate utrace, then support for only one mode
might be sufficient. (I'll let someone else make that determination.)

If the goal is to become a full-fledged debug agent ala gdbserver,
then ugdb will need to support both.

Kevin

Oleg Nesterov
2010-09-10 22:00:07 UTC
Permalink
Post by Frank Ch. Eigler
Post by Oleg Nesterov
[...]
But, Jan. Implementing the memory writes does not mean breakpoints
automatically start to work!
It approximately should though.
No.

Frank, I guess I did a mistake, I should have read the pdf you sent
me first. I'll read it anyway later, but I think that I already
understand how this work.

gdb replaces the original insn with "int 3".

the tracee reports SIGTRAP.

Now, to continue the tracee, gdb does not restore the
original instruction. Instead, it

- writes this insn into _start code

- changes regs->ip to point to this insn

- does single-step to execute this insn

- changes regs->ip again

So. Let's forget about breakpoints temporary. ugdb needs the single
stepping first.

Damn. I spent much more time than I'd wish trying to understand this.
I misunderstood the "target byte order" part in the documentation of
P packet.

Oleg.
Tom Tromey
2010-09-10 22:12:12 UTC
Permalink
Oleg> Now, to continue the tracee, gdb does not restore the
Oleg> original instruction. Instead, it
Oleg> - writes this insn into _start code
Oleg> - changes regs->ip to point to this insn
Oleg> - does single-step to execute this insn
Oleg> - changes regs->ip again

This is what is done for non-stop.
I believe it is called "displaced stepping" in gdb.

I think eventually we would like it if uprobes did this work, instead of
gdb doing it. Presumably that would yield better performance. E.g., if
we have a thread-specific breakpoint, then other threads hitting that
breakpoint could simply do the displaced stepping via uprobes, and not
report a breakpoint hit to gdb at all.

For all-stop, breakpoints are handled differently, though I don't
remember how offhand.

Tom
Roland McGrath
2010-09-14 19:38:04 UTC
Permalink
The traditional method is to restore the original instruction replaced by
the breakpoint in text, single-step over that instruction, then restore the
breakpoint in text, then continue. That method requires all-stop so that
while you are stepping the thread that just hit the breakpoint, you can't
have another thread run past that instruction and miss the breakpoint.

Both this traditional in-place method, and the instruction-copying method,
depend on using single-step. So "stepi" has to work before "break" can work.


Thanks,
Roland
Oleg Nesterov
2010-09-14 19:43:40 UTC
Permalink
Post by Roland McGrath
Both this traditional in-place method, and the instruction-copying method,
depend on using single-step. So "stepi" has to work before "break" can work.
Yes, thanks, I already got it.

Oleg.
Loading...