Discussion:
gdb side of improved linker-debugger interface
Gary Benson
2011-06-07 09:48:08 UTC
Permalink
Hi all,

A week or so ago I mailed this list a description of a new linker-
debugger interface based on SystemTap probes, along with a patch
containing the glibc side:

http://www.cygwin.com/ml/archer/2011-q2/msg00000.html

In the meantime I've been working on the gdb side of this, which is
available in the archer-gbenson-stap-rtld branch. I've also attached
a patch (which applies to archer-sergiodj-stap-patch-split) in case
you just want to have a look.

The deal is basically that gdb would normally set one solib event
breakpoint, on _dl_debug_state. This patch makes it look for the two
SystemTap probes described in my previous mail, and if it finds them,
set a pair of breakpoints on those instead. When stop-on-solib-events
is set, gdb will stop on both breakpoints, mimicing the old behaviour.
When stop-on-solib-events is not set, gdb will only stop on the post-
modification breakpoint. This allows solib mapping and unmapping to
be tracked as before while almost halving the time taken (I did some
basic tests, eg loading 1000 simple libraries took 14s as opposed to
26s). This helps https://bugzilla.redhat.com/show_bug.cgi?id=698001.
The post-modification breakpoint is called later than previously, to
match where Solaris libc calls _dl_debug_state, which addresses
https://bugzilla.redhat.com/show_bug.cgi?id=658851.

I've not implemented anything to do with dlmopen, but the information
required for gdb to discover the libraries' namespaces can be obtained
from the SystemTap probes. I've also not done anything special about
STT_GNU_IFUNC functions, since if one of these is causing problems you
could simply set gdb to break on its relocator without needing any
special support in stop-on-solib-events.

I've seen mention that the RT_CONSISTENT called too early bug was the
cause of another issue where libthread_db would not be loaded if
libpthread was loaded with dlopen rather than being linked in, but I
haven't been able to discover if this really was ever the case. If
anybody knows (or has a testcase) could you fill me in on it please.

I'm still waiting for a reply to my mail about in-process debuggers,
so I haven't changed anything on that regard yet.

Cheers,
Gary
--
http://gbenson.net/
Tom Tromey
2011-06-07 16:29:10 UTC
Permalink
Gary> A week or so ago I mailed this list a description of a new linker-
Gary> debugger interface based on SystemTap probes, along with a patch
Gary> containing the glibc side:
Gary> http://www.cygwin.com/ml/archer/2011-q2/msg00000.html

If this is ready then the next step is to try to get it into glibc.

I don't think it can go into glibc mainline, but perhaps we can get it
onto Roland's branch.

One thing that would be useful is a write-up explaining the issues and
why this approach is correct. What I recall from some bug (I don't
recall which one) on the topic is that Ulrich was of the opinion that
the debug hook was in the right spot and that GDB was wrong. It would
be good to have a rebuttal for this position in particular.

I'm not sure what the protocol is here -- email to glibc-alpha, or just
to Roland. glibc-alpha will likely get you flamed, don your protective
gear.

Gary> I've seen mention that the RT_CONSISTENT called too early bug was the
Gary> cause of another issue where libthread_db would not be loaded if
Gary> libpthread was loaded with dlopen rather than being linked in, but I
Gary> haven't been able to discover if this really was ever the case. If
Gary> anybody knows (or has a testcase) could you fill me in on it please.

I failed in my attempt to make a simple reproducer.
I know David Malcolm had this reliably fail with his Python plugin for
GCC; so you might try that, though it seems rather large for a test case.

Gary> + /* SystemTap probes. */
Gary> + const struct stap_probe *pre_mod_probe;
Gary> + const struct stap_probe *post_mod_probe;

You are going to have to merge or rebase and update this a bit.
I changed find_probe_in_objfile on the branch, to account for probes
with multiple locations.

Once the patch is ready I think it should go on archer-sergiodj-stap and
then on origin/archer-sergiodj-stap-patch-split as the last patch in the
series.

Gary> +static int
Gary> +svr4_update_solib_event_breakpoint (struct breakpoint *b, void *arg)
Gary> +{
Gary> + struct svr4_info *info = get_svr4_info ();
Gary> + struct bp_location *loc;
Gary> +
Gary> + for (loc = b->loc; loc; loc = loc->next)
Gary> + {
Gary> + if (loc->pspace == current_program_space
Gary> + && loc->address == info->pre_mod_probe->address)
Gary> + {
Gary> + b->enable_state = stop_on_solib_events;

This should have a constant from `enum enable_state' on the RHS.

Gary> - create_solib_event_breakpoint (target_gdbarch, sym_addr);
Gary> + svr4_create_solib_event_breakpoints (target_gdbarch, sym_addr);

I don't know this code very well, but it seems to me that it is looking
at different ways to set the solib breakpoints. Your current code runs
at the very end -- GDB still tries the existing ways to find the
breakpoint locations, and then, if one is found, instead inserts at the
probe point if that is found.

Assuming this reading is correct, it seems like it would be better to
just check for SystemTap probe points early in enable_break, and then
prefer those to the other methods.

Tom
Tom Tromey
2011-06-08 16:27:24 UTC
Permalink
Gary> I did it that way to ensure we're looking for the probes in the
Gary> correct object. It uses the original code to locate _dl_debug_state,
Gary> then assumes that the object containing _dl_debug_state is the linker
Gary> and looks for the probes in that. I don't know if this is the right
Gary> solution though.

Ok, I see. I think it would be fine the other way, but this is ok with
me too.

Tom
Gary Benson
2011-06-09 09:15:58 UTC
Permalink
Post by Tom Tromey
Gary> I did it that way to ensure we're looking for the probes in
Gary> the correct object. It uses the original code to locate
Gary> _dl_debug_state, then assumes that the object containing
Gary> _dl_debug_state is the linker and looks for the probes in
Gary> that. I don't know if this is the right solution though.
Ok, I see. I think it would be fine the other way, but this is
ok with me too.
What would the other way be? I'm aware the way I've done it is
convoluted, and obviously it would be neater and simpler to just
check at the top of enable_break, but I don't know how you would
locate the objfile (unless there another way of locating probes
that doesn't require that?) I guess my thinking was that there
is all this code in enable_break for dealing with all kinds of
different platforms---and that obviously works---and I wanted to
take advantage of that.

Cheers,
Gary
--
http://gbenson.net/
Tom Tromey
2011-06-09 14:42:38 UTC
Permalink
Post by Tom Tromey
Ok, I see. I think it would be fine the other way, but this is
ok with me too.
Gary> What would the other way be?

Oh, nothing, since I was not thinking clearly about what this code is
doing.

Tom

Gary Benson
2011-06-08 14:12:39 UTC
Permalink
Post by Tom Tromey
Gary> A week or so ago I mailed this list a description of a new
Gary> linker- debugger interface based on SystemTap probes, along
Gary> http://www.cygwin.com/ml/archer/2011-q2/msg00000.html
If this is ready then the next step is to try to get it into glibc.
I don't think it can go into glibc mainline, but perhaps we can get
it onto Roland's branch.
One thing that would be useful is a write-up explaining the issues
and why this approach is correct. What I recall from some bug (I
don't recall which one) on the topic is that Ulrich was of the
opinion that the debug hook was in the right spot and that GDB was
wrong. It would be good to have a rebuttal for this position in
particular.
I was thinking about this last night. What I'm considering is adding
at least one more probe point, so there's one at the original location
(where Uli thinks is right) and one where we actually want to break.
I also want to look into the thing on
https://bugzilla.redhat.com/show_bug.cgi?id=698001#c1 where they
mention calling their function (_dl_debug_fast_state) *only* when the
link map has actually changed as I didn't realise there were
situations where it wouldn't.
Post by Tom Tromey
I'm not sure what the protocol is here -- email to glibc-alpha, or
just to Roland. glibc-alpha will likely get you flamed, don your
protective gear.
I emailed Andreas privately a while back and he said to discuss it on
glibc-alpha, so I guess that's the place for this.
Post by Tom Tromey
Gary> I've seen mention that the RT_CONSISTENT called too early bug
Gary> was the cause of another issue where libthread_db would not be
Gary> loaded if libpthread was loaded with dlopen rather than being
Gary> linked in, but I haven't been able to discover if this really
Gary> was ever the case. If anybody knows (or has a testcase) could
Gary> you fill me in on it please.
I failed in my attempt to make a simple reproducer.
I know David Malcolm had this reliably fail with his Python plugin
for GCC; so you might try that, though it seems rather large for a
test case.
I have this now: http://sourceware.org/ml/archer/2011-q2/msg00022.html
Post by Tom Tromey
Gary> + /* SystemTap probes. */
Gary> + const struct stap_probe *pre_mod_probe;
Gary> + const struct stap_probe *post_mod_probe;
You are going to have to merge or rebase and update this a bit. I
changed find_probe_in_objfile on the branch, to account for probes
with multiple locations.
Ok, thanks for the heads up.
Post by Tom Tromey
Gary> +static int
Gary> +svr4_update_solib_event_breakpoint (struct breakpoint *b, void *arg)
Gary> +{
Gary> + struct svr4_info *info = get_svr4_info ();
Gary> + struct bp_location *loc;
Gary> +
Gary> + for (loc = b->loc; loc; loc = loc->next)
Gary> + {
Gary> + if (loc->pspace == current_program_space
Gary> + && loc->address == info->pre_mod_probe->address)
Gary> + {
Gary> + b->enable_state = stop_on_solib_events;
This should have a constant from `enum enable_state' on the RHS.
Ah, ok, I'll change it.
Post by Tom Tromey
Gary> - create_solib_event_breakpoint (target_gdbarch, sym_addr);
Gary> + svr4_create_solib_event_breakpoints (target_gdbarch, sym_addr);
I don't know this code very well, but it seems to me that it is
looking at different ways to set the solib breakpoints. Your
current code runs at the very end -- GDB still tries the existing
ways to find the breakpoint locations, and then, if one is found,
instead inserts at the probe point if that is found.
Assuming this reading is correct, it seems like it would be better
to just check for SystemTap probe points early in enable_break, and
then prefer those to the other methods.
I did it that way to ensure we're looking for the probes in the
correct object. It uses the original code to locate _dl_debug_state,
then assumes that the object containing _dl_debug_state is the linker
and looks for the probes in that. I don't know if this is the right
solution though.

Cheers,
Gary
--
http://gbenson.net/
Jan Kratochvil
2011-06-07 17:19:45 UTC
Permalink
Post by Gary Benson
I've seen mention that the RT_CONSISTENT called too early bug was the
cause of another issue where libthread_db would not be loaded if
libpthread was loaded with dlopen rather than being linked in, but I
haven't been able to discover if this really was ever the case. If
anybody knows (or has a testcase) could you fill me in on it please.
the testcase from:
https://bugzilla.redhat.com/show_bug.cgi?id=179072#c13

works for me - that is it does not work. I haven't tried it with your
glibc+gdb, I guess it works now?


Thanks,
Jan
Gary Benson
2011-06-08 11:00:44 UTC
Permalink
Post by Jan Kratochvil
Post by Gary Benson
I've seen mention that the RT_CONSISTENT called too early bug was
the cause of another issue where libthread_db would not be loaded
if libpthread was loaded with dlopen rather than being linked in,
but I haven't been able to discover if this really was ever the
case. If anybody knows (or has a testcase) could you fill me in
on it please.
https://bugzilla.redhat.com/show_bug.cgi?id=179072#c13
works for me - that is it does not work. I haven't tried it with
your glibc+gdb, I guess it works now?
It actually doesn't fail on standard F14 gdb:

$ gdb ./testload
GNU gdb (GDB) Fedora (7.2-51.fc14)
Copyright (C) 2010 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /home/gary/work/archer/rh179072-tests/testload...done.
(gdb) r
Starting program: /home/gary/work/archer/rh179072-tests/testload
[Thread debugging using libthread_db enabled]
[New Thread 0x7ffff7dd5700 (LWP 5584)]
[Thread 0x7ffff7dd5700 (LWP 5584) exited]

Program exited normally.

Not sure what's happening there...

Cheers,
Gary
--
http://gbenson.net/
Jan Kratochvil
2011-06-08 11:48:16 UTC
Permalink
Post by Jan Kratochvil
https://bugzilla.redhat.com/show_bug.cgi?id=179072#c13
[...]
Please do:
# prelink -u /lib64/libpthread-2.13.so

$ gdb ./testload
GNU gdb (GDB) Fedora (7.2-51.fc14)
Copyright (C) 2010 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /home/jkratoch/t/rh179072-tests/testload...done.
(gdb) r
Starting program: /home/jkratoch/t/rh179072-tests/testload
[Thread debugging using libthread_db enabled]
Cannot find new threads: generic error
(gdb) _


Thanks,
Jan
Gary Benson
2011-06-08 12:28:35 UTC
Permalink
Post by Jan Kratochvil
Post by Jan Kratochvil
https://bugzilla.redhat.com/show_bug.cgi?id=179072#c13
[...]
# prelink -u /lib64/libpthread-2.13.so
$ gdb ./testload
GNU gdb (GDB) Fedora (7.2-51.fc14)
Copyright (C) 2010 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /home/jkratoch/t/rh179072-tests/testload...done.
(gdb) r
Starting program: /home/jkratoch/t/rh179072-tests/testload
[Thread debugging using libthread_db enabled]
Cannot find new threads: generic error
(gdb) _
Ah, after unprelinking I get that failure with F14 gdb. But, it works
nicely with the new stuff:

$ ../build/gdb/gdb -args ../../glibc/install/lib/ld-2.13.90.so ./testload
GNU gdb (GDB) 7.3.50.20110510-cvs
Copyright (C) 2011 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-unknown-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /home/gary/work/glibc/install/lib/ld-2.13.90.so...done.
(gdb) r
Starting program: /home/gary/work/glibc/install/lib/ld-2.13.90.so ./testload
[Thread debugging using libthread_db enabled]
[New Thread 0x7ffff762a700 (LWP 6969)]
[Thread 0x7ffff762a700 (LWP 6969) exited]
[Inferior 1 (process 6966) exited normally]
(gdb) _

I'm glad I have this reproduced now, it's IMO a much more compelling
argument for the fix than the somewhat arbitrary testcase the bug was
originally filed with.

Thanks!

Cheers,
Gary
--
http://gbenson.net/
Loading...