Discussion:
dwarf name canonicalization
Tom Tromey
2010-02-23 20:52:21 UTC
Permalink
Keith, do you happen to know offhand what things in gdb rely on
canonicalizing C++ names in the dwarf reader?

I was discussing this canonicalization with a user who prefers to remain
anonymous. His experience is that this canonicalization greatly slows
down gdb startup (he said 15%), and in his experience isn't needed for
his use case, which is running gdb as part of an IDE.

I'm wondering whether it would make sense to somehow disable this, maybe
via some special mode for IDEs to use. I thought maybe you'd know what
would break...

My understanding is that the typical IDE use cases are much more
restricted than what CLI users do. E.g., in the IDE case, most
breakpoints are set by "file:line" and expression evaluation is not as
important.

Tom
Keith Seitz
2010-02-23 22:06:50 UTC
Permalink
Post by Tom Tromey
Keith, do you happen to know offhand what things in gdb rely on
canonicalizing C++ names in the dwarf reader?
I have a pretty good idea. :-)
Post by Tom Tromey
I was discussing this canonicalization with a user who prefers to remain
anonymous. His experience is that this canonicalization greatly slows
down gdb startup (he said 15%), and in his experience isn't needed for
his use case, which is running gdb as part of an IDE.
[OT: I would love a test case. I *pleaded* for specific test cases.]

Anonymous obviously has evidence that his IDE can work around the
problems of generic input which we must deal with in the console. I see
no reason why anonymous shouldn't submit a patch to disable
canonicalization (and related). He'll probably want to also disable
dwarf2_physname and bring back DW_AT_MIPS_linkage_name (assuming that
doesn't disappear altogether).
Post by Tom Tromey
I'm wondering whether it would make sense to somehow disable this, maybe
via some special mode for IDEs to use. I thought maybe you'd know what
would break...
Unless the IDE provided a console that accepted generic input (like
"normal" gdb), I don't think that much would break, if anything. IDEs
really rather rely on linespecs for the most part, no? As long as you're
not sending input to gdb that looks like a function name, you should be
safe. But I cannot guarantee. I have no first-hand experience with IDEs
(in many years).

I would much rather address (fix?) the speed problem first. The idea of
multiple paths through the code for the "same" task would seem a high
bit rot risk.

Keith
André Pönitz
2010-02-25 11:16:16 UTC
Permalink
Post by Keith Seitz
[OT: I would love a test case. I *pleaded* for specific test cases.]
Yes, I remember that. Sorry. I still had it on my TODO list, just had not
found the time to create something that it's easily reproducible without
too much external dependencies.

Looks like it is time to act now.

All my "real" use cases would require Qt which is probably not acceptable
here, so let me have a shot at a contrived example that I'd consider
structurally not too far off from reality, a ~1000 function "project",
structured like this:

----------------------- lib1.h --------------------
#ifndef LIB1_H
#define LIB1_H

#include <string>
#include <vector>

#include <map>

namespace ns {
namespace inner {

struct Foo1
{
int foo0(std::map<std::string, std::vector<std::string> > &map,
const std::string &index, const std::string &x);
// [...]
int foo25(std::map<std::string, std::vector<std::string> > &map,
const std::string &index, const std::string &x);
int sum();
};
[...]

----------------------- lib1.cpp --------------------
[...]
int Foo1::foo25(std::map<std::string, std::vector<std::string> > &map,
const std::string &index, const std::string &x)
{
return map[index].size() < x.size();
}

int Foo1::sum()
{
int t = 0;
std::map<std::string, std::vector<std::string> > m;
m["key 0"].push_back("value 0");
t += foo0(m, "key 1", "xxx");
[...]
return t;
}

----------------------- main.cpp --------------------
#include "lib1.h"
[...]

using namespace ns::inner;

int main()
{
int s = 0;
s += Foo0().sum();
s += Foo1().sum();
[...]
return s;
}


I'll attach a perl script generating the code. Don't look at the actual code
too close, it really does not matter. A quick test also indicates that neither
the number or files nor of functions make a difference for the time ratio.

With 7.0.90 gdb spends 15.48% of its instructions in dwarf2_canonicalize_name
and functions called from there, with 7.0.1 it is only 0.04%.

Total instruction count is 429,137,527 vs 516,590,964.
Both versions of gdb are compiled with -O2 -g using gcc 4.4.1.

I certainly do understand that instruction count does not need to mean
much, but it is fairly reproducible and in this case it correlates indeed with
wall clock times.

Note that the number will get _much_ worse when it comes to "modern"
C++ like code using template expressions or even
Post by Keith Seitz
[...] Unless the IDE provided a console that accepted generic input (like
"normal" gdb), I don't think that much would break, if anything. IDEs
really rather rely on linespecs for the most part, no? As long as you're
not sending input to gdb that looks like a function name, you should be
safe. But I cannot guarantee. I have no first-hand experience with IDEs
(in many years).
From my point of view it is a safe assumption that most if not all IDE users
would prefer a 15% startup time gain over an improved parsing of function
names - especially since they are very unlikely to ever use anyway.

However, it looks like it does not even have to be an either-or here. If
the canonicalization would be made optional using, say, some 'maint set'
switch, a user could make his own choice, and an IDE could even apply
some "cleverness" like switching canonicalization off in the beginning
and reload with canonicalization as soon as the user triggers an operation
that needs canonicalization. Or maybe even retrieve a list of uncanonicalized
symbols and match user input against that before bothering gdb with it.
Post by Keith Seitz
I would much rather address (fix?) the speed problem first. The idea of
multiple paths through the code for the "same" task would seem a high
bit rot risk.
I am not sure this will solve the problem. Even if you were able to speed up
canonicalization by, say, 30%, it would still impact startup times by 10%,
unconditionally, no matter whether the result is ever needed. And 10% are
highly visible when the total time is in the "several dozen seconds" range.

Andre'
Tom Tromey
2010-02-25 19:37:51 UTC
Permalink
André> All my "real" use cases would require Qt which is probably not
André> acceptable here

I wanted to reply to this quickly, so I haven't actually read the rest
of your note yet :)

Anything in Fedora is fine as a test case for this kind of thing. It is
simple for us to install the needed packages and debuginfo.

Tom

Loading...