GDB: How to dump Unicode Strings (i.e. provide custom data display)
Problem Generally the problem you are facing is that you have strings of UCS-2/UTF-16
unicode 16bit characters, and you'd like to print them out inside gdb as you
can do normal 8bit C-style strings. Of course the same problem applies to any
custom data-type which gdb knows nothing about, but it is with strings we have
the most problems.
Alternative Solutions You can try some gdb scripting, e.g. http://ooo.ximian.com/hackers-guide.html#section-5.7.2, http://www.mozilla.org/unix/debugging-faq.html#pruichar and http://david.fries.net/thoughts/printqstring.php. But these macros are
generally quite slow or display your strings in an ugly unnatural fashion, and
in general require the user to know the type of the value he is printing, which
can becomes a problem if there are a varied amount of types for which you
require a custom display to get human readable output. e.g. OpenOffice.org has
two unicode string classes, and two nonunicode string classes. Having more than
one macro to print them would be tedious, so a more flexible and extensible
approach would be nice to have.
Solution The hook to achieve this is that gdb can be used to directly call functions
from your executable, so you can take advantage of this by providing some
methods in your executable which return a view of the data formatted in a type
which gdb has inbuilt support, i.e. C-style string, or alternatively provide
methods which pretty print them directly to screen. Either way you can then
just ask gdb to call these functions when you wish to debug them.
When your language is C++ this has the huge advantage that gdb can emulate the
C++ resolution of overloaded function names. So if you provide a set of C++
functions which share the same name, but are overloaded for each type you wish
to have a custom debug view of, then gdb can call the correct overloaded
function for you depending on the function signature. So regardless of the type
you have just the one "custom dump" function name to use.
Example e.g. for openoffice.org there are the two nonunicode string classes of
ByteString and rtl::OString and the two unicode string classes of UniString and
rtl::OUString. We can provide an overloaded dbg_dump function for each type in
the appropiate OpenOffice.org libraries e.g...
const sal_Char *dbg_dump(const ByteString &rStr)
static ByteString aStr;
aStr = rStr;
aStr.Append(static_cast< char >(0));
const sal_Char *dbg_dump(const UniString &rStr)
return dbg_dump(ByteString(rStr, RTL_TEXTENCODING_UTF8));
const sal_Char *dbg_dump(const rtl::OString &rStr)
static rtl::OStringBuffer aStr;
aStr = rtl::OStringBuffer(rStr);
aStr.append(static_cast< char >(0));
const sal_Char *dbg_dump(const rtl::OUString &rStr)
return dbg_dump(OUStringToOString(rStr, RTL_TEXTENCODING_UTF8));
Now from gdb we can use
(gdb) print dbg_dump(sWhatever)
and if the sWhatever is a unicode type then we get a UTF-8 string, otherwise we
get a strightforward dump of a copy of the non-unicode data. Here I perferred
getting my data returned as a char pointer, so I return a pointer to a static
buffer, simply printing to stderr/stdout is an obvious alternative.
In principle it should be possible to extend the overloading for an arbitrary
number of types for which custom printing would be useful.
dbx This should also work from solaris dbx with (I think)...
(dbx) print ``dbg_dump(sWhatever)
if not, check the dbx documentation to get the correct syntax and let me know.
Quirks In practice the pseudo-overloading seems to work consistently in dbx and gdb
only when the overloaded functions are compiled with debugging symbols enabled,
otherwise things get a little flaky. For openoffice.org, due to size and time
constraints, it is common to only enable debugging symbols for the subset of
code which you are debugging, as opposed to globally which is the norm. So for
openoffice.org I place the debugging methods inside source files which are then
forced to be always built with debugging enabled.
Caolan McNamara (2004) <email@example.com>