Site Root : GdbUnicodePrinting

GDB: How to dump Unicode Strings (i.e. provide custom data display)

Problem

Generally the problem you are facing is that you have strings of UCS-2/UTF-16 unicode 16bit characters, and you'd like to print them out inside gdb as you can do normal 8bit C-style strings. Of course the same problem applies to any custom data-type which gdb knows nothing about, but it is with strings we have the most problems.

Alternative Solutions

You can try some gdb scripting, e.g. http://ooo.ximian.com/hackers-guide.html#section-5.7.2, http://www.mozilla.org/unix/debugging-faq.html#pruichar and http://david.fries.net/thoughts/printqstring.php. But these macros are generally quite slow or display your strings in an ugly unnatural fashion, and in general require the user to know the type of the value he is printing, which can becomes a problem if there are a varied amount of types for which you require a custom display to get human readable output. e.g. OpenOffice.org has two unicode string classes, and two nonunicode string classes. Having more than one macro to print them would be tedious, so a more flexible and extensible approach would be nice to have.

Solution

The hook to achieve this is that gdb can be used to directly call functions from your executable, so you can take advantage of this by providing some methods in your executable which return a view of the data formatted in a type which gdb has inbuilt support, i.e. C-style string, or alternatively provide methods which pretty print them directly to screen. Either way you can then just ask gdb to call these functions when you wish to debug them.

When your language is C++ this has the huge advantage that gdb can emulate the C++ resolution of overloaded function names. So if you provide a set of C++ functions which share the same name, but are overloaded for each type you wish to have a custom debug view of, then gdb can call the correct overloaded function for you depending on the function signature. So regardless of the type you have just the one "custom dump" function name to use.

Example

e.g. for openoffice.org there are the two nonunicode string classes of ByteString and rtl::OString and the two unicode string classes of UniString and rtl::OUString. We can provide an overloaded dbg_dump function for each type in the appropiate OpenOffice.org libraries e.g...

 //libtools
 const sal_Char *dbg_dump(const ByteString &rStr)
 {
     static ByteString aStr;
     aStr = rStr;
     aStr.Append(static_cast< char >(0));
     return aStr.GetBuffer();
 }

 //libtools
 const sal_Char *dbg_dump(const UniString &rStr)
 {
     return dbg_dump(ByteString(rStr, RTL_TEXTENCODING_UTF8));
 }

 //libsal
 const sal_Char *dbg_dump(const rtl::OString &rStr)
 {
     static rtl::OStringBuffer aStr;
     aStr = rtl::OStringBuffer(rStr);
     aStr.append(static_cast< char >(0));
     return aStr.getStr();
 }

 //libsal
 const sal_Char *dbg_dump(const rtl::OUString &rStr)
 {
     return dbg_dump(OUStringToOString(rStr, RTL_TEXTENCODING_UTF8));
 }

Now from gdb we can use

 (gdb) print dbg_dump(sWhatever) 

and if the sWhatever is a unicode type then we get a UTF-8 string, otherwise we get a strightforward dump of a copy of the non-unicode data. Here I perferred getting my data returned as a char pointer, so I return a pointer to a static buffer, simply printing to stderr/stdout is an obvious alternative.

In principle it should be possible to extend the overloading for an arbitrary number of types for which custom printing would be useful.

dbx

This should also work from solaris dbx with (I think)...

 (dbx) print ``dbg_dump(sWhatever)

if not, check the dbx documentation to get the correct syntax and let me know.

Quirks

In practice the pseudo-overloading seems to work consistently in dbx and gdb only when the overloaded functions are compiled with debugging symbols enabled, otherwise things get a little flaky. For openoffice.org, due to size and time constraints, it is common to only enable debugging symbols for the subset of code which you are debugging, as opposed to globally which is the norm. So for openoffice.org I place the debugging methods inside source files which are then forced to be always built with debugging enabled.

Links

Caolan McNamara (2004) <caolan@skynet.ie>

Last generated at Sat Nov 2 12:10:06 2013 Caolán McNamara <caolan@skynet.ie> Created with WebMake/0.5