2010/05/14

C++ RAII adapter for Xerces

Posted in Code, CodeProject tagged , , , at 13:22 by Orjan

Xerces is a powerful validating XML parser that supports both DOM and SAX. It’s written in a simple subset of C++, and designed to be portable across the greatest possible number of platforms. For a number of reasons, the strings used in Xerces are zero-terminated 16-bit integer arrays, and data tends to be passed around by pointers. The responsibility for managing the lifetime of the DOM data passed around is usually Xerces’, but not always. Some types must always be released explicitly, while for others, this is optional.

In other words, this is a job for the RAII idiom. Alas, we can’t reach for our boost::shared_ptr[1] or std::auto_ptr, since Xerces has its own memory manager, and when Xerces creates an object for you, it is not guaranteed to be safe to simply call delete. Instead, you must call the object’s release() function.

Something like this would probably do the job:

class auto_xerces_ptr
{
    DOMNode* item_;
public:
    auto_xerces_ptr(DOMNode* i)
    : item_(i)
    {}
    ~auto_xerces_ptr()
    {
        item_->release();
    }
    DOMNode* get()
    {
        return item_;
    }
};

// Set up a parser
...
// Use wrapper for types that must be released
auto_xerces_ptr domDocument(parser->adoptDocument());
// Use wrapped object
domDocument.get()->getDocumentElement();
...
// We don't need to remember to call release - it's automatic


However, while the DOMNode class serves as base class for all the classes that need to be released, most of the classes it is base for do not need to be released explicitly. (See documentation for full list.) While they usually can be released without ill effects, it’s probably safest to avoid releasing objects that are already looked after elsewhere. Basically, if the object has an owner, we should leave it alone. So let’s amend that destructor a bit, and add some extra safety and helpfulness.

    ~auto_xerces_ptr()
    {
        xerces_release();
    }
    void xerces_release()
    {
        if ((0 != item_) && (0 == item_->getOwnerDocument()))
        {
            item_->release();
            item_ = 0;
        }
    }
    DOMNode* yield()
    {
        DOMNode* temp = item_;
        item_ = 0;
        return temp;
    }


As you see, I’ve made a function to explicitly release, should you wish to do so, with some sanity checking, and a function to give up the held pointer. Because nomenclature can never be simple and common, I’ve chosen to call the releasing function xerces_release() rather than simply release(), because the std::auto_ptr, which is a quite well known RAII utility class, also has a function called release(). In that case, however, it doesn’t release the memory safely, like Xerces does, but its hold of the data, like my function yield() above. Without looking at the actual implementation, someone seeing an auto_xerces_ptr::release() function being called in the code might think it does a Xerces DOMNode::release(), or that it does the equivalent of std::auto_ptr::release(). Rather than risk that sort of confusion, I’ve opted for the verbose.

Now, that’s all fine and dandy, but doesn’t help with the biggest Xerces memory leaker – the strings. The Xerces type XMLCh is a UTF-16 character, and there is a helpful class – XMLString – to help you convert between XMLCh* and other formats, particularly char*, and copy these strings. We don’t have to worry about any strings we have given to a Xerces object, since these are well managed internally. However, we must be wary when making copies, with the XMLString::replicate and XMLString::transcode functions, as they create strings we are responsible for, and which we must release with a call to the XMLString::release function.

// We have an XML element like this: <bob>an apple</bob>
...
// We don't need to worry about this, it's owned by the node
const XMLCh* s1 = pNode1->getNodeValue(); // s1 = "an apple" 
// But it points to the actual node value, so we make our own copy
XMLCh s2 = XMLString::replicate(s1);
// Do things with our copy
...
// Must remember to release the copied string when done with it
XMLString::release(s2);

// Convert into a format the rest of the system can deal with
char* s3 = XMLString::transcode(s1);
// Do things with our transcoded copy
...
// Must remember to release the copied string when done with it
XMLString::release(s3);

// It's easy to forget, though, and to write concise code...
std::string s4 = XMLString::transcode(s1);
// That memory is instantly leaked!


Takes you back, doesn’t it? Just like the olden days, before std::string (and TString, and CString and …) when strings were pure C like K&R intended. [shudder]

So, that’s just another couple of classes to write, right? One to manage XMLCh* and one to manage char*. Let’s call them auto_xerces_XMLCH_ptr and auto_xerces_char_ptr… No, scrap that, that’s bad design. Instead, let’s extend the auto_xerces_ptr to handle multiple types. In other words, let’s make it a template class:

template <typename T>
class auto_xerces_ptr
{
    T* item_;
public:
    auto_xerces_ptr(T* i)
    : item_(i)
    {}
    ~auto_xerces_ptr()
    {
        item_->release();
...


Hang on, that won’t work; there’s no release() member function for char. If the data type is a XMLCh or char, we must call XMLString::release, otherwise we should call the data object’s member function. Can we have an internal releasing function – let’s call it do_release – and overload it? Well, not quite:

template <typename T>
class auto_xerces_ptr
{
    void do_release(T* i)
...
    void do_release(char* i) // Possible compilation error!


Here, the compiler will complains that for a auto_xerces_ptr<char> there are two definitions of void do_release(char* i). However, you can achieve the desired functionality through template specialisation, where you tell the compiler that for a certain template type, it should use a specialised function (or class, in the case of class templates) rather than the generic one.

template <typename T>
class auto_xerces_ptr
{
    // Hide copy constructor and assignment operator
    auto_xerces_ptr(const auto_xerces_ptr&);
    auto_xerces_ptr& operator=(const auto_xerces_ptr&);

    // Function to release Xerces data type
    template <typename T> 
    static void do_release(T*& item)
    {
        // Only release this if it has no parent (otherwise
        // parent will release it)
        if (0 == item->getOwnerDocument())
            item->release();
    }

    // Specializations for character types, which needs to be
    // released by XMLString::release
    template <> 
    static void do_release(char*& item)
    {
        XMLString::release(&item);
    }

    template <> 
    static void do_release(XMLCh*& item)
    {
        XMLString::release(&item);
    }
    // The actual data we're holding
    T* item_;

public:
    auto_xerces_ptr()
        : item_(0)
    {}

    explicit auto_xerces_ptr(T* i)
        : item_(i)
    {}

    ~auto_xerces_ptr()
    {
        xerces_release();
    }

    // Assignment of data to guard (not chainable)
    void operator=(T* i)
    {
        reassign(i);
    }

    // Release held data (i.e. delete/free it)
    void xerces_release()
    {
        if (!is_released())
        {
            // Use type-specific release mechanism
            do_release(item_);
            item_ = 0;
        }
    }

    // Give up held data (i.e. return data without releasing)
    T* yield()
    {
        T* tempItem = item_;
        item_ = 0;
        return tempItem;
    }

    // Release currently held data, if any, to hold another
    void assign(T* i)
    {
        xerces_release();
        item_ = i;
    }

    // Get pointer to the currently held data, if any
    T* get()
    {
        return item_;
    }

    // Return true if no data is held
    bool is_released() const
    {
        return (0 == item_);
    }
};

// Use wrapper for types that must be released
auto_xerces_ptr domDocument(parser->adoptDocument());
...

const XMLCh* s1 = pNode1->getNodeValue(); // s1 = "an apple" 
// Make our own copy
auto_xerces_ptr<XMLCh> s2(XMLString::replicate(s1));
// Do things with our copy, without worrying about releaseing
...

// Convert into a format the rest of the system can deal with
auto_xerces_ptr<char> s3(XMLString::transcode(s1));
// Do things with our transcoded copy, no worries
...

std::string s4 = auto_xerces_ptr<char>(XMLString::transcode(s1)).get();
// That memory is now released as soon as the std::string assignment is finished


There it is, code completed. We don’t even have to worry about accidentally using it to wrap a string that is pointing to element data, since those are given as const XMLCh*, and the compiler will complain that there is no constructor for auto_xerces_ptr that takes a const pointer. Take it for a spin and see if it’s useful for you, and let me know what you think.

[1] Now also available as tr1::shared_ptr, and soon (at the time of writing) as std::shared_ptr.

About these ads

4 Comments »

  1. Hm, useful knowledge about Xerces. But you can just wrap a boost::shared_ptr instead of inventing things from scratch (this also gives you sharing). If you want to do that, simply specify a custom deleter to the boost::shared_ptr; that’s what it’s for.

    However, this currently precludes using boost::make_shared, which doesn’t support a custom deleter. I.e., it means an extra technically needless allocation internally in the boost::shared_ptr, the same as before boost::make_shared was introduced. On the other hand, people were generally prepared to pay that price for the convenience of boost::shared_ptr, before boost::make_shared.

    I can think of some workarounds but as Fermat wrote, this margin is too small… :-)

    Cheers,

    – Alf

    • Orjan said,

      Thanks, I had completely forgot about the custom deleter facility. Unfortunately, the need for two deleters (one for DOMNode, one for string types) complicates matters somewhat, but I’ll have a little play around to try to put together a “shared_xerces_ptr”. I haven’t had a need for sharing, but I can see where it would be very handy.

      Since Xerces handles all allocation of the types it supports, I’m afraid make_shared wouldn’t be useful anyway.

      • Yeah, my comments about make_shared were pretty, uh, stupid. I tried to find this again to add comment to the comment, like “disregard the last half”, but couldn’t find it. But now that you commented I got a mail about that with the URL. :-)

        Anyway, I’ve downloaded Xerces and started playing with it.

        Cheers,

        – Alf

  2. [...] in Code tagged C++, template, XML at 11:15 by Orjan In my previous entry, C++ RAII adapter for Xerces, I presented a simple memory management wrapper for Xerces types. Because of the way Xerces manages [...]


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

%d bloggers like this: