<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	xmlns:georss="http://www.georss.org/georss" xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" xmlns:media="http://search.yahoo.com/mrss/"
	>

<channel>
	<title>Cool Cow Studio</title>
	<atom:link href="http://coolcowstudio.wordpress.com/feed/" rel="self" type="application/rss+xml" />
	<link>http://coolcowstudio.wordpress.com</link>
	<description>Working code from a working coder</description>
	<lastBuildDate>Tue, 26 Feb 2013 11:28:02 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.com/</generator>
<cloud domain='coolcowstudio.wordpress.com' port='80' path='/?rsscloud=notify' registerProcedure='' protocol='http-post' />
<image>
		<url>http://s2.wp.com/i/buttonw-com.png</url>
		<title>Cool Cow Studio</title>
		<link>http://coolcowstudio.wordpress.com</link>
	</image>
	<atom:link rel="search" type="application/opensearchdescription+xml" href="http://coolcowstudio.wordpress.com/osd.xml" title="Cool Cow Studio" />
	<atom:link rel='hub' href='http://coolcowstudio.wordpress.com/?pushpress=hub'/>
		<item>
		<title>A synchronous observer of asynchronous events</title>
		<link>http://coolcowstudio.wordpress.com/2012/11/09/a-synchronous-observer-of-asynchronous-events/</link>
		<comments>http://coolcowstudio.wordpress.com/2012/11/09/a-synchronous-observer-of-asynchronous-events/#comments</comments>
		<pubDate>Fri, 09 Nov 2012 15:50:59 +0000</pubDate>
		<dc:creator>Orjan</dc:creator>
				<category><![CDATA[Code]]></category>
		<category><![CDATA[CodeProject]]></category>
		<category><![CDATA[C++]]></category>
		<category><![CDATA[Multithreading]]></category>
		<category><![CDATA[Win32]]></category>

		<guid isPermaLink="false">http://coolcowstudio.wordpress.com/2012/11/09/a-synchronous-observer-of-asynchronous-events/</guid>
		<description><![CDATA[Introduction In the Observer design pattern, a subject holds a list of interested parties &#8211; the observers &#8211; which it will notify about changes in status. Simply put, it&#8217;s a form of subscription, and this design comes up in all sorts of places (which is one of the definitions of the term &#8216;design pattern&#8216;). It&#8217;s [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=coolcowstudio.wordpress.com&#038;blog=13675819&#038;post=142&#038;subd=coolcowstudio&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<h2>Introduction</h2>
<p>In the <a href="http://en.wikipedia.org/wiki/Observer_pattern">Observer design pattern</a>, a subject holds a list of interested parties &#8211; the observers &#8211; which it will notify about changes in status. Simply put, it&#8217;s a form of subscription, and this design comes up in all sorts of places (which is one of the definitions of the term &#8216;<a href="http://en.wikipedia.org/wiki/Design_pattern_%28computer_science%29">design pattern</a>&#8216;). It&#8217;s well suited for handling asynchronous events, like user interaction in a GUI, sensor information, and so on.</p>
<p>There is, however, often a need to re-synchronise asynchronous events. For instance, you might keep the latest status update until it&#8217;s actually needed for display, storage or some calculation. By doing this, you disregard the asynchronous nature of its source, and treat it as just another variable, as if it had been read from the subject right then. In other words, you synchronise a status from the past with the present. Sometimes, though, you don&#8217;t want the <i>last</i> value, but the <i>next</i>, which is a bit more complex, as it requires you to wait for the future to happen before we can say it&#8217;s the present.</p>
<p>In this article, we will write a simple multi-threaded example implementation of the Observer pattern, and show how to re-synchronise a past event to look current. Then we&#8217;ll demonstrate a technique to treat future events like they&#8217;re current, too.</p>
<p><span id="more-142"></span></p>
<h2>Thread safety</h2>
<p>If, as is often the case, the observed subject and the part of the system that manages the observers run in different threads, we need to make sure that they can co-exist in a friendly manner. Specifically, we must make sure that adding or removing an observer &#8211; activities that takes place outside the observed thread &#8211; does not interfere with the reporting of events.</p>
<p>In other words, accessing the list of observers is a <i><a href="http://en.wikipedia.org/wiki/Critical_section">critical section</a></i> of the code, which may not be interrupted. Since this is a quite common situation, operating systems that support multi-threading also provide tools to handle it. On Windows, this is done with a <code>CRITICAL_SECTION</code> object.</p>
<p>(If you use MFC, there is an <a href="http://msdn.microsoft.com/en-us/library/h5zew56b%28v=vs.80%29.aspx">eponymous wrapper</a> for it. However, the implementations of the MFC synchronisation objects have been badly <a href="http://www.flounder.com/avoid_mfc_syncrhonization.htm">flawed in the past</a>, and I believe they still are.)</p>
<p>Checking the <a href="http://msdn.microsoft.com/en-us/library/windows/desktop/ms682530%28v=vs.85%29.aspx">documentation</a>, we see that there are functions available to create and destroy <code>CRITICAL_SECTION</code>, and to enter and leave a locked state. We also see that it can be entered and left recursively, as long as it&#8217;s the same thread. Knowing all this, we can write a C++ class to manage it.</p>
<p>Why not use the Windows API directly? The same reason for almost all C++ wrappers of OS objects &#8211; lifetime management. Instead of having to remember to call <code>DeleteCriticalSection</code> everywhere it might be needed, we can do that in the destructor, as per the <a href="http://en.wikipedia.org/wiki/Resource_Acquisition_Is_Initialization">Resource Acquisition Is Initialization</a> idiom.</p>
<pre class="brush: cpp;">
// Need CRITICAL_SECTION declaration
#include &lt;Windows.h&gt;

// Class wrapping Win32 CRITICAL_SECTION
class CriticalSection
{
public:
  // Constructor
  CriticalSection()
  { 
    ::InitializeCriticalSection(&amp;cs_); 
  }

  // Destructor
  ~CriticalSection()
  { 
    ::DeleteCriticalSection(&amp;cs_); 
  }
  
  // Enter critical section
  void Enter()
  { 
    ::EnterCriticalSection(&amp;cs_); 
  }

  // Leave critical section
  void Leave()
  { 
    ::LeaveCriticalSection(&amp;cs_);
  }

private:
  // Hide copy operations
  CriticalSection(const CriticalSection&amp;);
  CriticalSection&amp; operator=(const CriticalSection&amp;);
  
  // Data member
  CRITICAL_SECTION cs_;
};
</pre>
<p>Quite simple, really, with little overhead. And because every <code>Enter</code> must be matched by a <code>Leave</code>, the sensible thing to do is to write a RAII wrapper for that, too. If we don&#8217;t, odds are that at some point, we&#8217;ll alter the code using the <code>CriticalSection</code> and introduce a new exit point, via a function return or exception, which won&#8217;t get the <code>Leave</code> function called. A RAII wrapper helps code robustness.</p>
<pre class="brush: cpp;">
// RAII Critical section lock
class CSLock
{
public:
  // Constructor
  CSLock(CriticalSection&amp; section)
  : section_(section) 
  { 
    section_.Enter(); 
  }
  
  // Destructor
  ~CSLock()
  { 
    section_.Leave(); 
  }
    
private:
  // Hide copy operations
  CSLock(const CSLock&amp;);
  CSLock&amp; operator=(const CSLock&amp;);
    
  // Data member
  CriticalSection&amp; section_;
};
</pre>
<p>This automates the locking, so that we only need declare a <code>CSLock</code> at the beginning of the scope we wish to lock for interruption, and will automatically unlock as we leave the scope and the destructor is called. We&#8217;ll see examples of how these are used below.</p>
<h2>Something to see</h2>
<p>Now that we have the tools to support a multi-threaded application, let&#8217;s write a simple Observer system. This requires something to observe, and something to do the observing. For this example, we&#8217;ll make a <code>Subject</code> class, which declares an internal, abstract <code>Subject::Observer</code> class, from which we&#8217;ll derive our observers. The <code>Subject</code> notification in this example will send an integer to the observers.
<pre class="brush: cpp;">
#include &lt;set&gt;
#include "CSLock.h"

// Subject to be observed
class Subject
{
public:
  // Abstract observer
  class Observer
  {
    // The subject we're observing
    Subject* subject_;
    // The subject is a friend, so it can help manage our relationship
    friend class Subject;

    // Assign a subject we've been registered with
    void SetSubject(Subject* subject)
    {
      // Is it the one we have already?
      if (subject != subject_)
      {
        // Unregister from current subject, if any
        if (0 != subject_)
        {
          subject_-&gt;Unregister(*this);
        }
        // Remember new subject
        subject_ = subject;
      }
    }

    // Kicked out by the subject
    void ReleaseFromSubject(Subject* subject)
    {
      if (subject == subject_)
      {
        subject_ = 0;
      }
    }
  protected:
    // Constructor only available to derived classes
    Observer()
    : subject_(0)
    {}

    // Derived classes decide what to do with notifications
    // (Still available to Subject class, as it's a friend)
    virtual void Notify(int) = 0;

  public:
    // Base classes need virtual destructor
    virtual ~Observer()
    {
      SetSubject(0);
    }
  };
  // End of internal class definition

  // Destructor
  ~Subject()
  {
    // Lock, as we're using the set of observers
    CSLock lock(criticalsection_);
    // Release all observers
    for (std::set&lt;Observer*&gt;::iterator i = observers_.begin(); 
         i != observers_.end(); ++i)
    {
      (*i)-&gt;ReleaseFromSubject(this);
    }
  }

  // Add observer
  void Register(Observer&amp; observer)
  {
    // Lock, as we're manipulating the set of observers
    CSLock lock(criticalsection_);
    // Add to the set of observers
    observers_.insert(&amp;observer);
    // Let it know we've accepted the registration
    observer.SetSubject(this);
  }
  
  // Remove observer
  void Unregister(Observer&amp; observer)
  {
    // Lock, as we're manipulating the set of observers
    CSLock lock(criticalsection_);
    // Remove from the set of observers
    observers_.erase(&amp;observer);
    // Let it know we've accepted the unregistration
    observer.ReleaseFromSubject(this);
  }

  // Notify observers about new data
  void Notify(int val) const
  {
    // Lock, as we're using the set of observers
    CSLock lock(criticalsection_);
    // Notify all
    for (std::set&lt;Observer*&gt;::const_iterator i = observers_.begin(); 
         i != observers_.end(); ++i)
    {
      (*i)-&gt;Notify(val);
    }
  }

  // Check how many observers we have
  size_t ObserverCount() const
  {
    return observers_.size();
  }

private:
  // The registered observers
  std::set&lt;Observer*&gt; observers_;
  // A critical section to guard the observers
  mutable CriticalSection criticalsection_;
};
</pre>
<p>The first thing to note here is that the <code>Subject</code> and <code>Observer</code> are tightly coupled, which somewhat paradoxically is to help de-couple the derived observers from the <code>Subject</code>. The logic and responsibility of maintaining the relationship is kept private, thanks to the friendship between the <code>Subject</code> and <code>Observer</code>, so that derived classes can&#8217;t affect it. This tight coupling is also the reason for making the <code>Observer</code> an internal class, to emphasise this is not any old observer, but one for this particular <code>Subject</code>.</p>
<p>Another thing worth noting is that just as the <code>Subject::Observer</code> class leaves the actual handling of a <code>Notify</code> call to a derived class, the <code>Subject</code> class here isn&#8217;t concerned with the generation of values to notify observers with. That&#8217;s for someone else, this <code>Subject</code> only handles its observers and getting notifications out to them. (Indeed, it would be a relatively trivial task to make the notification type (<code>int</code> in this example) a template type, and make this a generic and re-usable Observer pattern implementation. To do so is left as an exercise to the reader. Just mind whether you notify by value, reference, or pointer.)</p>
<p>A final point worth making is that the <code>CriticalSection</code> is declared to be <code>mutable</code>. The reason for this is that it&#8217;s only altered during a function call, by the <code>CSLock</code>, but at the end of the function call it will have been restored to its previous state. By indicating it&#8217;s mutable, we can make the <code>Notify</code> function <code>const</code>.</p>
<p>So, let&#8217;s put it all together, with a custom observer that saves the latest value, a function to produce values, a thread, and a complete program.</p>
<pre class="brush: cpp;">
#include &lt;iostream&gt;

class PastObserver : public Subject::Observer
{
  // Last observed value
  int value_;
  // A critical section to guard the value
  mutable CriticalSection criticalsection_;

public:
  // Constructor, 
  PastObserver()
  : value_(0)
  {}

  // Access last value
  int GetLastValue() const
  {
    // Lock to prevent the value being modified
    CSLock lock(criticalsection_);
    return value_;
  }

  // Function called by observed Subject
  virtual void Notify(int value)
  {
    {
      // Lock to prevent the value being read while we're assigning
      CSLock lock(criticalsection_);
      // Store the value
      value_ = value;
    }
    // Print it out
    std::cout &lt;&lt; "PastObserver notified: " &lt;&lt; value_ &lt;&lt; std::endl;
  }
};

// A thread function to generate values
// Takes a Subject* as thread parameter
DWORD WINAPI ValueFunction(void* pParam)
{
  Subject* subject = (Subject*)pParam;

  std::cout &lt;&lt; "Thread started with " &lt;&lt; 
    subject-&gt;ObserverCount() &lt;&lt; " observers" &lt;&lt; std::endl;

  // Put a seed value in for the random number generator
  int val = (int)time(0);
  // Run until there are no more observers
  while (0 &lt; subject-&gt;ObserverCount())
  {
    // Get a random value to report
    srand(val);
    val = rand();
    // Report it
    subject-&gt;Notify(val);
    // Take a little break
    Sleep(100 * (val &amp; 0x7));
  }
  std::cout &lt;&lt; "Thread ended" &lt;&lt; std::endl;
  return 0;
}

void main()
{
  // Create subject and observer
  Subject subject;
  PastObserver observer;
  subject.Register(observer);
  
  // Start the thread
  CreateThread(NULL, NULL, &amp;ValueFunction, &amp;subject, NULL, NULL);

  // Let it work for a bit
  Sleep(1000);

  // Close down and report
  subject.Unregister(observer);
  std::cout &lt;&lt; "Last value: " &lt;&lt; observer.GetLastValue() &lt;&lt; std::endl;

  // Wait for thread to terminate
  ::WaitForSingleObject(thread, INFINITE);
} 
</pre>
<p>And that&#8217;s it. A reasonably small and clear illustration of how the Observer pattern works. In this example, we synchronise with the past, by reading the last value. Now, let&#8217;s synchronise with the future!</p>
<h2>Reading the future</h2>
<p>So how do you read the future? Well, obviously, our observer has to wait for it to happen, so we&#8217;ll need another synchronisation object: the <a href="http://msdn.microsoft.com/en-us/library/windows/desktop/ms682655%28v=vs.85%29.aspx">Event</a>. This is a boolean object which can be set or reset, and waited for with <code>WaitForSingleObject</code>. As it turns out, we&#8217;ll need two of those &#8211; one to indicate we&#8217;re waiting for data, and one to indicate we&#8217;ve received it.</p>
<pre class="brush: cpp;">
class FutureObserver : public Subject::Observer
{
  // Observed value
  int value_;

  // Events 
  HANDLE waiting_;
  HANDLE newValue_;

public:
  // Constructor
  FutureObserver()
  : value_(0),
    waiting_(0),
    newValue_(0)
  {
    // No security attributes, automatic reset, initially reset, no name
    waiting_ = ::CreateEvent(0, 0, 0, 0);
    newValue_ = ::CreateEvent(0, 0, 0, 0);
  }

  // Destructor
  ~FutureObserver()
  {
    CloseHandle(waiting_);
    CloseHandle(newValue_);
  }

  // Wait for next value
  int GetNextValue() const
  {
    // Indicate we're waiting
    SetEvent(waiting_);

    // Wait for a new value
    if (WAIT_OBJECT_0 == ::WaitForSingleObject(newValue_, INFINITE))
    {
      // Success
      return value_;
    }
    else
    {
      // Failures, could be WAIT_FAILED or WAIT_TIMEOUT
      throw std::exception("Failed waiting for next value");
    }
  }

  // Function called by observed Subject
  virtual void Notify(int value)
  {
    // Print it out
    std::cout &lt;&lt; "FutureObserver notified: " &lt;&lt; value &lt;&lt; std::endl;
    // Check to see if we're waiting
    if (WAIT_OBJECT_0 == ::WaitForSingleObject(waiting_, 0))
    {
      // We were waiting, so keep the value...
      value_ = value;
      // ... and flag we have it
      ::SetEvent(newValue_);
    }
  }
};
</pre>
<p>This observer is a bit more complex, as it has to manage the two <code>Event</code> objects, but the principle is simple enough. In the <code>GetNextValue()</code> function, we set one event, and wait for the other. The next time <code>Notify()</code> is called, it will see the <code>waiting_</code> flag is set, so it will store the value and signal that <code>newValue_</code> is ready. The events are created to reset automatically, as soon as a <code>WaitForSingleObject</code> call is successful (eg when the event it is waiting for has been set).</p>
<p>The <code>GetNextValue()</code> function waits infinitely here &#8211; it will not continue until it&#8217;s found a value &#8211; and so the exception should never happen, unless the <code>FutureObserver</code> has been deleted in another thread. If you&#8217;d prefer a timeout, just overload the <code>GetNextValue()</code> function:</p>
<pre class="brush: cpp;">
  // Access next value, with timeout and success indicator
  int GetNextValue(DWORD millisecondTimeout, bool&amp; timedOut) const
  {
    // Indicate we're waiting
    SetEvent(waiting_);

    // Wait for a new value
    switch (::WaitForSingleObject(newValue_, millisecondTimeout))
    {
    case WAIT_OBJECT_0:
      // Success
      timedOut = false;
      return value_;
    case WAIT_TIMEOUT:
      timedOut = true;
      return 0;
    default:
      // WAIT_FAILED
      throw std::exception("Failed waiting for next value");
    }
  }
</pre>
<p>The <code>Notify()</code> function, in contrast, doesn&#8217;t wait at all. When <code>WaitForSingleObject</code> is called with a timeout of zero milliseconds, it returns immediately, so we have to check the return value to see if we were successful. This means we&#8217;re not holding up the <code>Subject::Nofify()</code> more than necessary.
<p>Finally, let&#8217;s put it all together:</p>
<pre class="brush: cpp;">
void main()
{
  // Create subject and observers
  Subject subject;
  PastObserver past;
  subject.Register(past);
  FutureObserver future;
  subject.Register(future);
  
  // Start the thread
  HANDLE thread = CreateThread(NULL, NULL, &amp;ValueFunction, &amp;subject, NULL, NULL);

  // Let it work for a bit
  Sleep(1000);

  // Get the last and next value, twice
  std::cout &lt;&lt; "Last value: " &lt;&lt; past.GetLastValue() &lt;&lt; std::endl;
  std::cout &lt;&lt; "Next value: " &lt;&lt; future.GetNextValue() &lt;&lt; std::endl;
  std::cout &lt;&lt; "Last value: " &lt;&lt; past.GetLastValue() &lt;&lt; std::endl;
  std::cout &lt;&lt; "Next value: " &lt;&lt; future.GetNextValue() &lt;&lt; std::endl;

  // Close down
  subject.Unregister(past);
  subject.Unregister(future);

  // Wait for thread time to terminate
  ::WaitForSingleObject(thread, INFINITE);
} 
</pre>
<p>As always, if you found this interesting or useful, or have suggestions for improvements, please let me know.</p>
<br />Filed under: <a href='http://coolcowstudio.wordpress.com/category/code/'>Code</a>, <a href='http://coolcowstudio.wordpress.com/category/code/codeproject/'>CodeProject</a> Tagged: <a href='http://coolcowstudio.wordpress.com/tag/c/'>C++</a>, <a href='http://coolcowstudio.wordpress.com/tag/multithreading/'>Multithreading</a>, <a href='http://coolcowstudio.wordpress.com/tag/win32/'>Win32</a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=coolcowstudio.wordpress.com&#038;blog=13675819&#038;post=142&#038;subd=coolcowstudio&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://coolcowstudio.wordpress.com/2012/11/09/a-synchronous-observer-of-asynchronous-events/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://2.gravatar.com/avatar/e4280f2669475d196742f3f90db0d4b1?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">CoolCowStudio</media:title>
		</media:content>
	</item>
		<item>
		<title>GetLastError as std::string</title>
		<link>http://coolcowstudio.wordpress.com/2012/10/19/getlasterror-as-stdstring/</link>
		<comments>http://coolcowstudio.wordpress.com/2012/10/19/getlasterror-as-stdstring/#comments</comments>
		<pubDate>Fri, 19 Oct 2012 13:00:32 +0000</pubDate>
		<dc:creator>Orjan</dc:creator>
				<category><![CDATA[Code]]></category>
		<category><![CDATA[C++]]></category>
		<category><![CDATA[Win32]]></category>

		<guid isPermaLink="false">http://coolcowstudio.wordpress.com/?p=138</guid>
		<description><![CDATA[If you haven&#8217;t a function for this already, feel free to re-use this. Putting it here so I don&#8217;t have to look around for it next time I need it. // Needs Windows constant and type definitions #include &#60;windows.h&#62; // Create a string with last error message std::string GetLastErrorStdStr() { DWORD error = GetLastError(); if [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=coolcowstudio.wordpress.com&#038;blog=13675819&#038;post=138&#038;subd=coolcowstudio&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>If you haven&#8217;t a function for this already, feel free to re-use this. Putting it here so I don&#8217;t have to look around for it next time I need it.</p>
<pre class="brush: cpp;">
// Needs Windows constant and type definitions
#include &lt;windows.h&gt;

// Create a string with last error message
std::string GetLastErrorStdStr()
{
  DWORD error = GetLastError();
  if (error)
  {
    LPVOID lpMsgBuf;
    DWORD bufLen = FormatMessage(
        FORMAT_MESSAGE_ALLOCATE_BUFFER | 
        FORMAT_MESSAGE_FROM_SYSTEM |
        FORMAT_MESSAGE_IGNORE_INSERTS,
        NULL,
        error,
        MAKELANGID(LANG_NEUTRAL, SUBLANG_DEFAULT),
        (LPTSTR) &amp;lpMsgBuf,
        0, NULL );
    if (bufLen)
    {
      LPCSTR lpMsgStr = (LPCSTR)lpMsgBuf;
      std::string result(lpMsgStr, lpMsgStr+bufLen);
      
      LocalFree(lpMsgBuf);

      return result;
    }
  }
  return std::string();
}
</pre>
<p>This function retrieves the last error code, if any, and gets the text message associated with it, which is then converted to a standard string and returned. The main benefits of using this function is that it saves you from having to remember the syntax of <a href="http://msdn.microsoft.com/en-us/library/windows/desktop/ms679351%28v=vs.85%29.aspx">FormatMessage</a>, and that the memory used is tidied up.</p>
<p>Note that the <code>FORMAT_MESSAGE_FROM_SYSTEM</code> flag means only system error messages will be given. If you want to include error messages from your own modules, you&#8217;ll need to add the <code>FORMAT_MESSAGE_FROM_HMODULE</code> flag, and provide the handle to the module. See the FormatMessage documentation for details.</p>
<br />Filed under: <a href='http://coolcowstudio.wordpress.com/category/code/'>Code</a> Tagged: <a href='http://coolcowstudio.wordpress.com/tag/c/'>C++</a>, <a href='http://coolcowstudio.wordpress.com/tag/win32/'>Win32</a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=coolcowstudio.wordpress.com&#038;blog=13675819&#038;post=138&#038;subd=coolcowstudio&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://coolcowstudio.wordpress.com/2012/10/19/getlasterror-as-stdstring/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
	
		<media:content url="http://2.gravatar.com/avatar/e4280f2669475d196742f3f90db0d4b1?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">CoolCowStudio</media:title>
		</media:content>
	</item>
		<item>
		<title>Splitting strings again &#8211; strtok redeemed</title>
		<link>http://coolcowstudio.wordpress.com/2012/10/17/splitting-strings-again-strtok-redeemed/</link>
		<comments>http://coolcowstudio.wordpress.com/2012/10/17/splitting-strings-again-strtok-redeemed/#comments</comments>
		<pubDate>Wed, 17 Oct 2012 16:46:27 +0000</pubDate>
		<dc:creator>Orjan</dc:creator>
				<category><![CDATA[Code]]></category>
		<category><![CDATA[CodeProject]]></category>
		<category><![CDATA[C++]]></category>
		<category><![CDATA[string]]></category>

		<guid isPermaLink="false">http://coolcowstudio.wordpress.com/?p=128</guid>
		<description><![CDATA[The C++ source files for the string tokenisers discussed in this post and the Splitting strings post, plus the code for Removing whitespace and Static assert in C++, can be found here:http://coolcowstudio.co.uk/source/cpp/utilities.zip. One of the more curious omissions from the C++ standard library is a string splitter, e.g. a function that can take a string [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=coolcowstudio.wordpress.com&#038;blog=13675819&#038;post=128&#038;subd=coolcowstudio&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p><small>The C++ source files for the string tokenisers discussed in this post and the <a href="http://coolcowstudio.wordpress.com/2010/08/10/splitting-strings/">Splitting strings</a> post, plus the code for <a href="http://coolcowstudio.wordpress.com/2010/08/12/removing-whitespace/">Removing whitespace</a> and <a href="http://coolcowstudio.wordpress.com/2010/07/14/static-assert-in-c/">Static assert in C++</a>, can be found here:<br /><a href="http://coolcowstudio.co.uk/source/cpp/utilities.zip">http://coolcowstudio.co.uk/source/cpp/utilities.zip</a>.</small></p>
<p>One of the more curious omissions from the C++ standard library is a string splitter, e.g. a function that can take a string and split it up into its constituent parts, or tokens, based on some delimiter. There is one in other popular languages ((C# &#8211; String.Split, Java &#8211; String.split, Python &#8211; string.split etc), but C++ programmers are left to roll their own, or use one from a third-party library like the <a href="http://www.boost.org/doc/libs/release/libs/tokenizer/index.html"><code>boost::tokenizer</code></a> (or the one I presented in <a href="http://coolcowstudio.wordpress.com/2010/08/10/splitting-strings/">Splitting strings</a>).</p>
<p>There are many ways of going this; the Stack Overflow question <a href="http://stackoverflow.com/questions/53849/how-do-i-tokenize-a-string-in-c">How do I tokenize a string in C++?</a> has 23 answers at the time of writing, and those contain 20 different solutions (<code>boost::tokenizer</code> and <a href="http://pubs.opengroup.org/onlinepubs/009695399/functions/strtok_r.html"><code>strtok</code></a> are suggested multiple times).</p>
<p>The <code>strtok</code> recommendations, however, all have comments pointing out the problems with this function &#8211; it&#8217;s destructive, and not reentrant (it can&#8217;t be nested or run in parallell on multiple strings). As functions go, <code>strtok</code> has a rather poor reputation &#8211; there&#8217;s even a popular reentrant version, <code>strtok_r</code>, available in many C library implementations, though it&#8217;s not a standard function.</p>
<p><span id="more-128"></span>
<p>So it&#8217;s a good thing that there are so many other ways of splitting strings, isn&#8217;t it? Well, yes, but the clever thing with <code>strtok</code>, which you don&#8217;t get from any of the string splitters, not even the flexibility-obsessed Boost, is that you can change the token separator as you go along. The splitters often give the option to provide a selection of possible delimiters like this:</p>
<pre class="brush: cpp;">
  char punct = " ,.?!;:-";
  std::string text = "Silly, right? This is - obviously - an example: Boo!";
  
  std::vector&lt;std::string&gt; tokens;
  std::vector&lt;char&gt; separators(punct, punct + strlen(punct));
  
  tokenise_string(text, separators, tokens);
  // tokens now contains the eight words of the string
</pre>
</p>
<p>But what if there are characters that are both legal inside some tokens, and separators for others? What if you have to parse strings with the format <i>[Last name],[First name] &#8211; [Profession] &#8211; [Age]</i> like this table:</p>
<pre>
Bradshawe, Adam - Colonel, retired - 73
Burton-West,Jenny - Surgeon - 37
Smith,Ben - Taxi driver - 56
</pre>
<p>While a string splitter would stumble over this &#8211; since both space, hyphen and comma can be both delimiters and valid content &#8211; old <code>strtok</code> doesn&#8217;t even raise an eyebrow:</p>
<pre class="brush: cpp;">
  const char* table = "Jones, Adam - Colonel, retired - 73\n"
    "Burton-West,Jenny - Surgeon - 37\n"
    "Smith,Ben - Taxi driver - 56";
  char* changeable = (char*) malloc(strlen(table)); // cast not needed in C
  strcpy(changeable, table);

  char *last, *first, *profession, *age;
  // Start it off with the first search
  last = strtok(changeable, " ,");
  while (last)
  {
    first = strtok(NULL, " -");
    profession = strtok(NULL, "-");
    // Only snag - this probably has a trailing space we need to trim
    int proflen = strlen(profession);
    if (profession[proflen - 1] == ' ')
      profession[proflen - 1] = '';
      
    age = strtok(NULL, " -\n");
    // Have whole row, take care of data
    // ...
    // Start next row, if any
    last = strtok(NULL, " ,");
  }
  free(changeable);
</pre>
</p>
<p>In other words, <code>strtok</code> offers unique functionality not found in the string splitters. So let&#8217;s design a C++ version that is efficient, non-destructive, and reentrant. Thanks to the object-orientation support of C++, we can let each tokeniser have a const reference to the string we&#8217;re tokenising. This gives us both reentrancy and non-destructiveness. And while we&#8217;re at it, let&#8217;s have a flag to decide whether we should include empty tokens or not &#8211; something quite useful that&#8217;s missing from <code>strtok</code>.</p>
<pre class="brush: cpp;">
  class string_tokeniser
  {
    // The string we're searching in
    const std::string&amp; source_;
    // Flag indicating whether to include empty strings
    bool empty_;
    // Current location in string
    std::string::size_type current_;
    // Length of current string
    std::string::difference_type length_;
    // Location to start next search
    std::string::size_type next_;
    
  public:
    // Constructor, setting the string to work on
    string_tokeniser(const std::string&amp; source, bool empty = false);
    ...
</pre>
<p>We&#8217;ll also need variables to keep track of where we are, where to start looking for next token, and so on, which I&#8217;ve also included above.
</p>
<p>Now, what do we want to do with this? Well, we actually want to do two distinct tasks &#8211; advance to the next token, and extract a token. In <code>strtok</code>, those are done at the same step, but since we&#8217;re not constrained by the limitations of C, it&#8217;s better to keep it tidy. Like I did in my <a href="http://coolcowstudio.wordpress.com/2010/08/10/splitting-strings/">string splitter</a>, I&#8217;ll overload the advancing function to allow the user to give a single character, a selection of characters, or a whole string as a delimiter. </p>
<pre class="brush: cpp;">
    ...
    // Advance to next token, by given character separator
    bool next(const std::string::value_type&amp; separator);

    // Advance to next token, by any of given character separators
    bool next(const std::vector&lt;char&gt;&amp; separators);

    // Advance to next token, by given string separator
    bool next(const std::string&amp; separator);
    ...
</pre>
<p>These all return true if a new token was found in the string. We&#8217;ll also need a way of accessing the tokens found, and some housekeeping:</p>
<pre class="brush: cpp;">
    ...
    // Get current token, if any, safely
    bool get_token(std::string&amp; token) const;

    // Get current token, if any, otherwise an empty token
    std::string get_token() const;

    // Reset search
    void reset();

    // Check token availability
    bool has_token() const;

    // Check if search is at end
    bool at_end() const;

    // Get source string
    const std::string&amp; get_source() const;
  };
</pre>
</p>
<p>That&#8217;s the interface, let&#8217;s do some implementation. The way this will work is that we keep hold of a current location and token length, which are used to retrieve the token using <code>std::string::substr</code>, while also keeping the next location in which to start the search for a token. This will have to be <code>next_ = current_ + length_ + length of delimiter</code>, so the next search does not pick up the last delimiter.</p>
<p>When the <code>string_tokeniser</code> is first created, we have no search results, so need to initialise appropriately. The same values are set on a reset, and used to get the current token and check status:</p>
<pre class="brush: cpp;">
  // Constructor
  string_tokeniser::string_tokeniser(const std::string&amp; source, bool empty /*= false*/)
    : source_(source)
    , current_(std::string::npos)
    , length_(0)
    , next_(source.empty() ? std::string::npos : 0)
    , empty_(empty)
  {}

  // Reset search so it can be restarted
  void string_tokeniser::reset()
  {
    current_ = std::string::npos;
    next_ = source_.empty() ? std::string::npos : 0;
    length_ = 0;
  }

  // Return true if there is a current token
  bool string_tokeniser::has_token() const
  {
    // Not worried about length here, as it might be an empty token
    return (std::string::npos != current_);
  }

  // Return true if no further searches can be done
  bool string_tokeniser::at_end() const
  { 
    return (std::string::npos == next_);
  }

  // Get source string
  const std::string&amp; string_tokeniser::get_source() const
  {
    return source_;
  }

  // Get current token, if any, safely
  bool string_tokeniser::get_token(std::string&amp; token) const
  {
    if (!has_token())
      return false;
    if (0 == length_)
      token.clear();
    else
      token = source_.substr(current_, length_);
    return true;
  }

  // Get current token, if any, otherwise an empty token
  std::string string_tokeniser::get_token() const
  {
    if (!has_token() || (0 == length_))
      return std::string();
    return source_.substr(current_, length_);
  }
</pre>
</p>
<p>Right, that just leaves the implementation of the key function: <code>next()</code>. Since there are three overloads, with almost identical implementation, the sensible thing is to break out most of the common stuff into a helper function. Unfortunately, we can&#8217;t easily do that, since if we do not care about empty tokens, we have to recurse and try to find the next, in the case of repeated delimiters, which means it will have to be aware of which overload to chose.  Instead, we&#8217;ll break out the common handling into a template function:</p>
<pre class="brush: cpp;">
In class declaration:
    ...
    // Helper - handle the result of a search, advancing to prepare for next
    template &lt;typename T&gt;
    bool handle_next(size_t advance, const T&amp; separator);
  public:
    // Constructor, setting the string to work on
    ...

Implementation:
  // Advance to next token
  bool string_tokeniser::next(const std::string::value_type&amp; separator)
  {
    // Store the start
    current_ = next_;
    if (at_end())
      return false;
    // Find next
    next_ = source_.find(separator, current_);
    // Deal with result of search
    return handle_next(1, separator);
  }

  // Advance to next token
  bool string_tokeniser::next(const std::string&amp; separator)
  {
    // Store the start
    current_ = next_;
    if (at_end())
      return false;
    // Find next
    next_ = source_.find(separator, current_);
    // Deal with result of search
    return handle_next(separator.size(), separator);
  }

  // Advance to next token
  bool string_tokeniser::next(const std::vector&lt;char&gt;&amp; separators)
  {
    // Store the start
    current_ = next_;
    if (at_end())
      return false;
    // Find next
    next_ = source_.find_first_of(&amp;separators[0], current_, separators.size());
    // Deal with result of search
    return handle_next(1, separators);
  }

  // Handle the result of a search, advancing to prepare for next search
  template &lt;typename T&gt;
  bool string_tokeniser::handle_next(size_t advance, const T&amp; separator)
  {
    if (std::string::npos == next_)
    {
      // Separator not found, but there might still be data, at the end 
      length_ = source_.size() - current_;
    }
    else
    {
      // Store the length of the current token
      length_ = next_ - current_;
      // and move next starting point to beyond the one we found
      next_ += advance;
      // In the case of double separators (e.g. | in "a|b||d"), this gives an 
      // empty token. If empties aren't accepted, we'll recurse
      if ((0 == length_) &amp;&amp; !empty_)
      {
        return next(separator);
      }
    }
    // Do we have a token?
    if (0 &lt; length_)
      return true;
    // Even if empties are accepted, dismiss an empty token at the end of a 
    // string (e.g. "a|b|" gives "a" and "b" only)
    if (!empty_ || (std::string::npos == next_))
    {
      // Invalidate current, so extraction isn't valid
      current_ = next_;
      return false;
    }
    return true;    
  }
</pre>
</p>
<p>There, all done. And because we can use both single characters, a selection of characters, and strings as delimiters, the equivalent of the <code>strtok</code> example avoids the need to trim spaces:</p>
<pre class="brush: cpp;">
  std::string table = "Jones, Adam - Colonel, retired - 73\n"
    "Burton-West,Jenny - Surgeon - 37\n"
    "Smith,Ben - Taxi driver - 56";

  std::string last, first, profession, age;
  // Prepare delimiters to use
  std::vector&lt;char&gt; comma_space;
  comma_space.push_back(' ');
  comma_space.push_back(',');
  std::string sp_dash_sp(" - ");
  char endl('\n');

  string_tokeniser tok(table);
  while (!tok.at_end())
  {
    tok.next(comma_space);
    tok.get_token(last);
    tok.next(sp_dash_sp);
    tok.get_token(first);
    tok.next(sp_dash_sp);
    tok.get_token(profession);
    tok.next(endl);
    tok.get_token(age);
  }
</pre>
<p>Because it is non-destructive, it is by necessity less efficient than <code>strtok</code>, since the tokens have to be copied. On the other hand, if you have to copy the <code>const char*</code> to a <code>char*</code> buffer to use <code>strtok</code>, maybe the efficiency loss isn&#8217;t that bad.</p>
<p>As always, if you found this interesting or useful, or have suggestions for improvements, please let me know.</p>
<p><b>Update:</b> <a>Jens Ayton</a> has informed me that C11 introduced <code>strtok_s</code>, which is even safer, but not, I believe, incorporated into C++11 .</p>
<br />Filed under: <a href='http://coolcowstudio.wordpress.com/category/code/'>Code</a>, <a href='http://coolcowstudio.wordpress.com/category/code/codeproject/'>CodeProject</a> Tagged: <a href='http://coolcowstudio.wordpress.com/tag/c/'>C++</a>, <a href='http://coolcowstudio.wordpress.com/tag/string/'>string</a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=coolcowstudio.wordpress.com&#038;blog=13675819&#038;post=128&#038;subd=coolcowstudio&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://coolcowstudio.wordpress.com/2012/10/17/splitting-strings-again-strtok-redeemed/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://2.gravatar.com/avatar/e4280f2669475d196742f3f90db0d4b1?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">CoolCowStudio</media:title>
		</media:content>
	</item>
		<item>
		<title>All your base64 are different to us</title>
		<link>http://coolcowstudio.wordpress.com/2011/08/05/all-your-base64-are-different-to-us/</link>
		<comments>http://coolcowstudio.wordpress.com/2011/08/05/all-your-base64-are-different-to-us/#comments</comments>
		<pubDate>Fri, 05 Aug 2011 12:03:46 +0000</pubDate>
		<dc:creator>Orjan</dc:creator>
				<category><![CDATA[Code]]></category>
		<category><![CDATA[CodeProject]]></category>
		<category><![CDATA[C++]]></category>
		<category><![CDATA[coding]]></category>
		<category><![CDATA[template]]></category>

		<guid isPermaLink="false">http://coolcowstudio.wordpress.com/?p=111</guid>
		<description><![CDATA[The C++ source files for the stand-alone base64 encoder and decoder discussed in this post, plus a separate implementation of quoted-printable (RFC 2045, section 6.7), and the hex string converter I presented last year, can be found here:http://coolcowstudio.co.uk/source/cpp/coding.zip. There is a quote that goes &#8220;Standards are great! Everyone should have one.&#8221; or something along those [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=coolcowstudio.wordpress.com&#038;blog=13675819&#038;post=111&#038;subd=coolcowstudio&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p><small>The C++ source files for the stand-alone base64 encoder and decoder discussed in this post, plus a separate implementation of quoted-printable (<a href="http://tools.ietf.org/html/rfc2045#section-6.7">RFC 2045, section 6.7</a>), and the hex string converter I <a href="http://coolcowstudio.wordpress.com/2010/08/05/redux-hex-strings-to-raw-data-and-back/">presented last year</a>, can be found here:<br /><a href="http://coolcowstudio.co.uk/source/cpp/coding.zip">http://coolcowstudio.co.uk/source/cpp/coding.zip</a>.</small></p>
<p>There is a quote that goes &#8220;Standards are great! Everyone should have one.&#8221; or something along those lines. (Somewhat ironically, this quote, too, has many different variations, and has many attributions. The earliest I&#8217;ve found attributes it to George Morrow in <a href="http://books.google.co.uk/books?id=jy8EAAAAMBAJ&amp;lpg=PA58&amp;pg=PA58#v=onepage&amp;f=false">InfoWorld 21 Oct 1985</a>).</p>
<p>A case in point is the <a href="http://en.wikipedia.org/wiki/Base64">base64</a> encoding. Put simply, it&#8217;s a method of encoding an array of 8-bit bytes using an alphabet consisting of 64 different printable characters from the <a href="http://en.wikipedia.org/wiki/ASCII">ASCII</a> character set. This is done by taking three 8-bit bytes of source data, arranging them into a 24-bit word, and converting that into four 6-bit characters that maps onto the 64-character alphabet (since 6 bits is 0-63).</p>
<p>The original implementation was for <a href="http://en.wikipedia.org/wiki/Privacy-enhanced_Electronic_Mail">privacy-enhanced e-mail</a> (<a href="http://www.ietf.org/html/rfc1421">RFC 1421</a>), then altered slightly for <a href="http://en.wikipedia.org/wiki/MIME">MIME</a> (<a href="http://tools.ietf.org/html/rfc2045">RFC 2045</a>), and again in its own standard (<a href="http://tools.ietf.org/html/rfc4648">RFC 4648</a>).</p>
<p>When I was looking at base64, I was interested in three different varieties or flavours, namely the MIME version, the (per <a href="http://tools.ietf.org/html/rfc4648">RFC 4648</a>) standard base64, and base64url. These differ in how they handle line breaks and other illegal characters, what characters are used in the 64-character alphabet, and the use of padding at the end to make up an even triplet of bytes.</p>
<p><span id="more-111"></span>That&#8217;s three mostly similar but slightly different algorithms, so how to design an implementation? A number of designs are possible, of course, like:
<ol>
<li>three copy-pasted and tweaked sets of functions</li>
<li>hugely parameterised functions where every variation in algorithm or data can be altered in the call</li>
<li>an inheritance tree, where virtual overridden functions alter behaviour</li>
</ol>
<p>I chose to use a design that in my opinion takes the best from all of those, with none of the disadvantages.</p>
<p>The <a href="http://en.wikipedia.org/wiki/Base64#Variants_summary_table">Wikipedia article</a> has a handy table giving a summary of the differences between the variants, which gives eight possible differences of data or algorithm. I&#8217;ve elected to ignore the last of those, which is the addition of a checksum only used for <a href="http://en.wikipedia.org/wiki/OpenPGP">OpenPGP</a> (<a href="http://tools.ietf.org/html/rfc4880">RFC 4880</a>), since that is easily added outside the base64 coding. The sixth difference on Wikipedia&#8217;s list &#8211; line separator &#8211; can also be ignored since it can be inferred from the maximum line length. The remaining areas of difference are:
<ul>
<li>Character for index 62</li>
<li>Character for index 63</li>
<li>Character to use for padding</li>
<li>Whether fixed line length is used</li>
<li>Maximum line length</li>
<li>Handling of illegal characters</li>
</ul>
<p>Fortunately, these are all integer types (<code>bool</code> and <code>char</code> are both integer types), so can be used as template parameters.  In other words, I can define a type to hold all possible variants:</p>
<pre class="brush: cpp;">
template&lt;
  char Tchar62         // Character for index 62 in alphabet
  char Tchar63         // Character for index 63 in alphabet
  char TcharPad        // Character to use for padding, or 0 if 
                       // padding is not used
  bool TfixLineLength  // false if line length is fixed. If true, 
                       // maxLineLength is used
  int  TmaxLineLength  // Maximum (or fixed) line length. 0 if not 
                       // used. If used, CR+LF is used as linebreak
  bool TignoreIllegal&gt; // if false, discards characters not in 
                       // alphabet; if true throws error on finding 
                       // illegal character
struct base64_variants
{
  // This type only has type information accessors
  // Get character for index 62 in alphabet
  static char char62()  
    { return Tchar62; } 
    
  // Get character for index 63 in alphabet
  static char char63()  
    { return Tchar63; }          
    
  // Get padding character, if any
  static char charPad() 
    { return TcharPad;  }
    
  // Check if padding is used
  static bool pad()     
    { return TcharPad &gt; 0;  }
    
  // Check if line length is fixed
  static bool fixLineLength() 
    { return TfixLineLength;  }
    
  // Get maximum line length
  static int  maxLineLength() 
    { return TmaxLineLength;  }
    
  // Check if lines should be broken
  static bool lineBreaks()    
    { return TmaxLineLength&gt;0;  }
    
  // Check if illegal characters may be ignored
  static bool ignoreIllegal() 
    { return TignoreIllegal;  }
};
</pre>
<p>What is the benefit of this?  Well, it lets us express distinct sets of parameters to correspond to particular standards, like this:</p>
<pre>
// MIME variety of base64 (RFC 2045) 
typedef base64_variants&lt;'+', '/', '=', false, 76, true&gt; base64MIME;

// Plain standard (RFC 4648) base64
typedef base64_variants&lt;'+', '/', '=', false, 0, false&gt; base64;

// URL variety of base64 (RFC 4648)
typedef base64_variants&lt;'-', '_', 0, false, 0, false&gt; base64url;
</pre>
<p>This means that there is no risk of mixing up parameters in function calls &#8211; once a set is defined (and tested and verified to be correct) that can be used everywhere. The actual coding functions then get a very simple interface, with an input, an output, and variant type as only parameters:</p>
<pre>
// Encode C-style array of bytes
template&lt;typename T&gt;
void bytes_to_base64(const unsigned char* data, 
        size_t length, 
        std::string&amp; str);

// Encode byte vector
template&lt;typename T&gt;
void bytes_to_base64(const std::vector&lt;unsigned char&gt;&amp; data, 
        std::string&amp; str);

// Decode to byte vector        
template&lt;typename T&gt;
void base64_to_bytes(const std::string&amp; str, 
        std::vector&lt;unsigned char&gt;&amp; data);
</pre>
<p>And this is how the use of these would look in code:</p>
<pre>
// Using test vectors from RFC 4648
const char* src = "fooba";
std::vector&lt;unsigned char&gt; data(src, src+5);
std::string result;

bytes_to_base64&lt;base64&gt;(data, result);

assert(result == "Zm9vYmE=");

base64_to_bytes&lt;base64&gt;(result, data);

result = std::string(data.begin(), data.end());
assert(result == "fooba");
</pre>
<p>Below is a short extract of the implementation, illustrating how the variant specification is used:</p>
<pre>
// Encode
template&lt;typename T&gt;
void bytes_to_base64(const unsigned char* data, size_t length, 
  std::string&amp; str)
{
  str.clear();
  // Calculate maximum expected size (bytes, padding, line breaks) 
  str.reserve((length * 4 ) / 3 + 3 + 
    (T::lineBreaks() ? 2 * length / T::maxLineLength() : 0) );
</pre>
<p>(I won&#8217;t post the whole implementation here. While it&#8217;s only 200 airy and thoroughly commented lines, it&#8217;s better to give you links to all the files, which I do at the beginning of this post.)</p>
<p>Now, the astute reader will note that I only declared the coding functions earlier, and talked about the implementation as if it was separate. That&#8217;s the normal way of doing things, but that won&#8217;t work with templates, right?</p>
<p>Here&#8217;s the thing with template functions and classes: they are not just the one thing. A normal, non-template function is one single thing, fully defined once and once only. Therefore, it can be compiled in one compilation unit, and then linked to. It can be declared elsewhere, and that declaration is essentially a promise that somewhere this thing is defined, which is all the compiler cares about.</p>
<p>A template function is effectively a new function for each template parameter (or combination of parameters) it is used with. (As far as the compiler is concerned, a template class is just a way of saying that all member functions have the same set of template parameters.) To the compiler, there&#8217;s no such thing as a &#8220;template function&#8221;. There is only &#8220;template function with these template parameters&#8221;.</p>
<p>This means that there can&#8217;t be a single compiled variant that can be linked to, only specialisations that use this or that set of template parameters. Even if only one variant, only one specialisation, is used in your project, the compiler can&#8217;t know that. </p>
<p>So instead, the compiler compiles these inline; it&#8217;s effectively replacing each call to a template function with a copy of that function, in which the template parameters are those used in that particular call.</p>
<p>In other words, <i>C++ templates are just a way of bullying the compiler into doing the copy-paste programming you are ashamed of doing yourself.</i></p>
<p>As it happens, though, that also provides the solution to separationg definitions and declarations, provided you know what template parameters you&#8217;ll be using. All you need to do is declare the function with the parameters you want, in the same compilation unit as the template function definition.</p>
<p>Here&#8217;s a simple example:</p>
<pre>
-------------------------------------
-- In my_template.h file
... 
template &lt;typename T&gt;
T my_temp_func(const T&amp; t);

-------------------------------------
-- In my_template.cpp file

#include "my_template.h"

// Definition
template &lt;typename T&gt;
T my_temp_func(const T&amp; t)
{
  return t;
}

// Instantiation declaration
int my_temp_func&lt;int&gt;(const int&amp; t);

-------------------------------------
-- In using_my_template.cpp
#include "my_template.h"
...
int i = my_temp_func&lt;int&gt;(4); // works
string s = my_temp_func&lt;string&gt;("Abob"); // link error
</pre>
<p>In the example above, the declaration in the last line of <code>my_template.cpp</code> tells the compiler there&#8217;ll be a variant of the template function that uses <code>int</code> as template parameter. Okay, says the compiler, I&#8217;ll put an inline copy there. Since the generic definition is right there in the same compilation unit (ie <code>my_template.cpp</code>), this is something the compiler can do &#8211; it has all the information it needs.</p>
<p>The result of that is that in the compiled file (probably called <code>my_template.obj</code>) there is now a function that has the signature <code>int my_temp_func(const int&amp; t)</code>. This is a fully defined specialisation of a template function, so to the linker it looks just like a normal function.</p>
<p>However, the linker won&#8217;t be able to find a <code>string</code> specialisation, so this will generate a linker error.</p>
<p>This illustrates both how to use this trick, and its limitation. It only works if you list all specialisations you are going to use, which makes it unfeasible for generic libraries.</p>
<p>In this case, though, it&#8217;s ideal. I have my three variants of base64 defined &#8211; <code>base64MIME</code>, <code>base64</code> and <code>base64url</code> &#8211; and those are the only one I&#8217;ll need. </p>
<p>Actually, I might as well add a definition for the original variant:</p>
<pre>
// PEM variety of base64 (RFC 1421)
typedef base64_variants&lt;'+', '/', '=', true, 64, false&gt; base64PEM;
</pre>
<p>So I have my four variations defined, and they are the only variations of base64 I&#8217;m interested in, so I&#8217;ll just have to declare them in the same <code>.cpp</code> file as the function definitions are in:</p>
<pre>
template void bytes_to_base64&lt;base64&gt;(const unsigned char* data, 
                    size_t length, std::string&amp; str);
template void bytes_to_base64&lt;base64&gt;(const std::vector&lt;unsigned char&gt;&amp; data, 
                    std::string&amp; str);
template void base64_to_bytes&lt;base64&gt;(const std::string&amp; str, 
                    std::vector&lt;unsigned char&gt;&amp; data);

template void bytes_to_base64&lt;base64MIME&gt;(const unsigned char* data,
                    size_t length, std::string&amp; str);
...                  
</pre>
<p>This lets the linker find compiled varieties for all those base64 definitions.</p>
<p>Should you want to implement a slightly different base64 coder, you could use the code I&#8217;ve written. It wouldn&#8217;t be enough to declare a new definition type, but you would also have to add a declaration using that type to the <code>.cpp</code> file. But the source code is both open and free, so help yourself.</p>
<p>(I should note that like with so much else, base64 is something there are lots and lots of implementations of available on the net, but most of the ones I&#8217;ve found tended to be very lax and lack strict checking of syntactical correctness, or implement just one flavour. Hence, writing my own.)</p>
<p>As always, if you found this interesting or useful, or have suggestions for improvements, please let me know.</p>
<br />Filed under: <a href='http://coolcowstudio.wordpress.com/category/code/'>Code</a>, <a href='http://coolcowstudio.wordpress.com/category/code/codeproject/'>CodeProject</a> Tagged: <a href='http://coolcowstudio.wordpress.com/tag/c/'>C++</a>, <a href='http://coolcowstudio.wordpress.com/tag/coding/'>coding</a>, <a href='http://coolcowstudio.wordpress.com/tag/template/'>template</a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=coolcowstudio.wordpress.com&#038;blog=13675819&#038;post=111&#038;subd=coolcowstudio&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://coolcowstudio.wordpress.com/2011/08/05/all-your-base64-are-different-to-us/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://2.gravatar.com/avatar/e4280f2669475d196742f3f90db0d4b1?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">CoolCowStudio</media:title>
		</media:content>
	</item>
		<item>
		<title>Removing whitespace</title>
		<link>http://coolcowstudio.wordpress.com/2010/08/12/removing-whitespace/</link>
		<comments>http://coolcowstudio.wordpress.com/2010/08/12/removing-whitespace/#comments</comments>
		<pubDate>Thu, 12 Aug 2010 11:17:20 +0000</pubDate>
		<dc:creator>Orjan</dc:creator>
				<category><![CDATA[Code]]></category>
		<category><![CDATA[CodeProject]]></category>
		<category><![CDATA[C++]]></category>
		<category><![CDATA[STL]]></category>
		<category><![CDATA[string]]></category>

		<guid isPermaLink="false">http://coolcowstudio.wordpress.com/?p=107</guid>
		<description><![CDATA[Here&#8217;s a std::string, please remove all whitespace from it. How would you do it? Despite its seeming simplicity, it&#8217;s an interesting question, because it can be done in so many ways. To start with, how do you identify whitespace? Let&#8217;s have a look at some different approaches (all of which I&#8217;ve seen in the wild): [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=coolcowstudio.wordpress.com&#038;blog=13675819&#038;post=107&#038;subd=coolcowstudio&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>Here&#8217;s a <code>std::string</code>, please remove all whitespace from it.  How would you do it? Despite its seeming simplicity, it&#8217;s an interesting question, because it can be done in so many ways.</p>
<p>To start with, how do you identify <a href="http://en.wikipedia.org/wiki/Whitespace_character">whitespace</a>?  Let&#8217;s have a look at some different approaches (all of which I&#8217;ve seen in the wild):</p>
<pre class="brush: cpp;">
// Simple
bool iswhitespace1(char c)
{
  // Is it  space   or    tab      or    return   or    newline?
  return (c == ' ') || (c == '\t') || (c == '\r') || (c == '\n');
}
// Cute attempt at cleverness
bool iswhitespace2(char c)
{
  // Is it one of the whitespace characters?
  static const std::string spaces(" \t\r\n");
  return (std::string::npos != spaces.find(c));
}
// Probably ok, for English at least
bool iswhitespace3(char c)
{
  // Using C function, from &lt;cctype&gt;
  return ::isspace(c);
}
// As above, but standard C++ instead of standard C
bool iswhitespace4(char c)
{
  // Using current locale, and std function from &lt;locale&gt;
  static const std::locale loc;
  return std::isspace(c, loc);
}
</pre>
<p>If we were to run through these four functions with values of c from 0 to 255, the first two would produce the same result, and the latter two would (probably) produce the same result, but those wouldn&#8217;t be the same as for the first two.<br />
<span id="more-107"></span><br />
There are two reasons for this. First of all, the C and C++ <code>isspace</code> functions include a couple of often forgotten whitespace characters &#8211; the vertical tab (<code>'\v'</code>, 0x0b) and the form feed (<code>'\f'</code>, 0x0c). They don&#8217;t tend to see that much use nowadays, but are still defined as whitespace in both the C and C++ standards.  </p>
<p>The second reason the results from <code>isspace</code> may differ from a hard-coded solution is that they are both dependent on what locale</a> is in use. A changed locale will never indicate that any of the standard list of whitespace characters (<code>" \t\r\n\v\f"</code>) is not a whitespace character, but may indicate that some further characters are also whitespace.</p>
<p>Since the functions already exist in the standard, it&#8217;s rather silly of us to write our own, so let&#8217;s just use <code>isspace</code>.  Unless you muck about and change locales (and let&#8217;s not, if we can avoid it), both the C and C++ version behave the same way, so which you use is up to you.</p>
<p>Knowing how to identify whitespace characters, we only need to remove them.  How do we do that?  Well, that depends on whether we want to modify the string, or create a copy.  In either case, let&#8217;s avoid the simplistic, completely hand-made solutions again:</p>
<pre class="brush: cpp;">
// Working on std::string str

// Altering original
std::string::size_type p = 0;
while (p &lt; str.size())
{
  // If character at p is space erase it, otherwise go to next
  if (isspace(str[p]))
    str.erase(p, 1);
  else
    ++p;
}

...

// Making a copy
std::string output;
for (std::string::size_type i = 0; i &lt; str.size(); ++i)
{
  if (!isspace(str[i]))
    output += str[i];
}
</pre>
<p>Both these solutions work, but there are well established and standardised ways of doing these things using algorithms:</p>
<pre class="brush: cpp;">
// Working on std::string str

// Altering original
str.erase(std::remove_if(str.begin(), str.end(), 
  &amp;::isspace), str.end());

...

// Making a copy
std::string output;
std::remove_copy_if(str.begin(), str.end(), 
  std::back_inserter(output), &amp;::isspace);
</pre>
<p>Simple!</p>
<p>No? Ok, let&#8217;s break it up. The functions in the C++ <code>&lt;algorithm&gt;</code> header generally work on three types of parameters: <a href="http://www.cppreference.com/wiki/stl/iterators">iterators</a>, <a href="http://www.sgi.com/tech/stl/Predicate.html">predicates</a> and <a href="http://stackoverflow.com/questions/356950/c-functors-and-their-uses">function objects</a> (aka functors). In the code above, we&#8217;re not using any functors, so we&#8217;ll put them aside for the moment.</p>
<p><code>&amp;::isspace</code> &#8211; <i>predicate</i>. This is simply a pointer to a function that takes one parameter and returns a <code>bool</code>, in this case indicating whether a given character is whitespace or not, as discussed earlier.</p>
<p><code>str.begin(), str.end()</code> &#8211; <i>iterators</i>, in this case indicating where to start and stop running the algorithm. We want to go through the whole string, so we start at the beginning, and end at the, well, end.</p>
<p><code>str.erase(std::remove_if(...), str.end());</code> &#8211; this is the <a href="http://en.wikipedia.org/wiki/Erase-remove_idiom">erase-remove idiom</a>. Because the <code>remove_if</code> function only takes iterators, it can&#8217;t actually remove anything. What it can do is re-shuffle, and put all the elements (or characters in the string, in this case) that match the predicate (is whitespace) at the end of the given range.  It then returns an iterator that gives the first position of these predicate-fulfilling characters. This iterator is then given to the <code>erase</code> member function of the string, as the start of the characters to erase, and <code>str.end()</code> as the end.</p>
<p><code>std::back_inserter</code> &#8211; <i>iterator</i>. This is a handy little helper that gives an output iterator for the given container (i.e. an iterator that can be used to insert elements in a containiner). (Unfortunately, <a href="http://msdn.microsoft.com/en-us/library/12awccbs(VS.80).aspx">Microsoft&#8217;s documentation</a> still says the container given to it must be a <code>std::vector</code>, <code>std::list</code> or <code>std::deque</code>, which is not true. The only thing required is that the container has the member function <code>push_back</code>, which <code>std::string</code> does. Given how popular their development tools are, it&#8217;s surprising this hasn&#8217;t been amended.)</p>
<p><code>std::remove_copy_if</code> &#8211; this is an amazimgly poorly named function, which ought to be called <code>std::copy_if_not</code>. What it does is: go through the range given (i.e. <code>begin</code> to <code>end</code>), call the predicate (i.e. <code>isspace</code>) with each element in the range, and if the predicate returns true, <i>don&#8217;t copy it</i>. It doesn&#8217;t remove anything from the input range (it can&#8217;t, as it only has iterators), and in fact doesn&#8217;t change anything at all on the range it&#8217;s given. I guess that conceptually, it removes an element for which the predicate is true from a list of elements to copy. Except, there is no such list. In short: horrible name, copies elements <i>not</i> fulfilling the predicate.</p>
<p>So, there we are. Two simple and useful functions to remove whitespace:</p>
<pre class="brush: cpp;">
void remove_whitespace(std::string&amp; str)
{
  str.erase(std::remove_if(str.begin(), str.end(), 
    &amp;::isspace), str.end());
}

void remove_whitespace(const std::string&amp; input, std::string&amp; output)
{
  output.clear();
  std::remove_copy_if(input.begin(), input.end(), 
    std::back_inserter(output), &amp;::isspace);
}
</pre>
<p>(Of course, if you really <i>want</i> to use <code>std::isspace</code> with <code>std::locale</code>, things start to get a bit&#8230; well, complicated.  I might return to that at some later point.)  </p>
<br />Filed under: <a href='http://coolcowstudio.wordpress.com/category/code/'>Code</a>, <a href='http://coolcowstudio.wordpress.com/category/code/codeproject/'>CodeProject</a> Tagged: <a href='http://coolcowstudio.wordpress.com/tag/c/'>C++</a>, <a href='http://coolcowstudio.wordpress.com/tag/stl/'>STL</a>, <a href='http://coolcowstudio.wordpress.com/tag/string/'>string</a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=coolcowstudio.wordpress.com&#038;blog=13675819&#038;post=107&#038;subd=coolcowstudio&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://coolcowstudio.wordpress.com/2010/08/12/removing-whitespace/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
	
		<media:content url="http://2.gravatar.com/avatar/e4280f2669475d196742f3f90db0d4b1?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">CoolCowStudio</media:title>
		</media:content>
	</item>
		<item>
		<title>Splitting strings</title>
		<link>http://coolcowstudio.wordpress.com/2010/08/10/splitting-strings/</link>
		<comments>http://coolcowstudio.wordpress.com/2010/08/10/splitting-strings/#comments</comments>
		<pubDate>Tue, 10 Aug 2010 11:22:02 +0000</pubDate>
		<dc:creator>Orjan</dc:creator>
				<category><![CDATA[Code]]></category>
		<category><![CDATA[CodeProject]]></category>
		<category><![CDATA[C++]]></category>
		<category><![CDATA[string]]></category>

		<guid isPermaLink="false">http://coolcowstudio.wordpress.com/?p=96</guid>
		<description><![CDATA[Back in the dawn of time, when men were real men, bytes were real bytes, and floating point numbers were real, um, reals, the journeyman test of every aspiring programmer was to write their own text editor. (This was way before the concept of &#8220;life&#8221; had been invented, so no-one knew they were supposed to [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=coolcowstudio.wordpress.com&#038;blog=13675819&#038;post=96&#038;subd=coolcowstudio&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>Back in the dawn of time, when men were real men, bytes were real bytes, and floating point numbers were real, um, reals, the journeyman test of every aspiring programmer was to write their own text editor.  (This was way before the concept of &#8220;life&#8221; had been invented, so no-one knew they were supposed to have one.)</p>
<p>Nowadays, we know better, and don&#8217;t write new code to solve problems that have already been solved. Well, unless we need an XML parser &#8211; everybody (including myself, but that&#8217;s a post for another time) has written one of those &#8211; or at least a string tokeniser (aka splitter).</p>
<p><i>Other</i> languages get tokenisers for free (C# &#8211; String.Split, Java &#8211; String.split, Python &#8211; string.split, and so on, and even C has strtok), but not C++. Which is why it&#8217;s something almost every C++ programmer writes, at some point or other.</p>
<p>Of course, you can use the rather nifty <a href="http://www.boost.org/doc/libs/release/libs/tokenizer/index.html">boost::tokenizer</a>, if the place where you work is okay with using Boost (a surprising number of places aren&#8217;t, for various reasons), or find one of the numerous example implementations out there. Like this one, for instance:<br />
<span id="more-96"></span></p>
<pre class="brush: cpp;">
void tokenise_string(const std::string&amp; str, 
  const std::string&amp; separator, 
  std::vector&lt;std::string&gt;&amp; tokens, 
  bool empty /* = false */)
{
  const std::string::size_type strlength = str.length();
  const std::string::size_type seplength = separator.length();

  std::string::size_type prev = 0;
  std::string::size_type next = str.find(separator, prev);

  while (std::string::npos != next)
  {
    if (empty || prev != next)
      tokens.push_back(str.substr(prev, next - prev));
    prev = next + seplength;
    next = str.find(separator, prev);
  }
  if (empty || prev != strlength)
    tokens.push_back(str.substr(prev, strlength - prev));
}
</pre>
<p>There&#8217;s not that much to say about this. Pass in a string to split up into tokens, what separator to look for, and an output parameter which will hold the tokens when we&#8217;re done. What makes this implementation slightly different from some the others is that the separator is a <code>std::string</code>, and treated as such. Other implementations I&#8217;ve seen take a <code>char</code> (or even <code>std::string::value_type</code>) as a separator, or a string which is treated as a list of possible separators (like &#8220;.!?&#8221; to split a text into sentences).</p>
<p>I dislike the latter, as it&#8217;s ambiguos &#8211; is the separator used as a full string or as an array of characters?  Rather, I&#8217;d prefer to make it explicit by overloading the function</p>
<pre class="brush: cpp;">
void tokenise_string(const std::string&amp; str, 
  const std::vector&lt;std::string::value_type&gt;&amp; separators, 
  std::vector&lt;std::string&gt;&amp; tokens, 
  bool empty /* = false */)
{
  const std::string::size_type strlength = str.length();
  const std::string::size_type seplength = 1;
  const std::string sep(separators.begin(), separators.end());

  std::string::size_type prev = 0;
  std::string::size_type next = str.find_first_of(sep, prev);

  while (std::string::npos != next)
  {
    if (empty || prev != next)
      tokens.push_back(str.substr(prev, next - prev));
    prev = next + seplength;
    next = str.find_first_of(sep, prev);
  }
  if (empty || prev != strlength)
    tokens.push_back(str.substr(prev, strlength - prev));
}
</pre>
<p>However, there is a problem here, in that <code>std::string</code> can be implicitly created from a native array of characters, and <code>std::vector</code> can&#8217;t:</p>
<pre class="brush: cpp;">
std::vector&lt;std::string&gt; output;
std::string input = "What, me worry? Nah.";
char separators[] = {'.','?'};

// Will call std::string separator version, which we probably don't intend
tokenise_string(input, separators, output);

// Must set up a vector explicitly
std::vector&lt;char&gt; sep_array(&amp;separators[0], &amp;separators[2]);

// Will call std::vector separator version
tokenise_string(input, sep_array, output);
</pre>
<p>For now, that is. C++ 1x will have an <code>initializer_list</code> constructor which will make things interesting here. </p>
<p>By the way, the benefit of treating a separator string as one single separator is, of course, that it lets us <a href="http://en.wikipedia.org/wiki/Full_stop#Use_in_telegrams">parse telegrams</a>:</p>
<pre class="brush: cpp;">
std::vector&lt;std::string&gt; output;
std::string input = "NO TIME FOR WRENCHES STOP HAMMER TIME STOP";
std::vector&lt;std::string&gt; separators = "STOP";
tokenise_string(input, separators, output);
// Now output has two strings
</pre>
<p>I should probably mention, too, that the <code>empty</code> parameter lets us specify whether to include empty tokens in the output.  In most cases, I don&#8217;t want to, but there are times it&#8217;s significant, if only to indicate whether the string started or ended with a separator.</p>
<p>Finally, here&#8217;s a function you see implemented and talked about a lot less often than its counterpart.  If you want to split, presumably you&#8217;ll also want to merge, at some point. While it&#8217;s a very simple function, I&#8217;ve found it handy to have it available, so the merging is consistently done:</p>
<pre class="brush: cpp;">
void merge_tokens(const std::vector&lt;std::string&gt; &amp;tokens, 
  const std::string&amp; separator, 
  std::string&amp; output)
{
  if (!tokens.empty())
  {
    output = tokens.front();
    for (std::vector&lt;std::string&gt;::const_iterator i = 
      ++(tokens.begin()); 
      i != tokens.end(); ++i)
    {
      output += separator + *i;
    }
  }
}
</pre>
<p>Here we see the difference the <code>empty</code> flag makes in a call, by the way:</p>
<pre class="brush: cpp;">
std::vector&lt;std::string&gt; split1, split2;
std::string input = "/usr/tmp";
std::string separator = "/";

tokenise_string(input, separator, split1);
tokenise_string(input, separator, split2, true);

std::string merged1, merged2;

merge_tokens(split1, separator, merged1);
merge_tokens(split2, separator, merged2);

assert(input != merged1);  // Initial / removed
assert(input == merged2);  
</pre>
<br />Filed under: <a href='http://coolcowstudio.wordpress.com/category/code/'>Code</a>, <a href='http://coolcowstudio.wordpress.com/category/code/codeproject/'>CodeProject</a> Tagged: <a href='http://coolcowstudio.wordpress.com/tag/c/'>C++</a>, <a href='http://coolcowstudio.wordpress.com/tag/string/'>string</a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=coolcowstudio.wordpress.com&#038;blog=13675819&#038;post=96&#038;subd=coolcowstudio&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://coolcowstudio.wordpress.com/2010/08/10/splitting-strings/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
	
		<media:content url="http://2.gravatar.com/avatar/e4280f2669475d196742f3f90db0d4b1?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">CoolCowStudio</media:title>
		</media:content>
	</item>
		<item>
		<title>Redux: Hex strings to raw data and back</title>
		<link>http://coolcowstudio.wordpress.com/2010/08/05/redux-hex-strings-to-raw-data-and-back/</link>
		<comments>http://coolcowstudio.wordpress.com/2010/08/05/redux-hex-strings-to-raw-data-and-back/#comments</comments>
		<pubDate>Thu, 05 Aug 2010 11:06:19 +0000</pubDate>
		<dc:creator>Orjan</dc:creator>
				<category><![CDATA[Code]]></category>
		<category><![CDATA[CodeProject]]></category>
		<category><![CDATA[C++]]></category>
		<category><![CDATA[coding]]></category>
		<category><![CDATA[string]]></category>

		<guid isPermaLink="false">http://coolcowstudio.wordpress.com/?p=70</guid>
		<description><![CDATA[During the writing of my last post, I did the due dilligence thing and considered alternative implementations and algorithms to solve the problem at hand (converting a string representation of an 8-bit hexadecimal value to an unsigned 8-bit integer value). Because I was, in effect, documenting code written some years ago, I can&#8217;t recall exactly [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=coolcowstudio.wordpress.com&#038;blog=13675819&#038;post=70&#038;subd=coolcowstudio&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>During the writing of my <a href="http://coolcowstudio.wordpress.com/2010/08/04/string-utilities-hex/">last post</a>, I did the due dilligence thing and considered alternative implementations and algorithms to solve the problem at hand (converting a string representation of an 8-bit hexadecimal value to an unsigned 8-bit integer value). Because I was, in effect, documenting code written some years ago, I can&#8217;t recall exactly what other options, if any, I tried at the time.</p>
<p>I think I first tried using a <code>std::stringstream</code>, but gave up on that as being too slow, and went with <code>strtoul</code> instead. I might also have played around with using a <code>std::map</code> lookup table, with all the headaches that brought in terms of storage and initialisation, and decided against it.</p>
<p>What I <i>didn&#8217;t</i> try was a straight, non-clever switch-based lookup table to find the integer value of a hexadecimal character digit:</p>
<pre class="brush: cpp;">
inline unsigned char hex_digit_to_nybble(char ch)
{
  switch (ch)
  {
    case '0': return 0x0;
    case '1': return 0x1;
    case '2': return 0x2;
...
    case 'f': return 0xf;
    case 'F': return 0xf;
    default: throw std::invalid_argument();
  }
}
</pre>
<p><span id="more-70"></span>I did when writing that post, though, and found that this was twice as fast as using <code>strtoul</code>. That in itself isn&#8217;t surprising, as it&#8217;s a much simpler functional requirement to fulfill (the standard <code>strtoul</code> is a very capable parser). </p>
<p>As it happens, using this digit-to-nybble function is not only faster, but also simplifies the main conversion function calling it. For one thing, this function handles all sanity checking and validation of the characters given, and for another, it removes the need to copy the two characters into a hex string before doing the conversion.  </p>
<p>Faster, neater, tidier code &#8211; brilliant!</p>
<p>However, I was troubled by one question: what character set does C++ use?<br />
See, I was using literal characters, and we&#8217;ve all been warned about making assumptions about character sets.  I even referred to that kind of dangerous assumption in the post I wrote, without, if I&#8217;m honest, thinking much about it.</p>
<p>Looking at the code again, I realised that I was already assuming things about the character set, since I use an array of characters when encoding integer values into hex strings:</p>
<pre class="brush: cpp;">
static const std::string hex_char = "0123456789abcdef";
...
char high_nybble_char =  hex_char[[(data_char &gt;&gt; 4) &amp; 0x0f];
char low_nybble_char =  hex_char[[data_char &amp; 0x0f];
</pre>
<p>In other words, if this was safe, the switch was safe. If it wasn&#8217;t, neither was the switch, and while I already had a portable safe and working version for the decoding (using <code>strtoul</code>), I might have to come up with an alternative for the encoding.</p>
<p>Essentially, for this code to be guaranteed to be portable, standards-compliant, and safe, three potentially different character sets need to agree that the sixteen characters used to represent hexadecimal numbers are the same. Those three are:
<ol>
<li>The character set used to store the source code files,</li>
<li>The character set used by the compiler when creating executable code,</li>
<li>The character set used by the data sent into the decoding function, and accepted as output by the caller of the encoding function.</li>
</ol>
<p>The problem is that a character is only a character when printed or written or displayed. It&#8217;s a visual shape, a representation. This <b>A</b> (probably) looks like a character to you, but it isn&#8217;t stored as one.  It&#8217;s stored in some sort of binary representation, using some defined rule set and look-up that tells your computer (or mobile, iPad, text-to-speech system, etc) that it should be rendered in a manner that represents the concept of the letter &#8220;A&#8221; in one of the many Latin alphabets. Probably.</p>
<p>So the compiler needs to be able to read the source file, which is a stream of bytes (which, of course, may be of any number of lengths, although 8 bits is the most common), and interpret those into a representation where, for instance, the bytes {0&#215;69, 0&#215;66, 0&#215;20, 0&#215;28} are parsed as &#8220;if (&#8221; and not &#8220;%&ouml;&#8221;.  </p>
<p>Of course, this is what a compiler is required to do, and regardless of how the source code is stored (well, provided it&#8217;s in a form the compiler suports), the compiler will read it and convert it to &#8220;basic source character set&#8221;. This character set includes all hexadecimal digits, so we&#8217;re safe there. (It actually includes most of your basic printable 7-bit ASCII.  There are a couple of good explanations of this on <a href="http://stackoverflow.com/questions/331690/c-source-in-unicode">StackOverflow</a>.)</p>
<p>Next, the compiler has to worry about the &#8220;basic execution character set&#8221;, which is what character set &#8211; at minimum &#8211; is used during execution.  This is defined as the &#8220;basic source character set&#8221; plus a few extra characters, so we&#8217;re good there, too.</p>
<p>Finally, the data sent in to the function is safe, too, because the &#8220;basic source character set&#8221; covers the calling functions too, and in the case of data read from disk or otherwise externally sourced, it is the responsibility of the calling function to provide it in that form.</p>
<p>For more information on these character sets, see the C++ standard, or these two articles, which have been very helpful to me:<br /><a href="http://www.cppgeek.com/2009/05/04/what-character-set-does-cpp-use/">What character set does C++ use?</a><br />
<a href="http://www.glenmccl.com/charset.htm">C++ Character Sets</a></p>
<p>So, anyway, it&#8217;s safe to use character literals to represent hexadecimal digits, which means that the <code>bytes_to_hex</code> function can remain unchanged, and the <code>hex_to_bytes</code> can be rewritten to use the much faster switch lookup:</p>
<pre class="brush: cpp;">
inline unsigned char hex_digit_to_nybble(char ch)
{
  switch (ch)
  {
    case '0': return 0x0;
    case '1': return 0x1;
...
    case 'f': return 0xf;
    case 'F': return 0xf;
    default:
    {
      std::string e = "Invalid character in hex string: \'";
      e += ch;
      e += "'";
      throw std::invalid_argument(e);
    }
  }
}

void hex_to_bytes2(const std::string&amp; str, 
  std::vector&lt;unsigned char&gt;&amp; data)
{
  // Sanity check
  static_assert&lt;8 == CHAR_BIT&gt;::valid_expression();
  
  // Clear output
  data.clear();
  
  // No data? Then we're done
  if (str.empty())
    return;

  // Must be prepared that string can have odd number of 
  // nybbles, in which case the first is treated like the low 
  // nybble of the first byte
  size_t lengthOverflow = str.length() % 2;

  // This also affects the length of the data buffer we
  // allocate (need full  byte for nybble)
  const size_t length = lengthOverflow + str.length() / 2;
  data.resize(length);

  // Buffer for byte conversion
  static char buf[3];
  buf[2] = 0;
  // End of input
  char* pend = &amp;buf[2];

  // Iterators for input and output
  size_t i = 0;
  size_t c = 0;

  // If the first nybble is a low, we'll do it separately
  if (1 == lengthOverflow)
  {
    data[i++] = hex_digit_to_nybble(str[c++]);
  }
  
  // For each output byte, we use two input characters for 
  // high and low nybble, respectively
  while (i &lt; length)
  {
    data[i++] = (hex_digit_to_nybble(str[c++]) &lt;&lt; 4) |
      hex_digit_to_nybble(str[c++]);
  }
}
</pre>
<p>So there we are. I spent an evening learning a bit more about C++, improving some old code, and writing this post saying &#8220;You can always improve&#8221;.</p>
<br />Filed under: <a href='http://coolcowstudio.wordpress.com/category/code/'>Code</a>, <a href='http://coolcowstudio.wordpress.com/category/code/codeproject/'>CodeProject</a> Tagged: <a href='http://coolcowstudio.wordpress.com/tag/c/'>C++</a>, <a href='http://coolcowstudio.wordpress.com/tag/coding/'>coding</a>, <a href='http://coolcowstudio.wordpress.com/tag/string/'>string</a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=coolcowstudio.wordpress.com&#038;blog=13675819&#038;post=70&#038;subd=coolcowstudio&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://coolcowstudio.wordpress.com/2010/08/05/redux-hex-strings-to-raw-data-and-back/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
	
		<media:content url="http://2.gravatar.com/avatar/e4280f2669475d196742f3f90db0d4b1?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">CoolCowStudio</media:title>
		</media:content>
	</item>
		<item>
		<title>Hex strings to raw data and back</title>
		<link>http://coolcowstudio.wordpress.com/2010/08/04/string-utilities-hex/</link>
		<comments>http://coolcowstudio.wordpress.com/2010/08/04/string-utilities-hex/#comments</comments>
		<pubDate>Wed, 04 Aug 2010 11:08:19 +0000</pubDate>
		<dc:creator>Orjan</dc:creator>
				<category><![CDATA[Code]]></category>
		<category><![CDATA[CodeProject]]></category>
		<category><![CDATA[C++]]></category>
		<category><![CDATA[coding]]></category>
		<category><![CDATA[string]]></category>

		<guid isPermaLink="false">http://coolcowstudio.wordpress.com/?p=58</guid>
		<description><![CDATA[Here&#8217;s a problem that tends to crop up in a lot of communication domains: how do you transfer binary data in a protocol which limits what characters are permitted? The answer is to encode it into permissible characters (for historical reasons often 7-bit printable ASCII), and because there are few things this wonderful industry likes [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=coolcowstudio.wordpress.com&#038;blog=13675819&#038;post=58&#038;subd=coolcowstudio&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>Here&#8217;s a problem that tends to crop up in a lot of communication domains: how do you transfer binary data in a protocol which limits what characters are permitted?  The answer is to encode it into permissible characters (for historical reasons often 7-bit printable ASCII), and because there are few things this wonderful industry likes more than re-inventing the wheel, there&#8217;s a plethora of <a href="http://en.wikipedia.org/wiki/Binary-to-text_encoding">binary-to-text encoding</a> schemes around.  Each has its own trade-offs in terms of speed and space efficiency, and almost every one has a more or less glorious history of being the favoured scheme on some platform, or in some protocol or application.</p>
<p>The simplest encoding is (in my opinion) the &#8220;hexadecimal text&#8221; encoding. It&#8217;s so simple, it doesn&#8217;t even have a fancy or clever name. You simply take each byte and type its value as a hexadecimal number. Working on the <a href="http://en.wikipedia.org/wiki/Byte">assumption</a> that a byte is 8 bits, its value can be expressed in two characters &#8211; 0&#215;00-0xff. Assuming that a character occupies one byte, we see that the size of the data will double by writing it as hexadeximal text, so it&#8217;s not very efficient space-wise.  But it is simple to understand and implement, and quite useful, so I wrote a pair of encoding/decoding functions. <span id="more-58"></span></p>
<p>Let&#8217;s start with the encoding function, as that&#8217;s simplest. I&#8217;ll use <code>std::string</code> to store the resulting string here, with the above assumptions. (Those assumptions &#8211; 8-bit memory bytes, 8-bit characters &#8211; are quite reasonable, in that if you&#8217;re working on a platform where they&#8217;re not true, you probably know about it.)</p>
<pre class="brush: cpp;">
#include &lt;limits&gt; // For char size

// Encode data buffer to string of hexadecimal values
void bytes_to_hex(const std::vector&lt;unsigned char&gt;&amp; data, 
  std::string&amp; str)
{
  // Just wrapping the more "raw" function
  bytes_to_hex(&amp;data[0], data.size(), str);
}

void bytes_to_hex(const unsigned char* data, size_t length,
  std::string&amp; str)
{
  // Sanity check
  static_assert&lt;8 == CHAR_BIT&gt;::valid_expression();
  
  // Clear output
  str.clear();
  
  // No data? Then we're done
  if (0 == length)
    return;

  // Output is twice the length of input length
  str.resize(length * 2, ' ');
  
  // Working with 4-bit nybbles, we can use the value as
  // index to character
  static const std::string hex_char = "0123456789abcdef";

  for (size_t i = 0; i &lt; length; ++i)
  {
    // High nybble
    str[i&lt;&lt;1] = hex_char[(data[i] &gt;&gt; 4) &amp; 0x0f];
    // Low nybble
    str[(i&lt;&lt;1) + 1] = hex_char[data[i] &amp; 0x0f];
  }
}
</pre>
<p>As you see, it&#8217;s very simple. Given a buffer of bytes {7, 233, 57, 42, 198} the string &#8220;07e9392ac6&#8243; is generated into the output parameter.  </p>
<p>While there is a standard way of turning a number into a string &#8211; with <code>std::stringstream</code> &#8211; I elected to write the code to do it myself here, since it&#8217;s a very simple and safe algorithm. </p>
<p>First, I set up an array of the sixteen hexadecimal digits, and then simply use the high and low <a href="http://en.wikipedia.org/wiki/Nibble">nybble</a> as an index into this array to get the character corresponding to the value of the nybble.  The high nybble has to be shifted down to get the correct range, and that&#8217;s all there is to it.</p>
<p>Since I had already resized the output string, I can write directly into the correct position, instead of appending (which would likely be significantly slower).</p>
<p>It&#8217;s tempting to write the decoding function as a straight reverse, but this stumbles on the character-to-value lookup. How do you get from a character to its corresponding nybble? The naive solution looks as follows:</p>
<pre class="brush: cpp;">
std::string hex = "f3"; // For instance
...
char hi_nybble = hex[0];
char lo_nybble = hex[1];
unsigned char result = 0;

// First for high nybble, then low
// Numeric or alphabetic?
if (hi_nybble &gt; '9')
  result |= (hi_nybble - 'a' + 0xa) &lt;&lt; 4;
else
  result |= (hi_nybble - &#039;0&#039;) &lt;&lt; 4;
if (lo_nybble &gt; &#039;9&#039;)
  result |= (lo_nybble - &#039;a&#039; + 0xa);
else
  result |= (lo_nybble - &#039;0&#039;);
...
</pre>
<p>Ignoring for the moment that there&#8217;s no sanity checking of the input data, assuming only the characters [0-9,a-f] will be present, this function still fails the &#8220;good engineering&#8221; test by making assumptions about character ordering and values.  It&#8217;s not safe to use, and may stop working when used with a different character set.</p>
<p>An alternative would be to make a proper lookup table, mapping characters to nybble values, with separate entries for upper and lower case characters (a-f), either by populating a <code>std::map</code> or a big switch:</p>
<pre class="brush: cpp;">
inline unsigned char hex_digit_to_nybble(char ch)
{
  switch (ch)
  {
    case '0': return 0x0;
    case '1': return 0x1;
    case '2': return 0x2;
...
    case 'f': return 0xf;
    case 'F': return 0xf;
    default: throw std::invalid_argument();
  }
}
</pre>
<p>Then, after I have the nybbles, I could shift the high one up, and do a bitwise OR to join them.  But frankly, while this works, it feels clunky.  And besides, there are lots of standard ways to convert a string to a number; from the standard C library functions <code>atoi</code> and <code>strtol</code>, to the standard C++ <code>std::stringstream</code> (and even <code>boost::lexical_cast</code> which isn&#8217;t standard, but fairly popular).  However, only two of those can handle numbers in bases other than decimal &#8211; <code>strtol</code> and <code>std::stringstream</code> &#8211; and of those, the latter is much more powerful, and therefore likely to be slower.</p>
<p>The <code>strtol</code> function expects a character string, so I&#8217;ll have to copy each pair of characters into a zero-terminated buffer, and use that as input to get a byte. That&#8217;s simple enough, but what do I do if there isn&#8217;t a pair of characters, but a single one?  </p>
<p>In other words, if I have the hex string &#8220;3da&#8221; to convert, the function should treat it like &#8220;03da&#8221;, and produce {03, da} rather than {3d, a0}. Rather than making a copy of the string with an extra &#8220;0&#8243; prepended, I&#8217;ll treat this as a special case.</p>
<p>Any other potential problems with using <code>strtol</code>?  Well, yes, the matter of what characters count as valid input.  I&#8217;ll use the unsigned version, <code>strtoul</code>, since I&#8217;m expecting an unsigned output, but even this is far too lenient in what it accepts: [whitespace][{+|–}] [0[{x|X}]][digits].  </p>
<p>Since I&#8217;m not converting numbers, but encoding bytes, I can&#8217;t accept any whitespace, signs, or anything that isn&#8217;t a hexadecimal digit. Looking into this further, it&#8217;s less of a problem than it would first appear, as <code>strtoul</code> will let me know if there&#8217;s an un-parsed character at the end. In other words, &#8220;-3&#8243; is fully parsed, but for &#8220;3-&#8221; it will only parse the first digit and then stop, so that&#8217;s a simple thing to check for. Furthermore, by telling it explicitly what base to use, it will disallow an initial &#8220;0x&#8221;.</p>
<p>Still, we need to make sure the initial character is valid hex, which means breaking out <code>isxdigit</code> to make sure we only accept hexadecimal digits. Now, this is a function that is both available from the standard C library, via the <code>&lt;cctype&gt;</code> header, and from the standard C++ library, via the <code>&lt;locale&gt;</code> header. The difference is that the <code>std::isxdigit</code> takes a <code>std::locale</code>, which I assume is just for completeness&#8217; sake, as the C <code>isxdigit</code> is not affected by any changes to locale. At best, it&#8217;s just a through call only adding a level of indirection, and at worst, it adds more unnecessary computing, so I&#8217;ll just stick to the C version.</p>
<pre class="brush: cpp;">
#include &lt;cctype&gt; // For isxdigit

void hex_to_bytes(const std::string&amp; str, 
  std::vector&lt;unsigned char&gt;&amp; data)
{
  // Sanity check
  static_assert&lt;8 == CHAR_BIT&gt;::valid_expression();
  
  // Clear output
  data.clear();
  
  // No data? Then we're done
  if (str.empty())
    return;

  // Must be prepared that string can have odd number of 
  // nybbles, in which case the first is treated like the low 
  // nybble of the first byte
  size_t lengthOverflow = str.length() % 2;

  // This also affects the length of the data buffer we
  // allocate (need full  byte for nybble)
  const size_t length = lengthOverflow + str.length() / 2;
  data.resize(length);

  // Buffer for byte conversion
  static char buf[3];
  buf[2] = 0;
  // End of input
  char* pend = &amp;buf[2];

  // Iterators for input and output
  size_t i = 0;
  size_t c = 0;

  // If the first nybble is a low, we'll do it separately
  if (1 == lengthOverflow)
  {
    buf[0] = '0';
    buf[1] = str[c++];
    unsigned char x = static_cast&lt;unsigned char&gt;
      (strtoul(buf, &amp;pend, 16));
    
    // Parsing should stop at terminating zero
    if (pend != &amp;buf[2])
    {
      std::string e = "Invalid character in hex string: \'";
      e += *(pend);
      e += "'";
      throw std::invalid_argument(e);
    }
    data[i++] = x;
  }
  
  // For each output byte, we use two input characters for 
  // high and low nybble, respectively
  for (; i &lt; length; ++i)
  {
    buf[0] = str[c++];
    // strtoul accepts initial whitespace or sign, we can't
    if (!isxdigit(buf[0]))
    {
      std::string e = "Invalid character in hex string: \'";
      e += buf[0];
      e += "'";
      throw std::invalid_argument(e);
    }

    buf[1] = str[c++];
    unsigned char x = static_cast&lt;unsigned char&gt;
      (strtoul(buf, &amp;pend, 16));

    // Parsing should stop at terminating zero
    if (pend != &amp;buf[2])
    {
      std::string e = "Invalid character in hex string: \'";
      e += *(pend);
      e += "'";
      throw std::invalid_argument(e);
    }

    data[i] = x;
  }
}
</pre>
<p>(As it happens, when writing this post I came across a very interesting <a href="http://tinodidriksen.com/2010/02/16/cpp-convert-string-to-int-speed/">blog entry</a> by <a href="http://tinodidriksen.com/">Tino Didriksen</a> testing different methods of converting strings to integers in C++, using decimal strings, so all the methods I mention above are timed. Looking at those results there&#8217;s little to recommend <code>std::stringstream</code> in terms of speed, which was a nice validation of code I wrote the first version of years ago.</p>
<p>I also used the benchmark code from Tino as a  to compare the usage of <code>strtol</code> to the <code>hex_digit_to_nybble</code> outlined above, and found that the latter was almost twice as fast. I&#8217;m currently pondering what drawbacks there might be in using what was quickly concieved as a rethorical strawman while writing this article.)</p>
<br />Filed under: <a href='http://coolcowstudio.wordpress.com/category/code/'>Code</a>, <a href='http://coolcowstudio.wordpress.com/category/code/codeproject/'>CodeProject</a> Tagged: <a href='http://coolcowstudio.wordpress.com/tag/c/'>C++</a>, <a href='http://coolcowstudio.wordpress.com/tag/coding/'>coding</a>, <a href='http://coolcowstudio.wordpress.com/tag/string/'>string</a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=coolcowstudio.wordpress.com&#038;blog=13675819&#038;post=58&#038;subd=coolcowstudio&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://coolcowstudio.wordpress.com/2010/08/04/string-utilities-hex/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
	
		<media:content url="http://2.gravatar.com/avatar/e4280f2669475d196742f3f90db0d4b1?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">CoolCowStudio</media:title>
		</media:content>
	</item>
		<item>
		<title>Home on the range</title>
		<link>http://coolcowstudio.wordpress.com/2010/07/22/home-on-the-range/</link>
		<comments>http://coolcowstudio.wordpress.com/2010/07/22/home-on-the-range/#comments</comments>
		<pubDate>Thu, 22 Jul 2010 17:13:33 +0000</pubDate>
		<dc:creator>Orjan</dc:creator>
				<category><![CDATA[Code]]></category>
		<category><![CDATA[CodeProject]]></category>
		<category><![CDATA[bounds]]></category>
		<category><![CDATA[C++]]></category>
		<category><![CDATA[template]]></category>

		<guid isPermaLink="false">http://coolcowstudio.wordpress.com/?p=54</guid>
		<description><![CDATA[Continuing on the train of thought started in bounds class I presented a few days ago in Bounds, and staying within them. As so often happens, just having bounds available made me think of what variants of it could be useful. For instance, it would be handy to have it work for floating point or [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=coolcowstudio.wordpress.com&#038;blog=13675819&#038;post=54&#038;subd=coolcowstudio&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p><i>Continuing on the train of thought started in <code>bounds</code> class I presented a few days ago in <a href="http://coolcowstudio.wordpress.com/2010/07/12/bounds-and-staying-within-them/">Bounds, and staying within them</a>.</i></p>
<p>As so often happens, just having <code>bounds</code> available made me think of what variants of it could be useful. For instance, it would be handy to have it work for floating point or non-<a href="http://en.wikipedia.org/wiki/Plain_old_data_structure">POD</a> types, which isn&#8217;t possible as it is written. Since the <code>bounds</code> class uses &#8216;<a href="http://msdn.microsoft.com/en-us/library/x5w1yety.aspx">non-type template parameters</a>&#8216; for its limits, only integer types and enums are accepted.<a name="fn1b" /><a href="#fn1">[1]</a> </p>
<p>Even disregarding this restriction, I found that I had use for a dynamic <code>range</code> class, as opposed to the static <code>bounds</code> which has its boundaries set at compile time. Just a simple one, and like <code>std::pair</code> only having two values, but with both of the same type, and with them guaranteed to be ordered. </p>
<p>The last part there would make it a bit more complex than the simple <code>std::pair</code> struct, as I&#8217;d need to validate the values given in order to ensure that the minimum was lower than or equal to the maximum, but still, a simple enough little class.<span id="more-54"></span>  </p>
<pre class="brush: cpp;">
template &lt;typename T,               // data type
  typename L = less_than_comparison::closed&lt;T&gt;, // lower comparer
  typename U = less_than_comparison::closed&lt;T&gt; &gt;// upper comparer
class range
{ 
  // Member data
  T minimum_, maximum_;
protected:
  // Validation function
  virtual void throw_if_invalid(const T&amp; minimum, 
	const T&amp; maximum)
  {
    if (maximum &lt; minimum)
      throw std::invalid_argument("Minimum &gt; maximum");
  }
public:
  // Type name for templated type
  typedef typename T type;

  // Default constructor
  range()
    : minimum_(T()),maximum_(T()) 
  {}
  // Assignment constructor
  range(T min, T max)
    : minimum_(min),maximum_(max)
  {
    throw_if_invalid(minimum_, maximum_);
  }

  // Get minimum value
  T get_minimum() const
  {
    return minimum_;
  }
  // Get maximum value
  T get_maximum() const
  {
    return maximum_;
  }
  // Set mimimum value 
  void set_minimum(T min)
  {
    throw_if_invalid(min, maximum_);
    minimum_ = min;
  }
  // Set maximum value 
  void set_maximum(T max)
  {
    throw_if_invalid(minimum_, max);
    maximum_ = max;
  }

  // Equality comparison operator
  bool operator==(const range&amp; other) const
  {
    return (minimum_ == other.minimum_) &amp;&amp; 
      (maximum_ == other.maximum_);
  }
  // Inquality comparison operator
  bool operator!=(const range&amp; other) const
  {
    return !operator==(other);
  }

  // Get size of range
  int width() const 
  {
    return maximum_ - minimum_;
  }

  // Check if value is in range
  bool in_range(T val) const
  {
    return L::less(minimum_, val) &amp;&amp; U::less(val, maximum_);
  }
  // Check if other range is subset of this
  bool in_range(const range&amp; other) const
  {
    return L::less(minimum_, other.minimum_) &amp;&amp; 
      U::less(other.maximum_, maximum_);
  }

  // Check if other range intersects with this
  bool intersects(const range&amp; other) const
  {
    return 
	  (in_range(other.minimum_) || other.in_range(minimum_)) &amp;&amp;
      (in_range(other.maximum_) || other.in_range(maximum_));
  }
  // Create union of this and other range
  range make_union(const range&amp; other) const
  {
    if (!intersects(other))
      throw std::invalid_argument("No union of ranges");
  
    return range(std::min(minimum_, other.minimum_), 
      std::max(maximum_, other.maximum_));
  }
  // Create intersection of this and other range
  range make_intersection(const range&amp; other) const
  {
    if (!intersects(other))
      throw std::invalid_argument("No intersection of ranges");

    return range(std::max(minimum_, other.minimum_), 
      std::min(maximum_, other.maximum_));
  }
};
// Create intersection of two ranges
template &lt;typename T&gt;
range&lt;T&gt; operator&amp;(const range&lt;T&gt;&amp; lhs, 
  const range&lt;T&gt;&amp; rhs)
{
  return lhs.make_intersection(rhs);
}

// Create union of two ranges
template &lt;typename T&gt;
range&lt;T&gt; operator|(const range&lt;T&gt;&amp; lhs, 
  const range&lt;T&gt;&amp; rhs)
{
  return lhs.make_union(rhs);
}
</pre>
<p>This uses the same <a href="http://en.wikipedia.org/wiki/Policy-based_design">policy-based design</a> with upper and lower comparators as the <code>bounds</code> class, so that you can have an open, closed, or half-open (either directions) range. </p>
<p>Note that despite the inclusion of union and intersection functions and operators, this is not a class intended for interval arithmetic. If you have such needs, you&#8217;re much better off with the <code><a href="http://www.boost.org/doc/libs/1_43_0/libs/numeric/interval/doc/interval.htm">boost::interval</a></code> class.</p>
<p>There is one virtual function in the <code>range</code> class: the validation function. The reason for this is that I can simply combine this with the <code>bounds</code> class into a bounded range, and only need to update the validation to take the bounds into consideration to have a fully functioning class.  Well, that, and write suitable constructors, and provide another bounds-checking function. </p>
<pre class="brush: cpp;">
template &lt;typename T, T lower_, T upper_, 
  typename L = less_than_comparison::closed&lt;T&gt;, 
  typename U = less_than_comparison::closed&lt;T&gt; &gt;
class bounded_range : public range&lt;T, L, U&gt;, 
  public bounds&lt;T, lower_, upper_, L, U&gt;
{
protected:
  /*  Overridden validation function to check bounds as well 
      as validity.
    Throws if minimum &gt; maximum, or if out of bounds
    \param minimum lower value of range
    \param maximum upper value of range
  */
  virtual void throw_if_invalid(const T&amp; mini, const T&amp; maxi)
  {
    range&lt;T, L, U&gt;::throw_if_invalid(mini, maxi);
    if (!in_bounds(mini))
      throw std::invalid_argument("Minimum out of bounds");
    if (!in_bounds(maxi))
      throw std::invalid_argument("Maximum out of bounds");
  }
public:
  // Type name for templated type
  typedef typename T type;
  // Default constructor
  bounded_range()
    : range(lower_bound(), upper_bound())
  {}
  // Assignment constructor
  bounded_range(const T&amp;  min, const T&amp;  max)
    : range(min, max)
  {
    throw_if_invalid(min, max);
  }
  // Conversion constructor
  bounded_range(const range&lt;type&gt;&amp; other)
    : range(other)
  {
    throw_if_invalid(other.get_minimum(), other.get_maximum());
  }
  // Check if value is within bounds using base class
  using bounds&lt;T, lower_, upper_&gt;::in_bounds;
  // Check if range is within bounds
  static bool in_bounds(const range&lt;T&gt;&amp; other)
  {
    return in_bounds(other.get_minimum()) &amp;&amp; 
	  in_bounds(other.get_maximum());
  }
};
</pre>
<p><a name="fn1" />[1] The C++ language also permits address types (pointer or reference) as non-type parameters, provided they&#8217;re known at compile time, but for that loophole to provide a way to implement static bounds checking with float or, say, std::point types, would, if at all possible, require a mastery of template metaprogramming magic that is far beyond my meagre abilities.<a href="fn1b">Back</a></p>
<br />Filed under: <a href='http://coolcowstudio.wordpress.com/category/code/'>Code</a>, <a href='http://coolcowstudio.wordpress.com/category/code/codeproject/'>CodeProject</a> Tagged: <a href='http://coolcowstudio.wordpress.com/tag/bounds/'>bounds</a>, <a href='http://coolcowstudio.wordpress.com/tag/c/'>C++</a>, <a href='http://coolcowstudio.wordpress.com/tag/template/'>template</a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=coolcowstudio.wordpress.com&#038;blog=13675819&#038;post=54&#038;subd=coolcowstudio&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://coolcowstudio.wordpress.com/2010/07/22/home-on-the-range/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://2.gravatar.com/avatar/e4280f2669475d196742f3f90db0d4b1?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">CoolCowStudio</media:title>
		</media:content>
	</item>
		<item>
		<title>Old bag of tricks</title>
		<link>http://coolcowstudio.wordpress.com/2010/07/16/old-bag-of-tricks/</link>
		<comments>http://coolcowstudio.wordpress.com/2010/07/16/old-bag-of-tricks/#comments</comments>
		<pubDate>Fri, 16 Jul 2010 11:41:33 +0000</pubDate>
		<dc:creator>Orjan</dc:creator>
				<category><![CDATA[Meta]]></category>
		<category><![CDATA[musing]]></category>

		<guid isPermaLink="false">http://coolcowstudio.wordpress.com/?p=48</guid>
		<description><![CDATA[The main reason for starting this blog was to serve as an incentive to do some long-overdue tidying. Like pretty much any programmer who&#8217;s been working for a few years, I&#8217;ve got a little bag of tricks I&#8217;ve been carrying around throughout my career (17 years so far, of mainly C++, Delphi, and C#). Over [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=coolcowstudio.wordpress.com&#038;blog=13675819&#038;post=48&#038;subd=coolcowstudio&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>The main reason for starting this blog was to serve as an incentive to do some long-overdue tidying.  Like pretty much any programmer who&#8217;s been working for a few years, I&#8217;ve got a little bag of tricks I&#8217;ve been carrying around throughout my career (17 years so far, of mainly C++, Delphi, and C#).  Over the years, some bits and bobs have become obsolete (like that string class I wrote back in 1993, and everything I&#8217;ve ever written in Visual Basic), others have been lost in transitions and moves, but it&#8217;s still a sizeable bag.</p>
<p>Most of the things in the bag, though, has been unorganised little snippets of code, random functions and classes that have been half-formed, often neither tidy or documented enough to be presented publicly.  Some has come from experiments and trials, some from various hobby projects, and some from times my work contract has given me some rights to the software I write.</p>
<p>The last part there is a sensitive area.  In most employment contracts, the employer retains the full right to the code you write as part of your job.  If you are an independent contractor, the rights issue can be more flexibly negotiated, and there&#8217;s sometimes a distinction made between business-related code, and supporting code.</p>
<p>In most cases, the code I&#8217;ve written in my day job is code I have no rights to.  But because I&#8217;m a geek, I&#8217;ve sometimes thought &#8220;Oooh, that&#8217;s interesting&#8221;, and sat down to try out and experiment with some concept in my spare time.  In those cases, the inspiration might have come from my day job, but the code has been written in my spare time, from scratch.  It may be because I&#8217;ve wanted to learn more, or because it&#8217;s been an interesting problem to solve, or simply because the day job has only required a partial, specialised solution, so I&#8217;ve wanted to do a full solution for my own intellectual satisfaction.</p>
<p>It&#8217;s often been the case, too, that I&#8217;ve found a use for these little snippets later, for a different employer, and in those cases I&#8217;ve taken my old code and tweaked or rewritten it to fit into the style and needs of the current workplace.  </p>
<p>So I finally decided to tidy things up.  I&#8217;ll be re-visiting my old code folders, and extract what&#8217;s useful or interesting, and tidy it up, and package it. I will:
<ul>
<li>stick to a single style of coding (per language) and rewrite the code to have a consistent look</li>
<li>organise the code into namespaces and classes</li>
<li>comment the code thoroughly</li>
<li>use <a href="http://www.doxygen.org/">Doxygen</a> or similar to extract the comments to useful documentation</li>
<li>write a simple test app to exercise the code for each module</li>
<li>for each module, create packages with:
<ul>
<li>only source code</li>
<li>source code, documentation, and example/test app</li>
</ul>
</li>
<li>release under a <a href="http://www.opensource.org/licenses/bsd-license.php">BSD license</a></li>
</ul>
<p>Why?  Well, in a large part to make it easier for me to find, and introduce at the places I work.  And I also think other people might find these bits and bobs useful.</p>
<p>I wonder, though, whether it would also be useful to make the code available on <a href="http://sourceforge.net/">SourceForge</a> and/or <a href="http://www.codeproject.com/">CodeProject</a>?</p>
<br />Filed under: <a href='http://coolcowstudio.wordpress.com/category/meta/'>Meta</a> Tagged: <a href='http://coolcowstudio.wordpress.com/tag/musing/'>musing</a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=coolcowstudio.wordpress.com&#038;blog=13675819&#038;post=48&#038;subd=coolcowstudio&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://coolcowstudio.wordpress.com/2010/07/16/old-bag-of-tricks/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://2.gravatar.com/avatar/e4280f2669475d196742f3f90db0d4b1?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">CoolCowStudio</media:title>
		</media:content>
	</item>
	</channel>
</rss>
