Category Archives: Code

Cleaning up .net with IDisposable and finalizers

One of the most common mistakes I see in .net code is the misuse of finalizers.

Finalizers should only be used to clean up unmanaged resources. There is no guarantee of the order in which finalizers will be called, and they are only given a short time to complete. Finanizers should never call Dispose on anything, and should never do things like flush a stream. If you get to the point where a finalizer needs to flush a stream, you have a bug elsewhere in that the object was not properly flushed and disposed.

If you do not have any unmanaged resources to clean up, do not implement a finalizer. Cleaning up managed resources should be done by implementing IDisposable. This is also where large references may be set to null in order for the garbage collector to collect them even if a reference to the object itself still exists. For example, MemoryStream does not set the reference to its buffer to null, so disposing it does not free the memory it consumes. In order to get the memory back from a MemoryStream, you need to remove the reference to it. Luckily, declaring the variable in a using statement will also mean that it goes out of scope after it is disposed and may be garbage collected.

When implementing a finalizer and IDisposable on the same class, we always want the finalizer to be run, but we also want to be able to clean up both managed and unmanaged resources early if Dispose is called.

This is where the Dispose(bool disposing) pattern comes from, in which managed resources should only be disposed if disposing is true, which is where a lot of people go wrong.

For this reason, I suggest being explicit and creating two virtual methods rather than one, telling people exactly what they should be doing in those methods. Overriding methods in derived classes should also ensure that they always call their base class’ implementation.

See also: Implementing Finalize and Dispose to Clean Up Unmanaged Resources (MSDN)

Problems with Transient Fault Handling Application Block (Topaz)

Microsoft’s Transient Fault Handling Application Block (Topaz) is a good way of getting all retry behaviour into the same shape, but it has some problems.

Problems with RetryPolicy.ExecuteAsync
One problem is that the CancellationToken is not passed to Task.Delay, so you can’t cancel mid-delay, which is a major problem if you use delays of more than a few seconds.

Another problem is that if you do cancel, the last (failed) task is returned, so if you cancel before the first exception you get a cancellation, but if you cancel after the first exception you get the last exception thrown. This is inconsistent behaviour, and if you need to know earlier exceptions, there’s already a Retrying event which contains them. If an operation is cancelled, you shouldn’t need to care what the previous exception was, because the final cause of incompletion was cancellation, not an exception.

Also, if you cancel, you have to handle OperationCancelledException in your detection and retry strategies, which gets old fast. It should be safe to treat a cancellation as a non-transient exception at the RetryPolicy level, because a CancellationToken can’t be un-cancelled. The only way you could get a different result after a cancellation is if you were throwing your own OperationCancelledException or using a new CancellationTokenSource inside each retry, both of which I would say are code smells.

Problems with API design
The API is relatively messy and convoluted for the simple functionality it provides. RetryStrategy has an abstract GetShouldRetry method which returns a ShouldRetry delegate, rather than just having an abstract ShouldRetry method to implement.

RetryPolicy has some methods marked as virtual, such as ExecuteAction but not others, such as ExecuteAsync. There should never be a reason to derive from RetryPolicy anyway when all of the behaviour you need to manipulate is contained in its dependencies; ITransientErrorDetectionStrategy and RetryStrategy.

There is no need for all of the overloaded constructors of RetryPolicy, most of which just use the extra parameters to construct a RetryStrategy, which could have been done more cleanly by the caller.

There is some redundancy between ITransientErrorDetectionStrategy and RetryStrategy, as both are asked whether the operation should be retried. A library provider may wish to define a detection strategy only, but this should be used by the consumer’s RetryStrategy rather than the RetryPolicy asking both.

RetryStrategy.FastFirstRetry is something which should be used by the RetryStrategy as part of ShouldRetry rather than used by the RetryPolicy.

Other problems
There is some weird behaviour like using Task.Delay.Wait rather than Thread.Sleep in ExecuteAction, and some potential bugs like not saving the Retrying event handler to a local variable in OnRetrying.

A solution
Here’s an implementation, based loosely on the RetryPolicy source, which I believe is an improvement:

The same behaviour is provided for both ExecuteAction and ExecuteAsync and cancellation is handled correctly.

The following CompatibilityRetryStrategy can be used to migrate existing implementations of ITransientErrorDetectionStrategy and RetryStrategy. You could also use something similar to make use of an existing ITransientErrorDetectionStrategy, such as SqlDatabaseTransientErrorDetectionStrategy.

At the time of writing, the reference source was last updated 18 Aug 2015, but the nuget package was last updated 26 April 2013, so it’s possible that it’s just not being distributed by nuget any more. However, the only fix I can see in there is passing the CancellationToken to Task.Delay.

I have used the MIT licence here. Transient Fault Handling Application Block is licensed under the Apache 2 licence. To the best of my knowledge, these are compatible for this usage.

Introduction to MFC

The Microsoft Foundation Class library is a framework for Windows GUIs in C++, based on the Win32 API C library.

To create a simple window, you need to create an app and a window.

The app class handles most of the behind the scenes stuff, including program entry. There must be one globally defined instance of the app for the program to run.

Here is a very basic MFC application. It just shows a blank window.

If you want to create a form with buttons, text boxes etc, you should probably look at CFormView. Alternatively, create a dialog-based application in Visual Studio rather than single document (SDI).

SQL Server random unique identifiers

A common method of producing random unique identifiers in SQL server is by using a GUID field, calling newid() to generate the data. For the most part, this works because it’s 128 bits worth of random data, which means there is a very low probability of duplicate records for most databases.

However, it is also common to combine this with the checksum() function to reduce it to a 32 bit integer. This makes collisions much more likely, even in relatively small databases. For example, the GUIDs 28258F69-6536-4198-BE37-94960ABF054F and 49B60D4B-DC4A-4E18-825E-B4C99713D011 both checksum to 0xC3AD13D3. With a table of about 100,000 rows collisions will start to occur more frequently by the birthday paradox.

Using this maths, we can see that using a 32 bit random number, the probability of getting at least one collision is 50% at around 77,500 rows and 99% at 200,000 rows. We can also see that if we increase this to a 53 bit number, 10 million rows gives a 0.55% chance of getting at least collision and 100 million rows gives a 42.5% chance of getting at least one collision, so 64 bit should be plenty.

For higher precision numbers, we can use mpmath

From here you can see that a 64 bit number has a 2.6% chance of getting a single collision in a 1 billion row table, and a 93% chance in a 10 billion row table.

A compromise of both is to simply truncate the GUID at 64 bits and optionally convert to a bigint.

If you leave it as binary and don’t need to convert to an integer type, this does not have to be 8 bytes. For example, you could have a 5 or a 10 byte code.

None of these are perfect but the probability of a collision decreases with more bits. If 128 bit is too long for you (e.g. to display to users) but 32 bit generates too many collisions, try a compromise such as 64 bit.

If you are consistent enough, you may even be able to store the original GUID and just display the truncated form, which could allow you to change the length displayed later without changing the probability of collisions. This is more flexible but may lead to confusion among users and consistency is required (differing lengths could lead to bugs).

Licensing your libraries

For most spare time projects, licensing can be an afterthought. Personally, I hardly put licences on anything I write. Most of the time this is just because I expect people to use it anyway. I treat licences more as a restriction than a freedom.

Really, if you don’t put a licence on your code then people can’t use it but if anyone asked I would probably let them. A licence should let them know whether they can use it without having to ask you.

However, anything serious that you release needs a licence. Choosing the right licence for your project can be time consuming and involve reading and completely understanding legal documents, which isn’t what most of us want to do. It is very common for people to release code under the GNU General Public Licence (GPL) or Lesser General Public Licence (LGPL).

Terms such as “derived works” and what constitutes these are even more of a problem.

There are a few key points around releasing under these licences. Please bear in mind that I do not have a legal background.

  1. Anyone is free to distribute your work, even if you have sold it to them.
  2. You must provide source code for anything you release.
  3. Anyone is free to modify your work.
  4. If you use code licensed under the GPL in your project, your project must also be licensed under the GPL. This means that for people not releasing under the GPL, your code is essentially useless and to all intents and purposes does not exist.

The first point here makes selling your work difficult. Even if you successfully sell it, anyone who has bought it can just give it away for free.

The second point here makes releasing closed source binaries a problem. Anything you release must be released with accompanying source code. This causes further problems for commercial software, as attempts to safeguard it against piracy can be easily removed, which is also legal by the third point here.

The third point means that people can effectively take what you have done, make slight changes to it and pass it off as their own work.

However, the fourth point here is the biggest problem. When you licence under the GPL, you are not only causing the above problems for yourself (which may be ideal for you), you are also forcing anyone who uses your code to have all the same problems (which probably won’t be ideal for them). If they don’t want these problems in their own work, they can’t use yours. Fortunately there is a solution for this. The LGPL varies from the GPL in that projects don’t have to use the same licence as you in order to use your code. This makes it ideal for freely releasing libraries that anyone can use.

Somewhat worryingly, FSF and GNU are trying to trying to get people to use GPL instead of LGPL and, it seems, the only reason they are doing this is to give “free” software an advantage. I don’t agree with this ethically. For free software to be free, everyone should be able to use it and that includes people who want to use it in commercial projects.

I’ve lost count of how many times I’ve found a nice little snippet of code that does exactly what I want and the author seems to want to release it to the public for everyone, but has released it under the GPL making it unusable in a non-GPL project.

There are many other licences out there to choose from, such as the MIT and BSD licences. So, please choose the licence for your project carefully and think of the consequences for your target audience before blindly slapping a GPL sticker on it.

Printer friendly Fudzilla RSS

Fudzilla is a nice site for catching up with daily tech news. I prefer to set up my email client Thunderbird to aggregate the RSS, but when you have a 1920×1200 monitor and Fudzilla still puts so much crap at the top of the article that it ends up looking like this and you have to scroll down before you can even read a few lines of text, its becomes a terrible user experience.

Wouldn’t it be better to get the text at a reasonable size, taking 100% of the width available, with no crap at the top that you have to scroll past?

Well, you can. The printer friendly view does all of this, and all you have to do is change the link in the RSS to use the printer friendly URL instead of the standard one. Here is some php do do just that! Just put this on a server somewhere and subscribe to that URL instead of the normal fudzilla feed.

You may be able to just view the description in your RSS aggregator, but it tends not to show the whole article. In the past, fudzilla has put just the subtitle in there, for example.


Update: This has been broken by fudzilla using a HTTP 302 redirect and a cookie but anything that will take the cookie from the first request and use it for the second request should be able to handle this easily. Here’s a function for it that uses cURL.

As a note to Fudzilla: This is only necessary because of the sheer amount of crap you put at the top of your page. You shouldn’t have to scroll down an entire screen height before you can even see the content of that page. That’s just poor design. If adding cookies and a redirect was an attempt to stop screen scraping, it didn’t work. You can disable that now and save yourself some bandwidth.

Compiling zlib.lib on Windows

Update: see comments section for a more up to date method of doing this. Don’t forget to run contrib\masmx86\bld_ml32.bat from the Visual Studio Command Prompt before compiling this way though.

This article still applies if you want to compile on older versions e.g. MSVC 6.0, for older projects or if you have problems with the new method.


zlib is the standard for lossless data compression. The DEFLATE compression algorithm is the basis for just about every lossless compression format out there, including “zip” and “gzip”, which is itself part of zlib.

There are two ways that it can be used from C/C++ projects in Windows.

Firstly, it can be used by dynamic linking (dll). This means using zdll.lib and shipping the appropriate version of zlib1.dll with your project. This is not a problem, as Windows versions of both of these files are provided.

The second way is to use static linking. That is, having all of the code in one .lib file and compiling it into your exe so that you do not have to distribute zlib1.dll. This means compiling zlib.lib.

In version 1.2.4 of zlib, a “projects” directory was provided, with a Microsoft Visual C++ 6.0 project. However, it seems that version 1.2.5 has not included this project. This means that the best solution is to go and get the 1.2.4 source and compile it yourself. However, the zlib project seems to be kept inside the libpng project on sourceforge.net, so it is not immediately obvious where to find older versions of the zlib source code.

zlib 1.2.4 source (zip)

Extract the zip, open projects\visualc6\zlib.dsp in Visual Studio (I used 2005) and compile “LIB Release” (and optionally “LIB Debug”)

Copy zlib.h and zconf.h from “include” to your Visual Studio “include” directory, and zlib.lib (and zlibd.lib if you made it) to your Visual Studio “lib” directory.

On 64 bit Windows, with Visual Studio 2005, this is “C:\Program Files (x86)\Microsoft Visual Studio 8\VC\” so adjust for your version of Visual Studio.

You now just need to add “zlib.lib” to your “Linker -> Input -> Additional Dependencies” line in your C++ project configuration to use it (and optionally zlibd.lib for the debug version).