Archive for the ‘Code’ Category

Pixel theme

Saturday, June 13th, 2009

I’ve just switched to the Pixel WordPress theme. It’s quite nice but it does have some flaws. Images no longer auto-size so some are too large. I have had to manually resize the widest offenders.

Another problem is that it forces you to have a crappy redundant “Welcome” message. I have removed this in the style.css and, while I was at it, I made the post content justified. If anyone else wants to do the same thing (or I want to repeat this modification after a WordPress update), the following CSS is simply appended to the theme’s style.css:

#welcome { display: none; }
.topContent { text-align: justify; }

There is no logical equivalent to conditional statements!

Monday, May 18th, 2009

For a long time, I have seen chunks of code in languages such as Lua and Python that claim that they can reproduce the C conditional operator just by using two logical operators, “and” and “or”.

The conditional statement in C uses the following format:

v = c ? t : f;

This is equivalent to saying “if the condition c (a boolean expression) is met assign t to v, otherwise assign f to v”. It’s a shorthand way of writing

if (c)
  v = t;
else
  v = f;

Some example outputs:

v = 1 ? "foo" : "bar"; /* v = "foo" */
v = 0 ? "foo" : "bar"; /* v = "bar" */
v = 1 ? 0 : 2;         /* v = 0 */

In the first example, the condition is true (C has no boolean type, anything that isn’t zero equates to true) so “t” is assigned. In the second example, the condition is false (zero) so “f” is assigned. The important thing to note here is that in the third example, the condition is true so “t” is still assigned.

As proof of this final value, the following was taken from a cgwin shell using the gcc compiler

~$ gcc --version
gcc (GCC) 3.4.4 (cygming special, gdc 0.12, using dmd 0.125)
Copyright (C) 2004 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
 
~$ cat cond.c
#include <stdio.h>
 
int main(void)
{
  printf("The result is %d!\n", 1 ? 0 : 2);
 
  return 0;
}
~$ gcc -Wall -o cond cond.c
~$ ./cond
The result is 0!
~$

The following is an implementation in Python that always gives the same behaviour as the conditional operator in C.

def cond_if(c, t, f):
  if c: return t
  else: return f

We can show the same examples again:

>>> cond_if(True, "foo", "bar")
'foo'
>>> cond_if(False, "foo", "bar")
'bar'
>>> cond_if(True, 0, 2)
0

However, many people claim that the same code can be written more concisely and, more importantly, inline by using logical operators to mimic the behaviour. The following is such a function:

def cond_logic(c, t, f):
  return c and t or f

When we try to actually run this, with the same examples again, we see where it all falls down.

>>> cond_logic(True, "foo", "bar")
'foo'
>>> cond_logic(False, "foo", "bar")
'bar'
>>> cond_logic(True, 0, 2)
2

While it works for the first two examples as it should, any pair of values which can be equated to a boolean expression will corrupt the logic used. In this example, simply passing zero as the “t” makes the result of “c and t” false (because false and _ = false), which reduces it to “false or f”, which returns “f” (because false or _ = _). No combination of parenthesis or variables will correct this problem.

Here’s the final example again but this time in Lua:

> function cond_logic(c, t, f)
>>   return c and t or f;
>> end
 
> = cond_logic(1, false, 2);
2

So, the next time someone tells you that your code can be improved in this way or that logical operators can be used to mimic the conditional operator, tell them that they are wrong and give them an example to prove it, then stick to using if statements if the language does not provide a conditional operator.

libvlc media player in C# (part 2)

Friday, May 8th, 2009

I gave some simplified VLC media player code in part 1 to show how easy it was to do and how most wrapper libraries make a mountain out of a mole hill. In that entry, I briefly touched on using some classes to make it easier and safer to implement actual programs with this.

The first thing to do is write a wrapper for the exceptions, so that they are handled nicely in C#. For a program using the library, exceptions should be completely transparent and should be handled in the normal try/catch blocks without having to do anything like initialise them or check them.

Another thing to do is to move all of the initialisation functions into constructors and all of the release functions into destuctors or use the System.IDisposable interface.

Here is the code listing for the 4 classes used (VlcInstance, VlcMedia, VlcMediaPlayer and VlcException). Note that the first 3 of these are very similar and that the main difference is that the media player class has some extra functions for doing things like playing and pausing the content.

class VlcInstance : IDisposable
{
    internal IntPtr Handle;
 
    public VlcInstance(string[] args)
    {
        VlcException ex = new VlcException();
        Handle = LibVlc.libvlc_new(args.Length, args, ref ex.Ex);
        if (ex.IsRaised) throw ex;
    }
 
    public void Dispose()
    {
        LibVlc.libvlc_release(Handle);
    }
}
 
class VlcMedia : IDisposable
{
    internal IntPtr Handle;
 
    public VlcMedia(VlcInstance instance, string url)
    {
        VlcException ex = new VlcException();
        Handle = LibVlc.libvlc_media_new(instance.Handle, url, ref ex.Ex);
        if (ex.IsRaised) throw ex;
    }
 
    public void Dispose()
    {
        LibVlc.libvlc_media_release(Handle);
    }
}
 
class VlcMediaPlayer : IDisposable
{
    internal IntPtr Handle;
    private IntPtr drawable;
    private bool playing, paused;
 
    public VlcMediaPlayer(VlcMedia media)
    {
        VlcException ex = new VlcException();
        Handle = LibVlc.libvlc_media_player_new_from_media(media.Handle, ref ex.Ex);
        if (ex.IsRaised) throw ex;
    }
 
    public void Dispose()
    {
        LibVlc.libvlc_media_player_release(Handle);
    }
 
    public IntPtr Drawable
    {
        get
        {
            return drawable;
        }
        set
        {
            VlcException ex = new VlcException();
            LibVlc.libvlc_media_player_set_drawable(Handle, value, ref ex.Ex);
            if (ex.IsRaised) throw ex;
            drawable = value;
        }
    }
 
    public bool IsPlaying { get { return playing && !paused; } }
 
    public bool IsPaused { get { return playing && paused; } }
 
    public bool IsStopped { get { return !playing; } }
 
    public void Play()
    {
        VlcException ex = new VlcException();
        LibVlc.libvlc_media_player_play(Handle, ref ex.Ex);
        if (ex.IsRaised) throw ex;
 
        playing = true;
        paused = false;
    }
 
    public void Pause()
    {
        VlcException ex = new VlcException();
        LibVlc.libvlc_media_player_pause(Handle, ref ex.Ex);
        if (ex.IsRaised) throw ex;
 
        if (playing)
            paused ^= true;
    }
 
    public void Stop()
    {
        VlcException ex = new VlcException();
        LibVlc.libvlc_media_player_stop(Handle, ref ex.Ex);
        if (ex.IsRaised) throw ex;
 
        playing = false;
        paused = false;
    }
}
 
class VlcException : Exception
{
    internal libvlc_exception_t Ex;
 
    public VlcException() : base()
    {
        Ex = new libvlc_exception_t();
        LibVlc.libvlc_exception_init(ref Ex);
    }
 
    public bool IsRaised { get { return LibVlc.libvlc_exception_raised(ref Ex) != 0; } }
 
    public override string Message { get { return LibVlc.libvlc_exception_get_message(ref Ex); } }
}

Using these classes is even easier than before, can use proper exception handling (removed for brevity) and cleans up better at the end. In this example, I have added an OpenFileDialog, which is where the file is loaded.

using System;
using System.Collections.Generic;
using System.ComponentModel;
using System.Data;
using System.Drawing;
using System.Text;
using System.Windows.Forms;
 
namespace MyLibVLC
{
    public partial class Form1 : Form
    {
        VlcInstance instance;
        VlcMediaPlayer player;
 
        public Form1()
        {
            InitializeComponent();
 
            openFileDialog1.FileName = "";
            openFileDialog1.Filter = "MPEG|*.mpg|AVI|*.avi|All|*.*";
 
            string[] args = new string[] {
                "-I", "dummy", "--ignore-config",
                @"--plugin-path=C:\Program Files (x86)\VideoLAN\VLC\plugins",
                "--vout-filter=deinterlace", "--deinterlace-mode=blend"
            };
 
            instance = new VlcInstance(args);
            player = null;
        }
 
        private void Form1_FormClosed(object sender, FormClosedEventArgs e)
        {
            if(player != null) player.Dispose();
            instance.Dispose();
        }
 
        private void Open_Click(object sender, EventArgs e)
        {
            if (openFileDialog1.ShowDialog() != DialogResult.OK)
                return;
 
            using (VlcMedia media = new VlcMedia(instance, openFileDialog1.FileName))
            {
                if (player != null) player.Dispose();
                player = new VlcMediaPlayer(media);
            }
 
            player.Drawable = panel1.Handle;
        }
 
        private void Play_Click(object sender, EventArgs e)
        {
            player.Play();
        }
 
        private void Pause_Click(object sender, EventArgs e)
        {
            player.Pause();
        }
 
        private void Stop_Click(object sender, EventArgs e)
        {
            player.Stop();
        }
    }
}

Update:

I have just corrected a minor bug (the wrong release function being called on the player handle) and uploaded the full Visual Studio 2005 project. You can download the full project here (or see 1.1.2 version below). It comes with the libvlc.dll and libvlccore.dll for VLC 1.0.1 in the bin\x86\Debug directory so if you have a version other than this, just overwrite those files.

Update for VLC 1.1.2:

You can now download the VLC 1.1.2 compatible version. There were some changes to the way libvlc handles exceptions that needed to be corrected. Other than that, there were a couple of minor function name changes.

Please use these posts as a starting point to use your own code though. These posts are intended to stop people from being reliant on the already existing, large, overcomplicated and quickly outdated libraries. They are not intended to be just another library for people to blindly use without understanding how it works. You can use this to learn how to write your own native interop code on a well designed library then adapt it for your own changes and keep it up to date with whichever version of VLC you want. This also means you never have to use the terrible code on pinvoke.net for other libraries, as you can write your own from the original documentation and it will almost always be better.

libvlc media player in C# (part 1)

Wednesday, May 6th, 2009

There seems to be a massive misconception about using VLC inside an application and many, many large wrapper libraries have been written. These are often harder to use than libvlc itself, buggy or just downright don’t work (at least not in what will be “the latest” version of VLC at the time you want to write anything).

Using the libvlc documentation directly and the libvlc example I wrote a simple wrapper class that performs the basics needed to play, pause and stop media. Because it is libvlc, things like resizing the video, toggling full screen by double clicking the video output or streaming media from a source device or network are handled automatically.

This code was all written and tested with VLC 0.98a but because it is taken from the documentation and example, it should work for all versions 0.9x and later with only minor changes. Because it is so simple, these changes should be easy to make. Most of the time, these changes will just be slight function name changes and no new re-structuring is needed.

The first thing to note is that there is no version of libvlc for Windows x64. All developers should set their CPU type to x86, even if they have a 32bit machine. If you set it to “Any CPU” then 64bit users will not be able to load libvlc.dll and will crash out. If you are compiling from the command line, this should look something like csc /platform:x86 foobar.cs

The second thing to note, which trips up a lot of users, is that you must specify VLC’s plugin directory. This may make distribution a nightmare, as the plugin directory is a large directory full of DLLs. It may be possible to narrow down these DLLs to just the ones your application actually needs but I don’t know if videolan have any advice about or licensing with redistribution of these.

libvlc is made up of several modules. For the sake of simplicity in this example, I will use 1 static class to contain every exported C function and split them up visually by module with #region.

The nicest thing about VLC, as far as interop with C# goes, is that all memory management is handled internally by libvlc and functions are provided for doing anything that you would need to do to their members. This means that using an IntPtr is suitable for almost everything. You just need to make sure that you pass the correct IntPtr into each function but another layer of C# encapsulating this would easily be able to make sure of that, as discussed in part 2. The only structure that you need to define is an exception, which is very simple. You then simply always pass in references to these structs with ref ex.

The code listing for the wrapper class is as follows:

using System;
using System.Runtime.InteropServices;
 
namespace MyLibVLC
{
  // http://www.videolan.org/developers/vlc/doc/doxygen/html/group__libvlc.html
 
  [StructLayout(LayoutKind.Sequential, Pack = 1)]
  struct libvlc_exception_t
  {
    public int b_raised;
    public int i_code;
    [MarshalAs(UnmanagedType.LPStr)]
    public string psz_message;
  }
 
  static class LibVlc
  {
    #region core
    [DllImport("libvlc")]
    public static extern IntPtr libvlc_new(int argc, [MarshalAs(UnmanagedType.LPArray,
      ArraySubType = UnmanagedType.LPStr)] string[] argv, ref libvlc_exception_t ex);
 
    [DllImport("libvlc")]
    public static extern void libvlc_release(IntPtr instance);
    #endregion
 
    #region media
    [DllImport("libvlc")]
    public static extern IntPtr libvlc_media_new(IntPtr p_instance,
      [MarshalAs(UnmanagedType.LPStr)] string psz_mrl, ref libvlc_exception_t p_e);
 
    [DllImport("libvlc")]
    public static extern void libvlc_media_release(IntPtr p_meta_desc);
    #endregion
 
    #region media player
    [DllImport("libvlc")]
    public static extern IntPtr libvlc_media_player_new_from_media(IntPtr media,
      ref libvlc_exception_t ex);
 
    [DllImport("libvlc")]
    public static extern void libvlc_media_player_release(IntPtr player);
 
    [DllImport("libvlc")]
    public static extern void libvlc_media_player_set_drawable(IntPtr player, IntPtr drawable,
      ref libvlc_exception_t p_e);
 
    [DllImport("libvlc")]
    public static extern void libvlc_media_player_play(IntPtr player, ref libvlc_exception_t ex);
 
    [DllImport("libvlc")]
    public static extern void libvlc_media_player_pause(IntPtr player, ref libvlc_exception_t ex);
 
    [DllImport("libvlc")]
    public static extern void libvlc_media_player_stop(IntPtr player, ref libvlc_exception_t ex);
    #endregion
 
    #region exception
    [DllImport("libvlc")]
    public static extern void libvlc_exception_init(ref libvlc_exception_t p_exception);
 
    [DllImport("libvlc")]
    public static extern int libvlc_exception_raised(ref libvlc_exception_t p_exception);
 
    [DllImport("libvlc")]
    public static extern string libvlc_exception_get_message(ref libvlc_exception_t p_exception);
    #endregion
  }
}

For a sample application to use this simple wrapper, I just created a new Windows form and added a play button, stop button and a panel for viewing the video. In this example, the stop button also cleans everything up so you should make sure to press it before closing the form.

At one point during this code, libvlc can optionally be given a HWND to draw to. If you don’t give it one, it pops up a new player. However, people seem to be confused over how simple this is to do in C# and have been making large amounts of interop calls to the Win32 API to get handles. This is not necessary, as System.Windows.Forms.Control.Handle allows you go get the window handle (HWND) to any component that inherits from the Control class. This includes the Form class and the Panel class (and even the Button class) so all you actually need to pass it is this.Handle (for the handle to the form itself) or panel.Handle (for a Panel called panel). If you want it to start fullscreen, add the command line argument “-f” rather than using the Win32 function GetDesktopWindow().

Because I will be using this to display PAL video, which is interlaced at 576i, I have added some deinterlacing options to the command line. These are --vout-filter=deinterlace and --deinterlace-mode=blend.

Without further ado, here is the code listing for the partial windows form class:

using System;
using System.Collections.Generic;
using System.ComponentModel;
using System.Data;
using System.Drawing;
using System.Text;
using System.Windows.Forms;
 
using System.Runtime.InteropServices;
 
namespace MyLibVLC
{
  public partial class Form1 : Form
  {
    IntPtr instance, player;
 
    public Form1()
    {
      InitializeComponent();
    }
 
    private void Play_Click(object sender, EventArgs e)
    {
      libvlc_exception_t ex = new libvlc_exception_t();
      LibVlc.libvlc_exception_init(ref ex);
 
      string[] args = new string[] {
        "-I", "dummy", "--ignore-config",
        @"--plugin-path=C:\Program Files (x86)\VideoLAN\VLC\plugins",
        "--vout-filter=deinterlace", "--deinterlace-mode=blend"
      };
 
      instance = LibVlc.libvlc_new(args.Length, args, ref ex);
      Raise(ref ex);
 
      IntPtr media = LibVlc.libvlc_media_new(instance, @"C:\foobar.mpg", ref ex);
      Raise(ref ex);
 
      player = LibVlc.libvlc_media_player_new_from_media(media, ref ex);
      Raise(ref ex);
 
      LibVlc.libvlc_media_release(media);
 
      // panel1 may be any component including a System.Windows.Forms.Form but
      // this example uses a System.Windows.Forms.Panel
      LibVlc.libvlc_media_player_set_drawable(player, panel1.Handle, ref ex);
      Raise(ref ex);
 
      LibVlc.libvlc_media_player_play(player, ref ex);
      Raise(ref ex);
    }
 
    private void Stop_Click(object sender, EventArgs e)
    {
      libvlc_exception_t ex = new libvlc_exception_t();
      LibVlc.libvlc_exception_init(ref ex);
 
      LibVlc.libvlc_media_player_stop(player, ref ex);
      Raise(ref ex);
 
      LibVlc.libvlc_media_player_release(player);
      LibVlc.libvlc_release(instance);
    }
 
    static void Raise(ref libvlc_exception_t ex)
    {
      if (LibVlc.libvlc_exception_raised(ref ex) != 0)
        MessageBox.Show(LibVlc.libvlc_exception_get_message(ref ex));
    }
  }
}

Note that this section of code is deprecated and the code from part 2 should be used instead.

Adding a pause button is similar to the stop button but without the cleanup.

Here is an example slightly further on down the line but using the same code:
Example of LibVLC

See part 2 for more.

3 colour gradient

Wednesday, March 18th, 2009

Recently I noticed a green-red gradient that I was using wasn’t really what I wanted.
green-red gradient

I wanted it to go through yellow. I have made a Vista sidebar gadget in the past that shows different colours of the horizontal percentage bars for CPU and memory usage which faded from green at 0% to yellow at 50% and then faded from yellow at 50% to red at 100%. I had thought this solved the problem until I tried to use that same formula for a gradient, which turned out to be a triangular gradient.
green-yellow-red triangle gradient

The problem here is that there is only yellow at the very peak of the triangle so it looks pinched. From here it is obvious that a curve is needed. I first looked into Bezier curves, as you can join two of them easily by using the same points on both. However, this seemed a bit complicated. I next used a bell curve, which is actually a Gaussian function. This function is e-x2. This worked well and I used it throughout the development of this gradient but after I was finished I realised that a simple Sine wave from 0 to PI would have sufficed (and produces almost exactly the same result as a Gaussian function). A better function would be a Cosine wave from -PI to PI, as this gives a smooth gradient at either end that repeats perfectly. However, this would need to be normalised so that it takes a percent from 0.0 to 1.0 and outputs a value from 0.0 to 1.0 y = (cos((x*2-1)*pi)+1)/2, which is easy to do in a simple Sine 0 to PI because it is done just by multiplying the input by PI.

The key to this is that when it hits the peak, at 50%, it changes from a green-yellow gradient to a red-yellow gradient. The sine function is not used directly to determine the colour but rather to determine where on a simple colour gradient to choose the colour from, which allows it to be used with any combination of colours. The end result is this:
green-yellow-red sine gradient (sine)
green-yellow-red cosine gradient (cosine)

C# code listing (for ASPX) is as follows:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
using System;
using System.Drawing;
using System.Drawing.Imaging;
 
public partial class PercentBar : System.Web.UI.Page
{
    protected void Page_Load(object sender, EventArgs e)
    {
        using (Bitmap bmp = new Bitmap(100, 20, PixelFormat.Format24bppRgb))
        {
            double w = (double)bmp.Width;
            for (int x = 0; x < bmp.Width; x++)
            {
                Color c = GetTriColour(x / w, Color.Lime, Color.Yellow, Color.Red);
                for (int y = 0; y < bmp.Height; y++)
                    bmp.SetPixel(x, y, c);
            }
 
            using (System.IO.MemoryStream ms = new System.IO.MemoryStream())
            {
                bmp.Save(ms, ImageFormat.Png);
                ms.WriteTo(Response.OutputStream);
            }
            Response.ContentType = "image/png";
        }
    }
 
    public static Color GetTriColour(double percent, Color left, Color centre, Color right)
    {
        if (percent < 0 || percent > 1)
            throw new Exception("Percent must be between 0 and 1");
 
        //double weight = Math.Sin(percent * Math.PI);
        double weight = (Math.Cos((percent * 2 - 1) * Math.PI) + 1) / 2;
 
        return GetColourFromLinearGradient(weight,
           percent < 0.5 ? left : right, centre);
    }
 
    public static Color GetColourFromLinearGradient(double percent, Color start, Color end)
    {
        double a, r, g, b;
 
        if (percent < 0 || percent > 1)
            throw new Exception("Percent must be between 0 and 1");
 
        double npercent = 1.0 - percent;
 
        a = Math.Min(start.A, end.A) + Math.Abs(start.A - end.A) * (start.A > end.A ? npercent : percent);
        r = Math.Min(start.R, end.R) + Math.Abs(start.R - end.R) * (start.R > end.R ? npercent : percent);
        g = Math.Min(start.G, end.G) + Math.Abs(start.G - end.G) * (start.G > end.G ? npercent : percent);
        b = Math.Min(start.B, end.B) + Math.Abs(start.B - end.B) * (start.B > end.B ? npercent : percent);
 
        return Color.FromArgb((int)a, (int)r, (int)g, (int)b);
    }
}

WordPress theme: Kubrick (wide)

Tuesday, February 17th, 2009

The default WordPress 2.7 theme is Kubrick. It’s nice but it is optimised for 800×600 and most people use more than that, so I decided to modify it slightly to optimise for 1024×768 or a similar width. Because of things like window borders and scrollbars, we do not want the width to be exactly 1024 pixels wide so all we do is use the existing widths and add the difference between 1024 and 800 to them. This is a 224 pixel increase in width so if something was 760 it would become 984, if it was 740 it would become 964, etc.

Several files need to be changed for this (make backups beforehand):

The 3 jpeg files are quite straightforward. I just opened them in Microsoft Paint, increased the width by 224 (image -> attributes) making them 984 wide, then dragged the right side of the old image to the right side of the new image and stretched out the middle section to fill the space.

The php file is a bit more complicated. This file reads kubrickheader.jpg, modifies it with your chosen colours and adds the white rounded corners. Again, the only changes here are adding 224 to some of the numbers in this file. We don’t even need to work these out, we can just append “+224″ to the existing numbers where appropriate. First we change the $corners array from
0 => array ( 25, 734 ),
to
0 => array ( 25, 734+224 ),
and do this for every entry in the array.

Slightly lower down in the “Blank out the blue thing” for loop, we change
$x2 = 740;
to
$x2 = 740+224;

and again in the “Draw a new color thing” for loop, we do the same thing, changing
$x2 = 739;
to
$x2 = 739+224;

Lastly, we change the style.css. This is the most complicated part, just because the widths to change are spread out everywhere and missing one will mess the whole page up. This is a bit long to explain, so here’s the diff (modified slightly for reading clarity – don’t try to patch with it). Remember that you want to change the lines with the – in front of them into the lines with the + in front. We can’t just use +224 any more so we actually have to work them out.

As an example, in the first one (#headerimg) you just change 740px to 964px.

@@ -41,9 +41,9 @@

 #headerimg     {
        margin: 7px 9px 0;
        height: 192px;
-       width: 740px;
+       width: 964px;
        }

@@ -236,18 +236,18 @@
 #page {
        background-color: white;
        margin: 20px auto;
        padding: 0;
-       width: 760px;
+       width: 984px;
        border: 1px solid #959596;
        }

 #header {
        background-color: #73a0c5;
        margin: 0 0 0 1px;
        padding: 0;
        height: 200px;
-       width: 758px;
+       width: 982px;
        }

@@ -258,15 +258,15 @@
 .narrowcolumn {
        float: left;
        padding: 0 0 20px 45px;
        margin: 0px 0 0;
-       width: 450px;
+       width: 674px;
        }

 .widecolumn {
        padding: 10px 0 20px 0;
        margin: 5px 0 0 150px;
-       width: 450px;
+       width: 674px;
        }

@@ -311,9 +311,9 @@

 #footer {
        padding: 0;
        margin: 0 auto;
-       width: 760px;
+       width: 984px;
        clear: both;
        }

@@ -570,9 +570,9 @@
 /* Begin Sidebar */
 #sidebar
 {
        padding: 20px 0 10px 0;
-       margin-left: 545px;
+       margin-left: 769px;
        width: 190px;
        }

If you get lost here, you can use my style.css but if you use a different base version to me (I am on the one that comes with WordPress 2.7), my modified style.css may not work for you.

Update 2009-06-13:
With WordPress 2.8′s release, this theme has changed slightly. The right-hand navigation bar is now a different colour from the rest of the page. This is done with the kubrickbg-ltr.jpg image. I have widened mine but I am still using the old images for the rest of the site (header, footer, etc) so it does not fit in perfectly, as you can see from the top and bottom of the page.

Die thumbs.db, die!

Saturday, January 31st, 2009

Someone just told me that they were going to download a program that would clean their windows hard drive of “thumbs.db” files, so I gave them this command line instead:

for /f "tokens=*" %a in ('dir /b /s /aSH thumbs.db') do @(
  echo %a
  del /f /aSH "%a"
)

This very quickly (under 1 minute for my whole C: drive) scans directories recursively for thumbs.db files, removes the “hidden” and “system” attributes and then deletes them, forcing deletion even if the files are read-only.

Of course, you would “cd” to the correct directory first (e.g. cd \ to do the whole drive), if you want to put it in a batch file you need to double up the percent symbols (%a becomes %%a) and you will need appropriate permissions to delete the files it finds (and maybe to change their attributes), so you may need to run this from an elevated command prompt if you are not running it in a directory that you own.

You should also note that the “/aSH” part of the “dir” command (there is no space, as this may also match files and directories called “sh”) assumes that the files are hidden and system files (as they are by default but probably not if you have extracted them from something like a zip file where someone has left them in). If the files are just hidden, just system or neither (just normal files) they will not appear in the list and will not be deleted. An alternative implementation could either run it once with this set and once with just “dir /b /s thumbs.db” or could run it once to remove the S and H attributes (or separately for each of these) and then after they are all “normal” files, run it through again to delete them.

The “echo” line is optional and just shows a verbose output as it deletes things. If you remove it you can also remove the parenthesis and put the whole thing on 1 line if you so wish.

Diffpex

Tuesday, January 13th, 2009

I was messing around with the well known Unix command line utility diff and thought to myself “If we just want to see the differences easily (rather than making a patch file), wouldn’t it be better to get rid of all of these garbage characters and use colours for displaying the output instead?”.

So that’s what I did. Here’s the classic “computer” vs “boathouse” example, letter by letter (you would normally do line by line or word by word, which is far easier to read).

computer-boathouse

Not only is “cbompuathouser” an awesome word but if you cover up all of the green letters, you get “computer” while if you cover up all of the red letters, you get “boathouse”. Black letters are common to both so they are always visible. When this really gets interesting is when you change the red or green to the same colour as the text. In this case, we change the green background to a grey background and end up with a clearly legible word “computer” but now it looks like we have “Tipexed” (liquid paper) out all of the unneeded letters from “boathouse”, just as we would if we used a pen and paper.

no-boathouse

This looks amazing when you do it on some big paragraphs that have a few differences in them.

Generic (lossless) compression format

Wednesday, January 7th, 2009

The problem:

When compressing data, what you really want is the smallest possible output that takes the least possible time to produce. You either want to compress it to store it in as small a place as possible or compress it to transfer it to somewhere else quickly. This is a big problem in computer science (and probably in mathematics) and generally, you have to sacrifice one for the other. Usually, a smaller output takes a cleverer algorithm to compute and so takes longer to produce. Sometimes this is desirable as it can make the output significantly smaller without significantly increasing decompression time, meaning that most of the overhead is only done once and people using the compressed data don’t have to worry about it. For example, BZip2 is great for compression but takes significantly longer than gzip to compress and quite a while longer to decompress.

There are several ways of organising compression formats. A few months ago I took a long hard look at the existing popular compression formats and decided that none of them did exactly what I wanted for both storage and data transfer at the same time. In general, they were too complicated.

The common “zip” file format stores its metadata dictionary at the end of the file, which means that the data cannot start to be processed until the all of the data is available.

Both gzip and BZip2 compression are only suitable for single files, which means that they are usually used in union with tar, an archive format that does not make use of any compression itself. The general use of these is to “tar” many files into one archive and then to compress the archive. This leads to an interesting question: should the files be individually compressed and then archived together or should they be archived together and then compressed as a whole?

The “zip” format compresses each file individually while the “tar.gz” and “tar.bz2″ formats archive and then compress the resulting archive. Both have their obvious advantages and disadvantages and these can all be applied to other areas such as whether individual files should be encrypted and then used in archives or archived and then encrypted together.  When individually compressing files, a file may be extracted by only decompressing that entry in the archive and ignoring the rest. This means that extracting a small file from a very large archive is fast. However, when archiving many similar files and then compressing them, data that is already known from one file may assist in compressing a similar file further, resulting in a smaller output size as less effort needs to be duplicated. When extracting one of these files, the entire archive (at least up to the end of that file) needs to be decompressed.

These formats also tend to add far too much overhead in the headers and footers of the total archive or the individual entries. In fact, gzip and zip both use the DEFLATE algorithm to actually compress the data. The differences between the two are only in the structure of the format and the different headers and footers used. gzip contains a few headers such as file name, which compression algorithm is used (always DEFLATE), file permissions, etc while also using a file size and an extra CRC32 checksum in the footer. The computation of this checksum is a major overhead over the standard DEFLATE format, which already contains a faster (but less accurate) Adler32 checksum.

The solution:

As DEFLATE is a well known, well documented compression algorithm with many existing implementations such as zlib, that is what we shall use for compression. DEFLATE has a few small headers in the order of a few bytes total just to set up the compression and an Adler32 checksum in the footer to check for corruption very quickly (and quite roughly). DEFLATE comes with 3 basic compression rates: none, normal and best. The common value used for this is “normal” but as modern machines are more than capable of performing “best” compression very quickly, we will use “best”.

In my tests I have found that the ability to extract individual files almost always outweighs the extra compression achieved by archiving several files and then compressing them together. Usually, the latter creates files less than 1% smaller than the former but in the archives that make the most difference to size, the average file size is so large that having to decompress other files in the same archive takes significantly longer. Of course, this is not a problem if every single file in the archive is required anyway. DEFLATE contains a rarely used starting dictionary in one of its headers that may allow nearly equal compression in both cases with the former retaining its ability to extract individual files quickly. For these reasons, the files will be compressed individually.

Some DEFLATE headers that we do not intend on using or that will always remain the same could be dropped to shave a few bytes but to make implementation easier, we will leave them in for now. This allows existing libraries such as zlib to be used without modification for the decompression stage.

As mentioned above, any additional steps such as encryption could either be done on the individual files, allowing them to be extracted and decrypted individually, or on the final archive. Either way, they are out of scope here so I will not say anything else about them.

The archive will use a structure similar to a linked list. There will be no header for the archive as a whole and each individual entry will contain a header for the size of that entry rather than a pointer to the next. This means that entries can be skipped quickly if the underlying stream allows seeking or random access. Of course, if there is no way to request an individual entry over the stream, it cannot be skipped to until it is already present. Removing a header for the archive as a whole means that two archives may be merged simply by concatenating them. This can be achieved on Unix/Linux with the “cat” command and on Windows/DOS with the “copy” command. To allow for streaming, there will be no footers at all other than the existing DEFLATE Adler32 footer, which cannot be used until all of that entry is already available anyway.

Many headers and footers such as file permissions are not uniform across platforms (compare Windows ACLs with Unix “chmod” style permissions, for example), are not available for streaming (e.g. file permissions are not needed when sending some compressed data across a network where it will never be saved and where the file’s owner may not even exist), etc and so have been dropped. This format is meant to be as simple as possible, remember.

The result:

The resulting file format is described as follows:

  • 8 bits (1 byte) for a version number to allow for later extensions and backwards compatibility. Version numbers range from 0 to 255 and are only major version changes to the format which would require a new way of parsing the rest of the entry. Entries in the same archive may use different version numbers e.g. if an old archive is merged with a new archive or if new features are only required in some entries.
  • 32 bits (4 bytes) original (inflated) data size to enhance decompression and let the user know how big the entry will be. If this does not benefit decompression at all, there is no problem caused by putting it in an entry’s footer as gzip already does, which would allow it to be set based on how much data was compressed even if the amount of data to be compressed was not known before compression started.
  • 32 bits (4 bytes) compressed (deflated) data size to enhance decompression – this being in a header rather than a footer may cause problems for streaming, as an entry needs to be compressed entirely before it can start to be sent. However, a possible extension such as setting it to “0″ to represent “unknown” or changing a version number or flag (if flags are already added in a later version) could avoid this if compressing the entire entry before streaming is not possible. If this is set to “unknown” then skipping an entry is not possible without decompressing it first to know where it ends.
  • n*16 bits for an optional 0-terminated text descriptor. In the case of files, this may be the Unicode file name. In the case of other data, it may be an identifier, a description or a string of metadata such as CSV values or an XML structure. If it is to be omitted, a 0 length string should be used. This should consist solely of a Unicode null terminator (\u0000)
  • The DEFLATE compressed data (starting with the first DEFLATE header and ending with the last DEFLATE footer)

Example:

01
0D 00 00 00
15 00 00 00
73 00 61 00 6D 00 70 00 6C 00 65 00 69 00 2E 00 74 00 78 00 74 00 00 00
78 DA F3 48 CD C9 C9 D7 51 08 CF 2F CA 49 51 04 00 1F 9E 04 6A

This version 1 entry (13 bytes originally, 21 bytes when compressed) describes a file called “samplei.txt”. If you decompress the last line (21 bytes) with DEFLATE you get the ASCII text “Hello, World!”, which is 13 bytes long. In this example, the compressed data is actually larger than the uncompressed data due to overheads (headers and footers) and made even larger if you include all of the headers added for this entry in the archive (to 54 bytes) but when the original file is a few dozen bytes larger, the compressed file is smaller.

If we remove the optional descriptor (file name) the resulting data is just 32 bytes long.

To compare overheads of the same file in different algorithms with and without archiving:

  • samplei.txt.gz = 46 bytes (normal and best)
  • samplei.txt.bz2 = 58 bytes (normal and -9)
  • samplei.txt.tar.gz = 159 bytes (normal) or 153 bytes (best)
  • samplei.txt.tar.bz2 = 149 bytes (normal and -9)
  • samplei.zip = 133 bytes (normal)

If we were just transferring the data “Hello, World!” across a network, we can transmit it twice simply by joining them together into a single archive, as you would expect from transmitting the same data twice (optional text descriptor removed for brevity):

01 0D 00 00 00 15 00 00 00 00 00
78 DA F3 48 CD C9 C9 D7 51 08 CF 2F CA 49 51 04 00 1F 9E 04 6A
01 0D 00 00 00 15 00 00 00 00 00
78 DA F3 48 CD C9 C9 D7 51 08 CF 2F CA 49 51 04 00 1F 9E 04 6A

This means that data can be transmitted using this compression with no alteration and without needing to know anything about the other packets. If, for example, each packet were limited to a maximum of 23 bytes, we could transmit the above data in the following packets, just as if it were one archive of unknown length that we were sending, rather than 2 separate entries of data:

01 0D 00 00 00 15 00 00 00 00 00 78 DA F3 48 CD C9 C9 D7 51 08 CF 2F
CA 49 51 04 00 1F 9E 04 6A 01 0D 00 00 00 15 00 00 00 00 00 78 DA F3
48 CD C9 C9 D7 51 08 CF 2F CA 49 51 04 00 1F 9E 04 6A .. ..

As soon as enough data is received for the first packet, that packet can be extracted and used. The same can be said for every following packet. We can keep adding to these packets on the sending end indefinitely. This is very useful for both storing data and transmitting it, our initial goals.

This means that resources needed by an application can be stored in this format and retrieved quickly and easily while at  the same time providing a way of compressing sequential packets that may also be used in the application. These are both features commonly needed when developing games, for example.

We can also start to extract the initial entries in an archive before the archive has even completed downloading, allowing for multiple files to be downloaded at once in a single response, which could have useful applications on the web rather than having to send many requests for individual files and provide many individual responses.

Well, that’s my generic compression format useful for both streaming and storage. Any questions, comments or criticisms are welcome.