Image Data

What is an image? As with everything in signal processing, an image in real life is continous. As light reflects of surfaces into the eye, there's an infinite number of ways it could reflect off a surface and bring light into the eye. For a computer an image has to be split into discrete parts, called pixels. As images are 2 dimension, we arrange this in a grid of pixels. Computationally we could store this grid as a 2 dimensional array, i.e.:

1 1 0
1 0 1 → [ [ 1, 1, 0 ], [ 1, 0, 1 ], [ 0, 0, 1 ] ]
0 0 1
Converting an image into a 2D array

To a computer, 2 dimensional arrays can be represented as a 1 dimensional array without much extra cost, so long as the width and the height are known:

[ [ 1, 1, 0 ], [ 1, 0, 1 ], [ 0, 0, 1 ] ] ≡ [ 1, 1, 0, 1, 0, 1, 0, 0, 1 ]
Where:
  w = 3
  h = 3

To a computer, images are simply a stream of data, in fact several APIs which deal with images (I'm looking at you JavaScript) return you a single dimension array of values, representing the image data.

Humans have a hard time thinking of images like this, even computer scientists. More naturally images can be thought of a 2D array of colour values (when you look at this in detail, it becomes a 3D array, but we'll get there). It's pretty easy to convert these in code.

Let's visualise this: figure ? contains a zoomed section of figure ?.

10× zoom section of figure ?

For each pixel, there are several values specifying the colour at that location, in figure 2, we use the classic RGB colour space. We'll discuss colour space in the next section. But simply put, the image can be split into channels representing a different aspect of colour. RGB is one of the easiest to understand colour spaces It contains 3 channels - Red, Green and Blue. Each pixel contains a number between 0 and 255 representing how intense the colour is at that position. 0 indicates no presence of that colour at all. 255 indicates the full presence of the colour. What about whites and blacks in the colour space? Black can simply be thought of as the abensence of colour, when R, G and B are all 0. Similarly white is the presences of all colour at the same time, when R, G and B are all 255.

In some image formats, the transparency of a position is also stored in a fourth channel, known as the Alpha channel. Again this is a value between 0 and 255, where 0 indicates fully transparent and 255 indicates fully opaque.

If you're new to programming, you might be wondering why we chose values between 0 and 255. This comes down to bits and bytes. Computers handle everything in binary (base 2). It works out with 8 digits, we can store 256 different values, from 0 to 255! 8 bits is a common length, and is known as a byte.

Image formats and compression

Storing images is a technical challenge. Even if we consider just the RGB channels, each pixel of the image requires 3 bytes. For a 100×100 pixel image, we need 3000 bytes, or 3 kilobytes. For a 1000×1000 pixel image, we would need 300 kilobytes. These days we're looking at phone camera which can take 12 megapixels, 12,000,000 pixels. Each of which needs 3 bytes of storage. It quickly adds up!

To save space we can compress images using several techniques. We split these into two categories: lossy compression, where the image can lose quality to gain lower file sizes.

At the core of image storage, you take an image and store it as a series of bytes. Visually, this looks like this:

Image compression