Part 6: Fourier analysis and JPEG compression

In the first five notebooks of this workshop session, we've reviewed the fundamentals tools of image processing: numerical arrays, convolutions, point clouds and gradient descent.

Now, for those of you who went really fast through these first steps, here's a little bonus: an introduction to the JPEG (1992) and JPEG2000 compression standards, which are ubiquitous in cameras and cinemas. These two formats rely on the Fourier and Wavelet transforms, two mathematical tools that can be thought of as precursors to Convolutional Neural Networks.

References, going further. These two bonus notebooks are based on the Numerical Tour on approximation with orthogonal bases, written by Gabriel Peyré. For additional references, you may have a look at:

First, we re-import our libraries:

In [1]:
%matplotlib inline
import center_images             # Center our images
import matplotlib.pyplot as plt  # Display library
import numpy as np               # Numerical computations
from imageio import imread       # Load .png and .jpg images

Re-define custom display routines:

In [2]:
def display(im):  # Define a new Python routine
    """
    Displays an image using the methods of the 'matplotlib' library.
    """
    plt.figure(figsize=(8,8))                     # Square blackboard
    plt.imshow( im, cmap="gray", vmin=0, vmax=1)  # Display 'im' using a gray colormap,
                                                  #         from 0 (black) to 1 (white)
def display_2(im_1, title_1, im_2, title_2):
    """
    Displays two images side by side; typically, an image and its Fourier transform.
    """
    plt.figure(figsize=(12,6))                    # Rectangular blackboard
    plt.subplot(1,2,1) ; plt.title(title_1)       # 1x2 waffle plot, 1st cell
    plt.imshow(im_1, cmap="gray")                 # Auto-equalization
    plt.subplot(1,2,2) ; plt.title(title_2)       # 1x2 waffle plot, 2nd cell
    plt.imshow(im_2, cmap="gray", vmin=-7, vmax=15)       

And load, once again, a simple axial slice:

In [3]:
I = imread("data/aortes/1.jpg", as_gray=True)  # Import as a grayscale array
I = I / 255                   # Normalize intensities in the [0,1] range
I = I[::2,::2]                # Subsample the image, for convenience
display(I)                    # Let's display our slice:

Fourier transform

Compression algorithms rely on transforms f, which turn an image I into a new array f(I) that is supposed to be easier to handle. The most fundamental of these "helpers" is the Fourier Transform (click, it's great!), which decomposes a signal or an image as a superposition of harmonics (just like a piano note, really), with weights encoded in the array


`fI = fft2(I)`.
In [4]:
# The numpy packages provides a Fast Fourier Transform in 2D,
# and its inverse (the iFFT). FFTshift and iFFTshift
# are just there to get nicer, centered plots:
from numpy.fft import fft2, ifft2, fftshift, ifftshift

fI = fft2(I)  # Compute the Fourier transform of our slice

# Display the logarithm of the amplitutde of Fourier coefficients.
# The "fftshift" routine allows us to put the zero frequency in
# the middle of the spectrum, thus centering the right plot as expected.
display_2( I, "Image", 
          fftshift( np.log(1e-7 + abs(fI)) ), "Fourier Transform" )

To get an intuition of this new object, the simplest thing to do is to take our Fourier Transform fI, edit it, and see what the image ifft2( edit( fI )) looks like:

In [5]:
def Fourier_bandpass(fI, fmin, fmax) :
    """
    Truncates a Fourier Transform fI, before reconstructing a bandpassed image.
    """
    Y, X = np.mgrid[:fI.shape[0], :fI.shape[1]]  # Horizontal and vertical gradients
    radius = (X - fI.shape[0]/2) ** 2 \
           + (Y - fI.shape[1]/2) ** 2        # Squared distance to the middle point
    radius = ifftshift( np.sqrt(radius) )    # Reshape to be fft-compatible

    
    fI_band = fI.copy()               # Create a copy of the Fourier transform
    fI_band[ radius <=fmin ] = 0      # Remove all the low frequencies
    fI_band[ radius > fmax ] = 0      # Remove all the high frequencies
    I_band = np.real(ifft2(fI_band))  # Invert the new transform...

    display_2( I_band, "Image",          # And display
               fftshift( np.log(1e-7 + abs(fI_band)) ), "Fourier Transform" )

As evidenced below, Fourier coefficients that are close to the center encode the low frequencies of the signal:

In [6]:
Fourier_bandpass(fI, 0, 10)

As we add more coefficients, we see that details start to appear:

In [7]:
Fourier_bandpass(fI, 0, 50)

We can also keep specific frequencies and compute "detail-only" images at a cheap numerical cost. Convolutions, that we presented in the 2nd notebook, can all be implemented this way:

In [8]:
Fourier_bandpass(fI, 50, 100)

Our image can then be simply expressed as a sum of low, medium and high frequencies:

In [9]:
Fourier_bandpass(fI, 0, 100)  # = Sum of the last two images

Thresholding. Now, what if we only keep the large coefficients of our Fourier Transform, before inverting it?

In [10]:
def Fourier_threshold(fI, threshold) :
    fI_thresh = fI.copy()                  # Create a copy of the Fourier transform
    fI_thresh[ abs(fI) < threshold ] = 0   # Remove all the small coefficients
    I_thresh = np.real(ifft2(fI_thresh))   # Invert the new transform...

    display_2( I_thresh, "Image",          # And display
               fftshift( np.log(1e-7 + abs(fI_thresh)) ), "Fourier Transform" )
    return I_thresh

This is not very convincing:

In [11]:
Fourier_threshold(fI, 100) ;

But in some sense, this method is a simple way of compressing our image. The code below allows us to encode an image made out of 256x256 = 65,536 pixels using only 2,000 coefficients.

For a 1:33 compression ratio, that's not too bad!

In [12]:
Abs_values = np.sort( abs(fI.ravel()) )  # Sort the coefficients' magnitudes
print(Abs_values)                        # in ascending order...

cutoff = Abs_values[-2000]                  # And select the 2000th largest value
Compressed = Fourier_threshold(fI, cutoff)  # as a cutoff:
display(Compressed)
[1.25364680e-02 1.25364680e-02 6.34758246e-02 ... 5.29219473e+03
 5.29219473e+03 7.91748145e+03]

Discrete cosine transform

But, obviously, we can go further. As explained with tons of illustrations in the Digital Signal Processing's guide, the Discrete Cosine Transform is slightly more adapted to the compression of images than the baseline Fourier Transform.

In [13]:
# Scipy's fftpack module implements the Discrete Cosine Transform,
# an extension of the Fourier Transform that is better suited to
# the compression of real-valued signals - Fourier coefficients are complex numbers.
from scipy import fftpack

def dct2(f):
    """
    Discrete Cosine Transform in 2D.
    """
    return np.transpose(fftpack.dct(
           np.transpose(fftpack.dct(f, norm = "ortho")), norm = "ortho"))

def idct2(f):
    """
    Inverse Discrete Cosine Transform in 2D.
    """
    return np.transpose(fftpack.idct(
           np.transpose(fftpack.idct(f, norm = "ortho")), norm = "ortho"))

The low frequencies are now gathered in the upper-left corner, while the other coefficients encode "stripes" of varying spatial frequencies:

In [14]:
dI = dct2(I)  # Compute the Cosine Transform of our image

display_2( I, "Image", 
           np.log(1e-7 + abs(dI)), "Cosine Transform (log grayscale)" )

Crucially, on typical image patches, the coefficients in the upper left corner are much larger than the other ones:

In [15]:
Patch = I[150:182,100:132]  # Truncate our image...
dPatch = dct2(Patch)        # And apply a discrete cosine transform

display_2( Patch, "Image", 
           (1e-7 + abs(dPatch)), "Cosine Transform (normal grayscale)" )

One can show that if there is no "edge" or "discontinuity" in a given patch, then we can throw away the bottom right coefficients without losing much information. In practice, the JPEG format thus relies on small Discrete Cosine Transforms, computed on local square patches of size 8x8 stacked upon each other:

In [16]:
def local_dct(I, w = 8) :  # w = patch size
    lI = np.zeros(I.shape)
    # Loop over the small (w,w) patches ------------------------------
    for i in range(1,I.shape[0]//w+1):
        for j in range(1,I.shape[1]//w+1):
            lI[(i-1)*w: i*w, (j-1)*w: j*w] = dct2(I[(i-1)*w: i*w, (j-1)*w: j*w])
    return lI


def ilocal_dct(lI, w = 8) :  # w = patch size
    I = np.zeros(lI.shape)
    # Loop over the small (w,w) patches ------------------------------
    for i in range(1,I.shape[0]//w+1):
        for j in range(1,I.shape[1]//w+1):
            I[(i-1)*w: i*w, (j-1)*w: j*w] = idct2(lI[(i-1)*w: i*w, (j-1)*w: j*w])
    return I

With a logarithmic scale put on the intensities, here is what our local DCT looks like:

In [17]:
lI = local_dct(I, w = 8)
display_2( I, "Image", 
   np.log(1e-7 + abs(lI)), "Local Cosine Transform (log grayscale)" )

Unfortunately, thresholding induces blocking artifacts, as patches are de-compressed independently from each other:

In [18]:
def lDCT_threshold(lI, threshold) :
    lI_thresh = lI.copy()                  # Create a copy of the local DCT transform
    lI_thresh[ abs(lI) < threshold ] = 0   # Remove all the small coefficients
    I_thresh = ilocal_dct(lI_thresh)       # Invert the new transform...

    display_2( I_thresh, "Image",          # And display
               np.log(1e-7 + abs(lI_thresh) ), "Local Cosine Transform" )
    return I_thresh

lDCT_threshold(lI, .5) ;

But crucially, this method allows us to compress the image adaptatively: the largest coefficients tend to be gathered around textured, edgy regions of the input image. When keeping the "largest 2,000 coefficients" to compress our raw bitmap I by a 1:33 factor, we smooth out large uniform regions... but at least, in contrast to what happened with the "global" Fourier Transform, the boundaries of our organs now stay reasonably sharp.

In [19]:
Abs_values = np.sort( abs(lI.ravel()) )  # Sort the coefficients' magnitudes
print(Abs_values)                        # in ascending order...

cutoff = Abs_values[-2000]               # And select the 2000th largest value
JPEG = lDCT_threshold(lI, cutoff)        # as a cutoff
display(JPEG)
[0.         0.         0.         ... 4.86519623 5.15686226 6.16029406]

To remember: Image processing algorithms rely on transforms, or feature maps, that turn raw bitmaps of pixel-wise values into higher-level descriptors. In this notebook, low and high frequencies, computed globaly or locally.

Using mathematics from the XIXth century (the Fourier/Cosine Transform), the JPEG format already achieves non-trivial performances on compression tasks. Robust, well-known and relatively easy to implement, it is has become the ubiquitous standard for digital photography.

But we can go further: see you in Part 7!