ML Snippet: Comparison of Accuracy Between Learning of Image Data and Fourier Transformed Data

Screwtape · 2020-07-23T16:16:56+00:00

Neural networks and similar have been variously described as "perceptrons", as facsimiles of biological neurons, as imitating the visual cortex in context of computer vision, and as universal function approximators, particularly in reinforcement learning. Each of these views can be useful, but it's often times best to not project anything fancier on them than is there and view them to be as in the last case, as function approximators that are learning something like a convex functional landscape, maybe exploring a topological manifold. From that perspective, it does not matter so much how data is represented, just as one can define a homomorphism from one topology to another, without compromising any of the expressivity. It's just an alternative form. There are many operations one can perform to change the representation of data or a space, such as changing basis. A particularly useful way of representing data -- used in signal processing, image compression and too many cases than one can list -- is transformation of a space from its spatial to something like a phase representation, using waves like sine and cosine, through the Fourier transformation. Essentially, cosine and sine functions of different frequencies form a basis, which works great, because of their orthogonal relationship to each other. That means, just as in the case of choosing an orthogonal basis in a vector space, each component used to represent an element in the new space can be used to derive how much each basis component contributed to its representation, in this case waves of given frequencies. Those are the roles played by the Fourier coefficients one obtains from its transformation.

I took this basic neural network architecture from this article that trains recognition of MNIST digits:

https://elitedatascience.com/keras-tutorial-deep-learning-in-python

This is about as canonical of an example as it gets in ML. But that is boring and I don't really like most of the code, so I tossed most of it away, and also injected my preference for such syntactical conveniences as being able to simply pass a list to a Sequential model class in Keras, instead of calling .add() for each layer. I also don't like it when people don't derive their data format from the data itself, but instead hardcode various things. All of that I ripped out and cleaned (clean in accordance with my tastes). I'll show that code below.

First, we need to apply a Fourier transformation to the data, though. How exactly, we should not need to care all that much. But we will keep it two dimensional to keep the convolutional layers largely unmodified.

from numpy.fft import rfft
def fft_data(x):
    data = []
    for xi in x:
        f = rfft(xi)
        data.append([f.real, f.imag])
    data = array(data)
    shape = data[0].shape
    return data.reshape(x.shape[0], shape[1], shape[2]*2, 1)

The imaginary components of the Fourier coefficients are tossed away and transformed into reals, because neural networks expect that and don't know what to do with imaginary numbers, AFAIK.

Now that we no longer really deal with pixels that have a certain relationship to each other by nature of their relative arrangement to each other, we can also toss away the first convolutional layer, in favor of a simple fully connected layer. The different Fourier coefficients, by nature of their ordering, still have some relationship to each other, such as the frequencies of the waves they represent being arranged sequentially falling or rising in magnitude, so the convolutional layer that's still in there might still be useful for learning, as opposed to having all coefficients arranged in a random order. Getting rid of the convolutional layers altogether, even after transformation of the data into the frequency spectrum, turns out to severely hamper learning. That handicap remains even after adding deeper fully connected layers in to compensate for their absence.

model = Sequential([
    Dense(128, activation='relu', input_shape=x_train[0].shape),
    Conv2D(64, (3, 3), activation='relu'),
    MaxPooling2D(pool_size=(2, 2)),
    Dropout(0.25),
    Flatten(),
    Dense(128, activation='relu'),
    Dropout(0.5),
    Dense(y_train.shape[1], activation='softmax')])

Nothing much else should change and now that we also got rid of those silly hardcoded constants, the neural network also becomes pretty flexible about the format of the data we put into it, so whether the data we put in represents pixels or Fourier coefficients makes little difference to it.

So is the neural network able to recognize handwritten digits represented as Fourier coefficients just as well? The code I ripped from the website gets about >85% accuracy by the end of the first epoch. Putting in Fourier coefficients instead, we get about the same accuracy by the second epoch. Because I was lazy, I did not let it run any further. The point was to show that it can do well enough, and it is satisfying to watch how well it can still learn after. Details about whether this hampers top accuracy or how much it hampers learning speed would take more effort. However, apart from that, inserting the Fourier transformation in the process was almost completely transparent to the neural network. It also did not seem to make a difference anymore if the data were normalized or not, or what channel order the images using, so all that circumlocute code that bloated the example so much could be tossed as well.

Here is the complete code. It's pretty much the original code (plus the elegance that has been bestowed upon it through my superior tastes in aesthetics and coding skills):

from keras.optimizers import Adadelta
from keras.losses import categorical_crossentropy
from keras.utils import to_categorical
from keras.datasets import mnist
from keras.models import Sequential
from keras.layers import Dense, Dropout, Flatten, Conv2D, MaxPooling2D
from numpy.fft import rfft
from numpy import array

def fft_data(x):
    data = []
    for xi in x:
        f = rfft(xi)
        data.append([f.real, f.imag])
    data = array(data)
    shape = data[0].shape
    return data.reshape(x.shape[0], shape[1], shape[2]*2, 1)

(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train, x_test = fft_data(x_train), fft_data(x_test)
y_train, y_test = to_categorical(y_train), to_categorical(y_test)

model = Sequential([
    Dense(128, activation='relu', input_shape=x_train[0].shape),
    Conv2D(64, (3, 3), activation='relu'),
    MaxPooling2D(pool_size=(2, 2)),
    Dropout(0.25),
    Flatten(),
    Dense(128, activation='relu'),
    Dropout(0.5),
    Dense(y_train.shape[1], activation='softmax')])
model.compile(loss=categorical_crossentropy, optimizer=Adadelta(), metrics=['accuracy'])
model.fit(x_train, y_train, batch_size=128, epochs=5, validation_data=(x_test, y_test))

A snapshot of some of the TensorFlow output:

Train on 60000 samples, validate on 10000 samples
Epoch 1/5
60000/60000 [==============================] - 332s 6ms/step - loss: 2.5453 - accuracy: 0.6988 - val_loss: 0.2552 - val_accuracy: 0.9263
Epoch 2/5
60000/60000 [==============================] - 294s 5ms/step - loss: 0.5154 - accuracy: 0.8641 - val_loss: 0.1962 - val_accuracy: 0.9495
Epoch 3/5
60000/60000 [==============================] - 281s 5ms/step - loss: 0.3650 - accuracy: 0.9020 - val_loss: 0.1326 - val_accuracy: 0.9636
Epoch 4/5
60000/60000 [==============================] - 277s 5ms/step - loss: 0.2738 - accuracy: 0.9225 - val_loss: 0.1266 - val_accuracy: 0.9648
Epoch 5/5
60000/60000 [==============================] - 279s 5ms/step - loss: 0.2220 - accuracy: 0.9371 - val_loss: 0.0856 - val_accuracy: 0.9717

UPDATE: I let it run for a few epochs after all, and, as you can see, despite only using Fourier coefficients, it attained an accuracy well into the 90% range.

Play with the code and maybe plot out the learning curve of the Fourier-transformed network. One could even add a new layer to Keras, through their custom layer API, that Fourier transforms its inputs automatically. If that makes any sense or just slows down training or has no conceivable use at all, I don't know, but it might still be fun. You obviously won't get a paper, or even a blog entry, out of something silly like this, so I might as well paste it here. Maybe you can impress or rather confuse some ML newcomers with it. I really don't know.

But it does show, maybe, that with an appropriate architecture most neural networks don't really care in what way you hand it the data. That's also why the original tutorial could just normalize the data points without regard for its visual impact. In fact, it is probably best not to constrain one's view and imagination by clinging too much to the metaphor of a visual context, a neural network "seeing" things, or even interpreting your data anything like an image in the first place. We should let go of such narrow minded beliefs.

“When He [God] talks of their losing their selves, He means only abandoning the clamour of self-will; once they have done that, He really gives them back all their personality, and boasts (I am afraid, sincerely) that when they are wholly His they will be more themselves than ever.”

― C.S. Lewis, The Screwtape Letters

MachineLearning

MODERATORS