Emotion Recognition using Keras

Hrithik Katoch
9 min readJul 20, 2020

by Bhuvnesh Rana, Hrithik Katoch and Jagrit Singh

The interaction between human beings and computers will be more natural if computers are able to perceive and respond to human non-verbal communication such as emotions. In recent years deep learning has progressed much in the field of image classification. The project titled “Emotion Recognition using Keras” has been developed to use Convolutional Neural Networks (CNNs) to detect the emotion state of the human. A data set of different facial expression is used which contains different emotes for machine learning. These different emotes used for training are “angry, disgust, fear, happy, neutral, sad and surprise”. The labelled facial images from facial expression dataset are sent to CNN and CNN is trained by these images. Then, proposed CNN model makes a determination which facial expression is performed. Recognition can be performed on a locally stored video file or real-time through a live webcam feed using OpenCV. The web application interface is built using Flask.

Dataset

The dataset consists of 48x48 pixel grayscale images of faces. The faces have been automatically registered so that the face is more or less centered and occupies about the same amount of space in each image. Each face is based on the emotion shown in the facial expression in one of seven categories (Angry, Disgust, Fear, Happy, Sad, Surprise, Neutral).

The training set consists of 28,708 images and the test set consists of 7,178 images. Compressed version of the dataset takes 92 MB space whereas uncompressed version takes 140 MB space.

This dataset was prepared by Pierre-Luc Carrier and Aaron Courville for the Facial Expression Recognition Challenge 2013 (kaggle).

Methodology

The CNN model is designed and trained using Keras. We use OpenCV for face detection by using its face detection classifier to draw bounding boxes around the automatically detected faces. After developing the facial model, the network is trained and saved. Then the trained model is deployed through a web interface built using Flask. Once the Emotion Recognition model is trained, we run the main python script which loads the trained model and saved weights by which finally the model is applied to a locally saved video file or to a real-time video stream through a webcam.

Importing the libraries

The following libraries are imported: -

import numpy as np
import matplotlib.pyplot as plt
import utils
import os
from livelossplot import PlotLossesKerasTF
import tensorflow as tf

Further some modules from the Tensorflow library are also imported.

Creating training and validation batches

Keras is used to automatically feed data from the training and test folder to create mini-batches for training and validation(test). Some hyper-parameter settings are defined for the data loader. Image size is taken 48x48 and the batch size is 64. The batch size is a hyperparameter of gradient descent that controls the number of training samples to work through before the model’s internal parameters are updated. Data generators for both training and validation set is created using the ImageDataGenerator function of Keras while randomly flipping images on their horizontal axis.

img_size = 48
batch_size = 64
datagen_train = ImageDataGenerator(horizontal_flip=True)
train_generator = datagen_train.flow_from_directory("train/",
target_size(img_size,img_size),
color_mode="grayscale",
batch_size=batch_size,
class_mode='categorical',
shuffle=True)
datagen_validation = ImageDataGenerator(horizontal_flip=True)
validation_generator=datagen_validation.flow_from_directory("test/",
target_size=(img_size,img_size),
color_mode="grayscale",
batch_size=batch_size,
class_mode='categorical',
shuffle=False)

Creating the CNN Model

A sequential CNN model is used in this project. The input first passes through 4 convolution blocks. The number of filters is gradually increased which is the general workflow of various convolution architectures. In each block Convolution, Batch Normalization, RELU (non-linearity), Max Pooling and Dropout regularization is applied on the data. At each convolution bock the volume is reduced by a factor of 2 while the number of channels nearly doubles. The output is flattened after the fourth convolution block and then passed on to the two fully connected layers. Finally, the dense layer with Softmax Activation is used to predict the output label which corresponds to on of the seven emotions. Adam optimizer is used with a learning rate of 0.0005 which speeds up the training to about 9 minutes per epoch. model.summary() function is used to output all the parameters which the Model will have to learn (around 3 million in this case).

CNN

Training and evaluating the Model

First, we select the number of epochs to be 15. The number of epochs is a hyperparameter of gradient descent that controls the number of complete passes through the training dataset. Steps per epoch are calculated by doing a floor division of the number of images in the training generator with the batch size of the training generator. This is repeated for the validation set. Then three callbacks are added to the training. A callback is an object that can perform actions at various stages of training (e.g. at the start or end of an epoch, before or after a single batch, etc). First callback is ReduceLROnPlateau which reduces the learning rate when there is no improvement in the validation loss after two epochs. ModelCheckpoint callback is used to save the model weights with higher validation accuracy. Model weights are saved to HDF5 format. This is a grid format that is ideal for storing multi-dimensional arrays of numbers. PlotLossesKerasTF is used to observe the training loss per epoch and the accuracy per epoch plots in real time. Each epoch takes around 9–10 minutes. The first epoch takes the longest time as resource allocation has to be done for the GPU, various libraries have to be loaded and files for optimization also have to be loaded. The total time taken is around 2.5 hours.

epochs = 15
steps_per_epoch = train_generator.n//train_generator.batch_size
validation_steps=validation_generator.n/
/validation_generator.batch_size
reduce_lr = ReduceLROnPlateau(monitor='val_loss', factor=0.1,
patience=2, min_lr=0.00001,
mode='auto')
checkpoint = ModelCheckpoint("model_weights.h5",
monitor='val_accuracy',
save_weights_only=True, mode='max',
verbose=1)
callbacks = [PlotLossesKerasTF(), checkpoint, reduce_lr]
history = model.fit(
x=train_generator,
steps_per_epoch=steps_per_epoch,
epochs=epochs,
validation_data = validation_generator,
validation_steps = validation_steps,
callbacks=callbacks
)

Saving the Model architecture as JSON

JSON is a simple file format for describing data hierarchically. The model’s architecture (configuration) specifies what layers the model contains, and how these layers are connected. The model architecture is stored using to_json() which uses a JSON. The model is then converted to JSON format and written to model.json in the local directory.

model_json = model.to_json()
with open("model.json", "w") as json_file:
json_file.write(model_json)

Class to output Model predictions

A FacialExpressionModel class is created to load the model from the JSON file, load the trained weights into the model, and predict facial expressions. It returns the emotion with the highest predicted probability.

Flask App for predictions

A flask app is created to serve the model’s prediction images directly to a web interface. A basic template in HTML is designed for the layout of the Flask app. The camera class is simply getting the image stream from the webcam, detecting the faces with OpenCV, adding bounding boxes around the face and converting the images to grayscale (from RGB) while rescaling them to 48x48.

Use the model

The main.py script is run to create the Flask app and serve the Model’s predictions to a web interface. The camera class sends the image stream to the pre-trained CNN model and then gets back the predictions from the model and adds labels to the video frames then finally returning the image to the web interface. The model can be applied to saved videos or real-time through a webcam.

Run

Outcome

Accuracy per epoch
Accuracy per epoch plot
Loss per epoch
Loss per epoch plot
Disgust (video)
Disgust emotion detected by the model on a local video.
Sad (video)
Sad emotion detected by the model on a local video.
Surprise (video)
Surprise emotion detected by the model on a local video.
Happy (video)
Happy emotion detected by the model on a local video.

Threats posed by Facial Recognition

Technical inaccuracies

There is a worrying cause of concern as many government-established facial recognition surveillance systems have reported high error rates. Facial recognition software has also proven to be biased against people of colour. There is a lot of concern against this issue as this may result in the exploitation of minorities when the technology becomes more mainstream.

Lack of user consent

The primary concern raised by citizens regarding facial recognition is the lack of user consent involved in the implementation process. CCTV surveillance systems are already being employed by many governments around the world. User consent is usually not sought in public places collecting citizens’ facial data. This enables automated live surveillance of people. Governments can track each and every move of citizens compromising their privacy. If used carelessly, every citizen can turn into a walking ID card, which leads to privacy, ethics, and security concerns.

Identity fraud

If facial data gets compromised, it poses huge threats to governments as well as ordinary citizens. If the security measures employed with facial recognition technology are not stringent enough, hackers can easily spoof other peoples’ identities to carry out illegal activities. This may result in huge financial losses if the data required for financial transactions gets stolen or duplicated.

Unclear legal or regulatory framework

There is a lack of detailed and specific information regarding the use of facial recognition technology among common citizens. Most countries have no specific legislation or rules that regulate the use of facial recognition technology. This legal loophole opens the door to abuse of the technology. Governments or business organizations can use the facial recognition data without the knowledge or consent of the people and use them in unapproved ways.

Unethical use

One of the significant dangers of facial recognition is the unethical use of technology. Gathering facial data without consent is one thing. However, collecting information without the user even being aware raises a huge debate regarding the unethical use of the technology. Hidden cameras are being employed at various places without the user being aware. Such data can be exploited and can be used unethically, compromising the data of unaware citizens. This not only violates an individual’s right to privacy but also infringes his right to information.

Data theft

Facial recognition software depends on and generates a large amount of data. Storage of data becomes a major concern with this technology. However, the prevention of data theft is a bigger concern regarding technology. Database hacking can compromise the data of thousands, if not millions of people. There have been numerous instances of data theft from publicly accessible databases. Prevention of data theft should be one of the priorities while implementing facial recognition technologies. Once the user data is compromised, it is compromised forever. This poses a significant threat as the data can be misused for a long period of time if the issue is not resolved.

Conclusion

In this project, we have constructed a CNN model to recognize facial expressions of human beings. The Model produces an accuracy rate of about 64%. This can be used to detect emotions in a locally stored video file or real-time through the webcam feed.

Accuracy and loss

For further enhancements in future, both facial recognition and facial attribute analysis including age and gender can be added to the project to make it more useful.

Complete code associated with this project is available in the following GitHub repository-

--

--