Briefly Diving Into Model Optimization (TensorFlow)

Che-Jui Huang
5 min readMar 13, 2022
Captured from Inside TensorFlow Series Videos

TLDR:

After completing a few Medical Imaging Computer Vision projects with TensorFlow Keras, I decide to investigate a bit more on optimizing models. Within this article, I will briefly share my thoughts and study notes regarding the TensorFlow Optimization Toolkit. This article elaborates on the importance of LOW LATENCY and REDUCED MODEL FILE SIZE. If you see it helpful please carry on reading!

GitHub Repo:

Why Do We Need To Optimize Our Models?
What’s The Business Value Behind?

I have two examples that I will like to share. First, let’s talk about
1) self-driving cars, and then we will go into 2) medical devices

To have cars driving on their own, we will need AI to recognize roads, pathways, buildings, pedestrians, road signs, etc… (You named it!). With this being said, LOW LATENCY is crucial. We need AI to swiftly recognize the surroundings to avoid accidents. Therefore, having an optimized model that speeds up inference (aka. predictions) is a must!!!

Next, I will like to introduce the importance of MODEL SIZES. Imagine that you have a fitness watch that monitors your health status using an AI model. While it is a great technology to have but what if your app produces lag results? For instance, you are alerted by “Sloth” (Imaginary AI Bot) at 2:40 pm and said:

“Hey, your heart beat went up to 150+ at 2:30pm. Do I need to call a doctor?” by Super Fast Sloth

It is terrible!! Therefore, for medical devices, having a smaller AI model size can also be beneficial. In fact, loading the model faster means LOWER LATENCY. I will say these two features come hand-by-hand.

Photo by Roger Burkhard on Unsplash ( You Need A Faster Sloth)

Introduction and Implementations

Weight Pruning

Starting with optimizing a Keras model, you will need tensorflow_model_optimization python package.

This is an official tool released by Google, so no worries about compatibility. Anyway, so what is weight pruning? To summarize, weight pruning is a technique that suppresses your model weights into zeros. Having sparse values (having a lot of zeros), can have the model compress the files easier and faster. Since with fewer floating-point values, you save a lot of storage as well!

Results:

I followed the tutorial and with some modifications, I built a transfer learning model on classifying cats and dogs in a given image. Both models have an accuracy of 93% in test sets.

Post Training API for TensorFlow Lite Model Optimization

Above are the results of model size optimization, next let’s look into model optimization for Mobile & IoT devices. This is equivalent to convert TensorFlow models into TensorFlow Lite models.

Good thing to know is that model converter is an built-in tool. You will need to call either
1) tf.lite.TFLiteConverter.from_saved_model
2) tf.lite.TFLiteConverter.from_keras_model

For more details on how to do it, you can visit my GitHub page where I have a well annotated notes with in the Jupyter Notebooks. Nonetheless, the general steps include:

  1. Define A Converter
  2. Select An Optimization Strategy
  3. Define Data Types (Float16 or INT8)
  4. Provide A Representational Dataset
  5. Initiate convert.convert()
  6. Output the model

But how do this works? Essentially, the conversion is suppressing the model size by determining a min-max range of your weights. Having floating-point ranges of values to represent your weights might not be necessary. Therefore, by knowing the min-max of your weights, the model can limit value representation within a calculated min-max which eventually reduces model weights. (This is known as Quantization, recall the image above)

Captured from Inside TensorFlow Series Videos

Results:

Same as section 1, I built a transfer learning model using the cats and dogs dataset for demonstration. In conclusion, we see a gigantic model size reduction with TensorFlow Lite model and a pretty awesome accuracy at 98.5%.

Conclusion:

There are more other techniques that the TensorFlow Optimization Toolkit provides, please take some time to explore on your own! Additionally, just a side note but I think is quite interesting.

While I was researching and implementing optimization techniques, I realized that “Running a TFLite Model using PC or Laptop CPUs can result in very very very poor inference time performance!”

After a couple of Stack Overflow searches, it seems to be that TensorFlow Lite models are designed to run on Mobile CPUs and other specific hardware. Therefore, if one tries to make an inference on PC CPUs, you get terrible results! Finally, if you want to build models and you are afraid of model compatibility, you can use the TensorFlow Lite model maker to build applications that are already optimized for Mobile devices.

--

--