Briefly Diving Into Model Optimization (TensorFlow)

5 min readMar 13, 2022

Captured from Inside TensorFlow Series Videos

TLDR:

After completing a few Medical Imaging Computer Vision projects with TensorFlow Keras, I decide to investigate a bit more on optimizing models. Within this article, I will briefly share my thoughts and study notes regarding the TensorFlow Optimization Toolkit. This article elaborates on the importance of LOW LATENCY and REDUCED MODEL FILE SIZE. If you see it helpful please carry on reading!

GitHub Repo:

GitHub - CJsGit-tech/TensorFlow_KerasModel_Optimization

Briefly introduce Keras models optimization toolkit and some demonstrations that I built for exploration. Most of the…

github.com

Why Do We Need To Optimize Our Models?
What’s The Business Value Behind?

I have two examples that I will like to share. First, let’s talk about
1) self-driving cars, and then we will go into 2) medical devices

To have cars driving on their own, we will need AI to recognize roads, pathways, buildings, pedestrians, road signs, etc… (You named it!). With this being said, LOW LATENCY is crucial. We need AI to swiftly recognize the surroundings to avoid accidents. Therefore, having an optimized model that speeds up inference (aka. predictions) is a must!!!

Next, I will like to introduce the importance of MODEL SIZES. Imagine that you have a fitness watch that monitors your health status using an AI model. While it is a great technology to have but what if your app produces lag results? For instance, you are alerted by “Sloth” (Imaginary AI Bot) at 2:40 pm and said:

“Hey, your heart beat went up to 150+ at 2:30pm. Do I need to call a doctor?” by Super Fast Sloth

It is terrible!! Therefore, for medical devices, having a smaller AI model size can also be beneficial. In fact, loading the model faster means LOWER LATENCY. I will say these two features come hand-by-hand.

Photo by Roger Burkhard on Unsplash ( You Need A Faster Sloth)

Introduction and Implementations

Weight Pruning

Starting with optimizing a Keras model, you will need tensorflow_model_optimization python package.

GitHub - tensorflow/model-optimization: A toolkit to optimize ML models for deployment for Keras…

The TensorFlow Model Optimization Toolkit is a suite of tools that users, both novice and advanced, can use to optimize…

github.com

This is an official tool released by Google, so no worries about compatibility. Anyway, so what is weight pruning? To summarize, weight pruning is a technique that suppresses your model weights into zeros. Having sparse values (having a lot of zeros), can have the model compress the files easier and faster. Since with fewer floating-point values, you save a lot of storage as well!

Pruning in Keras example | TensorFlow Model Optimization

Welcome to an end-to-end example for magnitude-based weight pruning. For an introduction to what pruning is and to…

www.tensorflow.org

Results:

I followed the tutorial and with some modifications, I built a transfer learning model on classifying cats and dogs in a given image. Both models have an accuracy of 93% in test sets.

Post Training API for TensorFlow Lite Model Optimization

Above are the results of model size optimization, next let’s look into model optimization for Mobile & IoT devices. This is equivalent to convert TensorFlow models into TensorFlow Lite models.

Good thing to know is that model converter is an built-in tool. You will need to call either
1) tf.lite.TFLiteConverter.from_saved_model
2) tf.lite.TFLiteConverter.from_keras_model

For more details on how to do it, you can visit my GitHub page where I have a well annotated notes with in the Jupyter Notebooks. Nonetheless, the general steps include:

Define A Converter
Select An Optimization Strategy
Define Data Types (Float16 or INT8)
Provide A Representational Dataset
Initiate convert.convert()
Output the model

But how do this works? Essentially, the conversion is suppressing the model size by determining a min-max range of your weights. Having floating-point ranges of values to represent your weights might not be necessary. Therefore, by knowing the min-max of your weights, the model can limit value representation within a calculated min-max which eventually reduces model weights. (This is known as Quantization, recall the image above)

Results:

Same as section 1, I built a transfer learning model using the cats and dogs dataset for demonstration. In conclusion, we see a gigantic model size reduction with TensorFlow Lite model and a pretty awesome accuracy at 98.5%.

Conclusion:

There are more other techniques that the TensorFlow Optimization Toolkit provides, please take some time to explore on your own! Additionally, just a side note but I think is quite interesting.

While I was researching and implementing optimization techniques, I realized that “Running a TFLite Model using PC or Laptop CPUs can result in very very very poor inference time performance!”

After a couple of Stack Overflow searches, it seems to be that TensorFlow Lite models are designed to run on Mobile CPUs and other specific hardware. Therefore, if one tries to make an inference on PC CPUs, you get terrible results! Finally, if you want to build models and you are afraid of model compatibility, you can use the TensorFlow Lite model maker to build applications that are already optimized for Mobile devices.

TensorFlow Lite Model Maker

The TensorFlow Lite Model Maker library simplifies the process of training a TensorFlow Lite model using custom…

www.tensorflow.org

References:

Inside TensorFlow: TF Model Optimization Toolkit (Quantization and Pruning)
Optimize machine learning models

Briefly Diving Into Model Optimization (TensorFlow)

TLDR:

GitHub - CJsGit-tech/TensorFlow_KerasModel_Optimization

Briefly introduce Keras models optimization toolkit and some demonstrations that I built for exploration. Most of the…

Why Do We Need To Optimize Our Models? What’s The Business Value Behind?

Introduction and Implementations

Weight Pruning

GitHub - tensorflow/model-optimization: A toolkit to optimize ML models for deployment for Keras…

The TensorFlow Model Optimization Toolkit is a suite of tools that users, both novice and advanced, can use to optimize…

Pruning in Keras example | TensorFlow Model Optimization

Welcome to an end-to-end example for magnitude-based weight pruning. For an introduction to what pruning is and to…

Results:

Post Training API for TensorFlow Lite Model Optimization

Results:

Conclusion:

TensorFlow Lite Model Maker

The TensorFlow Lite Model Maker library simplifies the process of training a TensorFlow Lite model using custom…

References:

Written by Che-Jui Huang

Why Do We Need To Optimize Our Models?
What’s The Business Value Behind?