Home IoT Speed up BiT Mannequin Even Extra with Quantization Utilizing OpenVINO and NNCF

Speed up BiT Mannequin Even Extra with Quantization Utilizing OpenVINO and NNCF

0
Speed up BiT Mannequin Even Extra with Quantization Utilizing OpenVINO and NNCF

[ad_1]

1. Introduction

Within the first half of this weblog collection, we mentioned how one can use Intel®’s OpenVINO™ toolkit to speed up inference of the Huge Switch (BiT) mannequin for pc imaginative and prescient duties. We coated the method of importing the BiT mannequin into the OpenVINO surroundings, leveraging {hardware} optimizations, and benchmarking efficiency. Our outcomes showcased important efficiency good points and decreased inference latency for BiT when utilizing OpenVINO in comparison with the unique TensorFlow implementation. With this sturdy base end in place, there’s nonetheless room for additional optimization. On this second half, we’ll additional improve BiT mannequin inference with the assistance of OpenVINO and Neural Community Compression Framework (NNCF) and low precision (INT8) inference. NNCF offers refined instruments for neural community compression by means of quantization, pruning, and sparsity methods tailor-made for deep studying inference. This enables BiT fashions to develop into viable for energy and memory-constrained environments the place the unique mannequin measurement could also be prohibitive. The methods offered will probably be relevant to many deep studying fashions past BiT.

2. Mannequin Quantization

Mannequin quantization is an optimization approach that reduces the precision of weights and activations in a neural community. It converts 32-bit floating level representations (FP32) to decrease bit-widths like 16-bit floats (FP16) or 8-bit integers (INT8) or 4-bit integers (INT4). The important thing profit is enhanced effectivity — smaller mannequin measurement and sooner inference time. These enhancements not solely improve effectivity on server platforms however, extra importantly, additionally allow deployment onto resource-constrained edge units. So, whereas server platform efficiency is improved, the larger affect is opening all-new deployment alternatives. Quantization transforms fashions from being restricted to knowledge facilities to being deployable even on low-power units with restricted compute or reminiscence. This massively expands the attain of AI to the true edge.

Beneath are just a few of the important thing mannequin quantization ideas:

  • Precision discount — Decreases the variety of bits used to symbolize weights and activations. Frequent bit-widths: INT8, FP16. Permits smaller fashions.
  • Effectivity — Compressed fashions are smaller and sooner, resulting in environment friendly system useful resource utilization.
  • Commerce-offs — Balancing mannequin compression, velocity, and accuracy for goal {hardware}. The aim is to optimize throughout all fronts.
  • Methods — Submit-training and quantization-aware coaching. Bakes in resilience to decrease precision.
  • Schemes — Quantization methods like weight, activation, or mixed strategies strike a steadiness between compressing fashions and preserving accuracy.
  • Preserving accuracy — Tremendous-tuning, calibration, and retraining keep the standard of real-world knowledge.

3. Neural Community Compression Framework (NNCF)

NNCF is a strong device for optimizing deep studying fashions, such because the Huge Switch (BiT) mannequin, to realize improved efficiency on varied {hardware}, starting from edge to knowledge heart. It offers a complete set of options and capabilities for mannequin optimization, making it straightforward for builders to optimize fashions for low-precision inference. A few of the key capabilities embody:

  • Assist for a wide range of post-training and training-time algorithms with minimal accuracy drop.
  • Seamless mixture of pruning, sparsity, and quantization algorithms.
  • Assist for a wide range of fashions: NNCF can be utilized to optimize fashions from a wide range of frameworks, together with TensorFlow, PyTorch, ONNX, and OpenVINO.

NNCF offers samples that display the utilization of compression algorithms for various use instances and fashions. See compression outcomes achievable with the NNCF-powered samples on the Mannequin Zoo web page. For extra particulars check with this GitHub repo.

4. BiT Classification Mannequin Optimization with OpenVINO™

Observe: Earlier than continuing with the next steps, guarantee you have got a conda surroundings arrange. Check with this weblog publish for detailed directions on establishing the conda surroundings.

4.1. Obtain BiT_M_R50x1_1 tf classification mannequin:

wget https://tfhub.dev/google/bit/m-r50x1/1?tf-hub-format=compressed 
-O bit_m_r50x1_1.tar.gz

mkdir -p bit_m_r50x1_1 && tar -xvf bit_m_r50x1_1.tar.gz -C bit_m_r50x1_1

4.2. OpenVINO™ Mannequin Optimization:

Execute the under command contained in the conda surroundings to generate OpenVINO IR mannequin recordsdata (.xml and .bin) for the bit_m_r50x1_1 mannequin. These mannequin recordsdata will probably be used for additional optimization and for inference accuracy validation in subsequent sections.

ovc ./bit_m_r50x1_1 --output_model ./bit_m_r50x1_1/ov/fp32/bit_m_r50x1_1 
--compress_to_fp16 False

5. Information Preparation

To guage the accuracy affect of quantization on our BiT mannequin, we want an acceptable dataset. For this, we leverage the ImageNet 2012 validation set which incorporates 50,000 pictures throughout 1000 courses. The ILSVRC2012 validation floor fact is used for cross-referencing mannequin predictions throughout accuracy measurement.

By testing our compressed fashions on established knowledge like ImageNet validation knowledge, we will higher perceive the real-world utility of our optimizations. Sustaining maximal accuracy whereas minimizing useful resource utilization is essential for edge deployment. This dataset offers the rigorous and unbiased means to successfully validate these trade-offs.

Observe: Accessing and downloading the ImageNet dataset requires registration.

6. Quantization Utilizing NNCF

On this part, we’ll delve into the precise steps concerned in quantizing the BiT mannequin utilizing NNCF. The quantization course of entails getting ready a calibration dataset and making use of 8-bit quantization to the mannequin, adopted by accuracy analysis.

6.1. Getting ready Calibration Dataset:

At this step, create an occasion of the nncf.Dataset class that represents the calibration dataset. The nncf.Dataset class could be a wrapper over the framework dataset object used for mannequin coaching or validation. Beneath is a pattern code snippet of nncf.Dataset() name with reworked knowledge samples.

# TF Dataset cut up for nncf calibration
img2012_val_split = get_val_data_split(tf_dataset_,
train_split=0.7,
val_split=0.3,
shuffle=True,
shuffle_size=50000)

img2012_val_split = img2012_val_split.map(nncf_transform).batch(BATCH_SIZE)

calibration_dataset = nncf.Dataset(img2012_val_split)

The transformation perform is a perform that takes a pattern from the dataset and returns knowledge that may be handed to the mannequin for inference. Beneath is the code snippet of the info rework.

# Information rework perform for NNCF calibration 
def nncf_transform(picture, label):
picture = tf.io.decode_jpeg(tf.io.read_file(picture), channels=3)
picture = tf.picture.resize(picture, IMG_SIZE)
return picture

6.2. NNCF Quantization (FP32 to INT8):

As soon as the calibration dataset is ready and the mannequin object is instantiated, the subsequent step entails making use of 8-bit quantization to the mannequin. That is achieved through the use of the nncf.quantize() API, which takes the OpenVINO FP32 mannequin generated within the earlier steps together with the calibrated dataset values to provoke the quantization course of. Whereas nncf.quantize() offers quite a few superior configuration knobs, in lots of instances like this one, it simply works out of the field or with minor changes. Beneath, is pattern code snippet of nncf.quantize() API name.

ov_quantized_model = nncf.quantize(ov_model, 
calibration_dataset,
fast_bias_correction=False)

For additional particulars, the official documentation offers a complete information on the fundamental quantization circulate, together with establishing the surroundings, getting ready the calibration dataset, and calling the quantization API to use 8-bit quantization to the mannequin.

6.3. Accuracy Analysis

Because of NNCF mannequin quantization course of, the OpenVINO INT8 quantized mannequin is generated. To guage the affect of quantization on mannequin accuracy, we carry out a complete benchmarking comparability between the unique FP32 mannequin and the quantized INT8 mannequin. This comparability entails measuring the accuracy of BiT Mannequin (m-r50x1/1) on the ImageNet 2012 Validation dataset. The accuracy analysis outcomes are proven in Desk 1.

With TensorFlow (FP32) to OpenVINO™ (FP32) mannequin optimization, the classification accuracy remained constant at 0.70154, confirming that conversion to OpenVINO™ mannequin illustration doesn’t have an effect on accuracy. Moreover, with NNCF Quantization to an 8-bit integer mannequin, the accuracy was solely marginally impacted of lower than 0.03%, demonstrating that the quantization course of didn’t compromise the mannequin’s classification skills.

Check with Appendix A for the Python script bit_ov_model_quantization.py, which incorporates knowledge preparation, mannequin optimization, NNCF quantization duties, and accuracy analysis.

The utilization of the bit_ov_model_quantization.py script is as follows:

$python bit_ov_model_quantization.py --help
utilization: bit_ov_model_quantization.py [-h] [--inp_shape INP_SHAPE] --dataset_dir DATASET_DIR --gt_labels GT_LABELS --bit_m_tf BIT_M_TF --bit_ov_fp32 BIT_OV_FP32
[--bit_ov_int8 BIT_OV_INT8]

BiT Classification mannequin quantization and accuracy measurement

required arguments:
--dataset_dir DATASET_DIR
Listing path to ImageNet2012 validation dataset
--gt_labels GT_LABELS
Path to ImageNet2012 validation ds gt labels file
--bit_m_tf BIT_M_TF Path to BiT TF fp32 mannequin file
--bit_ov_fp32 BIT_OV_FP32
Path to BiT OpenVINO fp32 mannequin file

non-compulsory arguments:
-h, --help present this assist message and exit
--inp_shape INP_SHAPE
N,W,H,C
--bit_ov_int8 BIT_OV_INT8
Path to save lots of BiT OpenVINO INT8 mannequin file

7. Conclusion

The outcomes emphasize the efficacy of OpenVINO™ and NNCF in optimizing mannequin effectivity whereas minimizing computational necessities. The power to realize outstanding efficiency and accuracy retention, notably when compressing fashions to INT8 precision, demonstrates the practicality of leveraging OpenVINO™ for deployment in varied environments together with resource-constrained environments. NNCF proves to be a invaluable device for practitioners looking for to steadiness mannequin measurement and computational effectivity with out substantial compromise on classification accuracy, opening avenues for enhanced mannequin deployment throughout various {hardware} configurations.

Notices & Disclaimers

Efficiency varies by use, configuration, and different elements. Study extra on the Efficiency Index website.

Efficiency outcomes are primarily based on testing as of dates proven in configurations and will not mirror all publicly out there ​updates. See backup for configuration particulars.

No product or part may be completely safe.
Your prices and outcomes might differ.
Intel applied sciences might require enabled {hardware}, software program or service activation.

© Intel Company. Intel, the Intel emblem, and different Intel marks are emblems of Intel Company or its subsidiaries. Different names and types could also be claimed because the property of others. ​

Extra Assets

Appendix A

  • ILSVRC2012 floor fact: ground_truth_ilsvrc2012_val.txt
  • See bit_ov_model_quantization.py under for the BiT mannequin quantization pipeline with NNCF described on this weblog.
"""
Copyright (c) 2022 Intel Company

Licensed underneath the Apache License, Model 2.0 (the "License");
you could not use this file besides in compliance with the License.
You could acquire a duplicate of the License at
http://www.apache.org/licenses/LICENSE-2.0
Except required by relevant legislation or agreed to in writing, software program
distributed underneath the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, both specific or implied.
See the License for the precise language governing permissions and
limitations underneath the License.
"""

"""
This script is examined with TensorFlow v2.12.1 and OpenVINO v2023.1.0

Utilization Instance under (with required parameters):

python bit_ov_model_quantization.py
--gt_labels ./<path_to>/ground_truth_ilsvrc2012_val.txt
--dataset_dir ./<path-to-dataset>/ilsvrc2012_val_ds/
--bit_m_tf ./<path-to-tf>/mannequin
--bit_ov_fp32 ./<path-to-ov>/fp32_ir_model

"""

import os, sys
from openvino.runtime import Core
import numpy as np
import argparse, os
import nncf
import openvino.runtime as ov
import pandas as pd
import re

import logging
logging.basicConfig(degree=logging.ERROR)

import tensorflow as tf
import tensorflow_hub as hub
import tensorflow.compat.v2 as tf

from PIL import Picture
from sklearn.metrics import accuracy_score

ie = Core()
tf.compat.v1.logging.set_verbosity(tf.compat.v1.logging.ERROR)
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2'

# For prime 1 labels.
MAX_PREDS = 1
BATCH_SIZE = 1
IMG_SIZE = (224, 224) # Default Imagenet picture measurement
NUM_CLASSES = 1000 # For Imagenette dataset

# Information rework perform for NNCF calibration
def nncf_transform(picture, label):
picture = tf.io.decode_jpeg(tf.io.read_file(picture), channels=3)
picture = tf.picture.resize(picture, IMG_SIZE)
return picture

# Information rework perform for imagenet ds validation
def val_transform(image_path, label):
picture = tf.io.decode_jpeg(tf.io.read_file(image_path), channels=3)
picture = tf.picture.resize(picture, IMG_SIZE)
img_reshaped = tf.reshape(picture, [IMG_SIZE[0], IMG_SIZE[1], 3])
picture = tf.picture.convert_image_dtype(img_reshaped, tf.float32)
return picture, label

# Validation dataset cut up
def get_val_data_split(tf_dataset_, train_split=0.7, val_split=0.3,
shuffle=True, shuffle_size=50000):
if shuffle:
ds = tf_dataset_.shuffle(shuffle_size, seed=12)

train_size = int(train_split * shuffle_size)
val_size = int(val_split * shuffle_size)
val_ds = ds.skip(train_size).take(val_size)

return val_ds

# OpenVINO IR mannequin inference validation
def ov_infer_validate(mannequin: ov.Mannequin,
val_loader: tf.knowledge.Dataset) -> tf.Tensor:

mannequin.reshape([1,IMG_SIZE[0],IMG_SIZE[1],3]) # If MO ran with Dynamic batching
compiled_model = ov.compile_model(mannequin)
output = compiled_model.outputs[0]

ov_predictions = []
for img, label in val_loader:#.take(25000):#.take(5000):#.take(5):
pred = compiled_model(img)[output]
ov_result = tf.reshape(pred, [-1])
top_label_idx = np.argsort(ov_result)[-MAX_PREDS::][::-1]
ov_predictions.append(top_label_idx)

return ov_predictions

# OpenVINO IR mannequin NNCF Quantization
def quantize(ov_model, calibration_dataset): #, val_loader: tf.knowledge.Dataset):
print("Began NNCF qunatization course of")
ov_quantized_model = nncf.quantize(ov_model, calibration_dataset, fast_bias_correction=False)
return ov_quantized_model

# OpenVINO FP32 IR mannequin inference
def ov_fp32_predictions(ov_fp32_model, validation_dataset):
# Load and compile the OV mannequin
ov_model = ie.read_model(ov_fp32_model)
print("Beginning OV FP32 Mannequin Inference...!!!")
ov_fp32_pred = ov_infer_validate(ov_model, validation_dataset)
return ov_fp32_pred

def nncf_quantize_int8_pred_results(ov_fp32_model, calibration_dataset,
validation_dataset, ov_int8_model):

# Load and compile the OV mannequin
ov_model = ie.read_model(ov_fp32_model)

# NNCF Quantization of OpenVINO IR mannequin
int8_ov_model = quantize(ov_model, calibration_dataset)
ov.serialize(int8_ov_model, ov_int8_model)
print("NNCF Quantization Course of accomplished..!!!")

ov_int8_model = ie.read_model(ov_int8_model)
print("Beginning OV INT8 Mannequin Inference...!!!")
ov_int8_pred = ov_infer_validate(ov_int8_model, validation_dataset)

return ov_int8_pred

def tf_inference(tf_saved_model_path, val_loader: tf.knowledge.Dataset):

tf_model = tf.keras.fashions.load_model(tf_saved_model_path)
print("Beginning TF FP32 Mannequin Inference...!!!")
tf_predictions = []
for img, label in val_loader:
tf_result = tf_model.predict(img, verbose=0)
tf_result = tf.reshape(tf_result, [-1])
top5_label_idx = np.argsort(tf_result)[-MAX_PREDS::][::-1]
tf_predictions.append(top5_label_idx)

return tf_predictions

"""
Module: bit_classificaiton
Description: API to run BiT classificaiton OpenVINO IR mannequin INT8 Quantization on utilizing NNCF and
perfom accuracy metrics for TF FP32, OV FP32 and OV INT8 on ImageNet2012 Validation dataset
"""
def bit_classification(args):

ip_shape = args.inp_shape
if isinstance(ip_shape, str):
ip_shape = [int(i) for i in ip_shape.split(",")]
if len(ip_shape) != 4:
sys.exit( "Enter form error. Set form 'N,W,H,C'. For instance: '1,224,224,3' " )

# Imagenet2012 validataion dataset used for TF and OV FP32 accuracy testing.
#dataset_dir = ../dataset/ilsvrc2012_val/1.0/ + "*.JPEG"
dataset_dir = args.dataset_dir + "*.JPEG"
tf_dataset = tf.knowledge.Dataset.list_files(dataset_dir)

gt_lables = open(args.gt_labels)

val_labels = []
for l in gt_lables:
val_labels.append(str(l))

# Producing ImageNet 2012 validation dataset dictionary (img, label)
val_images = []
val_labels_in_img_order = []
for i, v in enumerate(tf_dataset):
img_path = str(v.numpy())
id = int(img_path.cut up('/')[-1].cut up('_')[-1].cut up('.')[0])
val_images.append(img_path[2:-1])
val_labels_in_img_order.append(int(re.sub(r'n','',val_labels[id-1])))

val_df = pd.DataFrame(knowledge={'pictures': val_images, 'label': val_labels_in_img_order})

# Changing imagenet2012 val dictionary into tf.knowledge.Dataset
tf_dataset_ = tf.knowledge.Dataset.from_tensor_slices((listing(val_df['images'].values), val_df['label'].values))
imgnet2012_val_dataset = tf_dataset_.map(val_transform).batch(BATCH_SIZE)

# TF Dataset cut up for nncf calibration
img2012_val_split_for_calib = get_val_data_split(tf_dataset_, train_split=0.7,
val_split=0.3, shuffle=True,
shuffle_size=50000)

img2012_val_split_for_calib = img2012_val_split_for_calib.map(nncf_transform).batch(BATCH_SIZE)

# TF Mannequin Inference
tf_model_path = args.bit_m_tf
print(f"Tensorflow FP32 Mannequin {args.bit_m_tf}")
tf_p = tf_inference(tf_model_path, imgnet2012_val_dataset)

#acc_score = accuracy_score(tf_pred, val_labels_in_img_order[0:25000])
acc_score = accuracy_score(tf_p, val_labels_in_img_order)
print(f"Accuracy of FP32 TF mannequin = {acc_score}n")

# OpenVINO Mannequin Inference
print(f"OpenVINO FP32 IR Mannequin {args.bit_ov_fp32}")
ov_fp32_p = ov_fp32_predictions(args.bit_ov_fp32, imgnet2012_val_dataset)

acc_score = accuracy_score(ov_fp32_p, val_labels_in_img_order)
print(f"Accuracy of FP32 IR mannequin = {acc_score}n")

print("Beginning NNCF dataset Calibration....!!!")
calibration_dataset = nncf.Dataset(img2012_val_split_for_calib)

# OpenVINO IR FP32 to INT8 Mannequin Quantization with NNCF and
# INT8 predictions outcomes on validation dataset
ov_int8_p = nncf_quantize_int8_pred_results(args.bit_ov_fp32, calibration_dataset,
imgnet2012_val_dataset, args.bit_ov_int8)

print(f"OpenVINO NNCF Quantized INT8 IR Mannequin {args.bit_ov_int8}")
acc_score = accuracy_score(ov_int8_p, val_labels_in_img_order)
print(f"Accuracy of INT8 IR mannequin = {acc_score}n")

#acc_score = accuracy_score(tf_p, ov_fp32_p)
#print(f"TF Vs OV FP32 Accuracy Rating = {acc_score}")

#acc_score = accuracy_score(ov_fp32_p, ov_int8_p)
#print(f"OV FP32 Vs OV INT8 Accuracy Rating = {acc_score}")

if __name__ == "__main__":

parser = argparse.ArgumentParser(description="BiT Classification mannequin quantization and accuracy measurement")
non-compulsory = parser._action_groups.pop()
required=parser.add_argument_group("required arguments")
non-compulsory.add_argument("--inp_shape", sort=str, assist="N,W,H,C", default="1,224,224,3", required=False)
required.add_argument("--dataset_dir", sort=str, assist="Listing path to ImageNet2012 validation dataset", required=True)
required.add_argument("--gt_labels", sort=str, assist="Path to ImageNet2012 validation ds gt labels file", required=True)
required.add_argument("--bit_m_tf", sort=str, assist="Path to BiT TF fp32 mannequin file", required=True)
required.add_argument("--bit_ov_fp32", sort=str, assist="Path to BiT OpenVINO fp32 mannequin file", required=True)
non-compulsory.add_argument("--bit_ov_int8", sort=str, assist="Path to save lots of BiT OpenVINO INT8 mannequin file",
default="./bit_m_r50x1_1/ov/int8/saved_model.xml", required=False)
parser._action_groups.append(non-compulsory)

args = parser.parse_args()
bit_classification(args)

[ad_2]