Bringing Creativity, Agility, and Efficiency with Generative AI in Industries 24th Edition | Page 131

Advancements in Synthetic Video Generation for Autonomous Driving
Parameters Approach A Approach B
FID ( Fréchet inception distance ) Score 180 115.4 KLT ( Kanade Lucas Tomasi ) Score 0.2154 0.0300
Table 5-1 : KPI Evaluation A .
Approach A mentioned in Table 5.1 is trained with 4 input frames at start and doubling it after every 25th Epoch up to a maximum of 32 frames . That is training will start with batch of continuous 4 frames and subsequently after every 25th epoch the batch of input frame will change to 8 , 16 and 32 .
Similarly in Approach B training started with giving 2 frames and start and doubling after every 25th epoch up to a maximum of 32 frames .
Parameters Approach 1
( No generated data used for training )
Approach 2
( With generated + original data )
% of accurate detection 75 % 91 %
Table 5-2 : KPI Evaluation B .
FID is a commonly used evaluation score to measure the closeness of two images ( here , in our case , the original image and the generated image ). The lower the FID value , the better the output image / video quality . FID value 0 signifies there is no difference between input and generated image / video .
FID score is calculated by taking the mean ( μ 1 , μ 2 ) and covariance ( C1 , C2 ) of feature vectors by considering the pre-trained Inception-v3 model .
FID = || μ 1 – μ 2 || 2 + Tr ( C 1 + C 2 – 2 ∗ √ ( C 1 ∗ C 2 ))
Similarly , our proposed KLT score further reduces human intervention and gives a score to measure output quality based on temporal coherence between consecutive frames . The Lower KLT score signifies there is consistency between successive frames . Like FID score , “ 0 ” KLT score is the ideal case which means all frames are consistent .
KLT Score is calculated using the degree of similarity between consecutive frames of selected optical flow-based features .
126 March 2024