Bringing Creativity, Agility, and Efficiency with Generative AI in Industries 24th Edition | Page 128

Advancements in Synthetic Video Generation for Autonomous Driving
Testing Training Evaluation
Preprocessing KLT Score
Image Encoder
Label Embedding
Generator
Flow Embedding
Concatenation
Post processing
Discriminator ( Real / Fake )
FID Score
Figure 4-5 : Overall system architecture .
Figure 4-5 describes the overall project architecture in which our input labels , optical flow outputs , and guidance images are fed to Multi SEAN-based architecture , for which we will take care of the styling part in our proposed model . We will use the cascading approach to improve the quality of produced output and start training on lower-resolution images first , then shift to higher-resolution images . The detailed description of generator and discriminator architecture and Generative Adversarial Networks ( GANs ) have already been described in the above sections .
4.7 TRAINING AND VALIDATION
Our proposed model is trained and validated on Google Collaboratory 22 . ADAM 23 optimizer at a learning rate of 0.00008 is used for the encoder and generator networks , while a learning rate of 0.0005 is used for the discriminators described above . This study uses the Cityscapes 24 dataset for training . The dataset has been resized to 384x768 pixels and 512x1024 pixels and trained on a single GPU environment , which takes approximately 100 hours for training . This study adopted a cascading approach to train with 384x768 pixels first and then with 512x1024 pixels on the same network . Our model is tested on the CamVid 25 with the data generated from CARLA , Carla ROS Bridge , CarSim , Matlab , and Unreal CV ( Computer Vision ). For the segmentation data
22 https :// colab . research . google . com /
23 https :// arxiv . org / abs / 1412.6980
24 https :// arxiv . org / abs / 1604.01685
25 https :// www . sciencedirect . com / science / article / abs / pii / S0167865508001220
Journal of Innovation 123