Bringing Creativity, Agility, and Efficiency with Generative AI in Industries 24th Edition | Page 117

Advancements in Synthetic Video Generation for Autonomous Driving
Then , it integrates with the intermediate feature layer and extracts features from the last feature layer . These integrated features are then processed with various residual blocks to provide higher-resolution images . This approach was the first of its kind of detailed approach to videoto-video translation problems . The limitation of this approach is that it fails to solve issues like long-term temporal consistency , different AI rendering , and user perspective solutions .
World-Consistent Video-to-Video Synthesis proposed by A . Mallya , T . C . Wang et al . [ 4 ] discuss a new video synthesis framework that uses all previously generated frames . This is done by projecting structure from motion of all the previous frames to the existing frame . It uses a generator that corresponds to SPADE architecture and a discriminator that is like that of videoto-video synthesis [ 5 ].
Figure 2-5 : Architecture of label / flow embedding , image and segmentation used in world-consistent video-to-video synthesis . [ 4 ]
The overall network architecture consists of Label Embedding , Flow Embedding , an Image Encoder , and an Image Generator , as given in Figure 2-5 and Figure 2-6 , respectively . Label encoding uses an encoder-decoder style to embed input labels to distinctive features , which acts as one of the inputs to Multi SPADE in the Image generator . Flow Embedding deals with optical flow outputs of the previous frame , that is fed through the Multi SPADE block of Image generators . Image and Segmentation generators encode Image and segmented frames , respectively . The image encoder uses previously generated frames , while the segmentation encoder uses first-frame semantics . This helps to generate inputs for the Image Generator .
112 March 2024