Advancements in Synthetic Video Generation for Autonomous Driving
3 PROPOSALS AND CONTRIBUTIONS
The survey of the state-of-the-art indicates the necessity of the video-to-video synthesis solutions , which should consist of different key aspects as mentioned below in the realistic data generation process : 1 . Scenario generation . 2 . Label changing . 3 . Adding different realistic objects . 4 . Avoiding morphs . 5 . Reduce flickering / distortion .
Our solution should focus on these key aspects and create an interactive environment to provide a one-stop solution for different AI rendering , scenario generation , and future video prediction problems . Our approach is to generate realistic data , which aims to provide a one-stop solution for all these aspects and explore how an end user can eventually use our system .
Having the right contextual information is vital to have meaningful datasets for autonomous driving . There may be requirements to convert the legacy data captured with smaller Field of View ( FOV ) cameras to larger FOV cameras , or there could be a need to generate larger datasets from limited data available from a specific geography .
On the other hand , in certain areas , there may also be limitations in sharing personal information such as human faces , vehicle number plates , etc . ( for example , GDPR in the EU ). Such personal information needs to be anonymized . This data must also be augmented with synthetic data for training or validation . These requirements can differ depending on the context , region , and scenarios . It would be challenging to have a simulation platform to generalize these different parameters .
The proposed model provides flexibility to add the limited unique real data in already available datasets with limited training . Existing Simulator based approaches are not so cost effective . 3D asset creation with these simulators is a complex process . Also generated output seems to be far from reality .
In this approach , the proposed methodology provides a fusion of generative AI , deep learning , and image processing techniques . It explores the possibilities of generating data from simulated , real , or fused environments . The proposed approach is a panoptic segmentation-based approach to create semantic labels . Introducing a flow map-based methodology taking cues from a sequence of frames helps address long-term temporal coherence , which is a key issue in generated data . This input to the Generative AI network is a novel methodology to handle dynamic objects , scene changes , and environment variations .
Journal of Innovation 117