Abstract:
Short-form videos are becoming a trend in today’s social media. A large number of tools provide users with the capability to edit and mix video clips with audio files to create a short-form video. A significant number of such videos currently exist on platforms such as TikTok and Instagram that were created by manual cut and mix of video and audio clips. In this paper, we examine the relationship that exists between video scenes and audio clips in order to identify the best mixing strategy of multiple video and audio files. For this purpose, we extract the beats and downbeats of the audio files and augment them with the scene changes in the video files. Next, we use the data to train two LSTM and RNN models to determine the best places to cut the video based on the audio data. Our system enables users to automatically create high-quality short-form videos by simply specifying the video clips and the audio file that should be mixed with them. We conducted experiments on 200 short-form videos. Our results illustrate the efficiency of the proposed system in determining the optimal cut scenes based on the characteristics of the selected audio file.
Citation:
Hedna, M. R., Djemmal, Y., & Mershad, K. (2023, October). A Model for the Automatic Mixing of Multiple Audio and Video Clips. In 2023 International Conference on Cyberworlds (CW) (pp. 110-117). IEEE.