AIxiv专栏是机器之心发布学术、技术内容的栏目。过去数年,机器之心AIxiv专栏接收报道了2000多篇内容,覆盖全球各大高校与企业的顶级实验室,有效促进了学术交流与传播。如果您有优秀的工作想要分享,欢迎投稿或者联系报道。投稿邮箱:liyazhou@jiqizhixin.com;zhaoyunfeng@jiqizhixin.com
最近,来自中国科学技术大学、北京大学、上海 AI Lab等团队的研究人员发布了引人瞩目的 ShareGPT4Video系列,旨在提升视频理解和生成能力。
论文链接: https://arxiv.org/abs/2406.04325v1 项目链接:https://sharegpt4video.github.io/ 数据集链接:https://huggingface.co/datasets/ShareGPT4Video/ShareGPT4Video 代码链接: https://github.com/ShareGPT4Omni/ShareGPT4Video Demo链接: https://huggingface.co/spaces/Lin-Chen/ShareCaptioner-Video
清晰地理解帧间的时序变化。 详细准确地描述帧内内容。 对任意长度视频的可扩展性。
Caption 1:
The video segment documented a significant event in Kochi, Kerala, where 2 buildings razed in Kochi. The broadcast began with a split-screen presentation: on one side, thick clouds of dust were seen billowing into the sky, marking the onset of the demolition process, while on the other side, reporter Gopikrishnan provided live coverage, indicated by "BREAKING NEWS" captions and a consistent timestamp of "11:10 AM." The news ticker at the bottom of the screen simultaneously ran other global events, maintaining a flow of information. As the video progresses, the split-screen footage of the razed house turns into a close-up. A notable change in the headline to "KOCHI FLATS RAZED" signaled the demolition's culmination. A brief interlude offered a visual contradiction by showcasing the flats presumably before their demolition, providing a stark before and after comparison. As the video progressed, the left building's collapse initiated a dramatic alteration in the skyline, marked by significant dust plumes. Subsequently, another building was shown partially collapsing amid debris, fully obscured by dust in seconds, with surrounding greenery remaining untouched. This transitioned into a graphic interlude featuring the "India Today" logo, briefly pausing the live footage. Resuming to the aftermath, split imagery displayed the rubble and ongoing smoke. Then, the imagery continued to juxtapose the scenes of destruction against intact high-rise buildings nearby. The narrative was augmented by the revelation that the Supreme Court directed the demolition within a broader national news context. Throughout, the report maintained a real-time approach, threading continuity and urgency across the unfolding event's documentation.
Caption 2:
The video begins with an individual seated on a gray couch in a cozy domestic setting, about to unbox a product from a red CCM-branded box placed on a white table in front of them. Initially, the person is seen drinking from a blue can, indicating a casual atmosphere. Soon after, the individual shifts attention from the can to the red box, signifying the start of the unboxing process. The red box, initially closed, gradually becomes the focal point as the person prepares to open it, conveying a build-up of anticipation. As the video progresses, the box is flipped over and then opened, revealing its content still hidden under white tissue paper adorned with prints, adding to the suspense. The individual’s engagement with the box evolves, from initially preparing to open it, to actively delving into its contents. A momentary pause in activity is captured before the anticipation culminates with the individual lifting an object from the box. This object, identifiable by a yellow label, is then examined closely by the person, indicating a thorough inspection or perusal of the product or its packaging. Throughout the video, the surrounding environment remains consistent and undisturbed, with household items like a potted plant and a wall clock maintaining the setting's homely ambiance. The camera’s perspective remains fixed, focusing on the unfolding unboxing event without any movement, thus allowing the viewer to observe the narrative closely. Another partially open brown box is visible beside the main red box, though its role or contents are not elaborated upon. The video encapsulates the anticipation, action, and reveal inherent to unboxing experiences in a home setting.