data process for pre-training and fine-tuning #393

liuheng0111 · 2024-05-11T08:25:30Z

Here you said prepare a 10M dataset. What is it composed of, panda-10m and HD-VG-130M? How much of the HD-VG dataset has been used? The pre-training has 9.7M videos. Does this mean that the processing pipeline only filtered out 3% of the videos? What processing steps were involved in the pre-training, and what processing steps were involved in the fine-tuning? What filtering thresholds were used for each?

handsomeZhuang · 2024-05-16T03:44:52Z

数据处理跟训练同时进行吗？为什么不提前进行离线预处理数据呢？

github-actions · 2024-05-24T01:46:53Z

This issue is stale because it has been open for 7 days with no activity.

github-actions · 2024-05-31T01:47:44Z

This issue was closed because it has been inactive for 7 days since being marked as stale.

github-actions bot added the stale label May 24, 2024

github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale May 31, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data process for pre-training and fine-tuning #393

data process for pre-training and fine-tuning #393

liuheng0111 commented May 11, 2024

handsomeZhuang commented May 16, 2024

github-actions bot commented May 24, 2024

github-actions bot commented May 31, 2024

data process for pre-training and fine-tuning #393

data process for pre-training and fine-tuning #393

Comments

liuheng0111 commented May 11, 2024

handsomeZhuang commented May 16, 2024

github-actions bot commented May 24, 2024

github-actions bot commented May 31, 2024