Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

data process for pre-training and fine-tuning #393

Closed
liuheng0111 opened this issue May 11, 2024 · 3 comments
Closed

data process for pre-training and fine-tuning #393

liuheng0111 opened this issue May 11, 2024 · 3 comments
Labels

Comments

@liuheng0111
Copy link

image

Here you said prepare a 10M dataset. What is it composed of, panda-10m and HD-VG-130M? How much of the HD-VG dataset has been used? The pre-training has 9.7M videos. Does this mean that the processing pipeline only filtered out 3% of the videos? What processing steps were involved in the pre-training, and what processing steps were involved in the fine-tuning? What filtering thresholds were used for each?

@handsomeZhuang
Copy link

image数据处理跟训练同时进行吗?为什么不提前进行离线预处理数据呢?

Copy link

This issue is stale because it has been open for 7 days with no activity.

@github-actions github-actions bot added the stale label May 24, 2024
Copy link

This issue was closed because it has been inactive for 7 days since being marked as stale.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale May 31, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants