Reputation
Badges 1
132 × Eureka!Yeah! So if given a folder, it adds everything in the folder. But if given a list or iterable, it iterates over the Paths and zips them all up.
CostlyOstrich36 at the bottom of the screenshot it says "Compute Time: 440 days"
TB = Tensorboard? No idea, I haven't tried to run it with tensorboard specifically. I do have tensorboard installed in the environment, I can confirm that.
OK, definitely fix that in the snippet, lol
This was only a means to that end
CostlyOstrich36 nice, thanks for the link. I know that in "info" on the experiments dashboard it includes gpu_type and started/completed times, I'll give it a go based on that
Or examples of, like, "select all experiments in project with iterations > 0"?
Ah... so there actually is a way to share it then, so long as people are signed up? How would one do this? Do I just share a link to the experiment, like https://app.pro.clear.ml/projects/b4a1875539cb4d9798529439801402ee/experiments/6f4cb4718c7c4a25b3a041c63f6ff2b4/output/execution?columns=selected&columns=type&columns=last_iteration&columns=hyperparams.Args.num_train_epochs&columns=name&columns=status&columns=users&columns=started&columns=last_update&columns=tags&columns=parent.name&colum...
CostlyOstrich36 I get some weird results, for "active duration".
For example, several of the experiments show that their active duration is more than 90 days, but I definitely didn't run them that long.
I will test both! Thanks for the ideas!
the parent task ids is what I originally wanted, remember?
Martin I found a different solution (hardcoding the parent tasks by hand), but I'm curious to hear what you discover!
it's one where I reset it, and cleared out the Installed Packages to only have transformers @ git+
https://github.com/huggingface/transformers@61c506349134db0a0a2fd6fb2eff8e29a2f84e79 in it.
We do have the paid tier, I believe. Anywhere we can go and read up some more on this stuff, btw?
BTW, http://clear.ml has this at the bottom:
Yeah that should work. Basically in --train_file
it needs the path to train.txt, --validation_file
needs the path to validation.txt, etc. I just put them all in the same folder for convenience
It would certainly be nice to have. Lately I've heard of groups that do slices of datasets for distributed training, or who "stream" data.
So for example, I'm able to view in the UI that my finetuning task 7725f5bed94848039c68f2a3a573ded6 has an input model, and I can find the creating experiment for that. But how would I do this in code?
Oh, and good job starting your reference with an author that goes early in the alphabetical ordering, lol:
A colleague was asking about it, and especially how hard it would be to, like, save off the "best" model instead of the last
Then I gave that folder a name.
Ah, should've specified we've got a ClearML server running
Sure, if you want to give up that first-place spot! 😉
Well, in my particular case the training data's got, like 200 subfolders, each with 2,000 files. I was just curious whether it was possible to pull down one of the subsets
I know it's running these lines, which get defined in https://github.com/huggingface/transformers/blob/f6e254474cb4f90f8a168a599b9aaf3544c37890/src/transformers/trainer_pt_utils.py#L828trainer.log_metrics("eval", metrics) trainer.save_metrics("eval", metrics)
Hopefully it works for you, getting run_mlm.py to work took me some trial and error the first time. There is a --help option for the command line I believe. Some of the things aren't really intuitive
One has an active duration of 185502. dividing that by 60 gives you minutes, oh I did the math wrong. Need to divide by 60 again to get hours,