
Reputation
Badges 1
132 × Eureka!Anyhow, it seems that moving it to main() didn't help. Any ideas?
generally I include the random seed in the name
not much different from the HuggingFace version, I believe
No, they're not in Tensorboard
Then when I queue up a job on the 1x16gb
queue it would run on one of the two GPUs?
I went to https://app.pro.clear.ml/profile and looked in the bottom right. But would this tell us about the version of the server run by Dan?
Local in the sense that my team member set it up, remote to me
CostlyOstrich36 I get some weird results, for "active duration".
For example, several of the experiments show that their active duration is more than 90 days, but I definitely didn't run them that long.
Reproduce the training:# How to run
`
You need to pip install requirements first. I think the following would do: transformers datasets clearml tokenizers torch
CLEAR_DATA has train.txt and validation.txt, the .txt files just need to have text data on separate lines. For debugging, anything should do.
For training you need tokenizer files as well, vocab.json, merges.txt, and tokenizer.json.
you also need a config.json,
should work.
export CLEAR_DATA="./data/dataset_for...
Long story, but in the other thread I couldn't install the particular version of transformers unless I removed it from "Installed Packages" and added it to setup script instead. So I took to just throwing in that list of packages.
There's also https://allegro.ai/clearml/docs/rst/references/clearml_python_ref/task_module/task_task.html
SuccessfulKoala55 what's the difference between the two websites? Is one of them preferred?
Yeah! So if given a folder, it adds everything in the folder. But if given a list or iterable, it iterates over the Paths and zips them all up.
IrritableOwl63 pm'd you a task ID
Aggregating the sort of range of all the runs, maybe like a hurricane track?
This sort of behavior is what I was thinking about when I saw "wildcard or pathlib Path" listed as options
Well, I can just work around it now that I know, by creating a folder with no subfolders and uploading that. But... 🤔 perhaps allow the interface to take in a list or generator? As in,files_to_upload = [f for f in output_dir.glob("*") if f.is_file()] Task.current_task().upload_artifact( "best_checkpoint", artifact_object=files_to_upload)
And then it could zip up the list and name it "best_checkpoint"?
This discussion might be relevant, it shows how to query a Task for metrics in code: https://clearml.slack.com/archives/CTK20V944/p1626992991375500?thread_ts=1626981377.374400&cid=CTK20V944
Yes, it trains fine. I can even look at the console output
Ah, makes sense! Have you considered adding a "this is the old website! Click here to get to the new one!" banner, kinda like on docs for python2 functions? https://docs.python.org/2.7/library/string.html
Oh, and good job starting your reference with an author that goes early in the alphabetical ordering, lol:
Hopefully it works for you, getting run_mlm.py to work took me some trial and error the first time. There is a --help option for the command line I believe. Some of the things aren't really intuitive
Yeah that should work. Basically in --train_file
it needs the path to train.txt, --validation_file
needs the path to validation.txt, etc. I just put them all in the same folder for convenience
TB = Tensorboard? No idea, I haven't tried to run it with tensorboard specifically. I do have tensorboard installed in the environment, I can confirm that.
It would certainly be nice to have. Lately I've heard of groups that do slices of datasets for distributed training, or who "stream" data.
Sounds doable, I will give it a try.
The task.execute_remotely
thing is quite interesting, I didn't know about that!