Reputation
Badges 1
132 × Eureka!Oh yeah, that's been bugging me for a while
Yup, I just wanted to mark it completed, honestly. But then when I run it, Colab crashes.
not much different from the HuggingFace version, I believe
Before I enqueued the job, I manually edited Installed Packages thus:boto3 datasets clearml tokenizers torchand addedpip install git+to the setup script.
And the docker image isnvidia/cuda:11.2.2-cudnn8-runtime-ubuntu18.04
I did all that because I've been having this other issue: https://clearml.slack.com/archives/CTK20V944/p1624892113376500
Yeah! So if given a folder, it adds everything in the folder. But if given a list or iterable, it iterates over the Paths and zips them all up.
AgitatedDove14 yes, I called init and tensorboard is installed. It successfully uploaded the metrics from trainer.train(), just not from the next cell where we do trainer.predict
It seems to create a folder and put things into it, I was hoping to just observe the tensorboard folder
Then I gave that folder a name.
Reproduce the training:# How to run `
You need to pip install requirements first. I think the following would do: transformers datasets clearml tokenizers torch
CLEAR_DATA has train.txt and validation.txt, the .txt files just need to have text data on separate lines. For debugging, anything should do.
For training you need tokenizer files as well, vocab.json, merges.txt, and tokenizer.json.
you also need a config.json, should work.
export CLEAR_DATA="./data/dataset_for...
Oh, of course, that makes total sense
Can you share the code?
Or examples of, like, "select all experiments in project with iterations > 0"?
generally I include the random seed in the name
IrritableOwl63 pm'd you a task ID
So presumably you could write a Python loop that goes through and pulls the metrics into a list, then make a plot locally. Not sure about creating a Dashboard within the ClearML web interface though!
OK, I guess
` training_args_dict = training_args.to_dict()
Task.current_task().set_parameters_as_dict(training_args_dict) `works, but how to change the name from "General"?
Hopefully it works for you, getting run_mlm.py to work took me some trial and error the first time. There is a --help option for the command line I believe. Some of the things aren't really intuitive
This sort of behavior is what I was thinking about when I saw "wildcard or pathlib Path" listed as options
Yeah that should work. Basically in --train_file it needs the path to train.txt, --validation_file needs the path to validation.txt, etc. I just put them all in the same folder for convenience
Interesting, I wasn't aware of the possibilities you outline there at the end, where you, like, programmatically pull all the results down for all the tasks. Neat!
A more complex version of this which I'm trying to figure out:
I trained a model using TaskA. I need to now pull that model down from the saved artifacts of TaskA and fine-tune it in TaskB That finetuning in TaskB spits out a metric.
Is there a way to do this all elegantly? Currently my process is to manually download the model...
AgitatedDove14 I'm making some progress on this. I've currently got the situation that my training run saved all of these files, and Task.get_task(param['TaskA']).models['output''][-1] gets me just one of them, training_args.bin . Then -2 gets me another, rng_state.pth
If I just get Task.get_task(param['TaskA']).models['output'] , I end up getting a huge list of, like, ` [<clearml.model.Model object at 0x7fec2841c880>, <clearml.model.Model object at 0x7fec2841...
A colleague was asking about it, and especially how hard it would be to, like, save off the "best" model instead of the last

