Reputation
Badges 1
132 × Eureka!I guess I could try and edit that, somehow. Hmm
We do have the paid tier, I believe. Anywhere we can go and read up some more on this stuff, btw?
here's console output with loss being output
Could I use "register artifact" to get it to update every time there's a new checkpoint created?
SuccessfulKoala55 the clearml version on the server, according to my colleague, is:clearml-agent --version CLEARML-AGENT version 1.0.0
What I'm curious about is how clearML hooks into that to know to upload the other artifacts such as http://optimizer.pt .
Well, in my particular case the training data's got, like 200 subfolders, each with 2,000 files. I was just curious whether it was possible to pull down one of the subsets
Oh, that's a neat tip! I just set that in the Task settings? I didn't know that was possible
No, they're not in Tensorboard
Oh, here's an example, a screenshot I took of the files in my Colab instance:
My other question is: how does it decide what to upload automatically? It picked up almost everything, just not trainer_state.json. Which I'm actually not quite sure is necessary
And the reason is, because I have a bunch of "runs" with the same settings, and I want to compare broadly across several settings. So if I select "a bunch" with setting A I can see a general pattern when compared with setting B.
I've got 7-10 runs per setting, and about 7 or 8 settings
I think the model state is just post training loop (not inside the loop), no?
trainer_state.json gets updated every time a "checkpoint" gets saved. I've got that set to once an epoch.
My testing indicates that if the training gets interrupted, I can resume training from a saved checkpoint folder that includes trainer_state.json. It uses the info to determine which data to skip, where to pick back up again, etc
oooh, that's awesome lol. Never thought to do it that way
Gave it a try, it seems our GPU Queue doesn't have the S3 creds set up correctly. Making a separate thread about that
Sure, I don't seem to be having any trouble with 1.03rc1. As for 1.02, like I said, the original issue seems to have mysteriously gone away, like some sort of heisenbug that goes away when I mess with the Notebook.
With a completely fresh notebook I added the cells to install clearml 1.02 and initiate a Task, and ran the notebook again, and... the issue seems to have disappeared again.
Not sure how to even replicate the original issue anymore, sorry I couldn't be of more help!
Oh, and good job starting your reference with an author that goes early in the alphabetical ordering, lol:
Or do you just want:@misc{clearml, title = {ClearML - Your entire MLOps stack in one open-source tool}, year = {2019}, note = {Software available from }, url={ }, author = {ClearML}, }
Sure, if you want to give up that first-place spot! 😉
BTW, http://clear.ml has this at the bottom:
Or we could do@misc{clearml, title = {ClearML - Your entire MLOps stack in one open-source tool}, year = {2019}, note = {Software available from }, url={ }, author = {Allegro AI}, }
sounds good to me!
Oh, that's cool, didn't know about that:
This should work. It has the tokenizer files, the train.txt, the validation.txt and a config.json
