
Reputation
Badges 1
132 × Eureka!Hello! integration in what sense? Training a model? Uploading a model to the hub? Something else?
Ah, makes sense! Have you considered adding a "this is the old website! Click here to get to the new one!" banner, kinda like on docs for python2 functions? https://docs.python.org/2.7/library/string.html
There's also https://allegro.ai/clearml/docs/rst/references/clearml_python_ref/task_module/task_task.html
SuccessfulKoala55 what's the difference between the two websites? Is one of them preferred?
Actually at this point, I'd say it's too late, you might want to just generate new credentials...
SuccessfulKoala55 the clearml version on the server, according to my colleague, is:clearml-agent --version CLEARML-AGENT version 1.0.0
Here's the actual script I'm using
A colleague was asking about it, and especially how hard it would be to, like, save off the "best" model instead of the last
Yes, it trains fine. I can even look at the console output
I'm scrolling through the other thread to see if it's there
Before I enqueued the job, I manually edited Installed Packages thus:boto3 datasets clearml tokenizers torch
and addedpip install git+
to the setup script.
And the docker image isnvidia/cuda:11.2.2-cudnn8-runtime-ubuntu18.04
I did all that because I've been having this other issue: https://clearml.slack.com/archives/CTK20V944/p1624892113376500
It seems to create a folder and put things into it, I was hoping to just observe the tensorboard folder
As in, I edit Installed Packages, delete everything there, and put that particular list of packages.
here's console output with loss being output
oooh, that's awesome lol. Never thought to do it that way
Presumably the correct way to do this is to fork the transformers library, make the change, and add that version to my requirements.txt
Well, in my particular case the training data's got, like 200 subfolders, each with 2,000 files. I was just curious whether it was possible to pull down one of the subsets
I see a "publish" button on here, but would that make it visible on the wider internet?
What I'm curious about is how clearML hooks into that to know to upload the other artifacts such as http://optimizer.pt .
Could I use "register artifact" to get it to update every time there's a new checkpoint created?
So in theory we could hook into one of those functions and add a line to have ClearML upload that particular json we want
AgitatedDove14 I should have probably expanded my last message a bit more to say "Right, natanM, right now it's on http://app.pro.clear.ml , not http://app.clear.ml , can you advise, given that it is on .pro?"
OK, I added
Task.current_task().upload_artifact(name='trainer_state', artifact_object=os.path.join(output_dir, "trainer_state.json"))
after this line:
And it seems to be working.
When I was answering the question "are you using a local server", I misinterpreted it as "are you running the agents and queue on a local server station".
SuccessfulKoala55 I think I just realized I had a misunderstanding. I don't think we are running a local server version of ClearML, no. We have a workstation running a queue/agents, but ClearML itself is via http://app.pro.clear.ml , I don't think we have ClearML running locally. We were tracking experiments before we setup the queue and the workers and all that.
IrritableOwl63 can you confirm - we didn't setup our own server to, like, handle experiment tracking and such?