Reputation
Badges 1
132 × Eureka!Or examples of, like, "select all experiments in project with iterations > 0"?
I know the documentation says that you can give it a wildcard or pathlib Path - but I'm still not quite sure exactly how to tell it "top-level files only, not subfolders".
This sort of behavior is what I was thinking about when I saw "wildcard or pathlib Path" listed as options
I suppose I could upload 200 different "datasets", rather than one dataset with 200 folders in it, but then clearml-data search would have 200 entries in it? It seemed like a good idea to put them all in one at the time
I will test both! Thanks for the ideas!
Long story, but in the other thread I couldn't install the particular version of transformers unless I removed it from "Installed Packages" and added it to setup script instead. So I took to just throwing in that list of packages.
I went to https://app.pro.clear.ml/profile and looked in the bottom right. But would this tell us about the version of the server run by Dan?
CostlyOstrich36 nice, thanks for the link. I know that in "info" on the experiments dashboard it includes gpu_type and started/completed times, I'll give it a go based on that
Good point! Any pointers to API docs to start looking?
A colleague was asking about it, and especially how hard it would be to, like, save off the "best" model instead of the last
Hang on, CostlyOstrich36 I just noticed that there's a "project compute time" on the dashboard? Do you know how that is calculated/what that is?
sounds good to me!
OK, definitely fix that in the snippet, lol
Or at least not conveniently
CostlyOstrich36 I get some weird results, for "active duration".
For example, several of the experiments show that their active duration is more than 90 days, but I definitely didn't run them that long.
As in, I edit Installed Packages, delete everything there, and put that particular list of packages.
One has an active duration of 185502. dividing that by 60 gives you minutes, oh I did the math wrong. Need to divide by 60 again to get hours,
What I'm curious about is how clearML hooks into that to know to upload the other artifacts such as http://optimizer.pt .
Oh, here's an example, a screenshot I took of the files in my Colab instance:
I think the model state is just post training loop (not inside the loop), no?
trainer_state.json gets updated every time a "checkpoint" gets saved. I've got that set to once an epoch.
My testing indicates that if the training gets interrupted, I can resume training from a saved checkpoint folder that includes trainer_state.json. It uses the info to determine which data to skip, where to pick back up again, etc
Alas, no luck. Uploaded the same things, did not upload trainer_state.json
I guess I could try and edit that, somehow. Hmm
So in theory we could hook into one of those functions and add a line to have ClearML upload that particular json we want
This was only a means to that end
SuccessfulKoala55 I think I just realized I had a misunderstanding. I don't think we are running a local server version of ClearML, no. We have a workstation running a queue/agents, but ClearML itself is via http://app.pro.clear.ml , I don't think we have ClearML running locally. We were tracking experiments before we setup the queue and the workers and all that.
IrritableOwl63 can you confirm - we didn't setup our own server to, like, handle experiment tracking and such?
something like this is what I'm looking for
Martin I found a different solution (hardcoding the parent tasks by hand), but I'm curious to hear what you discover!