Reputation
Badges 1
981 × Eureka!No I agree, it’s probably not worth it
ClearML has a task.set_initial_iteration , I used it as such:checkpoint = torch.load(checkpoint_fp, map_location="cuda:0") Checkpoint.load_objects(to_load=self.to_save, checkpoint=checkpoint) task.set_initial_iteration(engine.state.iteration)But still the same issue, I am not sure whether I use it correctly and if it’s a bug or not, AgitatedDove14 ? (I am using clearml 1.0.4rc1, clearml-agent 1.0.0)
Yes that’s correct - the weird thing is that the error shows the right detected region
and the agent says agent.cudnn_version = 0
And I do that each time I want to create a subtask. This way I am sure to retrieve the task if it already exists
Is there any channel where we can see when new self hosted server version are published?
See my answer in the issue - I am not using docker
This is consistent: Each time I send a new task on the default queue, if trains-agent-1 has only one task running (the long one), it will pick another one. If I add one more experiment in the queue at that point (trains-agent-1 running two experiments at the same time), that experiment will stay in queue (trains-agent-2 and trains-agent-3 will not pick it because they also are running experiments)
(I am not part of the awesome ClearML team, just a happy user 🙂 )
AppetizingMouse58 btw I had to delete the old logs index before creating the alias, otherwise ES won’t let me create an alias with the same name as an existing index
AgitatedDove14 I see that the default sample_frequency_per_sec=2. , but in the UI, I see that there isn’t such resolution (ie. it logs every ~120 iterations, corresponding to ~30 secs.) What is the difference with report_frequency_sec=30. ?
Hi SuccessfulKoala55 , not really wrong, rather I don't understand it, the docker image with the args after it
Just found yea, very cool! Thanks!
So the wheel that was working for me was this one: [torch-1.11.0+cu115-cp38-cp38-linux_x86_64.whl](https://download.pytorch.org/whl/cu115/torch-1.11.0%2Bcu115-cp38-cp38-linux_x86_64.whl)
but then why do I have to do task.connect_configuration(read_yaml(conf_path))._to_dict() ?
Why not task.connect_configuration(read_yaml(conf_path)) simply?
I mean what is the benefit of returning ProxyDictPostWrite instead of a dict?
I see what I described in https://allegroai-trains.slack.com/archives/CTK20V944/p1598522409118300?thread_ts=1598521225.117200&cid=CTK20V944 :
randomly, one of the two experiments is shown for that agent
AgitatedDove14 In theory yes there is no downside, in practice running an app inside docker inside a VM might introduce slowdowns. I guess it’s on me to check whether this slowdown is negligible or not
No space, I will add and test 🙂
yes, the only thing I changed is:install_requires=[ ... "my-dep @ git+ ]to:install_requires=[ ... "git+ "]
yes, because it won’t install the local package which has this setup.py with the problem in its install_requires described in my previous message
Ok to be fair I get the same curve even when I remove clearml from the snippet, not sure why
correct, you could also use
Task.create
that creates a Task but does not do any automagic.
Yes, I didn't use it so far because I didn't know what to expect since the doc states:
"Create a new, non-reproducible Task (experiment). This is called a sub-task."