Reputation
Badges 1
979 × Eureka!not really, because it is in the middle of the controller task, there are other things to be done afterwards (retrieving results, logging new artifacts, creating new tasks, etc)
awesome π
Maybe then we can extend task.upload_artifact
?def upload_artifact(..., wait_for_upload: bool = False): ... if wait_for_upload: self.flush(wait_for_uploads=True)
So I cannot ssh anymore to the agent after starting clearml-session on it
So I created a symlink in /opt/train/data -> /data
So either I specify in the clearml-agent agent.python_binary: python3.8 as you suggested, or I enforce the task locally to run with python3.8 using task.data.script.binary
I assume youβre using a self-hosted server?
Yes
Ho nice, thanks for pointing this out!
Sorry, what I meant is that it is not documented anywhere that the agent should run in docker mode, hence my confusion
ok, thanks SuccessfulKoala55 !
This is the issue, I will make sure wait_for_status() calls reload at the ends, so when the function returns you have the updated object
That sounds awesome! It will definitely fix my problem π
In the meantime: I now do:task.wait_for_status() task._artifacts_manager.flush() task.artifacts["output"].get()
But I still get KeyError: 'output'
... Was that normal? Will it work if I replace the second line with task.refresh
() ?
Iβd like to move to a setup where I donβt need these tricks
So it is there already, but commented out, any reason why?
Thanks AgitatedDove14 !
Could we add this task.refresh()
on the docs? Might be helpful for other users as well π OK! Maybe there is a middle ground: For artifacts already registered, returns simply the entry and for artifacts not existing, contact server to retrieve them
Thanks AgitatedDove14 ! I created a project with a default output destination to a s3 bucket but I don't have local access to this bucket (only agents have access to it for security reasons). Because of that, I cannot create a task in this project programmatically locally because it tries to access the bucket and fails. And there is no easy way to change the default output location (not in the web UI, not in the sdk)
yes, the only thing I changed is:install_requires=[ ... "my-dep @ git+
]
to:install_requires=[ ... "git+
"]
torch==1.7.1 git+
.
I am already trying with latest of pip π
Hey SuccessfulKoala55 , unfortunately this doesnβt work, because the dict contains others dicts, and only the first level dict becomes a dict, the inner dicts still are ProxyDictPostWrite
and will make OmegaConf.create fail
This is consistent: Each time I send a new task on the default queue, if trains-agent-1 has only one task running (the long one), it will pick another one. If I add one more experiment in the queue at that point (trains-agent-1 running two experiments at the same time), that experiment will stay in queue (trains-agent-2 and trains-agent-3 will not pick it because they also are running experiments)
line 13 is empty π€
No space, I will add and test π
hooo now I understand, thanks for clarifying AgitatedDove14 !
Hi CostlyOstrich36 , I am not using Hydra, only OmegaConf, so you mean just calling OmegaConf.load should be enough?