How does your requirements.txt look like?
So how do you attach the pytorch requirement?
Can you add here the agent
section of your ~/clearml.conf
Also, in the original experiment, what pytorch version is detected?
What version of python is the agent machine running locally?
Does it supporttorch == 1.12.1
?
ExasperatedCrocodile76 , did you run the original experiment on linux machine with pip and the remote machine is linux with conda package manager?
@<1523703961872240640:profile|CrookedWalrus33> , pip instal clearml==1.5.3rc1
I think that might be the issue. Transfering from pip to Conda package managers can sometimes be problematic. Try to manually edit the requirements to reflect the settings in https://pytorch.org/
Yeah, looks like it reproduces. I suggest opening a GitHub issue to get this fixed 🙂
Hi @<1825704341311328256:profile|ManiacalSeahorse6> , you mean you have some existing users with each user having some data on their workspace and you would like to basically 'merge' the workspaces together?
Anything in Elastic? Can you add logs of the startup of the apiserver?
Note that you used an env variable, I want to try the config directly first 🙂
I'm guessing you want the files locally. Please try the following two:
https://clear.ml/docs/latest/docs/references/sdk/dataset#get_mutable_local_copy
https://clear.ml/docs/latest/docs/references/sdk/dataset#get_local_copy
Strange. Can you add your clearml.conf from the agent machine? Please make sure to obscure all secrets 🙂
Or are you trying to change something in the docker compose?
Hi @<1702492411105644544:profile|YummyGrasshopper29> , console logs are saved in Elastic. I would check on the status of your container
Hi ThankfulHedgehong21 ,
What versions of ClearML & ClearML-Agent are you using?
Also, can you provide a small code snippet to play with?
Hi DepressedFox45 ,
For the agent you'll need to run clearml-agent init
Hi @<1566596960691949568:profile|UpsetWalrus59> , I don't think you can pass it to clearml-agent init since this doesn't come up as any of the prompts, BUT, you can always create the file manually with all the fields filled in. What do you think?
You mentioned you are self deployed. When you deploy the server, one of containers deployed is the ES container. Did you not deploy the server via docker compose?
You must perform Task.init()
to have something reported 🙂
Hi @<1719162252994547712:profile|FloppyLeopard12> , not sure what you're trying to do, can you please elaborate?
DashingKoala39 , you'll need to configure each server individually 🙂
I think you can force the script diff to be empty with Task.set_script(diff="")
or maybe Task.set_script(diff=None)
https://clear.ml/docs/latest/docs/references/sdk/task#set_script
Hi SuperiorPanda77 , I'm not sure I understand, can you elaborate on the specific use case?
Thanks for the info! This happened when you had 2 spot instances running something, correct?
It really depends on how you want to work. The --docker
tag will make the agent run in docker mode, allowing it to spin docker containers to run the jobs inside
Do you see anything strange in the apiserver logs when restarting the server?
Hi @<1558624430622511104:profile|PanickyBee11> , how are you doing the multi node training?