Hi @<1566596960691949568:profile|UpsetWalrus59> , I think this basically means you have an existing model and it's using it as the starting point.
you can find the different cache folders that clearml uses in ~/clearml.conf
Before injecting anything into the instances you need to spin them up somehow. This is achieved by the application that is running and the credentials provided. So the credentials need to be provided to the AWS application somehow.
You certainly can do it with the python APIClient OR through the requests library
Hi CurvedHedgehog15 , can you please provide a code snippet to play with?
AbruptHedgehog21 , can you go into the network tab of the developer tools and see which request is returning 500?
Anything in Elastic? Can you add logs of the startup of the apiserver?
Hi @<1774245260931633152:profile|GloriousGoldfish63> , what version did you deploy?
ExcitedSeaurchin87 , I think you can differentiate them by using different worker names. Try using the following environment variable when running the command: CLEARML_WORKER_NAME
I wonder, why do you want to run multiple workers on the same GPU?
Hi @<1559711623147425792:profile|PlainPelican41> , How are you running the pipeline? Where is agent running ?
SubstantialElk6 , the agent is designed to re-run in an environment as close as possible to the original. Can you please provide logs of the two experiments so we can compare? I'm not sure what the issue is. Do both computers have the same python versions?
And if you clone the the same experiment and run it on the same machine it will again download all packages?
I think if you provide an absolute path it should work š
Or do you have your own code snippet that reproduces this?
I can think of two solutions:
Fix local python environments and begin using virtual environments ( https://github.com/pyenv/pyenv for example) Use the agent in --docker
mode. You won't need to worry about python versions but you will need to install Docker on that machine.
SucculentBeetle7 please give an example of the path that is given to you by the web interface :)
Hi @<1577830989026037760:profile|EnormousGiraffe79> , can you please a self contained code snippet that reproduces this?
Hmmm I would guess something would be possible but I'm not sure how to go about it. Maybe @<1523701087100473344:profile|SuccessfulKoala55> or @<1523701994743664640:profile|AppetizingMouse58> can give some more input.
I think the issue is that the message isn't informative enough. I would suggest opening a GitHub issue on this requesting a better message. Regarding confirming - I'm not sure but this is the default behavior of Optuna. You can run a random or a grid search optimization and then you won't see those messages for sure.
What do you think?
Hello MotionlessCoral18 ,
Can you please add a log with the failure?
Are you using a self hosted server or the SaaS solution?
I'm happy you found a solution š
did you setup agent.git_pass
& agent.git_user
in clearml.conf
?
CluelessElephant89 , Hi!
It looks like there is a problem with the API server. Can you please look for the docker logs and see what errors that it prints and paste here š
Hi WittySeal28 , can you please paste here the entire console of what you did and what you pasted?
SoreDragonfly16 Hi, what is your usage when saving/loading those files? You can mute both the save/load messages but not each one separately.
Also, do you see all these files as input models in UI?
Hi @<1558624430622511104:profile|PanickyBee11> , I think this might be a bug. Please open a GitHub issue to follow up on this š