Reputation
Badges 1
19 × Eureka!What are you trying to run exactly, and what's the error? 🙂
Just to make sure I get your use-case, the agent itself is started on the host machine with the --docker
command, right?
Is there a preferred way to stop the agent?
Same agent command + --stop
GreasyPenguin14 did you use Task.set_offline()
before calling anything else?
If this is the failure, are you sure you configured ssh correctly?
It looks like you've set CLEARML_ENV
to an unsupported value
Are the errors in the elastic log the same? (it's hard to see in your pasted log)
Yeah, this looks like it did finally succeed connecting to the apiserver...
Experiment names marked in red
What exactly do you need to pass to the pod for the self-signed cert?
If you keep it as I wrote it, you'll need to modify the sdk configuration as well so it will know to look for it in the new place
Hi OutrageousSheep60 , the first problem is that clearml is trying to look for /path/to/creds.json which I assume is not a correct file path
Is the Glue significant in initialising clearml-agent after the pod is spawned?
Nope - once the pod is spawned the glue only monitors it externally using kubectl
- the same way you would, and will only clean it up if the task was explicitly aborted by the user.
The best thing to do it understand why the pod is hanging (can it be related to your apt repo? do you maybe have your own pypi repo?), and enhance the k8s glue to it can detect it and report it correctly
Cam you perhaps send the docker-compose in the current server?
Storage certificate are handled deperately
Hi TrickySheep9 ,
The chart is being tested as we speak, and will be updated in the next few days 🙂
Can you show the logs for the apiserver pod?
In this scenario, I assume this would have to be pulled somehow from the secret manager on a ClearML remote run - how would ClearML know which user's data should be pulled from the secret manager? I assume your remote executions are using the agent's docker mode?
Hi @<1654294820488744960:profile|DrabAlligator92> ,
Regarding the model metadata, you don't need to actually construct the types, just use a list of dictionaries, and they will be casted automatically, for example:
client.models.add_or_update_metadata(model="<model-id>", metadata=[{"key": "foo", "type": "int", "value": "1"}])
Hi @<1523704207914307584:profile|ObedientToad56> , I would assume that will require an integration of the engine to the clearml-serving code (and a PR 🙂 )
(BTW - in the AMI distributions, the docker-compose-yml
should reside in /home/ec2-user
)
Can I ask you to open a PR with this fix? 🙂
@<1528546301493383168:profile|ThoughtfulElephant4> how is the ClearML Files server configured on your machine? is it None ?
SpicyLion54 the ClearML agent will always create a venv - you can't provide your own venv, hence the best practice is using docker for that purpose 🙂
Alternatively, you can provide an extra_index_url
to the agent so it will also look for packages on a different server (you can simply install your own pypi mirror) - see https://github.com/allegroai/clearml-agent/blob/742cbf57670815a80a0c502ef61da12521e1e71f/docs/clearml.conf#L66