Reputation
Badges 1
137 × Eureka!Absolute sense! Thanks a lot Martin, I thought it was being done by the backend!
Thanks, adding environment variables to the agentservice solved it, but for the agentgroup agent, I can't see any obvious way to inject environment variables. In the helm chart template I don't see any way to pass custom environment variables to the pod
You mean as output target for artifacts?
For example, for some of our models we create pdf reports, that we save in a folder in the NFS disk.
Next week I can take some screenshots if you need them, ai just closed the laptop and will be off for a couple of days :))
OK, so... when executed locally "train" prints:
` train:
SepalLength SepalWidth PetalLength PetalWidth Species
122 7.7 2.8 6.7 2.0 2.0
86 6.7 3.1 4.7 1.5 1.0
59 5.2 2.7 3.9 1.4 1.0
4 5.0 3.6 1.4 0.2 0.0
77 6.7 3.0 5.0 1.7 1.0
.. ... ... ... ... ......
My local clearml.conf is:# ClearML SDK configuration file api { # Notice: 'host' is the api server (default port 8008), not the web server. api_server: host web_server: host files_server: host # Credentials are generated using the webapp,
`
# Override with os environment: CLEARML_API_ACCESS_KEY / CLEARML_API_SECRET_KEY
credentials {"access_key": "access_key", "secret_key": "secret_key"}
}
sdk {
# ClearML - default SDK configuration
storage {
...
Hi Josh, the agents are running on top of K8s (I used the helm chart to deploy them, it uses K8s glue).
I'll add a sleep so that I have time to enter the pod, and get the clearml.conf and will send you the diff in a few minutes
using the --set
you adviced above right?
thanks a lot 🙂 that was quick 🙂
perfect, let me try 🙂 (thanks a lot for all the help!)
just to understand well the problems you helped me fix:
for elastic search it looked like I wasn't running the cluster with enough memory
but what happened to the FileServer? and how can I prevent it happening in a potential "production" deployment?
thanks a lot! So as long as we have the storageclass in our kubernetes cluster configured correctly, the new helm chart should work out of the box?
many thanks 🙂 I am going to play with ClearML a little bit and re-read carefully the thread to learn something from what you made me do today!
And yes, I am using the agents that come with the Helm chart from Clearml repository
well there are already processes in place.. we aim at migrating everything to ClearML, but we hoped we could do it gradually
(though so far I am not quite managing to make it work even using the right hosts and ports)
Hi Jake unfortunately I realized we put a loadbalancer, so any address like addess.domain, would ping
is there a way I can check whether the apiserver are reachable?
(like: https://clearml-apiserver.ds.bumble.dev/health http://ds.bumble.dev/health )
I am not aware of how clearml-dataset works, but I'll have a look 🙂
I can ping it without issues, but I am not sure if the communications are set correctly
OK I could connect with the SDK, so everything is working, I'd just like to get the right hosts shown in the UI when a new token is created
great! thanks a lot!
and one more question, in the values, I also see the values for the default tokens:
` credentials:
apiserver:
# -- Set for apiserver_key field
accessKey: "5442F3443MJMORWZA3ZH"
# -- Set for apiserver_secret field
secretKey: "BxapIRo9ZINi8x25CRxz8Wdmr2pQjzuWVB4PNASZqCtTyWgWVQ"
tests:
# -- Set for tests_user_key field
accessKey: "ENP39EQM4SLACGD5FXB7"
# -- Set for tests_user_secret field
secretKey: "lPcm0imbcBZ8mwgO7tpadutiS3gnJD05x9j7a...