![Profile picture](https://clearml-web-assets.s3.amazonaws.com/scoold/avatars/WickedGoat98.png)
Reputation
Badges 1
96 × Eureka!AgitatedDove14 ok, but how to deploy a trains-agent?
Thanks, will try on weekend to update the trains.conf
AgitatedDove14 I still do not understand, how I can deploy the trains-agent docker image to my trains-server installation so the 'default' queue will be handled.
Once I can do this, it should not be a big thing to add additional workers for more queues.
I found a template for k8s but as I'm quite new to Kubernetes I don't know how to use it.
As I use Rancher I'm able to even edit the trains-agent deployment. I added an additional command to handle the default queue as well, but it seems not ...
AgitatedDove14 thanks for the reply. I'm not sure if O understand how to check if the API is running propperly. I haven't seen the webUI so far since I don't know how ... I will continue trying to get it running ;)
AgitatedDove14 The problem I have with getting the ingress running ... seems to be caused by the fact that I'm running rancher in single node mode (using a docker image ...) where the port 80 is already in use so the webservice (WebUI) of trains cannot be mapped to the same port ...
Nevertheless I will continue with a real Kubernets cluster installation and try to get Trains + additional own agents running on it 😉
thanks so far for the support you provided. I will try to collect the i...
AgitatedDove14 unfortunately all tries to get any responce from the webUI failed 😞
(py38) wgo@NVidia-power : ~ $ ping 10.43.138.186
PING 10.43.138.186 (10.43.138.186) 56(84) Bytes Daten.
^C
--- 10.43.138.186 ping statistics ---
4 Pakete übertragen, 0 empfangen, 100% Paketverlust, Zeit 3062ms
(py38) wgo@NVidia-power : ~ $ curl http://10.43.97.217:30080
^C
(py38) wgo@NVidia-power : ~ $ curl http://10.43.138.186
^C
(py38) wgo@NVidia-power : ~ $ curl http://10.43.138.186...
Thanks a lot. I will let you know if I manged it :)
AgitatedDove14 the index astype(str) did the magic 🙂 thanks
AgitatedDove14 no, I mean the exemple worked locally and on the server, but m, plot is shown only locally
AgitatedDove14 so far not, I just reuse the docker image as it is, and it is not using the gpu parameter at all. It will be the next step to create an own image running the agent with this parameter, but than I faced the error messages and the url http://apiserver:8008 which I don't understand
this is a snippet of the YML configuration I'm currently usingagent-services: networks: - backend container_name: clearml-agent-services image: allegroai/clearml-agent-services:latest restart: unless-stopped privileged: true environment: CLEARML_HOST_IP: ${CLEARML_HOST_IP} CLEARML_WEB_HOST: ${CLEARML_WEB_HOST:-} CLEARML_API_HOST:
`
CLEARML_FILES_HOST: ${CLEARML_FILES_HOST:-}
CLEARML_API_ACCESS_KEY: ${CLEARML_API_ACCESS_KEY:-...
I think I understand now, that the trains.conf has to be located on the node running the trains-agent.
When starting an additional trains-agent not been instantiated by docker-compose so it is not part of the same network, I get problems finding the api_server. localhost:8008 for sure will not be. I dentified the IP of the server running in docker with docker inspect ... and edited ~/trains.conf using it, but unfortunately it still cannot find the apiserver 😞
` (py38) wgo@NVidi...
withif task.running_locally(): fig.show()
it works 🙂
thanks you for the support
the apiserver pods reports quite a lot
Thanks. I wanted to finalize it as it took me already much longer than I expected
regarding the list of agents, yes, the one additional I added I can see in the list
AgitatedDove14 ok, and how much storage is an account allowed to use? Omce reached, will the oldest experiments been deleted ?
ok, thanks. This is enough information. You don't need to check how much space is provided to the accounts
As I have to configure my router to forward the requests to my local server, I need to know the ports and protocoll settings (I expect TCP not UDP) I have to configure
AgitatedDove14 while playing (and documenting) the way to run clearml dockerized on the local machine, I noticed that the yml file https://github.com/allegroai/clearml-server/blob/master/docker/docker-compose.yml containsCLEARML_API_HOST:
http://apiserver:8008
I duplicated this configration (agent-services) section and adapted it to run the default queue hagent with the image allegroai/clearml-agent:latest
I hoped to have GPU support by this but so far haven't seen the GPU usage li...
😞 when editing the composistion to use the configured host ip as apiserver the queued work is never processed 😞
AgitatedDove14 unfortunately I still have issues with the plot. After removing the first row I get a wierd empty remote plot where the axis is a counter instead of a date. Seems not to be clearml related and I need to get more in touch with plotly to analyze it.
seems I'm wrong. The queues are there, but the workers are not
the log of the fileserver pod seems quite empty
` root@vmd62521:~# kubectl logs fileserver-6f49b74556-2m4n2 -n trains --all-containers
- Serving Flask app "fileserver" (lazy loading)
- Environment: production
WARNING: This is a development server. Do not use it in a production deployment.
Use a production WSGI server instead. - Debug mode: off
root@vmd62521:~#same to the agentservice
root@vmd62521:~# kubectl logs agentservices-56655788b6-rnbk4 apiserver-7d9cd59844-dfd5s -n train...
I'm quite new to Kubernetes. What I have found is that the ports I expected, are used
` root@vmd62521:~# kubectl get services -n trains
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
mongo-service ClusterIP 10.43.99.44 <none> 27017/TCP 25h
webserver-service NodePort 10.43.49.21 <none> 80:30080/TCP 25h
redis ClusterIP 10.43.62.222 <none> 6379/TCP 25h
elasticsearch-service Clust...