Reputation
Badges 1
96 × Eureka!AgitatedDove14 while playing (and documenting) the way to run clearml dockerized on the local machine, I noticed that the yml file https://github.com/allegroai/clearml-server/blob/master/docker/docker-compose.yml containsCLEARML_API_HOST:
http://apiserver:8008
I duplicated this configration (agent-services) section and adapted it to run the default queue hagent with the image allegroai/clearml-agent:latest
I hoped to have GPU support by this but so far haven't seen the GPU usage li...
😞 when editing the composistion to use the configured host ip as apiserver the queued work is never processed 😞
AgitatedDove14 ok, but how to deploy a trains-agent?
AgitatedDove14 the reason I'm asking is that I'm going to run the server in my home network and would like to run agents on virtual servers I run on a VPS provider.
AgitatedDove14 I would like to publish additional 2 articles handling the use with Docker and Kubernetes. Docker I managed, but my Kubernetes knowledge is quite low. I managed to set-up K3S cluster which might be also worth an article, but I still habe not realy the understanding to add workers with agents to it...
It might take some time till I will write the Kubernetes stuff. Once I'm doing I will let you know
As I have to configure my router to forward the requests to my local server, I need to know the ports and protocoll settings (I expect TCP not UDP) I have to configure
AgitatedDove14 not sure how to make use of such config / where to add it
Is it to be added in the docker image when generating an own, or can I set this in the Web GUI as property of the experiment I cloned, shall it be added in the original script but type what kind of variable type is 'agent' of?
Thanks, will try on weekend to update the trains.conf
the apiserver pods reports quite a lot
AgitatedDove14 ok, and how much storage is an account allowed to use? Omce reached, will the oldest experiments been deleted ?
I have been able to make use of
image: allegroai/trains-agent:latest
in the docker-compose file.yml 🎉
now I will focus on getting it working on Rancher
stay tuned
yes, this works, but just for completeness I wanted to add it to the composition ... nevermind, maybe too much details for an article 😉
need to read about the PipelineController. On a first view to the example it looks like what I would like to do.
I I would like to schedule multiple actions like 30 time the same script with different parameter, it looks like the add_step is what I will need
I'm quite new to Kubernetes. What I have found is that the ports I expected, are used
` root@vmd62521:~# kubectl get services -n trains
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
mongo-service ClusterIP 10.43.99.44 <none> 27017/TCP 25h
webserver-service NodePort 10.43.49.21 <none> 80:30080/TCP 25h
redis ClusterIP 10.43.62.222 <none> 6379/TCP 25h
elasticsearch-service Clust...
AgitatedDove14 I fixed it. The index looked like a string but wasn't ...
AgitatedDove14 unfortunately I still have issues with the plot. After removing the first row I get a wierd empty remote plot where the axis is a counter instead of a date. Seems not to be clearml related and I need to get more in touch with plotly to analyze it.
regarding the list of agents, yes, the one additional I added I can see in the list
pi {
# Notice: 'host' is the api server (default port 8008), not the web server.
api_server: http://vmd63828.contaboserver.net:30008
web_server: http://vmd63828.contaboserver.net:30080
files_server: http://vmd63828.contaboserver.net:30081
..}
Thanks for the twiitter tweet.
The credentials are already deleted
Hi Martin,
you are right. The Trains-agent is running with option cpu-only
` (py38) wgo@NVidia-power:~/dev/catwalk$ docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS
NAMES
b99d5103a43c allegroai/trains-agent-services:latest "/usr/agent/entrypoi…" 2 days ago Up 2 days
...
AgitatedDove14 it seems a comparison of plots the orig. & cloned experiment is not possible. Is this a bug on the server?
even when running these commands from within the docker container instance I do not get any responce 😞
root@56a6f444f140:/var/lib/rancher# curl http://10.43.97.217:30080
^C
root@56a6f444f140:/var/lib/rancher# curl http://10.43.138.186:8080
^C
root@56a6f444f140:/var/lib/rancher# curl http://10.43.138.186:8081
^C
root@56a6f444f140:/var/lib/rancher# curl http://10.43.138.186:8008
^C
root@56a6f444f140:/var/lib/rancher# curl http://10.43.97.217:8080
^C
root@56a6f444f140:/v...
AgitatedDove14 thanks for the reply. I'm not sure if O understand how to check if the API is running propperly. I haven't seen the webUI so far since I don't know how ... I will continue trying to get it running ;)
Hi AgitatedDove14
seems I used a wrong ip for the API tests.
When contacting the dockers Rancher IP:30080 I get the trains webUI 🙂
strange, I would expect that it would answer also on the address the webserver image got assigned to
root@56a6f444f140:/var/lib/rancher# ping 10.42.0.106
PING 10.42.0.106 (10.42.0.106): 56 data bytes
64 bytes from 10.42.0.106: icmp_seq=0 ttl=64 time=0.063 ms
64 bytes from 10.42.0.106: icmp_seq=1 ttl=64 time=0.064 ms
64 bytes from 10.42.0.106: icmp_seq=2 ttl=6...