Reputation
Badges 1
96 × Eureka!AgitatedDove14 if I would run an agent on a remote system, which ports do I need to open to let it work on a clearml-server?
AgitatedDove14 I still do not understand, how I can deploy the trains-agent docker image to my trains-server installation so the 'default' queue will be handled.
Once I can do this, it should not be a big thing to add additional workers for more queues.
I found a template for k8s but as I'm quite new to Kubernetes I don't know how to use it.
As I use Rancher I'm able to even edit the trains-agent deployment. I added an additional command to handle the default queue as well, but it seems not ...
the server name is correct, I have been able to upload the example ...
Thanks, will try on weekend to update the trains.conf
redis, mongo and elasticsearch looks also ok
AgitatedDove14 it seems a comparison of plots the orig. & cloned experiment is not possible. Is this a bug on the server?
this is a snippet of the YML configuration I'm currently usingagent-services: networks: - backend container_name: clearml-agent-services image: allegroai/clearml-agent-services:latest restart: unless-stopped privileged: true environment: CLEARML_HOST_IP: ${CLEARML_HOST_IP} CLEARML_WEB_HOST: ${CLEARML_WEB_HOST:-} CLEARML_API_HOST:
`
CLEARML_FILES_HOST: ${CLEARML_FILES_HOST:-}
CLEARML_API_ACCESS_KEY: ${CLEARML_API_ACCESS_KEY:-...
AgitatedDove14 not sure how to make use of such config / where to add it
Is it to be added in the docker image when generating an own, or can I set this in the Web GUI as property of the experiment I cloned, shall it be added in the original script but type what kind of variable type is 'agent' of?
Thanks for the twiitter tweet.
The credentials are already deleted
AgitatedDove14 unfortunately all tries to get any responce from the webUI failed π
(py38) wgo@NVidia-power : ~ $ ping 10.43.138.186
PING 10.43.138.186 (10.43.138.186) 56(84) Bytes Daten.
^C
--- 10.43.138.186 ping statistics ---
4 Pakete ΓΌbertragen, 0 empfangen, 100% Paketverlust, Zeit 3062ms
(py38) wgo@NVidia-power : ~ $ curl http://10.43.97.217:30080
^C
(py38) wgo@NVidia-power : ~ $ curl http://10.43.138.186
^C
(py38) wgo@NVidia-power : ~ $ curl http://10.43.138.186...
Thanks. I wanted to finalize it as it took me already much longer than I expected
Hi AgitatedDove14
seems I used a wrong ip for the API tests.
When contacting the dockers Rancher IP:30080 I get the trains webUI π
strange, I would expect that it would answer also on the address the webserver image got assigned to
root@56a6f444f140:/var/lib/rancher# ping 10.42.0.106
PING 10.42.0.106 (10.42.0.106): 56 data bytes
64 bytes from 10.42.0.106: icmp_seq=0 ttl=64 time=0.063 ms
64 bytes from 10.42.0.106: icmp_seq=1 ttl=64 time=0.064 ms
64 bytes from 10.42.0.106: icmp_seq=2 ttl=6...
seems I'm wrong. The queues are there, but the workers are not
to be honest, I don't know if I will find it as it is a Kubernetes cluster (ok only 2 nodes) and might be installed to somewhere ...
I will check if I will find any trains configs on the systems, but they should be the defaults comming with the Helm installer
AgitatedDove14 thanks for the reply. I'm not sure if O understand how to check if the API is running propperly. I haven't seen the webUI so far since I don't know how ... I will continue trying to get it running ;)
` sdk {
# TRAINS - default SDK configuration
storage {
cache {
# Defaults to system temp folder / cache
default_base_dir: "~/.trains/cache"
}
direct_access: [
# Objects matching are considered to be available for direct access, i.e. they will not be downloaded
# or cached, and any download request will return a direct reference.
# Objects are specified in glob format, available for url and content_ty...
AgitatedDove14 so far not, I just reuse the docker image as it is, and it is not using the gpu parameter at all. It will be the next step to create an own image running the agent with this parameter, but than I faced the error messages and the url http://apiserver:8008 which I don't understand
withif task.running_locally(): fig.show()
it works π
thanks you for the support
Sounds good :) I'm currently trying to run an orca instance ... but without success
the one I send you the snippet of the api {} config?
or do you mean the machine I ran the experiment locally?
need to read about the PipelineController. On a first view to the example it looks like what I would like to do.
I I would like to schedule multiple actions like 30 time the same script with different parameter, it looks like the add_step is what I will need
even when running these commands from within the docker container instance I do not get any responce π
root@56a6f444f140:/var/lib/rancher# curl http://10.43.97.217:30080
^C
root@56a6f444f140:/var/lib/rancher# curl http://10.43.138.186:8080
^C
root@56a6f444f140:/var/lib/rancher# curl http://10.43.138.186:8081
^C
root@56a6f444f140:/var/lib/rancher# curl http://10.43.138.186:8008
^C
root@56a6f444f140:/var/lib/rancher# curl http://10.43.97.217:8080
^C
root@56a6f444f140:/v...
π but I still need the laod ballancer ...
nevermind some day I will have it running π
AgitatedDove14 today I managed to run what I couldn't a month before:)
I didn't understand correctly what you wrote me that time.
The issue I had was, that I missed wget in the trains-agent image and was not able to run a system call of wget.
Now I mannaged to do so based on your imput you gave me by adding theagent.docker_preprocess_bash_script = [...]
in my trains.config, and it worked out of the box π
Basically this issue was the reason why I started learning how to create a Kube...