
Reputation
Badges 1
96 × Eureka!AgitatedDove14 so far not, I just reuse the docker image as it is, and it is not using the gpu parameter at all. It will be the next step to create an own image running the agent with this parameter, but than I faced the error messages and the url http://apiserver:8008 which I don't understand
AgitatedDove14 I'm just before finalizing the article and have issues now in getting a clone processed by an agent:(
The problem is plotly and something called orca. First it was not installed, than it was not able to start a browser as I run the agent as different user in an su - ... shell
I will try to use the plotly optionauto_open=False
😞 when editing the composistion to use the configured host ip as apiserver the queued work is never processed 😞
AgitatedDove14 while playing (and documenting) the way to run clearml dockerized on the local machine, I noticed that the yml file https://github.com/allegroai/clearml-server/blob/master/docker/docker-compose.yml containsCLEARML_API_HOST:
http://apiserver:8008
I duplicated this configration (agent-services) section and adapted it to run the default queue hagent with the image allegroai/clearml-agent:latest
I hoped to have GPU support by this but so far haven't seen the GPU usage li...
file_server not(py38) wgo@NVidia-power:~/dev/Trains/trains$ curl
`
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">
<title>405 Method Not Allowed</title> <h1>Method Not Allowed</h1> <p>The method is not allowed for the requested URL.</p> `
Hi AgitatedDove14
seems I used a wrong ip for the API tests.
When contacting the dockers Rancher IP:30080 I get the trains webUI 🙂
strange, I would expect that it would answer also on the address the webserver image got assigned to
root@56a6f444f140:/var/lib/rancher# ping 10.42.0.106
PING 10.42.0.106 (10.42.0.106): 56 data bytes
64 bytes from 10.42.0.106: icmp_seq=0 ttl=64 time=0.063 ms
64 bytes from 10.42.0.106: icmp_seq=1 ttl=64 time=0.064 ms
64 bytes from 10.42.0.106: icmp_seq=2 ttl=6...
AgitatedDove14 the first plotly plot is fine and I added a second one which fails again 😞 I checked that the indes is of type str but the remote plot is again having integer as x-axis
The code is in github if you would like to check. I will proceed converting the index back and forth to find a way getting it running
ok, thanks. This is enough information. You don't need to check how much space is provided to the accounts
As I have to configure my router to forward the requests to my local server, I need to know the ports and protocoll settings (I expect TCP not UDP) I have to configure
this is a snippet of the YML configuration I'm currently usingagent-services: networks: - backend container_name: clearml-agent-services image: allegroai/clearml-agent-services:latest restart: unless-stopped privileged: true environment: CLEARML_HOST_IP: ${CLEARML_HOST_IP} CLEARML_WEB_HOST: ${CLEARML_WEB_HOST:-} CLEARML_API_HOST:
`
CLEARML_FILES_HOST: ${CLEARML_FILES_HOST:-}
CLEARML_API_ACCESS_KEY: ${CLEARML_API_ACCESS_KEY:-...
Hi Martin,
you are right. The Trains-agent is running with option cpu-only
` (py38) wgo@NVidia-power:~/dev/catwalk$ docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS
NAMES
b99d5103a43c allegroai/trains-agent-services:latest "/usr/agent/entrypoi…" 2 days ago Up 2 days
...
- how can I enable the tensorboard and have the graphs been stored in trains?
I ran an local (not dockerized) trains-agenttrains-agent daemon --queue training --create-queue --foreground
which enabled me to see the GPU load on the corresponding view 🙂
Now I got another issue.
It seems when cloning an experiment, a virtual environment is been created with all the modules been identified to be used. Inside this environment the experiment is running.
Am I right?
Is this the case only for clones?
In my Python code I'm trying to read a pandas table which I stored i...
another question I have is, are the models been trained stored (I guess they are stored) in the mongodb or in the file system and which format is been used ?
AgitatedDove14 the reason I'm asking is that I'm going to run the server in my home network and would like to run agents on virtual servers I run on a VPS provider.
the server name is correct, I have been able to upload the example ...
Sorry, but I don'T understand how the cloned experiment is been provided with parameters.
A task which is been cloned by Trains might get its parameter via task.set_parameters(dict)
this parameters are comming from soe magic analysis of the argparse been used in the script.
AgitatedDove14 when is the call to set_parameter(...) been performed? Is the argparse call been somehow redirected and will receive the data from Trains instead of getting them via sys.argv or wherever argparse is gettin...
to be honest, I don't know if I will find it as it is a Kubernetes cluster (ok only 2 nodes) and might be installed to somewhere ...
I will check if I will find any trains configs on the systems, but they should be the defaults comming with the Helm installer
yes, this works, but just for completeness I wanted to add it to the composition ... nevermind, maybe too much details for an article 😉
AgitatedDove14 unfortunately all tries to get any responce from the webUI failed 😞
(py38) wgo@NVidia-power : ~ $ ping 10.43.138.186
PING 10.43.138.186 (10.43.138.186) 56(84) Bytes Daten.
^C
--- 10.43.138.186 ping statistics ---
4 Pakete übertragen, 0 empfangen, 100% Paketverlust, Zeit 3062ms
(py38) wgo@NVidia-power : ~ $ curl http://10.43.97.217:30080
^C
(py38) wgo@NVidia-power : ~ $ curl http://10.43.138.186
^C
(py38) wgo@NVidia-power : ~ $ curl http://10.43.138.186...
AgitatedDove14 The problem I have with getting the ingress running ... seems to be caused by the fact that I'm running rancher in single node mode (using a docker image ...) where the port 80 is already in use so the webservice (WebUI) of trains cannot be mapped to the same port ...
Nevertheless I will continue with a real Kubernets cluster installation and try to get Trains + additional own agents running on it 😉
thanks so far for the support you provided. I will try to collect the i...
even when running these commands from within the docker container instance I do not get any responce 😞
root@56a6f444f140:/var/lib/rancher# curl http://10.43.97.217:30080
^C
root@56a6f444f140:/var/lib/rancher# curl http://10.43.138.186:8080
^C
root@56a6f444f140:/var/lib/rancher# curl http://10.43.138.186:8081
^C
root@56a6f444f140:/var/lib/rancher# curl http://10.43.138.186:8008
^C
root@56a6f444f140:/var/lib/rancher# curl http://10.43.97.217:8080
^C
root@56a6f444f140:/v...