Reputation
Badges 1
96 × Eureka!withif task.running_locally(): fig.show()
it works 🙂
thanks you for the support
seems I'm wrong. The queues are there, but the workers are not
this is a snippet of the YML configuration I'm currently usingagent-services: networks: - backend container_name: clearml-agent-services image: allegroai/clearml-agent-services:latest restart: unless-stopped privileged: true environment: CLEARML_HOST_IP: ${CLEARML_HOST_IP} CLEARML_WEB_HOST: ${CLEARML_WEB_HOST:-} CLEARML_API_HOST:
`
CLEARML_FILES_HOST: ${CLEARML_FILES_HOST:-}
CLEARML_API_ACCESS_KEY: ${CLEARML_API_ACCESS_KEY:-...
AgitatedDove14 if I would run an agent on a remote system, which ports do I need to open to let it work on a clearml-server?
Sounds good :) I'm currently trying to run an orca instance ... but without success
Thanks. I wanted to finalize it as it took me already much longer than I expected
or do you mean the machine I ran the experiment locally?
the one I send you the snippet of the api {} config?
to be honest, I don't know if I will find it as it is a Kubernetes cluster (ok only 2 nodes) and might be installed to somewhere ...
I will check if I will find any trains configs on the systems, but they should be the defaults comming with the Helm installer
the server name is correct, I have been able to upload the example ...
redis, mongo and elasticsearch looks also ok
ok, thanks. This is enough information. You don't need to check how much space is provided to the accounts
the log of the fileserver pod seems quite empty
` root@vmd62521:~# kubectl logs fileserver-6f49b74556-2m4n2 -n trains --all-containers
- Serving Flask app "fileserver" (lazy loading)
- Environment: production
WARNING: This is a development server. Do not use it in a production deployment.
Use a production WSGI server instead. - Debug mode: off
root@vmd62521:~#same to the agentservice
root@vmd62521:~# kubectl logs agentservices-56655788b6-rnbk4 apiserver-7d9cd59844-dfd5s -n train...
the ports, I'm not sure about
file_server not(py38) wgo@NVidia-power:~/dev/Trains/trains$ curl
`
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">
<title>405 Method Not Allowed</title> <h1>Method Not Allowed</h1> <p>The method is not allowed for the requested URL.</p> `
AgitatedDove14 I tried editing the ~/trains.conf on the system I start the dockerized trains server & agent but without success.
I tried to add the script you provided insinde api and sdk scope as well as outside everything, the result is still the same, wget is missing :(api{ ... <here> } sdk{ ... <here> } <and here>
I'm quite sure I need to edit the trains file inside a docker container, but this will be part of the and even if I would be able to chenge it, not the solution I'm lo...
AgitatedDove14 the first plotly plot is fine and I added a second one which fails again 😞 I checked that the indes is of type str but the remote plot is again having integer as x-axis
The code is in github if you would like to check. I will proceed converting the index back and forth to find a way getting it running
Does this differ to what I send earlier?
AgitatedDove14 I don't know why, but now it worksrunfile('/home/wgo/dev/Trains/trains/examples/reporting/text_reporting.py', wdir='/home/wgo/dev/Trains/trains/examples/reporting') TRAINS Task: overwriting (reusing) task id=b31459aa2d414ea7b5aaa8c467ee6ad3 This is standard error test 2020-12-12 11:51:44.841 | INFO | __main__:report_logs:26 - That's it, beautiful and simple logging! (using ANSI colors) TRAINS results page:
`
reporting text logs
This is standard output test
hello, th...
# TRAINS SDK configuration file api { # Notice: 'host' is the api server (default port 8008), not the web server. api_server:
web_server:
files_server:
`
# Credentials are generated using the webapp, /profile
# Override with os environment: TRAINS_API_ACCESS_KEY / TRAINS_API_SECRET_KEY
credentials {....}
}
sdk {
# TRAINS - default SDK configuration
`
AgitatedDove14 so far not, I just reuse the docker image as it is, and it is not using the gpu parameter at all. It will be the next step to create an own image running the agent with this parameter, but than I faced the error messages and the url http://apiserver:8008 which I don't understand
AgitatedDove14 the index astype(str) did the magic 🙂 thanks
AgitatedDove14 regarding the credentials, will I need to take them out of my trains.conf, or might it be common practise to create a user for such pods instantiating additional workers listening on queues?