Reputation
Badges 1
96 × Eureka!Cool
I'm already impressed about what Trains does with just 2 lines of code
even when running these commands from within the docker container instance I do not get any responce 😞
root@56a6f444f140:/var/lib/rancher# curl http://10.43.97.217:30080
^C
root@56a6f444f140:/var/lib/rancher# curl http://10.43.138.186:8080
^C
root@56a6f444f140:/var/lib/rancher# curl http://10.43.138.186:8081
^C
root@56a6f444f140:/var/lib/rancher# curl http://10.43.138.186:8008
^C
root@56a6f444f140:/var/lib/rancher# curl http://10.43.97.217:8080
^C
root@56a6f444f140:/v...
another question I have is, are the models been trained stored (I guess they are stored) in the mongodb or in the file system and which format is been used ?
AgitatedDove14 unfortunately all tries to get any responce from the webUI failed 😞
(py38) wgo@NVidia-power : ~ $ ping 10.43.138.186
PING 10.43.138.186 (10.43.138.186) 56(84) Bytes Daten.
^C
--- 10.43.138.186 ping statistics ---
4 Pakete übertragen, 0 empfangen, 100% Paketverlust, Zeit 3062ms
(py38) wgo@NVidia-power : ~ $ curl http://10.43.97.217:30080
^C
(py38) wgo@NVidia-power : ~ $ curl http://10.43.138.186
^C
(py38) wgo@NVidia-power : ~ $ curl http://10.43.138.186...
Hi Martin,
you are right. The Trains-agent is running with option cpu-only
` (py38) wgo@NVidia-power:~/dev/catwalk$ docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS
NAMES
b99d5103a43c allegroai/trains-agent-services:latest "/usr/agent/entrypoi…" 2 days ago Up 2 days
...
also the webserver pods log contains entries
ok, thanks. This is enough information. You don't need to check how much space is provided to the accounts
🙂 but I still need the laod ballancer ...
nevermind some day I will have it running 😉
redis, mongo and elasticsearch looks also ok
AgitatedDove14 the first plotly plot is fine and I added a second one which fails again 😞 I checked that the indes is of type str but the remote plot is again having integer as x-axis
The code is in github if you would like to check. I will proceed converting the index back and forth to find a way getting it running
AgitatedDove14 unfortunately I still have issues with the plot. After removing the first row I get a wierd empty remote plot where the axis is a counter instead of a date. Seems not to be clearml related and I need to get more in touch with plotly to analyze it.
Hi AgitatedDove14
seems I used a wrong ip for the API tests.
When contacting the dockers Rancher IP:30080 I get the trains webUI 🙂
strange, I would expect that it would answer also on the address the webserver image got assigned to
root@56a6f444f140:/var/lib/rancher# ping 10.42.0.106
PING 10.42.0.106 (10.42.0.106): 56 data bytes
64 bytes from 10.42.0.106: icmp_seq=0 ttl=64 time=0.063 ms
64 bytes from 10.42.0.106: icmp_seq=1 ttl=64 time=0.064 ms
64 bytes from 10.42.0.106: icmp_seq=2 ttl=6...
pi {
# Notice: 'host' is the api server (default port 8008), not the web server.
api_server: http://vmd63828.contaboserver.net:30008
web_server: http://vmd63828.contaboserver.net:30080
files_server: http://vmd63828.contaboserver.net:30081
..}
file_server not(py38) wgo@NVidia-power:~/dev/Trains/trains$ curl `
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">
<title>405 Method Not Allowed</title> <h1>Method Not Allowed</h1> <p>The method is not allowed for the requested URL.</p> `
AgitatedDove14 while playing (and documenting) the way to run clearml dockerized on the local machine, I noticed that the yml file https://github.com/allegroai/clearml-server/blob/master/docker/docker-compose.yml containsCLEARML_API_HOST: http://apiserver:8008
I duplicated this configration (agent-services) section and adapted it to run the default queue hagent with the image allegroai/clearml-agent:latest
I hoped to have GPU support by this but so far haven't seen the GPU usage li...
` sdk {
# TRAINS - default SDK configuration
storage {
cache {
# Defaults to system temp folder / cache
default_base_dir: "~/.trains/cache"
}
direct_access: [
# Objects matching are considered to be available for direct access, i.e. they will not be downloaded
# or cached, and any download request will return a direct reference.
# Objects are specified in glob format, available for url and content_ty...
AgitatedDove14 it seems a comparison of plots the orig. & cloned experiment is not possible. Is this a bug on the server?
ok thanks, will need to run some tests later
or do you mean the machine I ran the experiment locally?
the apiserver pods reports quite a lot
AgitatedDove14 the index astype(str) did the magic 🙂 thanks
regarding the clean-up servide, do I need to run this as cron job, or does the trains server support a kind of add-ons where I need to copy the script to?
AgitatedDove14 The problem I have with getting the ingress running ... seems to be caused by the fact that I'm running rancher in single node mode (using a docker image ...) where the port 80 is already in use so the webservice (WebUI) of trains cannot be mapped to the same port ...
Nevertheless I will continue with a real Kubernets cluster installation and try to get Trains + additional own agents running on it 😉
thanks so far for the support you provided. I will try to collect the i...
As I have to configure my router to forward the requests to my local server, I need to know the ports and protocoll settings (I expect TCP not UDP) I have to configure
Thanks for the twiitter tweet.
The credentials are already deleted
Does this differ to what I send earlier?