
Reputation
Badges 1
96 × Eureka!AgitatedDove14 while playing (and documenting) the way to run clearml dockerized on the local machine, I noticed that the yml file https://github.com/allegroai/clearml-server/blob/master/docker/docker-compose.yml containsCLEARML_API_HOST:
http://apiserver:8008
I duplicated this configration (agent-services) section and adapted it to run the default queue hagent with the image allegroai/clearml-agent:latest
I hoped to have GPU support by this but so far haven't seen the GPU usage li...
to be honest, I don't know if I will find it as it is a Kubernetes cluster (ok only 2 nodes) and might be installed to somewhere ...
I will check if I will find any trains configs on the systems, but they should be the defaults comming with the Helm installer
I have been able to make use of
image: allegroai/trains-agent:latest
in the docker-compose file.yml 🎉
now I will focus on getting it working on Rancher
stay tuned
` sdk {
# TRAINS - default SDK configuration
storage {
cache {
# Defaults to system temp folder / cache
default_base_dir: "~/.trains/cache"
}
direct_access: [
# Objects matching are considered to be available for direct access, i.e. they will not be downloaded
# or cached, and any download request will return a direct reference.
# Objects are specified in glob format, available for url and content_ty...
AgitatedDove14 ok, but how to deploy a trains-agent?
ok, thanks. This is enough information. You don't need to check how much space is provided to the accounts
AgitatedDove14 unfortunately I still have issues with the plot. After removing the first row I get a wierd empty remote plot where the axis is a counter instead of a date. Seems not to be clearml related and I need to get more in touch with plotly to analyze it.
- how can I enable the tensorboard and have the graphs been stored in trains?
Thanks for the twiitter tweet.
The credentials are already deleted
Thanks. I wanted to finalize it as it took me already much longer than I expected
ok thanks, will need to run some tests later
Hi Martin,
you are right. The Trains-agent is running with option cpu-only
` (py38) wgo@NVidia-power:~/dev/catwalk$ docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS
NAMES
b99d5103a43c allegroai/trains-agent-services:latest "/usr/agent/entrypoi…" 2 days ago Up 2 days
...
Does this differ to what I send earlier?
AgitatedDove14 regarding the credentials, will I need to take them out of my trains.conf, or might it be common practise to create a user for such pods instantiating additional workers listening on queues?
AgitatedDove14 not sure how to make use of such config / where to add it
Is it to be added in the docker image when generating an own, or can I set this in the Web GUI as property of the experiment I cloned, shall it be added in the original script but type what kind of variable type is 'agent' of?
yes, this works, but just for completeness I wanted to add it to the composition ... nevermind, maybe too much details for an article 😉
As I have to configure my router to forward the requests to my local server, I need to know the ports and protocoll settings (I expect TCP not UDP) I have to configure
AgitatedDove14 ok, and how much storage is an account allowed to use? Omce reached, will the oldest experiments been deleted ?
AgitatedDove14 I tried editing the ~/trains.conf on the system I start the dockerized trains server & agent but without success.
I tried to add the script you provided insinde api and sdk scope as well as outside everything, the result is still the same, wget is missing :(api{ ... <here> } sdk{ ... <here> } <and here>
I'm quite sure I need to edit the trains file inside a docker container, but this will be part of the and even if I would be able to chenge it, not the solution I'm lo...
I'm quite new to Kubernetes. What I have found is that the ports I expected, are used
` root@vmd62521:~# kubectl get services -n trains
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
mongo-service ClusterIP 10.43.99.44 <none> 27017/TCP 25h
webserver-service NodePort 10.43.49.21 <none> 80:30080/TCP 25h
redis ClusterIP 10.43.62.222 <none> 6379/TCP 25h
elasticsearch-service Clust...
api_server and web_server look ok(py38) wgo@NVidia-power:~/dev/Trains/trains$ curl
{"meta":{"id":"bb5cd73435fb4127b9509ce3a771e95b","trx":"bb5cd73435fb4127b9509ce3a771e95b","endpoint":{"name":"","requested_version":1.0,"actual_version":null},"result_code":400,"result_spath /","error_stack":null},"data":{}}(py38) wgo@NVidia-power:~/dev/Trains/trains$ curl
`
<!doctype html>
<html lang="en">
<head> <meta charset="utf-8"> <title>trains</title> <base href="/"> <meta name="vie...
AgitatedDove14 so far not, I just reuse the docker image as it is, and it is not using the gpu parameter at all. It will be the next step to create an own image running the agent with this parameter, but than I faced the error messages and the url http://apiserver:8008 which I don't understand