Reputation
Badges 1
96 × Eureka!but before I need to understand how parameters are processed. See my last question in my earlier https://app.slack.com/client/TT9ATQXJ5/CTK20V944/thread/CTK20V944-1603740766.425000
to be honest, I don't know if I will find it as it is a Kubernetes cluster (ok only 2 nodes) and might be installed to somewhere ...
I will check if I will find any trains configs on the systems, but they should be the defaults comming with the Helm installer
AgitatedDove14 no, I mean the exemple worked locally and on the server, but m, plot is shown only locally
AgitatedDove14 it seems a comparison of plots the orig. & cloned experiment is not possible. Is this a bug on the server?
api_server and web_server look ok(py38) wgo@NVidia-power:~/dev/Trains/trains$ curl
{"meta":{"id":"bb5cd73435fb4127b9509ce3a771e95b","trx":"bb5cd73435fb4127b9509ce3a771e95b","endpoint":{"name":"","requested_version":1.0,"actual_version":null},"result_code":400,"result_spath /","error_stack":null},"data":{}}(py38) wgo@NVidia-power:~/dev/Trains/trains$ curl
`
<!doctype html>
<html lang="en">
<head> <meta charset="utf-8"> <title>trains</title> <base href="/"> <meta name="vie...
Sounds good :) I'm currently trying to run an orca instance ... but without success
AgitatedDove14 regarding the credentials, will I need to take them out of my trains.conf, or might it be common practise to create a user for such pods instantiating additional workers listening on queues?
seems I'm wrong. The queues are there, but the workers are not
AgitatedDove14 unfortunately all tries to get any responce from the webUI failed π
(py38) wgo@NVidia-power : ~ $ ping 10.43.138.186
PING 10.43.138.186 (10.43.138.186) 56(84) Bytes Daten.
^C
--- 10.43.138.186 ping statistics ---
4 Pakete ΓΌbertragen, 0 empfangen, 100% Paketverlust, Zeit 3062ms
(py38) wgo@NVidia-power : ~ $ curl http://10.43.97.217:30080
^C
(py38) wgo@NVidia-power : ~ $ curl http://10.43.138.186
^C
(py38) wgo@NVidia-power : ~ $ curl http://10.43.138.186...
π but I still need the laod ballancer ...
nevermind some day I will have it running π
As I have to configure my router to forward the requests to my local server, I need to know the ports and protocoll settings (I expect TCP not UDP) I have to configure
` sdk {
# TRAINS - default SDK configuration
storage {
cache {
# Defaults to system temp folder / cache
default_base_dir: "~/.trains/cache"
}
direct_access: [
# Objects matching are considered to be available for direct access, i.e. they will not be downloaded
# or cached, and any download request will return a direct reference.
# Objects are specified in glob format, available for url and content_ty...
another question I have is, are the models been trained stored (I guess they are stored) in the mongodb or in the file system and which format is been used ?
the one I send you the snippet of the api {} config?
I think I understand now, that the trains.conf has to be located on the node running the trains-agent.
When starting an additional trains-agent not been instantiated by docker-compose so it is not part of the same network, I get problems finding the api_server. localhost:8008 for sure will not be. I dentified the IP of the server running in docker with docker inspect ... and edited ~/trains.conf using it, but unfortunately it still cannot find the apiserver π
` (py38) wgo@NVidi...
AgitatedDove14 the index astype(str) did the magic π thanks
Thanks a lot. I will let you know if I manged it :)
withif task.running_locally(): fig.show()
it works π
thanks you for the support
I ran an local (not dockerized) trains-agenttrains-agent daemon --queue training --create-queue --foreground
which enabled me to see the GPU load on the corresponding view π
Now I got another issue.
It seems when cloning an experiment, a virtual environment is been created with all the modules been identified to be used. Inside this environment the experiment is running.
Am I right?
Is this the case only for clones?
In my Python code I'm trying to read a pandas table which I stored i...
after adding the
import fastparquet
statement to the code, the reconstruction of an clone is working
` Summary - installed python packages:
...
- fastparquet==0.4.1
...
Environment setup completed successfully
Starting Task Execution:
...
modeller.py: error: the following arguments are required: --algorithm `unfortunately it raises the next issue.
If the script been used expects to get parameters via command line (which in Trains experiments are identified and stored as parameter when using...
the server name is correct, I have been able to upload the example ...
also the webserver pods log contains entries