Reputation
Badges 1
96 × Eureka!need to read about the PipelineController. On a first view to the example it looks like what I would like to do.
I I would like to schedule multiple actions like 30 time the same script with different parameter, it looks like the add_step is what I will need
but before I need to understand how parameters are processed. See my last question in my earlier https://app.slack.com/client/TT9ATQXJ5/CTK20V944/thread/CTK20V944-1603740766.425000
Hi Martin,
you are right. The Trains-agent is running with option cpu-only
` (py38) wgo@NVidia-power:~/dev/catwalk$ docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS
NAMES
b99d5103a43c allegroai/trains-agent-services:latest "/usr/agent/entrypoi…" 2 days ago Up 2 days
...
regarding the clean-up servide, do I need to run this as cron job, or does the trains server support a kind of add-ons where I need to copy the script to?
I ran an local (not dockerized) trains-agenttrains-agent daemon --queue training --create-queue --foreground
which enabled me to see the GPU load on the corresponding view 🙂
Now I got another issue.
It seems when cloning an experiment, a virtual environment is been created with all the modules been identified to be used. Inside this environment the experiment is running.
Am I right?
Is this the case only for clones?
In my Python code I'm trying to read a pandas table which I stored i...
after adding the
import fastparquet
statement to the code, the reconstruction of an clone is working
` Summary - installed python packages:
...
- fastparquet==0.4.1
...
Environment setup completed successfully
Starting Task Execution:
...
modeller.py: error: the following arguments are required: --algorithm `unfortunately it raises the next issue.
If the script been used expects to get parameters via command line (which in Trains experiments are identified and stored as parameter when using...
- how can I enable the tensorboard and have the graphs been stored in trains?
another question I have is, are the models been trained stored (I guess they are stored) in the mongodb or in the file system and which format is been used ?
btw: at https://allegro.ai/docs/task.html#task.Task.enqueue the link to the 'Use Case Examples' is broken
ok thanks, will need to run some tests later
Sorry, but I don'T understand how the cloned experiment is been provided with parameters.
A task which is been cloned by Trains might get its parameter via task.set_parameters(dict)
this parameters are comming from soe magic analysis of the argparse been used in the script.
AgitatedDove14 when is the call to set_parameter(...) been performed? Is the argparse call been somehow redirected and will receive the data from Trains instead of getting them via sys.argv or wherever argparse is gettin...
well I managed to clone an experiment and adat its parameter on the trains server via browser.
If argparse is been used, no parameter must be defined as required. Instead it has to be managed by the script after parsing the parameter and something mandatory is missing to terminate.
Doing so worked fine for me 😁 at least for this part of work. Now fastparquet and missing packages are failing again...
the picture seemed to be missing.
sorry I tried but can't upload the picture to here. So I add a link to it https://drive.google.com/file/d/1HYYKDOY09hnE-DeCTPdZXpKy7537g5Ka/view?usp=sharing
Hi Martin,
thanks for the reply.
The data I'm syncing by an data provider wich supports only an ftp connection....
I started a Ranche training, will need some time to be able to set-up my cluster before I can start using Trains ;)
Cool
I'm already impressed about what Trains does with just 2 lines of code
to be honest, I don't know if I will find it as it is a Kubernetes cluster (ok only 2 nodes) and might be installed to somewhere ...
I will check if I will find any trains configs on the systems, but they should be the defaults comming with the Helm installer
pi {
# Notice: 'host' is the api server (default port 8008), not the web server.
api_server: http://vmd63828.contaboserver.net:30008
web_server: http://vmd63828.contaboserver.net:30080
files_server: http://vmd63828.contaboserver.net:30081
..}
or do you mean the machine I ran the experiment locally?
the one I send you the snippet of the api {} config?
` sdk {
# TRAINS - default SDK configuration
storage {
cache {
# Defaults to system temp folder / cache
default_base_dir: "~/.trains/cache"
}
direct_access: [
# Objects matching are considered to be available for direct access, i.e. they will not be downloaded
# or cached, and any download request will return a direct reference.
# Objects are specified in glob format, available for url and content_ty...
# TRAINS SDK configuration file api { # Notice: 'host' is the api server (default port 8008), not the web server. api_server:
web_server:
files_server:
`
# Credentials are generated using the webapp, /profile
# Override with os environment: TRAINS_API_ACCESS_KEY / TRAINS_API_SECRET_KEY
credentials {....}
}
sdk {
# TRAINS - default SDK configuration
`
the server name is correct, I have been able to upload the example ...