Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
Hi, I Run The Trains Server In An Docker Container And Started Making Use Of Tasks ... My Tests Are Showed On The Projects Dashboard Which Is Realy Cool. What I Haven'T Found So Far Is A Way To Clean Up The System From The Tests I Did. I'M Able To Archive

Hi, I run the trains server in an docker container and started making use of tasks ...
My tests are showed on the Projects dashboard which is realy cool.
What I haven't found so far is a way to clean up the system from the tests I did. I'm able to archive the experiments, but is there also a way to delete them, or wipe complete projects?
Another point I see is, that in the workers & queses view the GPU usage is not been reported. Only CPU usage is been displayed. Do I need to configure the docker image somehow to get also the GPU load visisble? On a shell I can see significant GPU uses with nvop while running experiments, but nothing (even not 0 load) in the workers view.

  
  
Posted 4 years ago
Votes Newest

Answers 17


another question I have is, are the models been trained stored (I guess they are stored) in the mongodb or in the file system and which format is been used ?

  
  
Posted 4 years ago

WickedGoat98
The trains-agent-services docker is always CPU, the idea is put long lasting services there (like the auto cleanup or slack integration or HPO etc.)
To spin an agent with GPU on any machine (regardless of where the trains-server is) you can check the trains-agent readme.
https://github.com/allegroai/trains-agent#running-the-trains-agent

  
  
Posted 4 years ago

Hi Martin,
you are right. The Trains-agent is running with option cpu-only
(py38) wgo@NVidia-power:~/dev/catwalk$ docker ps CONTAINER ID       IMAGE                                                COMMAND                 CREATED            STATUS             PORTS                                          NAMES b99d5103a43c       allegroai/trains-agent-services:latest               "/usr/agent/entrypoi…"  2 days ago         Up 2 days                                                          trains-agent-services 16d20b75acf9       allegroai/trains:latest                              "/opt/trains/wrapper…"  2 days ago         Up 2 days          8008/tcp, 8080-8081/tcp, 0.0.0.0:8080->80/tcp  trains-webserver 205af33b09b1       allegroai/trains:latest                              "/opt/trains/wrapper…"  2 days ago         Up 2 days          0.0.0.0:8008->8008/tcp, 8080-8081/tcp          trains-apiserver 695f57cd5b16       allegroai/trains:latest                              "/opt/trains/wrapper…"  2 days ago         Up 2 days          8008/tcp, 8080/tcp, 0.0.0.0:8081->8081/tcp     trains-fileserver 9e85517ec9f7       redis:5.0                                            "docker-entrypoint.s…"  2 days ago         Up 2 days          0.0.0.0:6379->6379/tcp                         trains-redis 9719ab098a42       docker.elastic.co/elasticsearch/elasticsearch:7.6.2  "/usr/local/bin/dock…"  2 days ago         Up 2 days          0.0.0.0:9200->9200/tcp, 9300/tcp               trains-elastic 17f250415e92       mongo:3.6.5                                          "docker-entrypoint.s…"  2 days ago         Up 2 days          0.0.0.0:27017->27017/tcp                       trains-mongo (py38) wgo@NVidia-power:~/dev/catwalk$ docker exec -it b99d5103a43c bash root@b99d5103a43c:/usr/agent# ps ax    PID TTY     STAT  TIME COMMAND      1 ?       Ss    0:00 /bin/sh /usr/agent/entrypoint.sh     11 ?       Sl   18:39 /usr/bin/python3 /usr/local/bin/trains-agent daemon --services-mode --queue services --create-queue --docker ubuntu:18.04 --cpu-only     17 pts/0   Ss    0:00 bash     31 pts/0   R+    0:00 ps ax root@b99d5103a43c:/usr/agent#
I followed the instructions on https://allegro.ai/docs/deploying_trains/trains_server_linux_mac/ running it in docker.
Unfortunately I can't find any info on how to configure the container

  
  
Posted 4 years ago

  • how can I enable the tensorboard and have the graphs been stored in trains?
  
  
Posted 4 years ago

ok thanks, will need to run some tests later

  
  
Posted 4 years ago

Sorry, but I don'T understand how the cloned experiment is been provided with parameters.
A task which is been cloned by Trains might get its parameter via task.set_parameters(dict)
this parameters are comming from soe magic analysis of the argparse been used in the script.
AgitatedDove14 when is the call to set_parameter(...) been performed? Is the argparse call been somehow redirected and will receive the data from Trains instead of getting them via sys.argv or wherever argparse is getting them from? If so, why my cloned experiment is reporting missing mandatory arguments?
Starting Task Execution: TRAINS results page: usage: modeller.py [-h] [-v VERBOSE] [-s MONGODB_SERVER] [-a ASSET] [-d DATABASE] [-f FEATURE_COLLECTION] [-t TARGET_COLLECTION] [-c NR_CORES] [-m MODEL_ROOT] --algorithm ALGORITHM [ALGORITHM ...] [--use_trains] [--epochs EPOCHS] [--tracing] modeller.py: error: the following arguments are required: --algorithm

  
  
Posted 4 years ago

models been trained stored ...

mongodb will store url links, the upload itself is controlled via the "output_uri" argument to the Task
If None is provided, the Trains log the local stored model (i.e. link to where you stored your model), if you provide one, Trains will automatically upload the model (into a new subfolder) and store the link to that subfolder.

  • how can I enable the tensorboard and have the graphs been stored in trains?

Basically if you call Task.init all your TB is automatically also logged by trains (obviously you still have the TB files locally)

  
  
Posted 4 years ago

WickedGoat98 the mechanism of cloning and parameter overriding is working only when the trains-agent is launching the experiment. Think of it this way:
Manual execution: trains sends data to server
Automatic (trains-agent) execution: trains pulls data from the server
This applies for both the argparse and connect and connect configuration.
The trains code itself is acting differently when it is executed from the 'trains-agent' context.
Does that help clear things ?

  
  
Posted 4 years ago

Another point I see is, that in the workers & queses view the GPU usage is not been reported

It should be reported, if it is not, maybe you are running the trains-agent in cpu mode ? (try adding --gpus)

  
  
Posted 4 years ago

ok will read it later

  
  
Posted 4 years ago

Hi WickedGoat98

but is there also a way to delete them, or wipe complete projects?

https://github.com/allegroai/trains/issues/16

Auto cleanup service here:
https://github.com/allegroai/trains/blob/master/examples/services/cleanup/cleanup_service.py

  
  
Posted 4 years ago

well I managed to clone an experiment and adat its parameter on the trains server via browser.
If argparse is been used, no parameter must be defined as required. Instead it has to be managed by the script after parsing the parameter and something mandatory is missing to terminate.
Doing so worked fine for me 😁 at least for this part of work. Now fastparquet and missing packages are failing again...

  
  
Posted 4 years ago

after adding the
import fastparquet
statement to the code, the reconstruction of an clone is working
` Summary - installed python packages:
...

  • fastparquet==0.4.1
    ...
    Environment setup completed successfully
    Starting Task Execution:
    ...
    modeller.py: error: the following arguments are required: --algorithm `unfortunately it raises the next issue.
    If the script been used expects to get parameters via command line (which in Trains experiments are identified and stored as parameter when using argparse) it fails to start 😞
    I'm sure you have a solution for this.
    I could add an option enabling Trains to provide the parameters after command line parsing, but how are the parameters fit to the script?
  
  
Posted 4 years ago

thanks Martin

  
  
Posted 4 years ago

regarding the clean-up servide, do I need to run this as cron job, or does the trains server support a kind of add-ons where I need to copy the script to?

  
  
Posted 4 years ago

btw: at https://allegro.ai/docs/task.html#task.Task.enqueue the link to the 'Use Case Examples' is broken

  
  
Posted 4 years ago

I ran an local (not dockerized) trains-agent
trains-agent daemon --queue training --create-queue --foregroundwhich enabled me to see the GPU load on the corresponding view 🙂

Now I got another issue.
It seems when cloning an experiment, a virtual environment is been created with all the modules been identified to be used. Inside this environment the experiment is running.
Am I right?
Is this the case only for clones?

In my Python code I'm trying to read a pandas table which I stored in parque format. Unfortunately when running the clone (with changed parameter) I get an exception caused by a missing package

` raise ImportError(
ImportError: Unable to find a usable engine; tried using: 'pyarrow', 'fastparquet'.
A suitable version of pyarrow or fastparquet is required for parquet support.
Trying to import the above resulted in these errors:

  • Missing optional dependency 'pyarrow'. pyarrow is required for parquet support. Use pip or conda to install pyarrow.
  • Missing optional dependency 'fastparquet'. fastparquet is required for parquet support. Use pip or conda to install fastparquet. This I also had on my development system when I started using the https://pandas.pydata.org/pandas-docs/stable/user_guide/io.html#parquet format. Pandas needs a backend to be installed being able to handle parquet format. What I'm using locally is https://fastparquet.readthedocs.io/en/latest/install.html is loaded on demand by pandas. So I havent added an import fastparquet `explicitly in the code (I will do this soon to see if it resolves the exception).
    But I wonder about the exception raising only on cloned experiments.
    While writing this I think I understand it now. Running a script locally uses whatever has been installed locally and by instantiating a task the streams are redirected, configurations are analyzed and stored, ...
    When cloning experiments, they are been re-constructed out of this information and are running in an isolated environment. If needed packages have not been identified as such, they are missing ...

Well, realy cool stuff this Trains product 👍
Looking forward to dive deaper to it

  
  
Posted 4 years ago
1K Views
17 Answers
4 years ago
one year ago
Tags
Similar posts