Reputation
Badges 1
25 × Eureka!We should probably have a section on that (i.e. running two agents on the same GPU, then explain how top use it)
TRAINS_WORKER_NAME=first_agent trains-agent --gpus 0
andTRAINS_WORKER_NAME=second_agent trains-agent --gpus 0
MysteriousBee56 okay look for the folder ~/.trains/vcs_cache you will find the git repo there, just overwrite the content with your local copy
MysteriousBee56 what do you mean "delete a worker"
stop the agent running remotely ?
I think this one is on us, I don't think a search would have led you to the correct answer ...
I'll try to make sure they add something regrading the configuration π
MysteriousBee56
Well we don't want to ask sudo permission automatically, and usually setups do no change, but you can diffidently call this one before running the agent πsudo chmod 777 -R ~/.trains/
MysteriousBee56 Edit in your ~/trains.conf:api_server:
http://localhost:8008
toapi_server:
http://192.168.1.11:8008
and obliviously the same for web & files
I'll make sure we fix the trains-agent to output an error message instead of trying to silently keep accessing the API server
Getting you machine ip:
just run :ifconfig | grep 'inet addr:'
Then you should see a bunch of lines, pick the one that does not start with 127 or 172
Then to verify runping <my_ip_here>
BTW, we figure out thatΒ Β
'
Β is belong the echo
yep, when seeing the full command it is apparent
ProudMosquito87 Just a few pointers on how we convert the TB histograms to awesome (but less accurate) 3D surfaces.
First I have to admit, I almost never use these histograms, maybe to detect a plateau of if something goes really wrong...
The 3D surface is basically grouping all the histograms and then bucketing them (I think the default is 50 buckets) so that you get a general feel of what's going on, not necessary a detailed view. Bottom line, you are correct, the TB is the source of truth...
PompousParrot44 please try to reply on the thread, so we do not create a mess in the main channel π
What's the "working directory" in the execution section? Do you have package "test" in the installed packages?
PompousParrot44 I assume the folder structure is something like:
repo_root:
--> test
-----> scripts
If this is the case, make sure the ""working directory" is .
which means repository root
Let me know if you managed to get it working, then we can see if we can detect it automatically.
PompousParrot44 obviously you can just archive a task and run the cleanup service, it will actually delete archived tasks older than X days.
https://github.com/allegroai/trains/blob/master/examples/services/cleanup/cleanup_service.py
PompousParrot44 these are the default plotly colors. You can change any of the layout properties with the
https://github.com/allegroai/trains/blob/65a4aa7aa90fc867993cf0d5e36c214e6c044270/trains/logger.py#L600
PompousParrot44 , so you mean like a base conda env?
Configuring trains-agent to use conda is done here:
https://github.com/allegroai/trains-agent/blob/699d13bbb34649c7e5337b4187cda59b7fa6fd33/docs/trains.conf#L44
Then for every experiment trains-agent will create a new conda environment based on the requirements of that experiment.
You can tell it to inherit the base conda env (or the one it is running from, I think) by settingsystem_site_packages: true
https://github.com/allegroai/tr...
PompousParrot44
It should still create a new venv, but inherit the packages from the system-wide (or specific venv) installed packages. Meaning it will not reinstalled packages you already installed, but it will ive you the option of just replacing a specific package (or install a new one) without reinstalling the entire venv
hmm... try to run the trains-agent from the ml
environment with "system_site_packages: true", it might do the trick. Anyhow please let me know if it worked π
CloudyHamster42 you mean that when you set sdk.metrics.tensorboard_single_series_per_graph
to True and you rerun the experiment, you are still getting multiple series on the same graph?
What's your Trains version?
CloudyHamster42
RC probably in a few days, but notice that it will just remove the warnings, I still can't reproduce the double axis issue.
It will be helpful if you could send a small script to reproduce the problem.
Maybe this example code can help ? https://github.com/allegroai/trains/blob/master/examples/manual_reporting.py
CloudyHamster42 FYI the warning will not be shown in the next Trains version, the issue is now fixed, thank you π
Regrading the double axes, see if adding plt.clf()
helps. It seems the axes are leftover from the previous figure, that somehow are still there...
Hi RobustGoldfish9 ,
I'd much rather just have trains-agent just automatically build the image defined there than have to build the image separately and make it available for all the agents to pull.
Do you mean there is no docker image in the artifactory built based on your Dockerfile ?
RobustGoldfish9 I see.
So in theory spinning an experiment on an gent would be clone code -> build docker -> mount code -> execute code inside docker?
(no need for requirements etc.?)
I see, would having this feature solve it (i.e. base docker + bash init script)?
https://github.com/allegroai/trains/issues/236
trains-agent runs a container from that image, then clones ...
That is correct
I'd like the base_docker_image to not only be defined at runtime
I see, may I ask why not just build it once, push it into artifactory and then have trains-agent
use it? (it will be much faster)
Hi ImpressionableRaven99
In the UI the re is a download button when you hover over the graph.
Are you asking if there is a programmatic interface?
What is the use case for all experiments ?
Hi WhimsicalLion91
You can always explicitly send a value:from trains import Logger Logger.current_logger().report_scalar("title", "series", iteration=0, value=1337)
A full example can be found here:
https://github.com/allegroai/trains/blob/master/examples/reporting/scalar_reporting.py