Reputation
Badges 1
100 × Eureka!SuccessfulKoala55 I found the temp files, they contain the supposedly worker id, which seems just fine
I've ran this 8 times:trains-agent --config-file /opt/trains/trains.conf daemon --detached --cpu-only --queue important_cpu_queue cpu_queue
The version is 0.16.2rc0 (a version Mushik gave me that supports local conda env)
It's important to say that this happens when I have more than like 4 workers but when I run thetrains-agent daemon --stop
With less than 4 workers it works well
I understand how this is problematic. This might require more thinking if you guys wish to support this.
I think I know what happens TimelyPenguin76
Could it be that trains automatically logs these images to plots?
Because when I removed the report_media/report_image the images were still logged into plots
Logger.current_logger().report_media(title=f"visualization images f{output_category}", series=output_path.split('.')[-2].split('/')[-1], iteration=1, local_path=output_path)
Well, I do exactly that but it still puts them under plots
TimelyPenguin76 it didn't help 😞
Great! Thanks 🙂
No, the task it was cloned from was created with Task.create, but there is a Task.init in the file that is run by Task.create
I think so. The issue is that I want to report only a sub set of the images (for example I create an image for every sample in the dataset but I want to display on trains only the top 10 with highest score) but when it's magically logged I have no control over this. What can be done?
Okay so in the end I've run it locally and it behaved as expected (no auto logging for matplotlib) but for trains agent it didn't work, it auto - logged it anyway. TimelyPenguin76
Nevermind, you can find it in the apiserver.conf
Oof, if all I have is a project bame to set? (Which could be a non existing project as well)
You can try copying all the contents of requirements.txt to the installed packages tab in the trains dashboard of your experiment (in the UI)
Hmm, I've changed my trains-server config location to use a config in a different location, and successfully set up in the second server the trains-agent. But I don't see any new worker created, why is that?
Something else, If I want to designate only some of the GPUs of a worker, how can I do that?
Found it in the init docs 🙂
Actually two machines with shared filesystem
Since my servers have a shared file system, the init process tells me that the configuration file already exists. Can I tell it to place it in another location? GrumpyPenguin23