AgitatedDove14 Hey, I just reproduce this. Whenever it happens, I also get a warning from torchvision:/home/koe1tv/anaconda3/envs/torch/lib/python3.7/site-packages/torchvision/io/video.py:105: UserWarning: The pts_unit 'pts' gives wrong results and will be removed in a follow-up version. Please use pts_unit 'sec'.
Unfortunately, I can't suppress this warning because I don't have access to the parameter mentioned in the warning.
For example, here are the two last log lines from my process:2020-09-11 18:34:50 /home/koe1tv/anaconda3/envs/torch/lib/python3.7/site-packages/torchvision/io/video.py:105: UserWarning: The pts_unit 'pts' gives wrong results and will be removed in a follow-up version. Please use pts_unit 'sec'.
--2020-09-11 18:34:52 2020-09-11 08:34:52,109 - trains.Task - WARNING - ### TASK STOPPED - USER ABORTED - STATUS CHANGED ###
Hi SoreDragonfly16
The warning you mention means that someone state of the experiment was changed to aborted
, which in term will actually kill the process.
What do you mean by "If I disable the logger," ?
SoreDragonfly16 could you test with Task.init using reuse_last_task_id=False
for example:task = Task.init('project', 'experiment', reuse_last_task_id=False)
The only thing that I can think of is running two experiments with the same project/name on the same machine, this will ensure every time you run the code, you create a new experiment.
SoreDragonfly16 could you reproduce the issue?
What's your OS? trains versions?
by "disable the logger" I mean not using trains at all, just in order to make sure the process doesn't stop by itself.
AgitatedDove14 Updated the Trains version to the mentioned version but it still stops. Regarding exceptions from subprocesses, torchvision doesn't show me any exception that I can handle.
Thanks for your support. My OS is Ubuntu 18.04.5 LTS and the trains version is 0.16.0. I can't run this code right now as my machine runs some other heavy stuff right now, but I'll try reproducing this as soon as It finishes
SoreDragonfly16 notice that if in the web UI you aborting a task it will do exactly what you described, print a message and quit the process. Any chance someone did that?
Also SoreDragonfly16 could you test with if the issue exists with trains==0.16.2rc0
?
SoreDragonfly16 the torchvision warning has nothing to do with the Trains
warning.
The Trains warning means that somehow someone changes the state of the Task from running (in_progress) to "stopped" (aborted). Could it be one of the subprocesses raised an exception ?