AgitatedDove14 Updated the Trains version to the mentioned version but it still stops. Regarding exceptions from subprocesses, torchvision doesn't show me any exception that I can handle.
SoreDragonfly16 the torchvision warning has nothing to do with the Trains warning.
The Trains warning means that somehow someone changes the state of the Task from running (in_progress) to "stopped" (aborted). Could it be one of the subprocesses raised an exception ?
Thanks for your support. My OS is Ubuntu 18.04.5 LTS and the trains version is 0.16.0. I can't run this code right now as my machine runs some other heavy stuff right now, but I'll try reproducing this as soon as It finishes
AgitatedDove14 Hey, I just reproduce this. Whenever it happens, I also get a warning from torchvision:/home/koe1tv/anaconda3/envs/torch/lib/python3.7/site-packages/torchvision/io/video.py:105: UserWarning: The pts_unit 'pts' gives wrong results and will be removed in a follow-up version. Please use pts_unit 'sec'.Unfortunately, I can't suppress this warning because I don't have access to the parameter mentioned in the warning.
For example, here are the two last log lines from my process:2020-09-11 18:34:50 /home/koe1tv/anaconda3/envs/torch/lib/python3.7/site-packages/torchvision/io/video.py:105: UserWarning: The pts_unit 'pts' gives wrong results and will be removed in a follow-up version. Please use pts_unit 'sec'.--2020-09-11 18:34:52 2020-09-11 08:34:52,109 - trains.Task - WARNING - ### TASK STOPPED - USER ABORTED - STATUS CHANGED ###
Hi SoreDragonfly16
The warning you mention means that someone state of the experiment was changed to aborted , which in term will actually kill the process.
What do you mean by "If I disable the logger," ?
SoreDragonfly16 notice that if in the web UI you aborting a task it will do exactly what you described, print a message and quit the process. Any chance someone did that?
SoreDragonfly16 could you reproduce the issue?
What's your OS? trains versions?
SoreDragonfly16 could you test with Task.init using reuse_last_task_id=False for example:task = Task.init('project', 'experiment', reuse_last_task_id=False)The only thing that I can think of is running two experiments with the same project/name on the same machine, this will ensure every time you run the code, you create a new experiment.
by "disable the logger" I mean not using trains at all, just in order to make sure the process doesn't stop by itself.
Also SoreDragonfly16 could you test with if the issue exists with trains==0.16.2rc0 ?