Hi, i have a ClearML experiment that failed to load its scalar plots after a few hours of training, when i look at the log locally with Tensorboard it seems ...
one year ago
all of the experiments for this particular project behave like this,
the console works fine and im still able to view debug imagesTask.init()
is called in main
of the training script with user-specified project and taskname
this is how task gets created:
def create_clearml_task(
project_name,
task_name,
script,
args,
docker_args="",
docker_image_name="<docker image name>",
add_task_init_call=True,
requirements_file=None,
**kwargs):
print(
"Creating task: project_name: {project_name}, task_name: {task_name}, script:{script} and args: \n {args}"
.format(
project_name=project_name,
task_name=task_name,
script=script,
...
this is what it said on the console when i tried to load it
is there a way to retrieve clearml error logs for situations like this?
sorry, not quite sure i understand - i am calling Task.init
inside main. my plots loads on clearml correctly for the first few hours or so, but freezes after that