Hi HelpfulHare30 , handling non-responsive tasks termination is done by a server watchdog, which can be configured using server settings. See https://clear.ml/docs/latest/docs/deploying_clearml/clearml_server_config#non-responsive-task-watchdog for more details. Since the documentation there is a bit lacking, I'll just point out you can also disable it completely using the services.tasks.non_responsive_tasks_watchdog.enabled: false
setting.
Hi SuccessfulKoala55 Thank you for response. So, it's not possible if we use community server, right?
Do I understand right that I can avoid task (including dataset termination if I update it somehow once a period (say, sending a log line)?
Well, any task that is running using the SDK actually always pings the server in the background
So, it's not possible if we use community server, right?
You're right, However I'm not sure this will be an issue for you at all
SuccessfulKoala55 , I have the following structure now (maybe it's not best practice and you can suggest a better one). There is a sequence of tasks, that are run manually or from pipeline. Every task at the end updates some dataset. The dataset should be closed only after all the sequence is finished (and some task in the sequence can take more than two days). The issue I want to avoid is aborting of the dataset task that these regular tasks update.
The issue I want to avoid is aborting of the dataset task that these regular tasks update.
HelpfulHare30 could you post a pseudo code of the dataset update ?
(My point is, I'm not sure the Dataset actually supports updating, as it need to reupload the previous delta snapshot). Wouldn't it be easier to add another child dataset and then use dataset.squash (like one would do in git) ?
AgitatedDove14 I didn't know about dataset.squish(). Thank you. I'll check this variant today