Hi HelpfulHare30 , handling non-responsive tasks termination is done by a server watchdog, which can be configured using server settings. See https://clear.ml/docs/latest/docs/deploying_clearml/clearml_server_config#non-responsive-task-watchdog for more details. Since the documentation there is a bit lacking, I'll just point out you can also disable it completely using the
services.tasks.non_responsive_tasks_watchdog.enabled: false setting.
The issue I want to avoid is aborting of the dataset task that these regular tasks update.
HelpfulHare30 could you post a pseudo code of the dataset update ?
(My point is, I'm not sure the Dataset actually supports updating, as it need to reupload the previous delta snapshot). Wouldn't it be easier to add another child dataset and then use dataset.squash (like one would do in git) ?
SuccessfulKoala55 , I have the following structure now (maybe it's not best practice and you can suggest a better one). There is a sequence of tasks, that are run manually or from pipeline. Every task at the end updates some dataset. The dataset should be closed only after all the sequence is finished (and some task in the sequence can take more than two days). The issue I want to avoid is aborting of the dataset task that these regular tasks update.