Something is changed by executing the program through agent, because I executed exactly the same code on exactly the same docker image and it doesn't produce this error.
Can you please elaborate more on what is happening in your code while this occurs, Can you add the full log?
The error is somehow connected to reinitializing task twice, I don't know what's the "true" way of using transformer's ClearMLCallback within clearml pipeline.
I have attached full log. This error happened during starting some standard transformers training loop.
The custom callback I have used is:
` class MyClearMLCallback(ClearMLCallback):
def init(self, *args, **kwargs):
self._task_name = kwargs.pop("task_name", None)
self._project_name = kwargs.pop("project_name", None)
super().init(*args, **kwargs)
def setup(self, args, state, model, tokenizer, **kwargs):
if self._clearml is None:
return
if state.is_world_process_zero:
logger.info("Automatic ClearML logging enabled.")
if self._clearml_task is None:
self._clearml_task = self._clearml.Task.init(
project_name=self._project_name,
task_name=self._task_name,
auto_connect_frameworks={"tensorboard": False, "pytorch": False},
output_uri=True,
)
self._initialized = True
logger.info("ClearML Task has been initialized.")
self._clearml_task.connect(args, "Args")
if hasattr(model, "config") and model.config is not None:
self._clearml_task.connect(model.config, "Model Configuration") `