Can you please elaborate more on what is happening in your code while this occurs, Can you add the full log?
I have attached full log. This error happened during starting some standard transformers training loop.
Something is changed by executing the program through agent, because I executed exactly the same code on exactly the same docker image and it doesn't produce this error.
The custom callback I have used is:
` class MyClearMLCallback(ClearMLCallback):
def init(self, *args, **kwargs):
self._task_name = kwargs.pop("task_name", None)
self._project_name = kwargs.pop("project_name", None)
super().init(*args, **kwargs)
def setup(self, args, state, model, tokenizer, **kwargs):
if self._clearml is None:
return
if state.is_world_process_zero:
logger.info("Automatic ClearML logging enabled.")
if self._clearml_task is None:
self._clearml_task = self._clearml.Task.init(
project_name=self._project_name,
task_name=self._task_name,
auto_connect_frameworks={"tensorboard": False, "pytorch": False},
output_uri=True,
)
self._initialized = True
logger.info("ClearML Task has been initialized.")
self._clearml_task.connect(args, "Args")
if hasattr(model, "config") and model.config is not None:
self._clearml_task.connect(model.config, "Model Configuration") `
The error is somehow connected to reinitializing task twice, I don't know what's the "true" way of using transformer's ClearMLCallback within clearml pipeline.