I think it might be related to the new run overwriting in this location
We are getting the dataset like this:
clearml_dataset = Dataset.get(
dataset_id=config.get("dataset_id"), alias=config.get("dataset_alias")
)
dataset_dir = clearml_dataset.get_local_copy()
Trying this:
clearml_dataset = Dataset.get(
dataset_id=config.get("dataset_id"), alias=config.get("dataset_alias")
)
dataset_dir = clearml_dataset.get_local_copy()
destination_dir = os.path.join("/datasets", os.path.basename(dataset_dir))
shutil.copytree(dataset_dir, destination_dir)
results = model.train(
data=destination_dir + "/data.yaml", epochs=config.get("epochs"), device=0
)
Our current setup is one clearml agent per GPU on the same machine