Unanswered
Hey,
We Are Using Clearml 1.9.0 With Transformers 4.25.1… And We Started Getting Errors That Do Not Reproduce In Earlier Versions (Only Works In 1.7.2 All 1.8.X Don’T Work):
Hey @<1523701949617147904:profile|PricklyRaven28> , So as discussed above there were 2 issues. The first one is still waiting on the second, it's on the backlog of our devs and should be done soon(tm).
That said, in the meantime I also wanted to do fun stuff with transformers, so I've written a quick hack that deals with the bug. It's bascially 2 functions that keep track of which types of keys are in the dict.
def cast_keys_to_string(d, changed_keys=dict()):
nd = dict()
for key in d.keys():
if not isinstance(key, str):
casted_key = str(key)
changed_keys[casted_key] = key
else:
casted_key = key
if isinstance(d[key], dict):
nd[casted_key], changed_keys = cast_keys_to_string(d[key], changed_keys)
else:
nd[casted_key] = d[key]
return nd, changed_keys
def cast_keys_back(d, changed_keys):
nd = dict()
for key in d.keys():
if key in changed_keys:
original_key = changed_keys[key]
else:
original_key = key
if isinstance(d[key], dict):
nd[original_key], changed_keys = cast_keys_back(d[key], changed_keys)
else:
nd[original_key] = d[key]
return nd, changed_keys
You can then use them like this:
training_args = TrainingArguments(
output_dir="my_awesome_model",
learning_rate=2e-5,
per_device_train_batch_size=16,
per_device_eval_batch_size=16,
dataloader_num_workers=0,
num_train_epochs=2,
weight_decay=0.01,
evaluation_strategy="epoch",
save_strategy="epoch",
load_best_model_at_end=True
)
# Allow ClearML access to the training args and allow it to override the arguments for remote execution
args_class = type(training_args)
args, changed_keys = cast_keys_to_string(training_args.to_dict())
training_args = args_class(**cast_keys_back(args, changed_keys)[0])
self.trainer = Trainer(
model=self.model,
args=training_args,
train_dataset=tokenized_dataset["train"],
eval_dataset=tokenized_dataset["test"],
tokenizer=self.tokenizer,
data_collator=data_collator,
compute_metrics=self.compute_metrics,
)
self.trainer.train()
This "hack" in combination with the patch to Huggingface from above should work 🙂 That said, it is a hack, so a production version of this should be there soon. I'll let you know when that happens!
145 Views
0
Answers
one year ago
one year ago