@<1523701435869433856:profile|SmugDolphin23> @<1523701087100473344:profile|SuccessfulKoala55> Yes, the second issue still consists, currently breaking our pipeline
Now worries! Just so I understand fully though: you were already using the patch with success from my branch. Now that it has been merged into transformers main branch you installed it from there and that's when you started having issues with not saving models? Then installing transformers 4.21.3 fixes it (which should have the old clearml integration even before the patch?)
i believe this is because of this code
None
Which initialized the task if clearml is installed… but a task already exists (because of the pipeline), it will replace it
Yes, and the old version only works without the patch.
I see the model on the artifacts tab, but it's not actually uploaded.
that makes more sense 🙂
would this work now as a workaround until the version is released?
Hi PricklyRaven28 ! What dict do you connect? Do you have a small script we could use to reproduce?
Hey @<1523701949617147904:profile|PricklyRaven28> , about the S3 loading issue. The path to the model in the artifact tab, is it an S3 bucket or a local path?
in the meantime, we should have fixed this. I will ping you when 1.9.1 is out to try it out!
yeah, it gets to that error because the previous issue is saved…i’ll try to work on a new example
Thanks! I'm checking now, but might take a little (meeting in between)
SmugDolphin23 BTW, this is using clearml and huggingface’s automatic logging… didn’t log something manual
for now we downgraded to 1.7.2, but of course prefer not to stay that way
@<1523701435869433856:profile|SmugDolphin23>
Hey 🙂
Any update?
We are having more issues with transformers and clearml in their new version.
The step that has transformers 4.25.1
isn’t able to upload artifacts.
If we downgrade transformers==4.21.3
it works
Hi @<1523701949617147904:profile|PricklyRaven28> sorry that this is happening. I tried to run your minimal example, but get a IndexError: Invalid key: 5872 is out of bounds for size 0
error. That said, I get the same error without the code running in a pipeline. There seems to be no difference between simply running the code and the pipeline (for me). Do you have an updated example, maybe also including getting a local copy of an artifact, so I can check?
` from clearml.automation import PipelineDecorator
from clearml import TaskTypes
@PipelineDecorator.component(task_type=TaskTypes.data_processing, cache=True)
def run_demo():
from transformers import AutoTokenizer, DataCollatorForTokenClassification, AutoModelForTokenClassification, TrainingArguments, Trainer
from datasets import load_dataset
dataset = load_dataset("conllpp")
model_checkpoint = 'bert-base-cased'
lr = 2e-5
num_train_epochs = 5
weight_decay = 0.01
seed = 1234
ner_feature = dataset["train"].features["ner_tags"]
label_names = ner_feature.feature.names
id2label = {str(i): label for i, label in enumerate(label_names)}
label2id = {v: k for k, v in id2label.items()}
tokenizer = AutoTokenizer.from_pretrained(model_checkpoint)
data_collator = DataCollatorForTokenClassification(tokenizer=tokenizer)
model = AutoModelForTokenClassification.from_pretrained(
model_checkpoint,
id2label=id2label,
label2id=label2id,
)
trainer_args = TrainingArguments(
'./tmp',
evaluation_strategy="epoch",
save_strategy="epoch",
learning_rate=lr,
num_train_epochs=num_train_epochs,
weight_decay=weight_decay,
seed=seed,
data_seed=seed,
load_best_model_at_end=True,
)
trainer = Trainer(
model=model,
args=trainer_args,
train_dataset=dataset["train"],
eval_dataset=dataset["validation"],
data_collator=data_collator,
tokenizer=tokenizer,
)
trainer.train()
@PipelineDecorator.pipeline(name="StuffToDelete", project=".Dev", version="0.0.2", pipeline_execution_queue="aws_cpu")
def pipeline():
run_demo()
if name == 'main':
PipelineDecorator.set_default_execution_queue("aws_cpu")
PipelineDecorator.run_locally()
pipeline() `
This isn’t a real working example, but it shows that on clearml 1.7.2 it passed initialization part (and has an error on training stuff which is ok)
And on 1.9.0 it errors before onTypeError: unsupported operand type(s) for +=: 'NoneType' and 'str'
Hi PricklyRaven28 , can you try with 1.9.1rc0?
confirming that only downgrading to transformers==4.21.3
without the patch worked....
This is a time bomb that eventually we won't be able to ignore... we will need to use new transformers code
@<1523701118159294464:profile|ExasperatedCrab78>
Hey again 🙂
I believe that the transformers patch wasn’t released yet right? we are getting into a problem where we need new features from transformers but can’t use because of this
I tried to work on a reproducible script but then i get errors that my clearml task is already initialized (also doesn’t happen on 1.7.2)
Hi @<1523701949617147904:profile|PricklyRaven28> ! We released ClearmlSDK 1.9.1 yesterday. Can you please try it?
This is the next step not being able to find the output of the last step
ValueError: Could not retrieve a local copy of artifact return_object, failed downloading
Hey @<1523701949617147904:profile|PricklyRaven28> I'm checking! Have you updated anything else and on which exact commit of transformers are you now?
I'm working with the patch, and installing transformers from github