Hey, We Are Using Clearml 1.9.0 With Transformers 4.25.1… And We Started Getting Errors That Do Not Reproduce In Earlier Versions (Only Works In 1.7.2 All 1.8.X Don’T Work):

Answered

Hey,
We are using clearml 1.9.0 with transformers 4.25.1… and we started getting errors that do not reproduce in earlier versions (only works in 1.7.2 all 1.8.x don’t work):

File "/tmp/tmp0you5mai.py", line 29, in train_entity_exraction_model train(source=source_path.absolute(), output=model_output_path.absolute(), seed=seed, **entity_extraction_trainer) File "/usr/src/lib/entity_extractions/train.py", line 74, in train trainer.train() File "/opt/conda/lib/python3.10/site-packages/transformers/trainer.py", line 1527, in train return inner_training_loop( File "/opt/conda/lib/python3.10/site-packages/transformers/trainer.py", line 1704, in _inner_training_loop self.control = self.callback_handler.on_train_begin(args, self.state, self.control) File "/opt/conda/lib/python3.10/site-packages/transformers/trainer_callback.py", line 353, in on_train_begin return self.call_event("on_train_begin", args, state, control) File "/opt/conda/lib/python3.10/site-packages/transformers/trainer_callback.py", line 397, in call_event result = getattr(callback, event)( File "/opt/conda/lib/python3.10/site-packages/transformers/integrations.py", line 1355, in on_train_begin self.setup(args, state, model, tokenizer, **kwargs) File "/opt/conda/lib/python3.10/site-packages/transformers/integrations.py", line 1345, in setup self._clearml_task.connect(args, "Args") File "/opt/conda/lib/python3.10/site-packages/clearml/task.py", line 1480, in connect return method(mutable, name=name) File "/opt/conda/lib/python3.10/site-packages/clearml/task.py", line 3449, in _connect_object a_dict = self._connect_dictionary(a_dict, name) File "/opt/conda/lib/python3.10/site-packages/clearml/task.py", line 3413, in _connect_dictionary flat_dict = self._arguments.copy_to_dict(flat_dict, prefix=name) File "/opt/conda/lib/python3.10/site-packages/clearml/backend_interface/task/args.py", line 508, in copy_to_dict self._task.set_parameter((prefix or '') + k, v) File "/opt/conda/lib/python3.10/site-packages/clearml/backend_interface/task/task.py", line 1281, in set_parameter self._set_parameters( File "/opt/conda/lib/python3.10/site-packages/clearml/backend_interface/task/task.py", line 1246, in _set_parameters description=create_description(), File "/opt/conda/lib/python3.10/site-packages/clearml/backend_interface/task/task.py", line 1237, in create_description created_description += "Values:\n" + ",\n".join( TypeError: unsupported operand type(s) for +=: 'NoneType' and 'str'

  				
Posted 
	2 years ago

					More  		
  Report
		
					PricklyRaven28
				
					0
					 × 1

Votes Newest

Answers 62

hi, yes we tried with the same result

  				
Posted 
	2 years ago

					More  		
  Report
		
					PricklyRaven28
				
					0
					 × 1

I am currently on vacation, I'll ask my team mates. But if not I'll get to it next week

  				
Posted 
	2 years ago

					More  		
  Report
		
					PricklyRaven28
				
					0
					 × 1

thank Lior

  				
Posted 
	2 years ago

					More  		
  Report
		
					SmugDolphin23
				
					0

Hi PricklyRaven28 just letting you know I still have this on my TODO, I'll update you as soon as I have something!

  				
Posted 
	2 years ago

					More  		
  Report
		
					ExasperatedCrab78
				
					0
					 × 1

ExasperatedCrab78 Sorry only saw this now,
Thanks for checking it!
Glad to see you found the issue, hope you find a way to fix the second one. for now we will continue using the previous version.
Would be glad if you can post when everything is fixed so we can advance our version.

  				
Posted 
	2 years ago

					More  		
  Report
		
					PricklyRaven28
				
					0
					 × 1

SmugDolphin23 SuccessfulKoala55 Yes, the second issue still consists, currently breaking our pipeline

  				
Posted 
	2 years ago

					More  		
  Report
		
					PricklyRaven28
				
					0
					 × 1

Hi PricklyRaven28 , can you try with 1.9.1rc0?

  				
Posted 
	2 years ago

					More  		
  Report
		
					SuccessfulKoala55
				
					0
					 × 1

I tried to work on a reproducible script but then i get errors that my clearml task is already initialized (also doesn’t happen on 1.7.2)

  				
Posted 
	2 years ago

					More  		
  Report
		
					PricklyRaven28
				
					0
					 × 1

Hi PricklyRaven28 ! What dict do you connect? Do you have a small script we could use to reproduce?

  				
Posted 
	2 years ago

					More  		
  Report
		
					SmugDolphin23
				
					0

BTW the code above is from clearml github so it’s the latest

  				
Posted 
	2 years ago

					More  		
  Report
		
					PricklyRaven28
				
					0
					 × 1

Hi PricklyRaven28 ! We released ClearmlSDK 1.9.1 yesterday. Can you please try it?

  				
Posted 
	2 years ago

					More  		
  Report
		
					SmugDolphin23
				
					0

PricklyRaven28 Please use this patch instead of the one previously shared. It excludes the dict hack :)

  				
Posted 
	2 years ago

					More  		
  Report
		
					ExasperatedCrab78
				
					0
					 × 1

Hey PricklyRaven28 I'm checking! Have you updated anything else and on which exact commit of transformers are you now?

  				
Posted 
	2 years ago

					More  		
  Report
		
					ExasperatedCrab78
				
					0
					 × 1

confirming that only downgrading to transformers==4.21.3 without the patch worked....
This is a time bomb that eventually we won't be able to ignore... we will need to use new transformers code

  				
Posted 
	2 years ago

					More  		
  Report
		
					PricklyRaven28
				
					0
					 × 1

Hey 🙂 Thanks for the update!

what i’m missing the is the point where you report to clearml between cast and casting back 🤔

  				
Posted 
	2 years ago

					More  		
  Report
		
					PricklyRaven28
				
					0
					 × 1

` from clearml.automation import PipelineDecorator
from clearml import TaskTypes

@PipelineDecorator.component(task_type=TaskTypes.data_processing, cache=True)
def run_demo():
from transformers import AutoTokenizer, DataCollatorForTokenClassification, AutoModelForTokenClassification, TrainingArguments, Trainer
from datasets import load_dataset

dataset = load_dataset("conllpp")

model_checkpoint = 'bert-base-cased'
lr = 2e-5
num_train_epochs  = 5
weight_decay = 0.01
seed = 1234

ner_feature = dataset["train"].features["ner_tags"]
label_names = ner_feature.feature.names
id2label = {str(i): label for i, label in enumerate(label_names)}
label2id = {v: k for k, v in id2label.items()}

tokenizer = AutoTokenizer.from_pretrained(model_checkpoint)

data_collator = DataCollatorForTokenClassification(tokenizer=tokenizer)

model = AutoModelForTokenClassification.from_pretrained(
    model_checkpoint,
    id2label=id2label,
    label2id=label2id,
)
trainer_args = TrainingArguments(
    './tmp',
    evaluation_strategy="epoch",
    save_strategy="epoch",
    learning_rate=lr,
    num_train_epochs=num_train_epochs,
    weight_decay=weight_decay,
    seed=seed,
    data_seed=seed,
    load_best_model_at_end=True,
)

trainer = Trainer(
    model=model,
    args=trainer_args,
    train_dataset=dataset["train"],
    eval_dataset=dataset["validation"],
    data_collator=data_collator,
    tokenizer=tokenizer,
)
trainer.train()

@PipelineDecorator.pipeline(name="StuffToDelete", project=".Dev", version="0.0.2", pipeline_execution_queue="aws_cpu")
def pipeline():
run_demo()

if name == 'main':
PipelineDecorator.set_default_execution_queue("aws_cpu")

PipelineDecorator.run_locally()

pipeline() `

This isn’t a real working example, but it shows that on clearml 1.7.2 it passed initialization part (and has an error on training stuff which is ok)

And on 1.9.0 it errors before on
TypeError: unsupported operand type(s) for +=: 'NoneType' and 'str'

  				
Posted 
	2 years ago

					More  		
  Report
		
					PricklyRaven28
				
					0
					 × 1

It's been accepted in master, but was not released yet indeed!

As for the other issue, it seems like we won't be adding support for non-string dict keys anytime soon. I'm thinking of adding a specific example/tutorial on how to work with Huggingface + ClearML so people can do it themselves.

For now (using the patch) the only thing you need to be careful about is to not connect a dict or object with ints as keys. If you do need to (e.g. ususally huggingface models need the id2label dict somewhere) just make sure to cast it to string before connecting it to ClearML and casting it back to int directly after. So that when ClearML changes the value, it's properly taken care of 🙂 My previous sample code is still valid!

  				
Posted 
	2 years ago

					More  		
  Report
		
					ExasperatedCrab78
				
					0
					 × 1

yeah, it gets to that error because the previous issue is saved…i’ll try to work on a new example

  				
Posted 
	2 years ago

					More  		
  Report
		
					PricklyRaven28
				
					0
					 × 1

I appreciate it!

  				
Posted 
	2 years ago

					More  		
  Report
		
					PricklyRaven28
				
					0
					 × 1

for now we downgraded to 1.7.2, but of course prefer not to stay that way

  				
Posted 
	2 years ago

					More  		
  Report
		
					PricklyRaven28
				
					0
					 × 1

Hey PricklyRaven28 , about the S3 loading issue. The path to the model in the artifact tab, is it an S3 bucket or a local path?

  				
Posted 
	2 years ago

					More  		
  Report
		
					EnthusiasticShrimp49
				
					0

i’ll try to work on something that works on 1.7.2

  				
Posted 
	2 years ago

					More  		
  Report
		
					PricklyRaven28
				
					0
					 × 1

tnx! keep me posted

  				
Posted 
	2 years ago

					More  		
  Report
		
					PricklyRaven28
				
					0
					 × 1

will check

  				
Posted 
	2 years ago

					More  		
  Report
		
					PricklyRaven28
				
					0
					 × 1

SmugDolphin23
Hey 🙂
Any update?

We are having more issues with transformers and clearml in their new version.
The step that has transformers 4.25.1 isn’t able to upload artifacts.
If we downgrade transformers==4.21.3 it works

  				
Posted 
	2 years ago

					More  		
  Report
		
					PricklyRaven28
				
					0
					 × 1

Looks like the first issue has been solved 🙂

i think the second one still consists, still checking

  				
Posted 
	2 years ago

					More  		
  Report
		
					PricklyRaven28
				
					0
					 × 1

This is the next step not being able to find the output of the last step

ValueError: Could not retrieve a local copy of artifact return_object, failed downloading

  				
Posted 
	2 years ago

					More  		
  Report
		
					PricklyRaven28
				
					0
					 × 1

ExasperatedCrab78
Here is an example that reproduces the second error

from clearml.automation import PipelineDecorator
from clearml import TaskTypes

@PipelineDecorator.component(task_type=TaskTypes.data_processing, cache=True)
def run_demo():
    from transformers import AutoTokenizer, DataCollatorForTokenClassification, AutoModelForSequenceClassification, TrainingArguments, Trainer
    from datasets import load_dataset
    import numpy as np
    import evaluate
    from pathlib import Path

    dataset = load_dataset("yelp_review_full")

    tokenizer = AutoTokenizer.from_pretrained("bert-base-cased")


    def tokenize_function(examples):
        return tokenizer(examples["text"], padding="max_length", truncation=True)
    
    
    def compute_metrics(eval_pred):
        logits, labels = eval_pred
        predictions = np.argmax(logits, axis=-1)
        return metric.compute(predictions=predictions, references=labels)

    
    small_train_dataset = dataset["train"].shuffle(seed=42).select(range(10))
    small_eval_dataset = dataset["test"].shuffle(seed=42).select(range(10))
    
    small_train_dataset = small_train_dataset.map(tokenize_function, batched=True)
    small_eval_dataset = small_eval_dataset.map(tokenize_function, batched=True)

    model = AutoModelForSequenceClassification.from_pretrained("bert-base-cased", num_labels=5)

    training_args = TrainingArguments(
        output_dir="test_trainer", 
        evaluation_strategy="epoch",
        # num_train_epoch=1,
    )
    
    metric = evaluate.load("accuracy")
    
    trainer = Trainer(
        model=model,
        args=training_args,
        train_dataset=small_train_dataset,
        eval_dataset=small_eval_dataset,
        compute_metrics=compute_metrics,
    )
    
    trainer.train()
    
    return Path('test_trainer')

@PipelineDecorator.component(task_type=TaskTypes.data_processing, cache=True)
def second_step(some_param):
    print("Success!")
    
@PipelineDecorator.pipeline(name="StuffToDelete", project=".Dev", version="0.0.2", pipeline_execution_queue="aws_cpu")
def pipeline():
    data = run_demo()
    second_step(data)

if __name__ == '__main__':
    PipelineDecorator.set_default_execution_queue("aws_cpu")
    
    PipelineDecorator.run_locally()
    
    pipeline()

  				
Posted 
	2 years ago

					More  		
  Report
		
					PricklyRaven28
				
					0
					 × 1

Damn it, you're right 😅

        # Allow ClearML access to the training args and allow it to override the arguments for remote execution
        args_class = type(training_args)
        args, changed_keys = cast_keys_to_string(training_args.to_dict())
        Task.current_task().connect(args)
        training_args = args_class(**cast_keys_back(args, changed_keys)[0])

  				
Posted 
	2 years ago

					More  		
  Report
		
					ExasperatedCrab78
				
					0
					 × 1

ExasperatedCrab78
Hey 🙂
Any updates on this? We need to use a new version of transformers because of another bug they have in an old version. so we can’t use the old transformers version anymore.

  				
Posted 
	2 years ago

					More  		
  Report
		
					PricklyRaven28
				
					0
					 × 1

Show more results

Write your answer

90K Views

62 Answers

2 years ago