Hey, We Are Using Clearml 1.9.0 With Transformers 4.25.1… And We Started Getting Errors That Do Not Reproduce In Earlier Versions (Only Works In 1.7.2 All 1.8.X Don’T Work):

Answered

Hey,
We are using clearml 1.9.0 with transformers 4.25.1… and we started getting errors that do not reproduce in earlier versions (only works in 1.7.2 all 1.8.x don’t work):

File "/tmp/tmp0you5mai.py", line 29, in train_entity_exraction_model train(source=source_path.absolute(), output=model_output_path.absolute(), seed=seed, **entity_extraction_trainer) File "/usr/src/lib/entity_extractions/train.py", line 74, in train trainer.train() File "/opt/conda/lib/python3.10/site-packages/transformers/trainer.py", line 1527, in train return inner_training_loop( File "/opt/conda/lib/python3.10/site-packages/transformers/trainer.py", line 1704, in _inner_training_loop self.control = self.callback_handler.on_train_begin(args, self.state, self.control) File "/opt/conda/lib/python3.10/site-packages/transformers/trainer_callback.py", line 353, in on_train_begin return self.call_event("on_train_begin", args, state, control) File "/opt/conda/lib/python3.10/site-packages/transformers/trainer_callback.py", line 397, in call_event result = getattr(callback, event)( File "/opt/conda/lib/python3.10/site-packages/transformers/integrations.py", line 1355, in on_train_begin self.setup(args, state, model, tokenizer, **kwargs) File "/opt/conda/lib/python3.10/site-packages/transformers/integrations.py", line 1345, in setup self._clearml_task.connect(args, "Args") File "/opt/conda/lib/python3.10/site-packages/clearml/task.py", line 1480, in connect return method(mutable, name=name) File "/opt/conda/lib/python3.10/site-packages/clearml/task.py", line 3449, in _connect_object a_dict = self._connect_dictionary(a_dict, name) File "/opt/conda/lib/python3.10/site-packages/clearml/task.py", line 3413, in _connect_dictionary flat_dict = self._arguments.copy_to_dict(flat_dict, prefix=name) File "/opt/conda/lib/python3.10/site-packages/clearml/backend_interface/task/args.py", line 508, in copy_to_dict self._task.set_parameter((prefix or '') + k, v) File "/opt/conda/lib/python3.10/site-packages/clearml/backend_interface/task/task.py", line 1281, in set_parameter self._set_parameters( File "/opt/conda/lib/python3.10/site-packages/clearml/backend_interface/task/task.py", line 1246, in _set_parameters description=create_description(), File "/opt/conda/lib/python3.10/site-packages/clearml/backend_interface/task/task.py", line 1237, in create_description created_description += "Values:\n" + ",\n".join( TypeError: unsupported operand type(s) for +=: 'NoneType' and 'str'

  				
Posted 
	one year ago

					More
				  		
  Report
		
					PricklyRaven28
				
					0
					 × 1

Votes Newest

Answers 62

` args.py #504:
for k, v in dictionary.items():
# if key is not present in the task's parameters, assume we didn't get this far when running
# in non-remote mode, and just add it to the task's parameters
if k not in parameters:
self._task.set_parameter((prefix or '') + k, v)
continue

task.py #1266:
def set_parameter(self, name, value, description=None, value_type=None):
# type: (str, str, Optional[str], Optional[Any]) -> ()
"""
Set a single Task parameter. This overrides any previous value for this parameter.

    :param name: The parameter name.
    :param value: The parameter value.
    :param description: The parameter description.
    :param value_type: The type of the parameters (cast to string and store)
    """
    if not Session.check_min_api_version('2.9'):
        # not supported yet
        description = None
        value_type = None

    self._set_parameters(
        {name: value}, __update=True,
        __parameters_descriptions={name: description},
        __parameters_types={name: value_type}
    )

task.py #1227:
def create_description():
if org_param and org_param.description:
return org_param.description
created_description = ""
if org_k in descriptions:
created_description = descriptions[org_k]
if isinstance(v, Enum):
# append enum values to description
if created_description:
created_description += "\n"
created_description += "Values:\n" + ",\n".join(
[enum_key for enum_key in type(v).dict.keys() if not enum_key.startswith("_")]
)
return created_description `We can see from this code that the description will always be None (because copy_to_dict never passes a description, it defaults to None and is always put in the descriptions dict as None), and if the arg is an Enum it will always throw the exception

  				
Posted 
	one year ago

					More
				  		
  Report
		
					PricklyRaven28
				
					0
					 × 1

Hi PricklyRaven28 , can you try with 1.9.1rc0?

  				
Posted 
	one year ago

					More
				  		
  Report
		
					SuccessfulKoala55
				
					0
					 × 1

hi, yes we tried with the same result

  				
Posted 
	one year ago

					More
				  		
  Report
		
					PricklyRaven28
				
					0
					 × 1

BTW the code above is from clearml github so it’s the latest

  				
Posted 
	one year ago

					More
				  		
  Report
		
					PricklyRaven28
				
					0
					 × 1

We'll check it out 👍

  				
Posted 
	one year ago

					More
				  		
  Report
		
					SuccessfulKoala55
				
					0
					 × 1

for now we downgraded to 1.7.2, but of course prefer not to stay that way

  				
Posted 
	one year ago

					More
				  		
  Report
		
					PricklyRaven28
				
					0
					 × 1

of course

  				
Posted 
	one year ago

					More
				  		
  Report
		
					SuccessfulKoala55
				
					0
					 × 1

Hi PricklyRaven28 ! What dict do you connect? Do you have a small script we could use to reproduce?

  				
Posted 
	one year ago

					More
				  		
  Report
		
					SmugDolphin23
				
					0

I tried to work on a reproducible script but then i get errors that my clearml task is already initialized (also doesn’t happen on 1.7.2)

  				
Posted 
	one year ago

					More
				  		
  Report
		
					PricklyRaven28
				
					0
					 × 1

i’ll try to work on something that works on 1.7.2

  				
Posted 
	one year ago

					More
				  		
  Report
		
					PricklyRaven28
				
					0
					 × 1

SmugDolphin23 BTW, this is using clearml and huggingface’s automatic logging… didn’t log something manual

  				
Posted 
	one year ago

					More
				  		
  Report
		
					PricklyRaven28
				
					0
					 × 1

` from clearml.automation import PipelineDecorator
from clearml import TaskTypes

@PipelineDecorator.component(task_type=TaskTypes.data_processing, cache=True)
def run_demo():
from transformers import AutoTokenizer, DataCollatorForTokenClassification, AutoModelForTokenClassification, TrainingArguments, Trainer
from datasets import load_dataset

dataset = load_dataset("conllpp")

model_checkpoint = 'bert-base-cased'
lr = 2e-5
num_train_epochs  = 5
weight_decay = 0.01
seed = 1234

ner_feature = dataset["train"].features["ner_tags"]
label_names = ner_feature.feature.names
id2label = {str(i): label for i, label in enumerate(label_names)}
label2id = {v: k for k, v in id2label.items()}

tokenizer = AutoTokenizer.from_pretrained(model_checkpoint)

data_collator = DataCollatorForTokenClassification(tokenizer=tokenizer)

model = AutoModelForTokenClassification.from_pretrained(
    model_checkpoint,
    id2label=id2label,
    label2id=label2id,
)
trainer_args = TrainingArguments(
    './tmp',
    evaluation_strategy="epoch",
    save_strategy="epoch",
    learning_rate=lr,
    num_train_epochs=num_train_epochs,
    weight_decay=weight_decay,
    seed=seed,
    data_seed=seed,
    load_best_model_at_end=True,
)

trainer = Trainer(
    model=model,
    args=trainer_args,
    train_dataset=dataset["train"],
    eval_dataset=dataset["validation"],
    data_collator=data_collator,
    tokenizer=tokenizer,
)
trainer.train()

@PipelineDecorator.pipeline(name="StuffToDelete", project=".Dev", version="0.0.2", pipeline_execution_queue="aws_cpu")
def pipeline():
run_demo()

if name == 'main':
PipelineDecorator.set_default_execution_queue("aws_cpu")

PipelineDecorator.run_locally()

pipeline() `

This isn’t a real working example, but it shows that on clearml 1.7.2 it passed initialization part (and has an error on training stuff which is ok)

And on 1.9.0 it errors before on
TypeError: unsupported operand type(s) for +=: 'NoneType' and 'str'

  				
Posted 
	one year ago

					More
				  		
  Report
		
					PricklyRaven28
				
					0
					 × 1

SmugDolphin23 SuccessfulKoala55 ^

  				
Posted 
	one year ago

					More
				  		
  Report
		
					PricklyRaven28
				
					0
					 × 1

thank Lior

  				
Posted 
	one year ago

					More
				  		
  Report
		
					SmugDolphin23
				
					0

in the meantime, we should have fixed this. I will ping you when 1.9.1 is out to try it out!

  				
Posted 
	one year ago

					More
				  		
  Report
		
					SmugDolphin23
				
					0

tnx! keep me posted

  				
Posted 
	one year ago

					More
				  		
  Report
		
					PricklyRaven28
				
					0
					 × 1

@<1523701435869433856:profile|SmugDolphin23>
Hey 🙂
Any update?

We are having more issues with transformers and clearml in their new version.
The step that has transformers 4.25.1 isn’t able to upload artifacts.
If we downgrade transformers==4.21.3 it works

  				
Posted 
	one year ago

					More
				  		
  Report
		
					PricklyRaven28
				
					0
					 × 1

Hi @<1523701949617147904:profile|PricklyRaven28> ! We released ClearmlSDK 1.9.1 yesterday. Can you please try it?

  				
Posted 
	one year ago

					More
				  		
  Report
		
					SmugDolphin23
				
					0

The version is v1.9.1

  				
Posted 
	one year ago

					More
				  		
  Report
		
					SuccessfulKoala55
				
					0
					 × 1

will check

  				
Posted 
	one year ago

					More
				  		
  Report
		
					PricklyRaven28
				
					0
					 × 1

Looks like the first issue has been solved 🙂

i think the second one still consists, still checking

  				
Posted 
	one year ago

					More
				  		
  Report
		
					PricklyRaven28
				
					0
					 × 1

@<1523701435869433856:profile|SmugDolphin23> @<1523701087100473344:profile|SuccessfulKoala55> Yes, the second issue still consists, currently breaking our pipeline

  				
Posted 
	one year ago

					More
				  		
  Report
		
					PricklyRaven28
				
					0
					 × 1

This is the next step not being able to find the output of the last step

ValueError: Could not retrieve a local copy of artifact return_object, failed downloading

  				
Posted 
	one year ago

					More
				  		
  Report
		
					PricklyRaven28
				
					0
					 × 1

i believe this is because of this code
None

Which initialized the task if clearml is installed… but a task already exists (because of the pipeline), it will replace it

  				
Posted 
	one year ago

					More
				  		
  Report
		
					PricklyRaven28
				
					0
					 × 1

Hi @<1523701949617147904:profile|PricklyRaven28> sorry that this is happening. I tried to run your minimal example, but get a IndexError: Invalid key: 5872 is out of bounds for size 0 error. That said, I get the same error without the code running in a pipeline. There seems to be no difference between simply running the code and the pipeline (for me). Do you have an updated example, maybe also including getting a local copy of an artifact, so I can check?

  				
Posted 
	one year ago

					More
				  		
  Report
		
					ExasperatedCrab78
				
					0
					 × 1

yeah, it gets to that error because the previous issue is saved…i’ll try to work on a new example

  				
Posted 
	one year ago

					More
				  		
  Report
		
					PricklyRaven28
				
					0
					 × 1

No worries! And thanks for putting in the time.

  				
Posted 
	one year ago

					More
				  		
  Report
		
					ExasperatedCrab78
				
					0
					 × 1

@<1523701118159294464:profile|ExasperatedCrab78>
Here is an example that reproduces the second error

from clearml.automation import PipelineDecorator
from clearml import TaskTypes

@PipelineDecorator.component(task_type=TaskTypes.data_processing, cache=True)
def run_demo():
    from transformers import AutoTokenizer, DataCollatorForTokenClassification, AutoModelForSequenceClassification, TrainingArguments, Trainer
    from datasets import load_dataset
    import numpy as np
    import evaluate
    from pathlib import Path

    dataset = load_dataset("yelp_review_full")

    tokenizer = AutoTokenizer.from_pretrained("bert-base-cased")


    def tokenize_function(examples):
        return tokenizer(examples["text"], padding="max_length", truncation=True)
    
    
    def compute_metrics(eval_pred):
        logits, labels = eval_pred
        predictions = np.argmax(logits, axis=-1)
        return metric.compute(predictions=predictions, references=labels)

    
    small_train_dataset = dataset["train"].shuffle(seed=42).select(range(10))
    small_eval_dataset = dataset["test"].shuffle(seed=42).select(range(10))
    
    small_train_dataset = small_train_dataset.map(tokenize_function, batched=True)
    small_eval_dataset = small_eval_dataset.map(tokenize_function, batched=True)

    model = AutoModelForSequenceClassification.from_pretrained("bert-base-cased", num_labels=5)

    training_args = TrainingArguments(
        output_dir="test_trainer", 
        evaluation_strategy="epoch",
        # num_train_epoch=1,
    )
    
    metric = evaluate.load("accuracy")
    
    trainer = Trainer(
        model=model,
        args=training_args,
        train_dataset=small_train_dataset,
        eval_dataset=small_eval_dataset,
        compute_metrics=compute_metrics,
    )
    
    trainer.train()
    
    return Path('test_trainer')

@PipelineDecorator.component(task_type=TaskTypes.data_processing, cache=True)
def second_step(some_param):
    print("Success!")
    
@PipelineDecorator.pipeline(name="StuffToDelete", project=".Dev", version="0.0.2", pipeline_execution_queue="aws_cpu")
def pipeline():
    data = run_demo()
    second_step(data)

if __name__ == '__main__':
    PipelineDecorator.set_default_execution_queue("aws_cpu")
    
    PipelineDecorator.run_locally()
    
    pipeline()

  				
Posted 
	one year ago

					More
				  		
  Report
		
					PricklyRaven28
				
					0
					 × 1

Traceback (most recent call last):
  File "/tmp/tmpxlf2zxb9.py", line 31, in <module>
    kwargs[k] = parent_task.get_parameters(cast=True)[return_section + '/' + artifact_name]
KeyError: 'return/return_object'
Setting pipeline controller Task as failed (due to failed steps) !
Traceback (most recent call last):
  File "/usr/src/lib/clearml_test.py", line 69, in <module>
    pipeline()
  File "/opt/conda/lib/python3.10/site-packages/clearml/automation/controller.py", line 3914, in internal_decorator
    raise triggered_exception
  File "/opt/conda/lib/python3.10/site-packages/clearml/automation/controller.py", line 3891, in internal_decorator
    LazyEvalWrapper.trigger_all_remote_references()
  File "/opt/conda/lib/python3.10/site-packages/clearml/utilities/proxy_object.py", line 392, in trigger_all_remote_references
    func()
  File "/opt/conda/lib/python3.10/site-packages/clearml/automation/controller.py", line 3592, in results_reference
    raise ValueError(
ValueError: Pipeline step "second_step", Task ID=94a133dd0325425ab162467146482121 failed

  				
Posted 
	one year ago

					More
				  		
  Report
		
					PricklyRaven28
				
					0
					 × 1

i believe this is because of transformer’s integration:

Automatic ClearML logging enabled.
ClearML Task has been initialized.

when a task already exists

  				
Posted 
	one year ago

					More
				  		
  Report
		
					PricklyRaven28
				
					0
					 × 1

Show more results

Write your answer

49K Views

62 Answers

one year ago