Hey, We Are Using Clearml 1.9.0 With Transformers 4.25.1… And We Started Getting Errors That Do Not Reproduce In Earlier Versions (Only Works In 1.7.2 All 1.8.X Don’T Work):

Answered

Hey,
We are using clearml 1.9.0 with transformers 4.25.1… and we started getting errors that do not reproduce in earlier versions (only works in 1.7.2 all 1.8.x don’t work):

File "/tmp/tmp0you5mai.py", line 29, in train_entity_exraction_model train(source=source_path.absolute(), output=model_output_path.absolute(), seed=seed, **entity_extraction_trainer) File "/usr/src/lib/entity_extractions/train.py", line 74, in train trainer.train() File "/opt/conda/lib/python3.10/site-packages/transformers/trainer.py", line 1527, in train return inner_training_loop( File "/opt/conda/lib/python3.10/site-packages/transformers/trainer.py", line 1704, in _inner_training_loop self.control = self.callback_handler.on_train_begin(args, self.state, self.control) File "/opt/conda/lib/python3.10/site-packages/transformers/trainer_callback.py", line 353, in on_train_begin return self.call_event("on_train_begin", args, state, control) File "/opt/conda/lib/python3.10/site-packages/transformers/trainer_callback.py", line 397, in call_event result = getattr(callback, event)( File "/opt/conda/lib/python3.10/site-packages/transformers/integrations.py", line 1355, in on_train_begin self.setup(args, state, model, tokenizer, **kwargs) File "/opt/conda/lib/python3.10/site-packages/transformers/integrations.py", line 1345, in setup self._clearml_task.connect(args, "Args") File "/opt/conda/lib/python3.10/site-packages/clearml/task.py", line 1480, in connect return method(mutable, name=name) File "/opt/conda/lib/python3.10/site-packages/clearml/task.py", line 3449, in _connect_object a_dict = self._connect_dictionary(a_dict, name) File "/opt/conda/lib/python3.10/site-packages/clearml/task.py", line 3413, in _connect_dictionary flat_dict = self._arguments.copy_to_dict(flat_dict, prefix=name) File "/opt/conda/lib/python3.10/site-packages/clearml/backend_interface/task/args.py", line 508, in copy_to_dict self._task.set_parameter((prefix or '') + k, v) File "/opt/conda/lib/python3.10/site-packages/clearml/backend_interface/task/task.py", line 1281, in set_parameter self._set_parameters( File "/opt/conda/lib/python3.10/site-packages/clearml/backend_interface/task/task.py", line 1246, in _set_parameters description=create_description(), File "/opt/conda/lib/python3.10/site-packages/clearml/backend_interface/task/task.py", line 1237, in create_description created_description += "Values:\n" + ",\n".join( TypeError: unsupported operand type(s) for +=: 'NoneType' and 'str'

  				
Posted 
	one year ago

					More
				  		
  Report
		
					PricklyRaven28
				
					0
					 × 1

Votes Newest

Answers 62

tnx! keep me posted

  				
Posted 
	one year ago

					More
				  		
  Report
		
					PricklyRaven28
				
					0
					 × 1

@<1523701118159294464:profile|ExasperatedCrab78>
Hey 🙂
Any updates on this? We need to use a new version of transformers because of another bug they have in an old version. so we can’t use the old transformers version anymore.

  				
Posted 
	one year ago

					More
				  		
  Report
		
					PricklyRaven28
				
					0
					 × 1

Looks like the first issue has been solved 🙂

i think the second one still consists, still checking

  				
Posted 
	one year ago

					More
				  		
  Report
		
					PricklyRaven28
				
					0
					 × 1

Traceback (most recent call last):
  File "/tmp/tmpxlf2zxb9.py", line 31, in <module>
    kwargs[k] = parent_task.get_parameters(cast=True)[return_section + '/' + artifact_name]
KeyError: 'return/return_object'
Setting pipeline controller Task as failed (due to failed steps) !
Traceback (most recent call last):
  File "/usr/src/lib/clearml_test.py", line 69, in <module>
    pipeline()
  File "/opt/conda/lib/python3.10/site-packages/clearml/automation/controller.py", line 3914, in internal_decorator
    raise triggered_exception
  File "/opt/conda/lib/python3.10/site-packages/clearml/automation/controller.py", line 3891, in internal_decorator
    LazyEvalWrapper.trigger_all_remote_references()
  File "/opt/conda/lib/python3.10/site-packages/clearml/utilities/proxy_object.py", line 392, in trigger_all_remote_references
    func()
  File "/opt/conda/lib/python3.10/site-packages/clearml/automation/controller.py", line 3592, in results_reference
    raise ValueError(
ValueError: Pipeline step "second_step", Task ID=94a133dd0325425ab162467146482121 failed

  				
Posted 
	one year ago

					More
				  		
  Report
		
					PricklyRaven28
				
					0
					 × 1

Saw it was merged 🙂 One down, one to go

  				
Posted 
	one year ago

					More
				  		
  Report
		
					PricklyRaven28
				
					0
					 × 1

When creating it, I found that this hack should be on our side, not on Huggingface's. So I'm only going to fix issue 1 with the PR, issue 2 is ours 🙂

  				
Posted 
	one year ago

					More
				  		
  Report
		
					ExasperatedCrab78
				
					0
					 × 1

Could you please run the misbehaving example, try to add a breakpoint in clearml/backend_interface/task/task.py in Task.update_output_model on the line with url = output_model.update_weights( , and tell me what the value of model_path is? In case you're using virtual environments, clearml library should be installed somewhere in <virtual env directory>/lib/python3.10/site-packages/clearml/

  				
Posted 
	one year ago

					More
				  		
  Report
		
					EnthusiasticShrimp49
				
					0

Hi PricklyRaven28 , can you try with 1.9.1rc0?

  				
Posted 
	one year ago

					More
				  		
  Report
		
					SuccessfulKoala55
				
					0
					 × 1

hi, yes we tried with the same result

  				
Posted 
	one year ago

					More
				  		
  Report
		
					PricklyRaven28
				
					0
					 × 1

Nothing that i think is relevant, I'm using latest from master. It might be a new bug on their side, wasn't sure.

  				
Posted 
	one year ago

					More
				  		
  Report
		
					PricklyRaven28
				
					0
					 × 1

This is the next step not being able to find the output of the last step

ValueError: Could not retrieve a local copy of artifact return_object, failed downloading

  				
Posted 
	one year ago

					More
				  		
  Report
		
					PricklyRaven28
				
					0
					 × 1

It should, but please check first. This is some code I quickly made for myself. It did make tests for it, but it would be nice to hear from someone else that it worked (as evidenced by the error above 😅 )

  				
Posted 
	one year ago

					More
				  		
  Report
		
					ExasperatedCrab78
				
					0
					 × 1

We'll check it out 👍

  				
Posted 
	one year ago

					More
				  		
  Report
		
					SuccessfulKoala55
				
					0
					 × 1

Hi @<1523701949617147904:profile|PricklyRaven28> just letting you know I still have this on my TODO, I'll update you as soon as I have something!

  				
Posted 
	one year ago

					More
				  		
  Report
		
					ExasperatedCrab78
				
					0
					 × 1

Yes, and the old version only works without the patch.
I see the model on the artifacts tab, but it's not actually uploaded.

  				
Posted 
	one year ago

					More
				  		
  Report
		
					PricklyRaven28
				
					0
					 × 1

SmugDolphin23 BTW, this is using clearml and huggingface’s automatic logging… didn’t log something manual

  				
Posted 
	one year ago

					More
				  		
  Report
		
					PricklyRaven28
				
					0
					 × 1

of course

  				
Posted 
	one year ago

					More
				  		
  Report
		
					SuccessfulKoala55
				
					0
					 × 1

@<1523701435869433856:profile|SmugDolphin23> @<1523701087100473344:profile|SuccessfulKoala55> Yes, the second issue still consists, currently breaking our pipeline

  				
Posted 
	one year ago

					More
				  		
  Report
		
					PricklyRaven28
				
					0
					 × 1

@<1523701949617147904:profile|PricklyRaven28> Please use this patch instead of the one previously shared. It excludes the dict hack :)

  				
Posted 
	one year ago

					More
				  		
  Report
		
					ExasperatedCrab78
				
					0
					 × 1

will check

  				
Posted 
	one year ago

					More
				  		
  Report
		
					PricklyRaven28
				
					0
					 × 1

Hey 🙂 Thanks for the update!

what i’m missing the is the point where you report to clearml between cast and casting back 🤔

  				
Posted 
	one year ago

					More
				  		
  Report
		
					PricklyRaven28
				
					0
					 × 1

Good to hear 🙏

  				
Posted 
	one year ago

					More
				  		
  Report
		
					PricklyRaven28
				
					0
					 × 1

However, I actually do think I can already open the Huggingface PR in the meantime. It has actually relatively little to do with the second bug.

  				
Posted 
	one year ago

					More
				  		
  Report
		
					ExasperatedCrab78
				
					0
					 × 1

Now worries! Just so I understand fully though: you were already using the patch with success from my branch. Now that it has been merged into transformers main branch you installed it from there and that's when you started having issues with not saving models? Then installing transformers 4.21.3 fixes it (which should have the old clearml integration even before the patch?)

  				
Posted 
	one year ago

					More
				  		
  Report
		
					ExasperatedCrab78
				
					0
					 × 1

It's been accepted in master, but was not released yet indeed!

As for the other issue, it seems like we won't be adding support for non-string dict keys anytime soon. I'm thinking of adding a specific example/tutorial on how to work with Huggingface + ClearML so people can do it themselves.

For now (using the patch) the only thing you need to be careful about is to not connect a dict or object with ints as keys. If you do need to (e.g. ususally huggingface models need the id2label dict somewhere) just make sure to cast it to string before connecting it to ClearML and casting it back to int directly after. So that when ClearML changes the value, it's properly taken care of 🙂 My previous sample code is still valid!

  				
Posted 
	one year ago

					More
				  		
  Report
		
					ExasperatedCrab78
				
					0
					 × 1

Hey @<1523701949617147904:profile|PricklyRaven28> I'm checking! Have you updated anything else and on which exact commit of transformers are you now?

  				
Posted 
	one year ago

					More
				  		
  Report
		
					ExasperatedCrab78
				
					0
					 × 1

The version is v1.9.1

  				
Posted 
	one year ago

					More
				  		
  Report
		
					SuccessfulKoala55
				
					0
					 × 1

` args.py #504:
for k, v in dictionary.items():
# if key is not present in the task's parameters, assume we didn't get this far when running
# in non-remote mode, and just add it to the task's parameters
if k not in parameters:
self._task.set_parameter((prefix or '') + k, v)
continue

task.py #1266:
def set_parameter(self, name, value, description=None, value_type=None):
# type: (str, str, Optional[str], Optional[Any]) -> ()
"""
Set a single Task parameter. This overrides any previous value for this parameter.

    :param name: The parameter name.
    :param value: The parameter value.
    :param description: The parameter description.
    :param value_type: The type of the parameters (cast to string and store)
    """
    if not Session.check_min_api_version('2.9'):
        # not supported yet
        description = None
        value_type = None

    self._set_parameters(
        {name: value}, __update=True,
        __parameters_descriptions={name: description},
        __parameters_types={name: value_type}
    )

task.py #1227:
def create_description():
if org_param and org_param.description:
return org_param.description
created_description = ""
if org_k in descriptions:
created_description = descriptions[org_k]
if isinstance(v, Enum):
# append enum values to description
if created_description:
created_description += "\n"
created_description += "Values:\n" + ",\n".join(
[enum_key for enum_key in type(v).dict.keys() if not enum_key.startswith("_")]
)
return created_description `We can see from this code that the description will always be None (because copy_to_dict never passes a description, it defaults to None and is always put in the descriptions dict as None), and if the arg is an Enum it will always throw the exception

  				
Posted 
	one year ago

					More
				  		
  Report
		
					PricklyRaven28
				
					0
					 × 1

@<1523701118159294464:profile|ExasperatedCrab78>
Ok. bummer to hear that it won't be included automatically in the package.

I am now experiencing a bug with the patch, not sure it's to blame... but i'm unable to save models in the pipeline.. checking if it's related

  				
Posted 
	one year ago

					More
				  		
  Report
		
					PricklyRaven28
				
					0
					 × 1

Damn it, you're right 😅

        # Allow ClearML access to the training args and allow it to override the arguments for remote execution
        args_class = type(training_args)
        args, changed_keys = cast_keys_to_string(training_args.to_dict())
        Task.current_task().connect(args)
        training_args = args_class(**cast_keys_back(args, changed_keys)[0])

  				
Posted 
	one year ago

					More
				  		
  Report
		
					ExasperatedCrab78
				
					0
					 × 1

Show more results

Write your answer

49K Views

62 Answers

one year ago