Hey, We Are Using Clearml 1.9.0 With Transformers 4.25.1… And We Started Getting Errors That Do Not Reproduce In Earlier Versions (Only Works In 1.7.2 All 1.8.X Don’T Work):

Answered

Hey,
We are using clearml 1.9.0 with transformers 4.25.1… and we started getting errors that do not reproduce in earlier versions (only works in 1.7.2 all 1.8.x don’t work):

File "/tmp/tmp0you5mai.py", line 29, in train_entity_exraction_model train(source=source_path.absolute(), output=model_output_path.absolute(), seed=seed, **entity_extraction_trainer) File "/usr/src/lib/entity_extractions/train.py", line 74, in train trainer.train() File "/opt/conda/lib/python3.10/site-packages/transformers/trainer.py", line 1527, in train return inner_training_loop( File "/opt/conda/lib/python3.10/site-packages/transformers/trainer.py", line 1704, in _inner_training_loop self.control = self.callback_handler.on_train_begin(args, self.state, self.control) File "/opt/conda/lib/python3.10/site-packages/transformers/trainer_callback.py", line 353, in on_train_begin return self.call_event("on_train_begin", args, state, control) File "/opt/conda/lib/python3.10/site-packages/transformers/trainer_callback.py", line 397, in call_event result = getattr(callback, event)( File "/opt/conda/lib/python3.10/site-packages/transformers/integrations.py", line 1355, in on_train_begin self.setup(args, state, model, tokenizer, **kwargs) File "/opt/conda/lib/python3.10/site-packages/transformers/integrations.py", line 1345, in setup self._clearml_task.connect(args, "Args") File "/opt/conda/lib/python3.10/site-packages/clearml/task.py", line 1480, in connect return method(mutable, name=name) File "/opt/conda/lib/python3.10/site-packages/clearml/task.py", line 3449, in _connect_object a_dict = self._connect_dictionary(a_dict, name) File "/opt/conda/lib/python3.10/site-packages/clearml/task.py", line 3413, in _connect_dictionary flat_dict = self._arguments.copy_to_dict(flat_dict, prefix=name) File "/opt/conda/lib/python3.10/site-packages/clearml/backend_interface/task/args.py", line 508, in copy_to_dict self._task.set_parameter((prefix or '') + k, v) File "/opt/conda/lib/python3.10/site-packages/clearml/backend_interface/task/task.py", line 1281, in set_parameter self._set_parameters( File "/opt/conda/lib/python3.10/site-packages/clearml/backend_interface/task/task.py", line 1246, in _set_parameters description=create_description(), File "/opt/conda/lib/python3.10/site-packages/clearml/backend_interface/task/task.py", line 1237, in create_description created_description += "Values:\n" + ",\n".join( TypeError: unsupported operand type(s) for +=: 'NoneType' and 'str'

  				
Posted 
	one year ago

					More
				  		
  Report
		
					PricklyRaven28
				
					0
					 × 1

Votes Newest

Answers 62

Hi @<1523701949617147904:profile|PricklyRaven28> just letting you know I still have this on my TODO, I'll update you as soon as I have something!

  				
Posted 
	one year ago

					More
				  		
  Report
		
					ExasperatedCrab78
				
					0
					 × 1

Yes, and the old version only works without the patch.
I see the model on the artifacts tab, but it's not actually uploaded.

  				
Posted 
	one year ago

					More
				  		
  Report
		
					PricklyRaven28
				
					0
					 × 1

SmugDolphin23 BTW, this is using clearml and huggingface’s automatic logging… didn’t log something manual

  				
Posted 
	one year ago

					More
				  		
  Report
		
					PricklyRaven28
				
					0
					 × 1

of course

  				
Posted 
	one year ago

					More
				  		
  Report
		
					SuccessfulKoala55
				
					0
					 × 1

@<1523701435869433856:profile|SmugDolphin23> @<1523701087100473344:profile|SuccessfulKoala55> Yes, the second issue still consists, currently breaking our pipeline

  				
Posted 
	one year ago

					More
				  		
  Report
		
					PricklyRaven28
				
					0
					 × 1

@<1523701949617147904:profile|PricklyRaven28> Please use this patch instead of the one previously shared. It excludes the dict hack :)

  				
Posted 
	one year ago

					More
				  		
  Report
		
					ExasperatedCrab78
				
					0
					 × 1

will check

  				
Posted 
	one year ago

					More
				  		
  Report
		
					PricklyRaven28
				
					0
					 × 1

Hey 🙂 Thanks for the update!

what i’m missing the is the point where you report to clearml between cast and casting back 🤔

  				
Posted 
	one year ago

					More
				  		
  Report
		
					PricklyRaven28
				
					0
					 × 1

Good to hear 🙏

  				
Posted 
	one year ago

					More
				  		
  Report
		
					PricklyRaven28
				
					0
					 × 1

However, I actually do think I can already open the Huggingface PR in the meantime. It has actually relatively little to do with the second bug.

  				
Posted 
	one year ago

					More
				  		
  Report
		
					ExasperatedCrab78
				
					0
					 × 1

Now worries! Just so I understand fully though: you were already using the patch with success from my branch. Now that it has been merged into transformers main branch you installed it from there and that's when you started having issues with not saving models? Then installing transformers 4.21.3 fixes it (which should have the old clearml integration even before the patch?)

  				
Posted 
	one year ago

					More
				  		
  Report
		
					ExasperatedCrab78
				
					0
					 × 1

It's been accepted in master, but was not released yet indeed!

As for the other issue, it seems like we won't be adding support for non-string dict keys anytime soon. I'm thinking of adding a specific example/tutorial on how to work with Huggingface + ClearML so people can do it themselves.

For now (using the patch) the only thing you need to be careful about is to not connect a dict or object with ints as keys. If you do need to (e.g. ususally huggingface models need the id2label dict somewhere) just make sure to cast it to string before connecting it to ClearML and casting it back to int directly after. So that when ClearML changes the value, it's properly taken care of 🙂 My previous sample code is still valid!

  				
Posted 
	one year ago

					More
				  		
  Report
		
					ExasperatedCrab78
				
					0
					 × 1

Hey @<1523701949617147904:profile|PricklyRaven28> I'm checking! Have you updated anything else and on which exact commit of transformers are you now?

  				
Posted 
	one year ago

					More
				  		
  Report
		
					ExasperatedCrab78
				
					0
					 × 1

The version is v1.9.1

  				
Posted 
	one year ago

					More
				  		
  Report
		
					SuccessfulKoala55
				
					0
					 × 1

@<1523701118159294464:profile|ExasperatedCrab78>
Ok. bummer to hear that it won't be included automatically in the package.

I am now experiencing a bug with the patch, not sure it's to blame... but i'm unable to save models in the pipeline.. checking if it's related

  				
Posted 
	one year ago

					More
				  		
  Report
		
					PricklyRaven28
				
					0
					 × 1

Damn it, you're right 😅

        # Allow ClearML access to the training args and allow it to override the arguments for remote execution
        args_class = type(training_args)
        args, changed_keys = cast_keys_to_string(training_args.to_dict())
        Task.current_task().connect(args)
        training_args = args_class(**cast_keys_back(args, changed_keys)[0])

  				
Posted 
	one year ago

					More
				  		
  Report
		
					ExasperatedCrab78
				
					0
					 × 1

I am currently on vacation, I'll ask my team mates. But if not I'll get to it next week

  				
Posted 
	one year ago

					More
				  		
  Report
		
					PricklyRaven28
				
					0
					 × 1

Just for reference, the main issue is that ClearML does not allow non-string types as dict keys for its configuration. Usually the labeling mapping does have ints as keys. Which is why we need to cast them to strings first, then pass them to ClearML then cast them back.

  				
Posted 
	one year ago

					More
				  		
  Report
		
					ExasperatedCrab78
				
					0
					 × 1

yeah, it gets to that error because the previous issue is saved…i’ll try to work on a new example

  				
Posted 
	one year ago

					More
				  		
  Report
		
					PricklyRaven28
				
					0
					 × 1

I tried to work on a reproducible script but then i get errors that my clearml task is already initialized (also doesn’t happen on 1.7.2)

  				
Posted 
	one year ago

					More
				  		
  Report
		
					PricklyRaven28
				
					0
					 × 1

Hey @<1523701949617147904:profile|PricklyRaven28> , about the S3 loading issue. The path to the model in the artifact tab, is it an S3 bucket or a local path?

  				
Posted 
	one year ago

					More
				  		
  Report
		
					EnthusiasticShrimp49
				
					0

Hi @<1523701949617147904:profile|PricklyRaven28> sorry that this is happening. I tried to run your minimal example, but get a IndexError: Invalid key: 5872 is out of bounds for size 0 error. That said, I get the same error without the code running in a pipeline. There seems to be no difference between simply running the code and the pipeline (for me). Do you have an updated example, maybe also including getting a local copy of an artifact, so I can check?

  				
Posted 
	one year ago

					More
				  		
  Report
		
					ExasperatedCrab78
				
					0
					 × 1

sounds good 🙂 I’ll soon check if this fixes our issue and update you

  				
Posted 
	one year ago

					More
				  		
  Report
		
					PricklyRaven28
				
					0
					 × 1

thank Lior

  				
Posted 
	one year ago

					More
				  		
  Report
		
					SmugDolphin23
				
					0

I'm getting really weird behavior now, the task seems to report correctly with the patch... but the step doesn't say "uploading" when finished... there is a "return" artifact but it doesn't exist on S3 (our file server configuration)

  				
Posted 
	one year ago

					More
				  		
  Report
		
					PricklyRaven28
				
					0
					 × 1

Hi PricklyRaven28 ! What dict do you connect? Do you have a small script we could use to reproduce?

  				
Posted 
	one year ago

					More
				  		
  Report
		
					SmugDolphin23
				
					0

i’ll try to work on something that works on 1.7.2

  				
Posted 
	one year ago

					More
				  		
  Report
		
					PricklyRaven28
				
					0
					 × 1

Allright, a bit of searching later and I've found 2 things:

You were right about the task! I've staged a fix here . It basically detects whether a task is already running (e.g. from the pipelinedecorator component) and if so, uses that task instead. We should probably do this for all of our integrations.
But then I found another bug. Basically the pipeline decorator task would mess up the internal nested dict of the label mapping inside of the model config. You will probably have the same issue if you run the pipeline with my fix above.
So for now, we're looking into the 2nd bug, because it breaks with Hugging Face models in a pipeline. Until we sort that out, I'm going to hold off on opening a PR to HF with the first fix. Makes sense?

Thanks a lot for the example, it helped tons to be able to reproduce!

  				
Posted 
	one year ago

					More
				  		
  Report
		
					ExasperatedCrab78
				
					0
					 × 1

i believe this is because of transformer’s integration:

Automatic ClearML logging enabled.
ClearML Task has been initialized.

when a task already exists

  				
Posted 
	one year ago

					More
				  		
  Report
		
					PricklyRaven28
				
					0
					 × 1

@<1523701435869433856:profile|SmugDolphin23>
Hey 🙂
Any update?

We are having more issues with transformers and clearml in their new version.
The step that has transformers 4.25.1 isn’t able to upload artifacts.
If we downgrade transformers==4.21.3 it works

  				
Posted 
	one year ago

					More
				  		
  Report
		
					PricklyRaven28
				
					0
					 × 1

Show more results

Write your answer

49K Views

62 Answers

one year ago