` args.py #504:
for k, v in dictionary.items():
# if key is not present in the task's parameters, assume we didn't get this far when running
# in non-remote mode, and just add it to the task's parameters
if k not in parameters:
self._task.set_parameter((prefix or '') + k, v)
continue
task.py #1266:
def set_parameter(self, name, value, description=None, value_type=None):
# type: (str, str, Optional[str], Optional[Any]) -> ()
"""
Set a single Task parameter. This overrides any previous value for this parameter.
:param name: The parameter name.
:param value: The parameter value.
:param description: The parameter description.
:param value_type: The type of the parameters (cast to string and store)
"""
if not Session.check_min_api_version('2.9'):
# not supported yet
description = None
value_type = None
self._set_parameters(
{name: value}, __update=True,
__parameters_descriptions={name: description},
__parameters_types={name: value_type}
)
task.py #1227:
def create_description():
if org_param and org_param.description:
return org_param.description
created_description = ""
if org_k in descriptions:
created_description = descriptions[org_k]
if isinstance(v, Enum):
# append enum values to description
if created_description:
created_description += "\n"
created_description += "Values:\n" + ",\n".join(
[enum_key for enum_key in type(v).dict.keys() if not enum_key.startswith("_")]
)
return created_description `We can see from this code that the description will always be None (because copy_to_dict never passes a description, it defaults to None and is always put in the descriptions dict as None), and if the arg is an Enum it will always throw the exception
Allright, a bit of searching later and I've found 2 things:
- You were right about the task! I've staged a fix here . It basically detects whether a task is already running (e.g. from the pipelinedecorator component) and if so, uses that task instead. We should probably do this for all of our integrations.
- But then I found another bug. Basically the pipeline decorator task would mess up the internal nested dict of the label mapping inside of the model config. You will probably have the same issue if you run the pipeline with my fix above.
So for now, we're looking into the 2nd bug, because it breaks with Hugging Face models in a pipeline. Until we sort that out, I'm going to hold off on opening a PR to HF with the first fix. Makes sense?
Thanks a lot for the example, it helped tons to be able to reproduce!
@<1523701118159294464:profile|ExasperatedCrab78>
Here is an example that reproduces the second error
from clearml.automation import PipelineDecorator
from clearml import TaskTypes
@PipelineDecorator.component(task_type=TaskTypes.data_processing, cache=True)
def run_demo():
from transformers import AutoTokenizer, DataCollatorForTokenClassification, AutoModelForSequenceClassification, TrainingArguments, Trainer
from datasets import load_dataset
import numpy as np
import evaluate
from pathlib import Path
dataset = load_dataset("yelp_review_full")
tokenizer = AutoTokenizer.from_pretrained("bert-base-cased")
def tokenize_function(examples):
return tokenizer(examples["text"], padding="max_length", truncation=True)
def compute_metrics(eval_pred):
logits, labels = eval_pred
predictions = np.argmax(logits, axis=-1)
return metric.compute(predictions=predictions, references=labels)
small_train_dataset = dataset["train"].shuffle(seed=42).select(range(10))
small_eval_dataset = dataset["test"].shuffle(seed=42).select(range(10))
small_train_dataset = small_train_dataset.map(tokenize_function, batched=True)
small_eval_dataset = small_eval_dataset.map(tokenize_function, batched=True)
model = AutoModelForSequenceClassification.from_pretrained("bert-base-cased", num_labels=5)
training_args = TrainingArguments(
output_dir="test_trainer",
evaluation_strategy="epoch",
# num_train_epoch=1,
)
metric = evaluate.load("accuracy")
trainer = Trainer(
model=model,
args=training_args,
train_dataset=small_train_dataset,
eval_dataset=small_eval_dataset,
compute_metrics=compute_metrics,
)
trainer.train()
return Path('test_trainer')
@PipelineDecorator.component(task_type=TaskTypes.data_processing, cache=True)
def second_step(some_param):
print("Success!")
@PipelineDecorator.pipeline(name="StuffToDelete", project=".Dev", version="0.0.2", pipeline_execution_queue="aws_cpu")
def pipeline():
data = run_demo()
second_step(data)
if __name__ == '__main__':
PipelineDecorator.set_default_execution_queue("aws_cpu")
PipelineDecorator.run_locally()
pipeline()
I tried to work on a reproducible script but then i get errors that my clearml task is already initialized (also doesn’t happen on 1.7.2)
@<1523701435869433856:profile|SmugDolphin23> @<1523701087100473344:profile|SuccessfulKoala55> Yes, the second issue still consists, currently breaking our pipeline
Damn it, you're right 😅
# Allow ClearML access to the training args and allow it to override the arguments for remote execution
args_class = type(training_args)
args, changed_keys = cast_keys_to_string(training_args.to_dict())
Task.current_task().connect(args)
training_args = args_class(**cast_keys_back(args, changed_keys)[0])
However, I actually do think I can already open the Huggingface PR in the meantime. It has actually relatively little to do with the second bug.
I'm getting really weird behavior now, the task seems to report correctly with the patch... but the step doesn't say "uploading" when finished... there is a "return" artifact but it doesn't exist on S3 (our file server configuration)
@<1523701118159294464:profile|ExasperatedCrab78>
Hey 🙂
Any updates on this? We need to use a new version of transformers because of another bug they have in an old version. so we can’t use the old transformers version anymore.
confirming that only downgrading to transformers==4.21.3 without the patch worked....
This is a time bomb that eventually we won't be able to ignore... we will need to use new transformers code
Nothing that i think is relevant, I'm using latest from master. It might be a new bug on their side, wasn't sure.
Hey @<1523701949617147904:profile|PricklyRaven28> I'm checking! Have you updated anything else and on which exact commit of transformers are you now?
Looks like the first issue has been solved 🙂
i think the second one still consists, still checking
@<1523701435869433856:profile|SmugDolphin23>
Hey 🙂
Any update?
We are having more issues with transformers and clearml in their new version.
The step that has transformers 4.25.1 isn’t able to upload artifacts.
If we downgrade transformers==4.21.3 it works
for now we downgraded to 1.7.2, but of course prefer not to stay that way
Hi PricklyRaven28 ! What dict do you connect? Do you have a small script we could use to reproduce?
I'm working with the patch, and installing transformers from github
in the meantime, we should have fixed this. I will ping you when 1.9.1 is out to try it out!
that makes more sense 🙂
would this work now as a workaround until the version is released?
It's been accepted in master, but was not released yet indeed!
As for the other issue, it seems like we won't be adding support for non-string dict keys anytime soon. I'm thinking of adding a specific example/tutorial on how to work with Huggingface + ClearML so people can do it themselves.
For now (using the patch) the only thing you need to be careful about is to not connect a dict or object with ints as keys. If you do need to (e.g. ususally huggingface models need the id2label dict somewhere) just make sure to cast it to string before connecting it to ClearML and casting it back to int directly after. So that when ClearML changes the value, it's properly taken care of 🙂 My previous sample code is still valid!
i believe this is because of this code
None
Which initialized the task if clearml is installed… but a task already exists (because of the pipeline), it will replace it
i’ll try to work on something that works on 1.7.2
Traceback (most recent call last):
File "/tmp/tmpxlf2zxb9.py", line 31, in <module>
kwargs[k] = parent_task.get_parameters(cast=True)[return_section + '/' + artifact_name]
KeyError: 'return/return_object'
Setting pipeline controller Task as failed (due to failed steps) !
Traceback (most recent call last):
File "/usr/src/lib/clearml_test.py", line 69, in <module>
pipeline()
File "/opt/conda/lib/python3.10/site-packages/clearml/automation/controller.py", line 3914, in internal_decorator
raise triggered_exception
File "/opt/conda/lib/python3.10/site-packages/clearml/automation/controller.py", line 3891, in internal_decorator
LazyEvalWrapper.trigger_all_remote_references()
File "/opt/conda/lib/python3.10/site-packages/clearml/utilities/proxy_object.py", line 392, in trigger_all_remote_references
func()
File "/opt/conda/lib/python3.10/site-packages/clearml/automation/controller.py", line 3592, in results_reference
raise ValueError(
ValueError: Pipeline step "second_step", Task ID=94a133dd0325425ab162467146482121 failed