Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
Hey, We Are Using Clearml 1.9.0 With Transformers 4.25.1… And We Started Getting Errors That Do Not Reproduce In Earlier Versions (Only Works In 1.7.2 All 1.8.X Don’T Work):

Hey,
We are using clearml 1.9.0 with transformers 4.25.1… and we started getting errors that do not reproduce in earlier versions (only works in 1.7.2 all 1.8.x don’t work):

File "/tmp/tmp0you5mai.py", line 29, in train_entity_exraction_model train(source=source_path.absolute(), output=model_output_path.absolute(), seed=seed, **entity_extraction_trainer) File "/usr/src/lib/entity_extractions/train.py", line 74, in train trainer.train() File "/opt/conda/lib/python3.10/site-packages/transformers/trainer.py", line 1527, in train return inner_training_loop( File "/opt/conda/lib/python3.10/site-packages/transformers/trainer.py", line 1704, in _inner_training_loop self.control = self.callback_handler.on_train_begin(args, self.state, self.control) File "/opt/conda/lib/python3.10/site-packages/transformers/trainer_callback.py", line 353, in on_train_begin return self.call_event("on_train_begin", args, state, control) File "/opt/conda/lib/python3.10/site-packages/transformers/trainer_callback.py", line 397, in call_event result = getattr(callback, event)( File "/opt/conda/lib/python3.10/site-packages/transformers/integrations.py", line 1355, in on_train_begin self.setup(args, state, model, tokenizer, **kwargs) File "/opt/conda/lib/python3.10/site-packages/transformers/integrations.py", line 1345, in setup self._clearml_task.connect(args, "Args") File "/opt/conda/lib/python3.10/site-packages/clearml/task.py", line 1480, in connect return method(mutable, name=name) File "/opt/conda/lib/python3.10/site-packages/clearml/task.py", line 3449, in _connect_object a_dict = self._connect_dictionary(a_dict, name) File "/opt/conda/lib/python3.10/site-packages/clearml/task.py", line 3413, in _connect_dictionary flat_dict = self._arguments.copy_to_dict(flat_dict, prefix=name) File "/opt/conda/lib/python3.10/site-packages/clearml/backend_interface/task/args.py", line 508, in copy_to_dict self._task.set_parameter((prefix or '') + k, v) File "/opt/conda/lib/python3.10/site-packages/clearml/backend_interface/task/task.py", line 1281, in set_parameter self._set_parameters( File "/opt/conda/lib/python3.10/site-packages/clearml/backend_interface/task/task.py", line 1246, in _set_parameters description=create_description(), File "/opt/conda/lib/python3.10/site-packages/clearml/backend_interface/task/task.py", line 1237, in create_description created_description += "Values:\n" + ",\n".join( TypeError: unsupported operand type(s) for +=: 'NoneType' and 'str'

  
  
Posted 2 years ago
Votes Newest

Answers 62


SmugDolphin23 SuccessfulKoala55 ^

  
  
Posted 2 years ago

@<1523701118159294464:profile|ExasperatedCrab78>
Ok. bummer to hear that it won't be included automatically in the package.

I am now experiencing a bug with the patch, not sure it's to blame... but i'm unable to save models in the pipeline.. checking if it's related

  
  
Posted 2 years ago

that makes more sense 🙂
would this work now as a workaround until the version is released?

  
  
Posted 2 years ago

yeah, it gets to that error because the previous issue is saved…i’ll try to work on a new example

  
  
Posted 2 years ago

Traceback (most recent call last):
  File "/tmp/tmpxlf2zxb9.py", line 31, in <module>
    kwargs[k] = parent_task.get_parameters(cast=True)[return_section + '/' + artifact_name]
KeyError: 'return/return_object'
Setting pipeline controller Task as failed (due to failed steps) !
Traceback (most recent call last):
  File "/usr/src/lib/clearml_test.py", line 69, in <module>
    pipeline()
  File "/opt/conda/lib/python3.10/site-packages/clearml/automation/controller.py", line 3914, in internal_decorator
    raise triggered_exception
  File "/opt/conda/lib/python3.10/site-packages/clearml/automation/controller.py", line 3891, in internal_decorator
    LazyEvalWrapper.trigger_all_remote_references()
  File "/opt/conda/lib/python3.10/site-packages/clearml/utilities/proxy_object.py", line 392, in trigger_all_remote_references
    func()
  File "/opt/conda/lib/python3.10/site-packages/clearml/automation/controller.py", line 3592, in results_reference
    raise ValueError(
ValueError: Pipeline step "second_step", Task ID=94a133dd0325425ab162467146482121 failed
  
  
Posted 2 years ago

Thanks! I'm checking now, but might take a little (meeting in between)

  
  
Posted 2 years ago

Could you please run the misbehaving example, try to add a breakpoint in clearml/backend_interface/task/task.py in Task.update_output_model on the line with url = output_model.update_weights( , and tell me what the value of model_path is? In case you're using virtual environments, clearml library should be installed somewhere in <virtual env directory>/lib/python3.10/site-packages/clearml/

  
  
Posted 2 years ago

Hi @<1523701949617147904:profile|PricklyRaven28> just letting you know I still have this on my TODO, I'll update you as soon as I have something!

  
  
Posted 2 years ago

The version is v1.9.1

  
  
Posted 2 years ago

I appreciate it!

  
  
Posted 2 years ago

However, I actually do think I can already open the Huggingface PR in the meantime. It has actually relatively little to do with the second bug.

  
  
Posted 2 years ago

will check

  
  
Posted 2 years ago

@<1523701118159294464:profile|ExasperatedCrab78>
Here is an example that reproduces the second error

from clearml.automation import PipelineDecorator
from clearml import TaskTypes

@PipelineDecorator.component(task_type=TaskTypes.data_processing, cache=True)
def run_demo():
    from transformers import AutoTokenizer, DataCollatorForTokenClassification, AutoModelForSequenceClassification, TrainingArguments, Trainer
    from datasets import load_dataset
    import numpy as np
    import evaluate
    from pathlib import Path

    dataset = load_dataset("yelp_review_full")

    tokenizer = AutoTokenizer.from_pretrained("bert-base-cased")


    def tokenize_function(examples):
        return tokenizer(examples["text"], padding="max_length", truncation=True)
    
    
    def compute_metrics(eval_pred):
        logits, labels = eval_pred
        predictions = np.argmax(logits, axis=-1)
        return metric.compute(predictions=predictions, references=labels)

    
    small_train_dataset = dataset["train"].shuffle(seed=42).select(range(10))
    small_eval_dataset = dataset["test"].shuffle(seed=42).select(range(10))
    
    small_train_dataset = small_train_dataset.map(tokenize_function, batched=True)
    small_eval_dataset = small_eval_dataset.map(tokenize_function, batched=True)

    model = AutoModelForSequenceClassification.from_pretrained("bert-base-cased", num_labels=5)

    training_args = TrainingArguments(
        output_dir="test_trainer", 
        evaluation_strategy="epoch",
        # num_train_epoch=1,
    )
    
    metric = evaluate.load("accuracy")
    
    trainer = Trainer(
        model=model,
        args=training_args,
        train_dataset=small_train_dataset,
        eval_dataset=small_eval_dataset,
        compute_metrics=compute_metrics,
    )
    
    trainer.train()
    
    return Path('test_trainer')

@PipelineDecorator.component(task_type=TaskTypes.data_processing, cache=True)
def second_step(some_param):
    print("Success!")
    
@PipelineDecorator.pipeline(name="StuffToDelete", project=".Dev", version="0.0.2", pipeline_execution_queue="aws_cpu")
def pipeline():
    data = run_demo()
    second_step(data)

if __name__ == '__main__':
    PipelineDecorator.set_default_execution_queue("aws_cpu")
    
    PipelineDecorator.run_locally()
    
    pipeline()
  
  
Posted 2 years ago

No worries! And thanks for putting in the time.

  
  
Posted 2 years ago

confirming that only downgrading to transformers==4.21.3 without the patch worked....
This is a time bomb that eventually we won't be able to ignore... we will need to use new transformers code

  
  
Posted 2 years ago

Hey @<1523701949617147904:profile|PricklyRaven28> I'm checking! Have you updated anything else and on which exact commit of transformers are you now?

  
  
Posted 2 years ago

Hi @<1523701949617147904:profile|PricklyRaven28> sorry that this is happening. I tried to run your minimal example, but get a IndexError: Invalid key: 5872 is out of bounds for size 0 error. That said, I get the same error without the code running in a pipeline. There seems to be no difference between simply running the code and the pipeline (for me). Do you have an updated example, maybe also including getting a local copy of an artifact, so I can check?

  
  
Posted 2 years ago

Hi PricklyRaven28 ! What dict do you connect? Do you have a small script we could use to reproduce?

  
  
Posted 2 years ago

@<1523701435869433856:profile|SmugDolphin23>
Hey 🙂
Any update?

We are having more issues with transformers and clearml in their new version.
The step that has transformers 4.25.1 isn’t able to upload artifacts.
If we downgrade transformers==4.21.3 it works

  
  
Posted 2 years ago

@<1523701949617147904:profile|PricklyRaven28> Please use this patch instead of the one previously shared. It excludes the dict hack :)

  
  
Posted 2 years ago

` args.py #504:
for k, v in dictionary.items():
# if key is not present in the task's parameters, assume we didn't get this far when running
# in non-remote mode, and just add it to the task's parameters
if k not in parameters:
self._task.set_parameter((prefix or '') + k, v)
continue

task.py #1266:
def set_parameter(self, name, value, description=None, value_type=None):
# type: (str, str, Optional[str], Optional[Any]) -> ()
"""
Set a single Task parameter. This overrides any previous value for this parameter.

    :param name: The parameter name.
    :param value: The parameter value.
    :param description: The parameter description.
    :param value_type: The type of the parameters (cast to string and store)
    """
    if not Session.check_min_api_version('2.9'):
        # not supported yet
        description = None
        value_type = None

    self._set_parameters(
        {name: value}, __update=True,
        __parameters_descriptions={name: description},
        __parameters_types={name: value_type}
    )

task.py #1227:
def create_description():
if org_param and org_param.description:
return org_param.description
created_description = ""
if org_k in descriptions:
created_description = descriptions[org_k]
if isinstance(v, Enum):
# append enum values to description
if created_description:
created_description += "\n"
created_description += "Values:\n" + ",\n".join(
[enum_key for enum_key in type(v).dict.keys() if not enum_key.startswith("_")]
)
return created_description `We can see from this code that the description will always be None (because copy_to_dict never passes a description, it defaults to None and is always put in the descriptions dict as None), and if the arg is an Enum it will always throw the exception

  
  
Posted 2 years ago

When creating it, I found that this hack should be on our side, not on Huggingface's. So I'm only going to fix issue 1 with the PR, issue 2 is ours 🙂

  
  
Posted 2 years ago

@<1523701118159294464:profile|ExasperatedCrab78>
Hey 🙂
Any updates on this? We need to use a new version of transformers because of another bug they have in an old version. so we can’t use the old transformers version anymore.

  
  
Posted 2 years ago

I'm getting really weird behavior now, the task seems to report correctly with the patch... but the step doesn't say "uploading" when finished... there is a "return" artifact but it doesn't exist on S3 (our file server configuration)

  
  
Posted 2 years ago

Nothing that i think is relevant, I'm using latest from master. It might be a new bug on their side, wasn't sure.

  
  
Posted 2 years ago

Hey 🙂 Thanks for the update!

what i’m missing the is the point where you report to clearml between cast and casting back 🤔

  
  
Posted 2 years ago

I'm working with the patch, and installing transformers from github

  
  
Posted 2 years ago

We'll check it out 👍

  
  
Posted 2 years ago

This is the next step not being able to find the output of the last step

ValueError: Could not retrieve a local copy of artifact return_object, failed downloading 
  
  
Posted 2 years ago

I am currently on vacation, I'll ask my team mates. But if not I'll get to it next week

  
  
Posted 2 years ago
152K Views
62 Answers
2 years ago
2 years ago
Tags
Similar posts