Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
Hey, We Are Using Clearml 1.9.0 With Transformers 4.25.1… And We Started Getting Errors That Do Not Reproduce In Earlier Versions (Only Works In 1.7.2 All 1.8.X Don’T Work):

Hey,
We are using clearml 1.9.0 with transformers 4.25.1… and we started getting errors that do not reproduce in earlier versions (only works in 1.7.2 all 1.8.x don’t work):

File "/tmp/tmp0you5mai.py", line 29, in train_entity_exraction_model train(source=source_path.absolute(), output=model_output_path.absolute(), seed=seed, **entity_extraction_trainer) File "/usr/src/lib/entity_extractions/train.py", line 74, in train trainer.train() File "/opt/conda/lib/python3.10/site-packages/transformers/trainer.py", line 1527, in train return inner_training_loop( File "/opt/conda/lib/python3.10/site-packages/transformers/trainer.py", line 1704, in _inner_training_loop self.control = self.callback_handler.on_train_begin(args, self.state, self.control) File "/opt/conda/lib/python3.10/site-packages/transformers/trainer_callback.py", line 353, in on_train_begin return self.call_event("on_train_begin", args, state, control) File "/opt/conda/lib/python3.10/site-packages/transformers/trainer_callback.py", line 397, in call_event result = getattr(callback, event)( File "/opt/conda/lib/python3.10/site-packages/transformers/integrations.py", line 1355, in on_train_begin self.setup(args, state, model, tokenizer, **kwargs) File "/opt/conda/lib/python3.10/site-packages/transformers/integrations.py", line 1345, in setup self._clearml_task.connect(args, "Args") File "/opt/conda/lib/python3.10/site-packages/clearml/task.py", line 1480, in connect return method(mutable, name=name) File "/opt/conda/lib/python3.10/site-packages/clearml/task.py", line 3449, in _connect_object a_dict = self._connect_dictionary(a_dict, name) File "/opt/conda/lib/python3.10/site-packages/clearml/task.py", line 3413, in _connect_dictionary flat_dict = self._arguments.copy_to_dict(flat_dict, prefix=name) File "/opt/conda/lib/python3.10/site-packages/clearml/backend_interface/task/args.py", line 508, in copy_to_dict self._task.set_parameter((prefix or '') + k, v) File "/opt/conda/lib/python3.10/site-packages/clearml/backend_interface/task/task.py", line 1281, in set_parameter self._set_parameters( File "/opt/conda/lib/python3.10/site-packages/clearml/backend_interface/task/task.py", line 1246, in _set_parameters description=create_description(), File "/opt/conda/lib/python3.10/site-packages/clearml/backend_interface/task/task.py", line 1237, in create_description created_description += "Values:\n" + ",\n".join( TypeError: unsupported operand type(s) for +=: 'NoneType' and 'str'

  
  
Posted 2 years ago
Votes Newest

Answers 62


Traceback (most recent call last):
  File "/tmp/tmpxlf2zxb9.py", line 31, in <module>
    kwargs[k] = parent_task.get_parameters(cast=True)[return_section + '/' + artifact_name]
KeyError: 'return/return_object'
Setting pipeline controller Task as failed (due to failed steps) !
Traceback (most recent call last):
  File "/usr/src/lib/clearml_test.py", line 69, in <module>
    pipeline()
  File "/opt/conda/lib/python3.10/site-packages/clearml/automation/controller.py", line 3914, in internal_decorator
    raise triggered_exception
  File "/opt/conda/lib/python3.10/site-packages/clearml/automation/controller.py", line 3891, in internal_decorator
    LazyEvalWrapper.trigger_all_remote_references()
  File "/opt/conda/lib/python3.10/site-packages/clearml/utilities/proxy_object.py", line 392, in trigger_all_remote_references
    func()
  File "/opt/conda/lib/python3.10/site-packages/clearml/automation/controller.py", line 3592, in results_reference
    raise ValueError(
ValueError: Pipeline step "second_step", Task ID=94a133dd0325425ab162467146482121 failed
  
  
Posted 2 years ago

No worries! And thanks for putting in the time.

  
  
Posted 2 years ago

thank Lior

  
  
Posted 2 years ago

@<1523701118159294464:profile|ExasperatedCrab78> Sorry only saw this now,
Thanks for checking it!
Glad to see you found the issue, hope you find a way to fix the second one. for now we will continue using the previous version.
Would be glad if you can post when everything is fixed so we can advance our version.

  
  
Posted 2 years ago

I am currently on vacation, I'll ask my team mates. But if not I'll get to it next week

  
  
Posted 2 years ago

i believe this is because of transformer’s integration:

Automatic ClearML logging enabled.
ClearML Task has been initialized.

when a task already exists

  
  
Posted 2 years ago

that makes more sense 🙂
would this work now as a workaround until the version is released?

  
  
Posted 2 years ago

tnx! keep me posted

  
  
Posted 2 years ago

SmugDolphin23 SuccessfulKoala55 ^

  
  
Posted 2 years ago

@<1523701118159294464:profile|ExasperatedCrab78>
Here is an example that reproduces the second error

from clearml.automation import PipelineDecorator
from clearml import TaskTypes

@PipelineDecorator.component(task_type=TaskTypes.data_processing, cache=True)
def run_demo():
    from transformers import AutoTokenizer, DataCollatorForTokenClassification, AutoModelForSequenceClassification, TrainingArguments, Trainer
    from datasets import load_dataset
    import numpy as np
    import evaluate
    from pathlib import Path

    dataset = load_dataset("yelp_review_full")

    tokenizer = AutoTokenizer.from_pretrained("bert-base-cased")


    def tokenize_function(examples):
        return tokenizer(examples["text"], padding="max_length", truncation=True)
    
    
    def compute_metrics(eval_pred):
        logits, labels = eval_pred
        predictions = np.argmax(logits, axis=-1)
        return metric.compute(predictions=predictions, references=labels)

    
    small_train_dataset = dataset["train"].shuffle(seed=42).select(range(10))
    small_eval_dataset = dataset["test"].shuffle(seed=42).select(range(10))
    
    small_train_dataset = small_train_dataset.map(tokenize_function, batched=True)
    small_eval_dataset = small_eval_dataset.map(tokenize_function, batched=True)

    model = AutoModelForSequenceClassification.from_pretrained("bert-base-cased", num_labels=5)

    training_args = TrainingArguments(
        output_dir="test_trainer", 
        evaluation_strategy="epoch",
        # num_train_epoch=1,
    )
    
    metric = evaluate.load("accuracy")
    
    trainer = Trainer(
        model=model,
        args=training_args,
        train_dataset=small_train_dataset,
        eval_dataset=small_eval_dataset,
        compute_metrics=compute_metrics,
    )
    
    trainer.train()
    
    return Path('test_trainer')

@PipelineDecorator.component(task_type=TaskTypes.data_processing, cache=True)
def second_step(some_param):
    print("Success!")
    
@PipelineDecorator.pipeline(name="StuffToDelete", project=".Dev", version="0.0.2", pipeline_execution_queue="aws_cpu")
def pipeline():
    data = run_demo()
    second_step(data)

if __name__ == '__main__':
    PipelineDecorator.set_default_execution_queue("aws_cpu")
    
    PipelineDecorator.run_locally()
    
    pipeline()
  
  
Posted 2 years ago

Hey @<1523701949617147904:profile|PricklyRaven28> , So as discussed above there were 2 issues. The first one is still waiting on the second, it's on the backlog of our devs and should be done soon(tm).

That said, in the meantime I also wanted to do fun stuff with transformers, so I've written a quick hack that deals with the bug. It's bascially 2 functions that keep track of which types of keys are in the dict.

def cast_keys_to_string(d, changed_keys=dict()):
    nd = dict()
    for key in d.keys():
        if not isinstance(key, str):
            casted_key = str(key)
            changed_keys[casted_key] = key
        else:
            casted_key = key
        if isinstance(d[key], dict):
            nd[casted_key], changed_keys = cast_keys_to_string(d[key], changed_keys)
        else:
            nd[casted_key] = d[key]
    return nd, changed_keys

def cast_keys_back(d, changed_keys):
    nd = dict()
    for key in d.keys():
        if key in changed_keys:
            original_key = changed_keys[key]
        else:
            original_key = key
        if isinstance(d[key], dict):
            nd[original_key], changed_keys = cast_keys_back(d[key], changed_keys)
        else:
            nd[original_key] = d[key]
    return nd, changed_keys

You can then use them like this:

        training_args = TrainingArguments(
            output_dir="my_awesome_model",
            learning_rate=2e-5,
            per_device_train_batch_size=16,
            per_device_eval_batch_size=16,
            dataloader_num_workers=0,
            num_train_epochs=2,
            weight_decay=0.01,
            evaluation_strategy="epoch",
            save_strategy="epoch",
            load_best_model_at_end=True
        )

        # Allow ClearML access to the training args and allow it to override the arguments for remote execution
        args_class = type(training_args)
        args, changed_keys = cast_keys_to_string(training_args.to_dict())
        training_args = args_class(**cast_keys_back(args, changed_keys)[0])

        self.trainer = Trainer(
            model=self.model,
            args=training_args,
            train_dataset=tokenized_dataset["train"],
            eval_dataset=tokenized_dataset["test"],
            tokenizer=self.tokenizer,
            data_collator=data_collator,
            compute_metrics=self.compute_metrics,
        )

        self.trainer.train()

This "hack" in combination with the patch to Huggingface from above should work 🙂 That said, it is a hack, so a production version of this should be there soon. I'll let you know when that happens!

  
  
Posted 2 years ago

` from clearml.automation import PipelineDecorator
from clearml import TaskTypes

@PipelineDecorator.component(task_type=TaskTypes.data_processing, cache=True)
def run_demo():
from transformers import AutoTokenizer, DataCollatorForTokenClassification, AutoModelForTokenClassification, TrainingArguments, Trainer
from datasets import load_dataset

dataset = load_dataset("conllpp")

model_checkpoint = 'bert-base-cased'
lr = 2e-5
num_train_epochs  = 5
weight_decay = 0.01
seed = 1234

ner_feature = dataset["train"].features["ner_tags"]
label_names = ner_feature.feature.names
id2label = {str(i): label for i, label in enumerate(label_names)}
label2id = {v: k for k, v in id2label.items()}

tokenizer = AutoTokenizer.from_pretrained(model_checkpoint)

data_collator = DataCollatorForTokenClassification(tokenizer=tokenizer)

model = AutoModelForTokenClassification.from_pretrained(
    model_checkpoint,
    id2label=id2label,
    label2id=label2id,
)
trainer_args = TrainingArguments(
    './tmp',
    evaluation_strategy="epoch",
    save_strategy="epoch",
    learning_rate=lr,
    num_train_epochs=num_train_epochs,
    weight_decay=weight_decay,
    seed=seed,
    data_seed=seed,
    load_best_model_at_end=True,
)

trainer = Trainer(
    model=model,
    args=trainer_args,
    train_dataset=dataset["train"],
    eval_dataset=dataset["validation"],
    data_collator=data_collator,
    tokenizer=tokenizer,
)
trainer.train()

@PipelineDecorator.pipeline(name="StuffToDelete", project=".Dev", version="0.0.2", pipeline_execution_queue="aws_cpu")
def pipeline():
run_demo()

if name == 'main':
PipelineDecorator.set_default_execution_queue("aws_cpu")

PipelineDecorator.run_locally()

pipeline() `

This isn’t a real working example, but it shows that on clearml 1.7.2 it passed initialization part (and has an error on training stuff which is ok)

And on 1.9.0 it errors before on
TypeError: unsupported operand type(s) for +=: 'NoneType' and 'str'

  
  
Posted 2 years ago

However, I actually do think I can already open the Huggingface PR in the meantime. It has actually relatively little to do with the second bug.

  
  
Posted 2 years ago

Yes, and the old version only works without the patch.
I see the model on the artifacts tab, but it's not actually uploaded.

  
  
Posted 2 years ago

Nothing that i think is relevant, I'm using latest from master. It might be a new bug on their side, wasn't sure.

  
  
Posted 2 years ago

i’ll try to work on something that works on 1.7.2

  
  
Posted 2 years ago

I tried to work on a reproducible script but then i get errors that my clearml task is already initialized (also doesn’t happen on 1.7.2)

  
  
Posted 2 years ago

Hi @<1523701949617147904:profile|PricklyRaven28> ! We released ClearmlSDK 1.9.1 yesterday. Can you please try it?

  
  
Posted 2 years ago

for now we downgraded to 1.7.2, but of course prefer not to stay that way

  
  
Posted 2 years ago

Could you please run the misbehaving example, try to add a breakpoint in clearml/backend_interface/task/task.py in Task.update_output_model on the line with url = output_model.update_weights( , and tell me what the value of model_path is? In case you're using virtual environments, clearml library should be installed somewhere in <virtual env directory>/lib/python3.10/site-packages/clearml/

  
  
Posted 2 years ago

Hey @<1523701949617147904:profile|PricklyRaven28> , about the S3 loading issue. The path to the model in the artifact tab, is it an S3 bucket or a local path?

  
  
Posted 2 years ago

S3 as it should be

  
  
Posted 2 years ago

Good to hear 🙏

  
  
Posted 2 years ago

Now worries! Just so I understand fully though: you were already using the patch with success from my branch. Now that it has been merged into transformers main branch you installed it from there and that's when you started having issues with not saving models? Then installing transformers 4.21.3 fixes it (which should have the old clearml integration even before the patch?)

  
  
Posted 2 years ago

@<1523701118159294464:profile|ExasperatedCrab78>
Ok. bummer to hear that it won't be included automatically in the package.

I am now experiencing a bug with the patch, not sure it's to blame... but i'm unable to save models in the pipeline.. checking if it's related

  
  
Posted 2 years ago

confirming that only downgrading to transformers==4.21.3 without the patch worked....
This is a time bomb that eventually we won't be able to ignore... we will need to use new transformers code

  
  
Posted 2 years ago

I'm working with the patch, and installing transformers from github

  
  
Posted 2 years ago

I appreciate it!

  
  
Posted 2 years ago

@<1523701118159294464:profile|ExasperatedCrab78>
Hey 🙂
Any updates on this? We need to use a new version of transformers because of another bug they have in an old version. so we can’t use the old transformers version anymore.

  
  
Posted 2 years ago

When creating it, I found that this hack should be on our side, not on Huggingface's. So I'm only going to fix issue 1 with the PR, issue 2 is ours 🙂

  
  
Posted 2 years ago
162K Views
62 Answers
2 years ago
2 years ago
Tags
Similar posts