that makes more sense 🙂
would this work now as a workaround until the version is released?
Hey @<1523701949617147904:profile|PricklyRaven28> , about the S3 loading issue. The path to the model in the artifact tab, is it an S3 bucket or a local path?
This is the next step not being able to find the output of the last step
ValueError: Could not retrieve a local copy of artifact return_object, failed downloading
@<1523701118159294464:profile|ExasperatedCrab78> Sorry only saw this now,
Thanks for checking it!
Glad to see you found the issue, hope you find a way to fix the second one. for now we will continue using the previous version.
Would be glad if you can post when everything is fixed so we can advance our version.
@<1523701118159294464:profile|ExasperatedCrab78>
Here is an example that reproduces the second error
from clearml.automation import PipelineDecorator
from clearml import TaskTypes
@PipelineDecorator.component(task_type=TaskTypes.data_processing, cache=True)
def run_demo():
from transformers import AutoTokenizer, DataCollatorForTokenClassification, AutoModelForSequenceClassification, TrainingArguments, Trainer
from datasets import load_dataset
import numpy as np
import evaluate
from pathlib import Path
dataset = load_dataset("yelp_review_full")
tokenizer = AutoTokenizer.from_pretrained("bert-base-cased")
def tokenize_function(examples):
return tokenizer(examples["text"], padding="max_length", truncation=True)
def compute_metrics(eval_pred):
logits, labels = eval_pred
predictions = np.argmax(logits, axis=-1)
return metric.compute(predictions=predictions, references=labels)
small_train_dataset = dataset["train"].shuffle(seed=42).select(range(10))
small_eval_dataset = dataset["test"].shuffle(seed=42).select(range(10))
small_train_dataset = small_train_dataset.map(tokenize_function, batched=True)
small_eval_dataset = small_eval_dataset.map(tokenize_function, batched=True)
model = AutoModelForSequenceClassification.from_pretrained("bert-base-cased", num_labels=5)
training_args = TrainingArguments(
output_dir="test_trainer",
evaluation_strategy="epoch",
# num_train_epoch=1,
)
metric = evaluate.load("accuracy")
trainer = Trainer(
model=model,
args=training_args,
train_dataset=small_train_dataset,
eval_dataset=small_eval_dataset,
compute_metrics=compute_metrics,
)
trainer.train()
return Path('test_trainer')
@PipelineDecorator.component(task_type=TaskTypes.data_processing, cache=True)
def second_step(some_param):
print("Success!")
@PipelineDecorator.pipeline(name="StuffToDelete", project=".Dev", version="0.0.2", pipeline_execution_queue="aws_cpu")
def pipeline():
data = run_demo()
second_step(data)
if __name__ == '__main__':
PipelineDecorator.set_default_execution_queue("aws_cpu")
PipelineDecorator.run_locally()
pipeline()
Hey 🙂 Thanks for the update!
what i’m missing the is the point where you report to clearml between cast and casting back 🤔
Just for reference, the main issue is that ClearML does not allow non-string types as dict keys for its configuration. Usually the labeling mapping does have ints as keys. Which is why we need to cast them to strings first, then pass them to ClearML then cast them back.
Nothing that i think is relevant, I'm using latest from master. It might be a new bug on their side, wasn't sure.
for now we downgraded to 1.7.2, but of course prefer not to stay that way
@<1523701118159294464:profile|ExasperatedCrab78>
Hey again 🙂
I believe that the transformers patch wasn’t released yet right? we are getting into a problem where we need new features from transformers but can’t use because of this
@<1523701118159294464:profile|ExasperatedCrab78>
Hey 🙂
Any updates on this? We need to use a new version of transformers because of another bug they have in an old version. so we can’t use the old transformers version anymore.
i believe this is because of transformer’s integration:
Automatic ClearML logging enabled.
ClearML Task has been initialized.
when a task already exists
Hey @<1523701949617147904:profile|PricklyRaven28> I'm checking! Have you updated anything else and on which exact commit of transformers are you now?
I'm getting really weird behavior now, the task seems to report correctly with the patch... but the step doesn't say "uploading" when finished... there is a "return" artifact but it doesn't exist on S3 (our file server configuration)
SmugDolphin23 BTW, this is using clearml and huggingface’s automatic logging… didn’t log something manual
i believe this is because of this code
None
Which initialized the task if clearml is installed… but a task already exists (because of the pipeline), it will replace it
It should, but please check first. This is some code I quickly made for myself. It did make tests for it, but it would be nice to hear from someone else that it worked (as evidenced by the error above 😅 )
Damn it, you're right 😅
# Allow ClearML access to the training args and allow it to override the arguments for remote execution
args_class = type(training_args)
args, changed_keys = cast_keys_to_string(training_args.to_dict())
Task.current_task().connect(args)
training_args = args_class(**cast_keys_back(args, changed_keys)[0])
@<1523701949617147904:profile|PricklyRaven28> Please use this patch instead of the one previously shared. It excludes the dict hack :)
I tried to work on a reproducible script but then i get errors that my clearml task is already initialized (also doesn’t happen on 1.7.2)
Yes, and the old version only works without the patch.
I see the model on the artifacts tab, but it's not actually uploaded.
@<1523701118159294464:profile|ExasperatedCrab78>
Ok. bummer to hear that it won't be included automatically in the package.
I am now experiencing a bug with the patch, not sure it's to blame... but i'm unable to save models in the pipeline.. checking if it's related
in the meantime, we should have fixed this. I will ping you when 1.9.1 is out to try it out!