Reputation
Badges 1
108 × Eureka!Hi @<1523701435869433856:profile|SmugDolphin23>
Confirming that rank0 process does not hang with the new version!
The accelerate CLI problem does still reproduce though (it's in my demo)
for now we downgraded to 1.7.2, but of course prefer not to stay that way
don’t have one ATM
@<1523701118159294464:profile|ExasperatedCrab78>
Hey again 🙂
I believe that the transformers patch wasn’t released yet right? we are getting into a problem where we need new features from transformers but can’t use because of this
using api.files_server? not default_output ?
what i'm doing is getting
parent = Task.get_task(task.parent)
and then checkingparent.data.user
but the user is some unknown id that doesn't exist in the all_users list
hi, yes we tried with the same result
BTW, i would expect this to happen automtically when running “local” and “debug”
also, i don’t need to change it during execution, i want it for a specific run
Yes, but it’s more complex because i’m using a pipeline… where i don’t explicitly call Task.init()
TimelyMouse69
Thanks for the reply, this is only regarding automatic logging, where i want to disable logging all together (avoiding the task being added to the UI)
It’s a lot of manual work that you need to remember to undo
My use case is developing the code, i don’t want to spam the UI
And i am logging some explicitly
It's with decorators.
Interesting, i wasn't aware of this python module for executing accelerate. I'll try to use that.
We used subprocess for it, but for some reason only when invoked in the pipeline the process freezes and doesn't close the main accelerate process. Works fine outside of clearml, any Idea?
It's models not datasets in our case...
But we can also just tar the folder and return that... Was just hoping to avoid doing that
` from clearml.automation import PipelineDecorator
from clearml import TaskTypes
@PipelineDecorator.component(task_type=TaskTypes.data_processing, cache=True)
def run_demo():
from transformers import AutoTokenizer, DataCollatorForTokenClassification, AutoModelForTokenClassification, TrainingArguments, Trainer
from datasets import load_dataset
dataset = load_dataset("conllpp")
model_checkpoint = 'bert-base-cased'
lr = 2e-5
num_train_epochs = 5
weight_decay =...
not sure about this, we really like being in control of reproducibility and not depend on the invoking machine… maybe that’s not what you intend
SmugSnake6 yep, that’s exactly it.
Hope the team is aware and will fix it
BTW the code above is from clearml github so it’s the latest
SmugDolphin23 BTW, this is using clearml and huggingface’s automatic logging… didn’t log something manual
i believe this is because of transformer’s integration:
Automatic ClearML logging enabled.
ClearML Task has been initialized.
when a task already exists
Hey, it took me some to check out.
I added 20 retries to check gpu driver, it says it finds the driver, but still the task starts without gpu driver
The pipeline is a bit complex, but it did that with a very dumb example
Saw it was merged 🙂 One down, one to go
We also wanted this, we preferred to create a docker image with all we need, and let the pipeline steps use that docker image
That way you don’t rely on clearml capturing the local env, and you can control what exists in the env
they also appear to be relying on the tensorboard callback which seems not to work on distributed training