Reputation
Badges 1
97 × Eureka!that makes more sense 🙂
would this work now as a workaround until the version is released?
looks like it’s working 🙂 tnx
tnx, i just can’t use 1.7.1 because of the pipeline problem from before
i didn’t, prefer not to add temporary workarounds
We also wanted this, we preferred to create a docker image with all we need, and let the pipeline steps use that docker image
That way you don’t rely on clearml capturing the local env, and you can control what exists in the env
not sure about this, we really like being in control of reproducibility and not depend on the invoking machine… maybe that’s not what you intend
yeah, it gets to that error because the previous issue is saved…i’ll try to work on a new example
they also appear to be relying on the tensorboard callback which seems not to work on distributed training
It's models not datasets in our case...
But we can also just tar the folder and return that... Was just hoping to avoid doing that
also, i don’t need to change it during execution, i want it for a specific run
It’s a lot of manual work that you need to remember to undo
Yes, but it’s more complex because i’m using a pipeline… where i don’t explicitly call Task.init()
BTW, i would expect this to happen automtically when running “local” and “debug”
TimelyMouse69
Thanks for the reply, this is only regarding automatic logging, where i want to disable logging all together (avoiding the task being added to the UI)
My use case is developing the code, i don’t want to spam the UI
SmugDolphin23 BTW, this is using clearml and huggingface’s automatic logging… didn’t log something manual
i’m following this guide
https://docs.fast.ai/distributed.html#Learner.distrib_ctx
so you run it like thispython -m fastai.launch <script>
for now we downgraded to 1.7.2, but of course prefer not to stay that way
@<1523701435869433856:profile|SmugDolphin23>
Hey 🙂
Any update?
We are having more issues with transformers and clearml in their new version.
The step that has transformers 4.25.1
isn’t able to upload artifacts.
If we downgrade transformers==4.21.3
it works
` from clearml.automation import PipelineDecorator
from clearml import TaskTypes
@PipelineDecorator.component(task_type=TaskTypes.data_processing, cache=True)
def run_demo():
from transformers import AutoTokenizer, DataCollatorForTokenClassification, AutoModelForTokenClassification, TrainingArguments, Trainer
from datasets import load_dataset
dataset = load_dataset("conllpp")
model_checkpoint = 'bert-base-cased'
lr = 2e-5
num_train_epochs = 5
weight_decay =...
confirming that only downgrading to transformers==4.21.3
without the patch worked....
This is a time bomb that eventually we won't be able to ignore... we will need to use new transformers code
@<1523701118159294464:profile|ExasperatedCrab78>
Hey again 🙂
I believe that the transformers patch wasn’t released yet right? we are getting into a problem where we need new features from transformers but can’t use because of this
I tried to work on a reproducible script but then i get errors that my clearml task is already initialized (also doesn’t happen on 1.7.2)
This is the next step not being able to find the output of the last step
ValueError: Could not retrieve a local copy of artifact return_object, failed downloading
I'm working with the patch, and installing transformers from github
I'm getting really weird behavior now, the task seems to report correctly with the patch... but the step doesn't say "uploading" when finished... there is a "return" artifact but it doesn't exist on S3 (our file server configuration)
Traceback (most recent call last):
File "/tmp/tmpxlf2zxb9.py", line 31, in <module>
kwargs[k] = parent_task.get_parameters(cast=True)[return_section + '/' + artifact_name]
KeyError: 'return/return_object'
Setting pipeline controller Task as failed (due to failed steps) !
Traceback (most recent call last):
File "/usr/src/lib/clearml_test.py", line 69, in <module>
pipeline()
File "/opt/conda/lib/python3.10/site-packages/clearml/automation/controller.py", line 3914, in intern...