Hi @<1633638724258500608:profile|BitingDeer35> ! You could attach the configuration using set_configuration_object
None in a pre_execute_callback
. The argument is set here: None
Basically, you would have something like:
def pre_callback(pipeline, node, params):
node.job.task.set_configuration_object(config)...
That makes sense. You should generally have only 1 task (initialized in the master process). The other subprocesses will inherit this task which should speed up the process
Hi @<1654294828365647872:profile|GorgeousShrimp11> ! add_tags
is an instance method, so you will need the controller instance to call it. To get the controller instance, you can do PipelineDecorator.get_current_pipeline()
then call add_tags
on the returned value. So: PipelineDecorator.get_current_pipeline().add_tags(tags=["tag1", "tag2"])
Hi @<1643060801088524288:profile|HarebrainedOstrich43> ! The rc is now out and installable via pip install clearml==1.14.1rc0
Hi @<1626028578648887296:profile|FreshFly37> ! Indeed, the pipeline gets tagged once it is running. Actually, it just tags itself. That is why you are encountering this issue. The version is derived in 2 ways: either you manually add the version using the version
argument in the PipelineController
, or the pipeline fetches the latest version out of all the pipelines that have ran, and auto-bumps that.
Please reference this function: [None](https://github.com/allegroai/clearml/blob/05...
because I think that what you are encountering now is an NCCL error
Oh I see. I think there is a mismatch between some clearml versions on your machine? How did you run these scripts exactly? (like the CLI, for example python test.py
?)
Or if you ran it via an IDE, what is the interpreter path?
Hi @<1714813627506102272:profile|CheekyDolphin49> ! It looks as if we can't report these plots as plotly plots so we default to Debug Samples. You should see both plots under Debug Samples
, but make sure you are setting the Metric
to -- All --
Hi @<1765547897220239360:profile|FranticShark20> ! Do you have any other logs that could help us debug this, such as tritonserver logs?
Also, can you use model.onnx
as the model file name both in the upload and default_model_filename, just to make sure this is not a file extension problem (this can happen with triton)
would that mean that multiple pre_callback()s would have to be defined for every add_step, since every step would have different configs? Sorry if there's something I'm missing, I'm still not quite good at working with ClearML yet.
Yes, you could have multiple callbacks, or you could check the name of each step via node.name
and map the name of the node to its config.
One idea would be to have only 1 pipeline config file, that would look like:
step_1:
# step_1 confi...
Hmm, in that case you might need to write it. Doesn’t hurt trying eitherway
Hi @<1590514584836378624:profile|AmiableSeaturtle81> , I think you are right. We will try to look into this asap
Hi FreshParrot56 ! This is currently not supported 🙁
Hi HandsomeGiraffe70 ! We found the cause for this problem, we will release a fix ASAP
That would be much appreciated
Hi @<1639074542859063296:profile|StunningSwallow12> !
This happens because the output_uri
in Task.init
is likely not set.
You could either set the env var CLEARML_DEFAULT_OUTPUT_URI
to the file server you want the model to be uploaded to before running train.py
or set sdk.development.default_upload_uri: true
(or to the file server you want the model to be uploaded to) in your clearml.conf
.
Also, you could call Task.init(output_uri=True)
in your train.py
scri...
Hi @<1590514584836378624:profile|AmiableSeaturtle81> ! To help us debug this: are you able to simply use the boto3
python package to interact with your cluster?
If so, how does that code look like? This would give us some insight on how the config should actually look like or what changes need to be made.
Hi @<1590514584836378624:profile|AmiableSeaturtle81> ! We have someone investigating the UI issue (I mainly work on the sdk). They will get back to you once they find something...
Hi @<1626028578648887296:profile|FreshFly37> ! You can get the version by doing:
p = Pipeline.get(...)
p._task._get_runtime_properties().get("version")
We will make the version more accessible in a future version
Hi @<1695969549783928832:profile|ObedientTurkey46> ! You could try increasing sdk.storage.cache.default_cache_manager_size
to a very large number
Hi @<1523708920831414272:profile|SuperficialDolphin93> ! What if you do just controller.start()
(to start it locally). The task should not quit in this case.
@<1643060801088524288:profile|HarebrainedOstrich43> we released 1.14.1 as an official version
Hi @<1688721797135994880:profile|ThoughtfulPeacock83> ! Make sure you set agent.package_manager.type: poetry
in your clearml.conf
. If you do, the poetry.lock of pyproject.toml will be used to install the packages. See None
Hi @<1523702000586330112:profile|FierceHamster54> ! This is currently not possible, but I have a workaround in mind. You could use the artifact_serialization_function
parameter in your pipeline. The function should return a bytes stream of the zipped content of your data with whichever compression level you have in mind.
If I'm not mistaken, you wouldn't even need to write a deserialization function in your case, because we should be able to unzip your data just fine.
Wdyt?
1.10.2 should be old enough
Hi @<1643060801088524288:profile|HarebrainedOstrich43> ! Thank you for reporting. We will get back to you as soon as we have something
Hi @<1643060801088524288:profile|HarebrainedOstrich43> ! Could you please share some code that could help us reproduced the issue? I tried cloning, changing parameters and running a decorated pipeline but the whole process worked as expected for me.
@<1523721697604145152:profile|YummyWhale40> are you able to manually save models from SageMaker using OutputModel
? None
Your object is likely holding some file descriptor or something like that. The pipeline steps are all running in separate processes (they can even run on different machines while running remotely). You need to make sure that the objects that you are returning are thus pickleable and can be passed between these processes. You can try to see that the logger you are passing around is indeed pickalable by calling pickle.dump(s)
on it an then loading it in another run.
The best practice would ...
Hi @<1654294820488744960:profile|DrabAlligator92> ! The way chunk size works is:
the upload will try to obtain zips that are smaller than the chunk size. So it will continuously add files to the same zip until the chunk size is exceeded. If the chunk size is exceeded, a new chunk (zip) is created. The initial file in this chunk is the file that caused the previous size to be exceeded (regardless of the fact that the file itself might exceed the size).
So in your case: am empty chunk is creat...