So if any step corresponding to 'inference_orchestrator_1' fails, then 'inference_orchestrator_2' keeps running.
GiganticTurtle0 I'm not sure it makes sense to halt the entire pipeline if one step fails.
That said, how about using the post_execution callback, then check if the step failed, you could stop the entire pipeline (and any running steps), what do you think?
Woot woot! 🤩
Hmm that is odd.
Can you verify with the latest from GitHub?
Is this reproducible with the pipeline example code?
I see the problem now: conda is failing to install the package from the git, then it reverts to pip install, and pip just fails... " //github.com/ajliu/pytorch_baselines "
okay let me check
Hi LazyFish41
Could it be some permission issue on /home/quetalasj/.clearml/cache/
?
Hi OutrageousSheep60
AS-IS
- without compressing or breaking it up into chunks.
So for that I would suggest to manually archive it, and upload as external link?
Or are you saying you want to control the compression used by Dataset class ?
https://github.com/allegroai/clearml/blob/72d9b22e0d27f317a364acfeacbcf5c70f852e8c/clearml/datasets/dataset.py#L603
Thanks SmallDeer34 !
Would you like us to? How about a footnote/acknowledgement?
How about a reference / footnote ?@misc{clearml, title = {ClearML - Your entire MLOps stack in one open-source tool}, year = {2019}, note = {Software available from
}, url={
}, author = {allegro.ai}, }
Hi GrittyKangaroo27
Is it possible to import user-defined modules when wrapping tasks/steps with functions and decorators?
Sure, any package (local included) can be imported, and will be automatically listed in the "installed packages" section of the pipeline component Task
(This of course assumes that on a remote machine you could do the "pip install <package")
Make sense ?
Hi @<1643060801088524288:profile|HarebrainedOstrich43>
try this RC let me know if it works 🙂
pip install clearml==1.13.3rc1
I'm just trying to see what is the default server that is set, and is it responsive
I'm assuming you mean your own server, not the demo server, is that correct ?
and then second part is to check if it is up and alive
Yes, you can curl
to the ping endpoint :
https://clear.ml/docs/latest/docs/references/api/debug#post-debugping
Wow, thank you very much. And how would I bind my code to task?
you mean the code that creates pipeline Tasks ?
(remember the pipeline itself is a Task in the system, basically if your pipeline code is a single script it will pack the entire thing )
🤔 maybe we should have "sub nodes" as just visual functions running inside the same actual pipeline component ?
Yep 🙂 but only in RC (or github)
The quickest workaround would be, In your final code just do something like:my_params_for_hpo = {'key': omegaconf.key} task.connect(my_params_for_hpo, name='hpo_params') call_training_with_value(my_params_for_hpo['key'])
This will initialize the my_params_for_hpo
with the values from OmegaConf, and allow you to override them in the hyperparameyter section (task.connect is two, in manual it stores the data on the Task, in agent mode, it takes the values from the Task and puts them ba...
TenseOstrich47 this sounds like a good idea.
When you have a script, please feel free to share, I think it will be useful for other users as well 🙂
When we enqueue the task using the web-ui we have the above error
ShallowGoldfish8 I think I understand the issue,
basically I think the issue is:task.connect(model_params, 'model_params')
Since this is a nested dict:model_params = { "loss_function": "Logloss", "eval_metric": "AUC", "class_weights": {0: 1, 1: 60}, "learning_rate": 0.1 }
The class_weights is stored as a String key, but catboost expects "int" key, hence it fails.
One op...
wdym 'executed on different machines'?The assumption is that you have machines (i.e. clearml-agents) connected to clearml, which would be running all the different components of the pipeline. Think out of the box scale-up. Each component will become a standalone Job and the data will be passed (i.e. stored and loaded) automatically on the clearml-server (can be configured to be external object storage as well). This means if you have a step that needs GPU it will be launched on a GPU machine...
Yep it should :)
I assume you add the previous iteration somewhere else, and this is the cause for the issue?
Martin, if you want, feel free to add your answer in the stackoverflow so that I can mark it as a solution.
Will do 🙂 give me 5
but who exactly executes agent in this case?
with both execute
/ build
commands, you execute it on your machine, for debugging purposes. make sense ?
 is the "installed packages" part editable? good to know
Of course it is, when you clone a Task everything is Editable 🙂
Isn't it a bit risky manually changing a package version?
worst case it will crash quickly, and you reset/edit/enqueue 🙂
(Should work though)
BTW: the above error is a mismatch between the TF and the docker, TF is looking for cuda 10, and the docker contains cuda 11
BTW: Full RestAPI reference here
https://allegro.ai/clearml/docs/rst/references/clearml_api_ref/index.html
Correct, which makes sense if you have a stochastic process and you are looking for the best model snapshot. That said I guess the default use case would be min/max (and not the global variant)
Yes, but I'm not sure that they need to have separate task
Hmm okay I need to check if this can be easily done
(BTW, the downside of that, you can only cache a component, not a sub-component)