
Reputation
Badges 1
25 × Eureka!MagnificentSeaurchin79 are you using the latest RC ?
(I think this was exactly the issue)
EDIT:
try to create the version withe the file removed after you upgrade to the latest RC (0.17.5rc3) in the summary you should see 1 file removed.
Okay Now I get it!
Let me think about it for an hour or two π
RobustGoldfish9
I think you need to set the trains-agent docker to be aware of the host, so it knows how to mount data/cache/configurations into the sibling docker
It should look something like:TRAINS_AGENT_DOCKER_HOST_MOUNT="/mnt/host/data:/root/.trains"
So if running a docker:docker run -e TRAINS_AGENT_DOCKER_HOST_MOUNT="/mnt/host/data:/root/.trains" ...
you should have a gpu argument there, set it to true
Hi GrievingTurkey78 yes, /opt/clearml should contain everything.
That said, backup only after you spin down the DBs so they serialize everything,
And how did you connect your example,yaml?
Thanks!
I think this one will cover both case (the issue is with files on the root of the dataset)if not (fnmatch(k, path) and fnmatch(k if '/' in k else '/{}'.format(k), '*/' + wildcard))}
Hi @<1523715429694967808:profile|ThickCrow29>
I am using the PipelineController with abort_on_failure set to False.
Is this a pipeline from code or from Tasks?
What is the clearml version?
Lastly, if a component fails, and another components is dependent on it's output, how would it run? if it is not dependent, why is it a child component?
Hi MagnificentSeaurchin79
This sounds like a deeper bug (of a sort), I think the best approach is to open a GitHub issue with some code that can reproduce this behavior, or at least enough information so that we could try to catch the bug.
This way we will make sure it is not forgotten.
Sounds good ?
MagnificentSeaurchin79 no need for the detection api (yes definitely a mess to setup), it will be more helpful to get a toy example.
Hi FierceHamster54
Are you saying the pipeline component is a standalone script?
If this is the case then you are correct, it should not need to, I think you can specify it in the decorator.
I think this might work π€@PipelineDecorator.component(..., repo=False)
Hi ColossalAnt7 , I think we run into it on a few dockers, I believe the bug was fixed in the latest trains-agent
RC. Could you verify please ?
ShinyLobster84
fatal: could not read Username for '
': terminal prompts disabled
This is the main issue, it needs git credentials to clone the repo code, containing the pipeline logic (this is the exact same behaviour as pipeline v1 execute_remotely(), which is now the default, could it be that before you executed the pipeline logic, locally ?)
WackyRabbit7 could the local/remote pipeline logic could apply in your case as well ?
I would suggest deleting them immediately when they're no longer needed,
This is the idea for the next RC, it will delete them after it is done using π
MagnificentSeaurchin79 making sure the basics work.
Can you see the 3D plots under the Plot section ?
Regrading the Tensors, could you provide a toy example for us to test ?
Ohh, sorry π:param run_pipeline_steps_locally: (default False) If True, run the pipeline steps themselves locally as a subprocess (use for debugging the pipeline locally, notice the pipeline code is expected to be available on the local machine)
WickedGoat98 are you running the agent with --gpus ?
WickedGoat98 the agent itself can be executed on bare metal, no need to setup a docker for it (although fully supported)
Specifically the docker compose has the docker running in services mode, i.e. for CPU light weight tasks such as running pipelines .
If the agent running on GPU, the easiest way to is run on bare metal
Ohh... I would not delete them then ... π
Maybe kind of heuristics (files created a week ago can be deleted?!)
And same behavior if I make the dependance explicty via the retunr of the first one
Wait, are you saying that in the code above, when you abort "step_a" , then "step_b" is executed ?
Okay let me check if we can reproduce, definitely not the way it is supposed to work π
WackyRabbit7 interesting! Are those "local" pipelines all part of the same code repository? do they need their own environment ?
What would be the easiest pipeline interface to run them locally? (I would if we could support this workflow, it seems you are not alone in this approach, and of course that you can always use them remotely, i.e. clone the pipeline and launch it on an agent)
I have a process that cleans theΒ
/tmp
Β each day,
WackyRabbit7 the files (configuration etc.) that are mapped into the containers are stored there.
They should clean themselves, that said, we have noticed that the services-mode skips this cleanup, and it will be solved on the next RC of clearml-agent.
Make sense ?
That's the right place but
like you would use hydra --override, which in your case I think it should be "accelerator.gpu" ,
You can also change allow_omegaconf_edit
in the UI to True, and then you could just edit the OmegaConf in the UI (if you do not change
allow_omegaconf_edit` then the edit in the UI is ignored)
I'm glad to hear π
If you can reproduce it, let me know
https://stackoverflow.com/questions/60860121/plotly-how-to-make-an-annotated-confusion-matrix-using-a-heatmap
MagnificentSeaurchin79 see plotly example here:
https://allegro.ai/clearml/docs/docs/examples/reporting/plotly_reporting.html
I started running it again and it seems to have passed the phase where it failed last time
Yey!
Yes it is a common case....
I have the feeling ShinyLobster84 WackyRabbit7 you are not alone in this one π let me make sure we change the default value of Yes it is a common case
to False, so the code looks cleaner
Hi PompousParrot44
So do you mean something like:
` task_model_a = Task.get('id_a')
task_model_b = Task.get('id_b')
model_a_file = task_model_a.models['output][-1].get_local_copy()
model_b_file = task_model_b.models['output][-1].get_local_copy() `