Reputation
Badges 1
25 × Eureka!pipe.start_locally() will run the DAG compute part on the same machine, where pipe.start() will start it on a remote worker (if it is not already running on a remote worker)
basically "pipe.start()" executed via an agent, will start the compute (no overhead)
does that help?
None
This seems like the same discussion , no ?
EnviousPanda91 this seems like a specific issue with the clearml-task
cli, could that be ?
Can you send a full clearml-task command-line to test ?
That somehow the PV never worked and it was all local inside the pod
Hi ElegantCoyote26
If there is, it will have to be using the docker-mode, but I do not think this is actually possible because this is not a feature of docker. It is possible to do on k8s, but that's a diff level of integration π
EDIT:
FYI we do support k8s integration
That's why I want to keep it as separate tasks under a single pipeline.
Hmm Yes, if this is the case then you definitely have to have two Tasks (with execution info on each one).
So you could just create a "draft" pipeline Task and report everything to it? Does that make sense ?
(By design a pipeline is in charge of spinning the Tasks and pulling the data/metric from them if needed, in your case it sounds like you need the Tasks to push the data/metric onto the pipeline Task, this is ...
Hi GiganticTurtle0
The main issue is the cache=True
it will cause the second time you call the function to essentially reuse the Task, ending with the same result.
Can you test with cache=False
in the decorator ?
https://github.com/allegroai/clearml/issues/199
Seems already supported for a while now ...
I wonder, does it launch all "step two" instances in parallel ?
In theory it should , but in practice since these are the same "template" I'm not sure what would happen.
One last note, you can call PipelineDecorator.debug_pipeline()
to debug the pipeline locally, it will have the exact same behavior only it will run the steps as subprocesses.
I assume the task is being launched sequentially. I'm going to prepare a more elaborate example to see what happens.
Let me know if you can produce a mock test, I would love to make sure we support the use case, this is a great example of using pipeline logic π
Thanks GiganticTurtle0 !
I will try to reproduce with the example you provided. regardless I already took a look at the code, and I'm pretty sure I know what the issue is. We will be pushing a few fixes after the weekend, I'm hoping this one will be included as well π
Sigint (ctrl c) only
Because flushing state (i.e. sending request) might take time so only when users interactively hit ctrl c we do that. Make sense?
Have a wrapper over Task to ensure S3 usage, tags, version number etc and project name can be skipped and it picks from the env var
Cool. Notice that when you clone the Task and the agents executes it, the project is already defined, so this env variable is meaningless, no ?
if project_name is None and Task.current_task() is not None: project_name = Task.current_task().get_project_name()
This should have fixed it, no?
"General" is the parameter section name (like Args)
btw,Β
Β I launch the agentΒ
daemon
Β outside docker (withΒ
--docker
) , thatβs the way it is supposed to work right?
Yep that should work
is it ?
It seems to try to p[ull with SSH credentials, add your user/pass(or better APIkey) to the clearml.conf
(look for git_user /git_pass)
Should solve the issue
clearml-agent daemon --detached --queue manual_jobs automated_jobs --docker --gpus 0
If the user running this command can run "docker run", then you should ne fine
it works if I run the same command manually.
What do you mean?
Can you do:docker run -it <my container here> bash
Then immediately get an interactive bash ?
I don't know how I would be able to get the description and name?
Good point, how about doing that in code, then you have all the information and you can store it in jsons / pickle next to the data folder?
wdyt?
Yeah I can write a script to transfer it over, I was just wondering if there was a built in feature.
unfortunately no π
Maybe if you have a script we can put it somewhere?
Hi ShortElephant92
You could get a local copy from the local server, then switch credentials to the hosted server and upload again, would that work?
Hi @<1526371965655322624:profile|NuttyCamel41>
How are you creating the model? specifically what do you have in "config.pbtxt"
specifically any python code should be in the pre/post processing code (actually not running on the GPU instance)
Ephemeral Dataset, I like that! Is this like splitting a dataset for example, then training/testing, when done deleting. Making sure the entire pipeline is reproducible, but without storing the data long term?
Retrying (Retry(total=239, connect=240, read=240, redirect=240, status=240)) after connection broken by 'SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1129)'))': /auth.login
OH that makes sense I'm assuming on your local machine the certificate is installed but not on remote machines / containers
Add the following to your clearml.conf:
api.verify_certificate: false
[None](https...
See Args section in the screenshot
"Args/counter"