one can containerise the whole pipeline and run it pretty much anywhere.
Does that mean the entire pipeline will be running on the instance spinning the container ?
From here: this is what I understand:
https://kedro.readthedocs.io/en/stable/10_deployment/06_kubeflow.html
My thinking was I can use one command and run all steps locally while still registering all "nodes/functions/inputs/outputs etc" with clearml such that I could also then later go into the interface and clone an...
Hi HandsomeCrow5 .
Remember the debug images are events with links to the actual images, so you first have to get the events and then you can download the images with https://allegro.ai/docs/examples/examples_storagehelper/#storagemanager (which by definition has the credentials, because it was able to upload them π
To get the events:from trains.backend_api.session.client import APIClient client = APIClient() client.events.debug_images(task='aabbcc')
Hi GracefulDog98
Are argument parameters to the script not passed on to the workers, or am I missing something?
The arguments are passed directly when the code is executed (i.e. the argparser parse_args is called).
If the code fails, I'm assuming the argparse is called before clearml is imported, could that be the case ?
Thanks @<1657918706052763648:profile|SillyRobin38> this is still in the internal git repo (we usually do not develop directly on github)
I want to get familiar with it and, if possible, contribute to the project.
This is a good place to start: None
we are still debating weather to sue it directly or as part of Triton ( None ) , would love to get your feedback
An easier fix for now will probably be some kind of warning to the user that a task is created but not connected
That is a good point, maybe if you do not have a "main" Task, then we print the warning (with some flag to disable the warning) ?
Yep, that would do it ...
You can disable it with:Task.init(..., auto_connect_frameworks={'scikit': False})
Hi CheerfulGorilla72
the "installed packages" section is used as "requirements.txt for the agent.
Are you saying the autodetection fails to detect all packages? You can specify in "manual execution" (i.e not when the agent is running the code), to just take the requirements.txt locally:` Task.force_requirements_env_freeze(requirements_file="./requirements.txt")
notice the above call should be executed Before Task.init
task = Task.init(...) `3. If you clear all the "installed packages" se...
Hi ClumsyElephant70
What's the clearml
you are using ?
(The first error is a by product of python process.Event created before a forkserver is created, some internal python issue. I thought it was solved, let me take a look at the code you attached)
This task is picked up by first agent; it runs DDP launch script for itself and then creates clones of itself with task.create_function_task() and passes its address as argument to the function
Hi UnevenHorse85
Interesting use case, just for my understanding, the idea is to use ClearML for the node allocation/scheduling and PyTorch DDP for the actual communication, is that correct ?
passes its address as argument to the function
This seems like a great solution.
the queu...
Although it's still really weird how it was failing silently
totally agree, I think the main issue was the agent had the correct configuration, but the container / env the agent was spinning was missing it,
I'll double check how come it did not print anything
Actually with
base-task-id
it uses the cached venv, thanks for this suggestion! Seems like this is equivalent to cloning via UI.
exactly !
But βcloningβ via UI runs an exact copy of the code/config, not a variant,
You can override the commit/branch and get the latest ...
run exp tweak code/configs in IDE, or tweak configs via CLI have it re-rerun in exact same venv (with no install overhead etc)So you can actually launch it remotely directly from the code:
...
So βwaitβ is a better metaphore for me
So I would do something like (I might have a few typos but that's the gist):
def post_execute_callback_example(a_pipeline, a_node):
# type (PipelineController, PipelineController.Node) -> None
print('Completed Task id={}'.format(a_node.executed))
# wait until model is tagged, then pass it as argument
while True:
found = Moodel.query_models(...) # model filter here, inlucing tag and project
if found:
...
Any other port that could be open? (if SSH is already open we cannot launch another daemon on the same port)
could you remove it and test ?
Could it be the credentials are actually incorrect? because it seems like you can access the server? (I assume you were able to browse to it and generate credentials. right?)
I'll try to go with this option, I think its actually perfect for my needs
Great!
Was going crazy for a short amount of time yelling to myself: I just installed clear-agent init!
oh noooooooooooooooooo
I can relate so much, happens to me too often that copy pasting into bash just uses the unicode character instead of the regular ascii one
I'll let the front-end guys know, so we do not make ppl go crazy π
AstonishingWorm64 can you share the full log (In the UI under Results/Console there is a download button)?
Hi @<1556812486840160256:profile|SuccessfulRaven86>
it does not when I run a flask command inside my codebase. Is it an expected behavior? Do you have some workarounds for this?
Hmm where do you have your Task.init ?
(btw: what's the use case of a flask app tracking?)
Then I deleted those workers,
How did you delete those workers? the autoscaler is supposed to spin the ec2 instances down when they are idle, in theory there is no need for manual spin down.
Hi @<1637624975324090368:profile|ElatedBat21>
I think that what you want is:
Task.add_requirements("unsloth", "@ git+
")
task = Task.init(...)
after you do that, what are you seeing in the Task "Installed Packages" ?
BTW: the new documentation should contain a full search over the docstring
PS. I just noticed that this function is not documented. I'll make sure it appears in the doc-string.
What's the clearml version? Is this with the latest from GitHub?
PompousParrot44 I think the website should address that:
https://allegro.ai/
But the TD;DR is the enterprise version adds Full Dataset Versioning on top, with end-to-end integration from code to DLOps (e.g.. data sampling , database query capabilities, data visualization, multi-site support, permission etc,)
Could you amend the original snippet (or verify that it also produces plots in debug samples) ?
(Basically I need something that I can run π )