DeterminedToad86
So based on the log it seems the agent is installing:
torch from https://download.pytorch.org/whl/cu102/torch-1.6.0-cp36-cp36m-linux_x86_64.whl
and torchvision from https://torchvision-build.s3-us-west-2.amazonaws.com/1.6.0/gpu/cuda-11-0/torchvision-0.7.0a0%2B78ed10c-cp36-cp36m-manylinux1_x86_64.whl
See in the log:Warning, could not locate PyTorch torch==1.6.0 matching CUDA version 110, best candidate 1.7.0
But torchvision is downloaded from the cuda 11 folder...
I...
HighOtter69 , let me check something
is it possible to change an existing model's URL?
Edit the DBs ... That's basically the only way π
Hmm I guess that now that you mention it, not that obvious when I'm on a Mac as well, maybe we should have the archive button at the bottom as well..
SteadyFox10 What do you think?
The first pipeline
Β step is calling init
GiddyPeacock64 Is this enough to track all the steps?
I guess my main question is every step in the pipeline an actual Task/Job or is it a single small function?
Kubeflow is great for simple DAGs but when you need to build more complex logic it is usually a bit limited
(for example the visibility into what's going on inside each step is missing so you cannot make a decision based on that).
WDYT?
Sorry if it's something trivial. I recently started working with ClearML.
No worries, this has actually more to do with how you work with Dask
The Task ID is the unique id of the any Task in the system (task.id will return the UID str)
Can you post a toy Dash code here, I'll explain how to make it compatible with clearml π
CluelessFlamingo93 I would also fix the pip version requirements to:pip_version: ["<20.2 ; python_version < '3.10'", "<22.3 ; python_version >= '3.10'"]
basically PVC for all the DBs π
That is a bit odd, But SSH keys have to have a specific chmod flags for them to work (security issues)
What was the error ?
ElegantCoyote26 could be, if the Task run is under 30sec?!
It seems stuck somewhere in the python path... Can you check in runtime what's os.environ['PYTHONPATH']
Hi @<1556812486840160256:profile|SuccessfulRaven86>
it does not when I run a flask command inside my codebase. Is it an expected behavior? Do you have some workarounds for this?
Hmm where do you have your Task.init ?
(btw: what's the use case of a flask app tracking?)
Then I deleted those workers,
How did you delete those workers? the autoscaler is supposed to spin the ec2 instances down when they are idle, in theory there is no need for manual spin down.
I can't find out how to pass my custom clearml.conf
Hi @<1544491301435609088:profile|TeenyElk27>
The easiest is to map it into the container in your docker-compose
(map a host clearml.conf into /root/clearml.conf inside the container)
Hi @<1523704157695905792:profile|VivaciousBadger56>
You should replace
task.mark_completed()
with:
task.close()
To your point
parameters = task.connect(parameters)
Will be retrieved with:
task.get_parameters()
fyi:
connect_configuration -> get_configuration_objects
Not sure: They also have the feature store (data management), as mentioned, which is pretty MLOps-y
.
Right, sorry, I was thinking about "Nuclio", my bad.
How would you compare those to ClearML?
At least based on the documentation and git state I would say this is very early stages. In terms of features they "tick all the boxes", but I'll be a bit skeptic on the ability to scale and support these features.
Taking a look at the screenshots from the docs, it also seem...
Hi VivaciousBadger56
Basically you can think of MLRun as "amazon lambda service without amazon". It is designed to run a "function" in scale on multiple nodes.
ClearML on the other hand is an MLOps platform. It does the experiment tracking, it orchestrates Task (think jobs), it does data management and lastly we recently released the serving. These are two different use cases.
Am I making sense here?
In the documentation it warns about
.close()
"Only call Task.close if you are certain the Task is not needed."
Maybe this is not clear enough, this means you do not need to automatically Add/Log/Track things into the Task in the current process.
This does Not mean you cannot access the Task or its artifacts
Mark closed means to externally (i..e not from the process that crated the Task, maybe even from a different machine) close and mark the task as completed (this...
Scenario 1 & 2 are essentially the same from caching perspective (the face B != B` means they have different caching hashes, but in both cases are cached).
Scenario 3 is the basically removing the cache flag from those components.
Not sure if I'm missing something.
Back to the @<1523701083040387072:profile|UnevenDolphin73>
From decorators - when the pipeline logic is very straightforward ...
Actually I would disagree, the decorators should be used when the pipeline Logic is not a D...
- Components anyway need to be available when you define the pipeline controller/decorator, i.e. same codebaseNo you an specify a different code base, see here:
None - The component code still needs to be self-composed (or, function component can also be quite complex)Well it can address the additional repo (it will be automatically added to the PYTHONPATH), and you c...
MagnificentPig49 I was not aware of jsonargparse
from what I understand it's a nicer way to parse json configuration files, with argparser alike interface. Did I get that correctly?
Regrading the missing argparser, you are correct, the auto-magic is not working since jsonargparse
is calling an internal ArgParser function and not the external one (hence we miss it).
The quickest fix is adding the following line before you call parse_args()
:task.connect(parent_parser)
Hi @<1720249421582569472:profile|NonchalantSeaanemone34>
Is it possible to read data directly from server w/o using get_local_copy()?
do you mean an artifact ? what is direct here?
How did you add the args? Is it argparser? If so the help is automatically picked so you can see it in yhe UI. BTW, the ability to provide a list of options is a really cool feature to have, I'll make sure to pass ot to product π
others from the local environment and this causes a conflict when importing the attr module
Inside the docker ? " local environment" ?
This is all under "root" no?
VexedCat68
But what's happening is, that I only publish a dataset once but every time it polls,
this seems wrong (i.e a bug?!), how do you setup the trigger ? is the Trigger Task constantly running or are you re-launching it?
EnchantingWorm39 you have great timing ;)
one can containerise the whole pipeline and run it pretty much anywhere.
Does that mean the entire pipeline will be running on the instance spinning the container ?
From here: this is what I understand:
https://kedro.readthedocs.io/en/stable/10_deployment/06_kubeflow.html
My thinking was I can use one command and run all steps locally while still registering all "nodes/functions/inputs/outputs etc" with clearml such that I could also then later go into the interface and clone an...
Yes, because when a container is executed, the agent creates a new venv and inherits from the system wide installed packages, but it cannot inherit or "understand" there is an existing venv, and where it is.
Hi SmugLizard24
The question is what is the reason of the issue?
That is a good question, could it be out of memory? (trying to compress or send the file in one chunk?)
ImmensePenguin78
I think the latest RC adds it, should be released later today π
TrickyRaccoon92 Thanks you so much! π