
Reputation
Badges 1
25 × Eureka!task.models["outputs"][-1].tags
(plural, a list of strings) and yes I mean the UI π
I get the n_saved
what's missing for me is how would you tell the TrainsLogger/Trains the current one is the best? Or are we assuming the last saved model is always the best ? (in that case there is no need for tag, you just take the last in the list)
If we are going with: "I'm only saving the model if it is better than the previous checkpoint" then just always use the same name i.e. " http:/...
you can also just create a venv and run the tests there (with the latest python package) ?
Thanks CharmingShrimp37 !
Could you PR the fix ?
It will be just in time for the 0.16 release π
Hi DeliciousBluewhale87
This sounds like a great workflow to implement.
I guess my first question is how do you imagine the manager/director interacting with the system? What will they be shown, to allow them to approve/decline the model promotion ?
ThickDove42 Windows also works π
Any specifics on the setup?
Oh, yes, that might be (threshold is 3 minutes if no reports) but you can change that:task.set_resource_monitor_iteration_timeout(seconds_from_start=10)
Hi ElegantCoyote26
is there a way to get a Task's docker container id/name?
you mean like Task.get_task("task_id_here").get_base_docker()
?
ow a Task's results page also has a plot for this, but I guess it's at the machine level and not the task level?
This is actually on the container level, meaning checked from inside the container. It should be what you are looking for
Hmm that is odd, could it be you are changing the sys.path ?
(What I'm assuming is happening is that it detects the packages in the PYTHONPATH and for some reason the order is different so it finds the "system" package before the "venv" package, hence the incorrect version)
How did you define the decorator of "train_image_classifier_component" ?
Did you define:@PipelineDecorator.component(return_values=['run_model_path', 'run_tb_path'], ...
Notice two return values
Sure. JitteryCoyote63 so what was the problem? can we fix something?
Hmm, can you send the full log of the pipeline component that failed, because this should have worked
Also could you test it with the latest clearml python version (i.e. 1.10.2)
Hey JitteryCoyote63 I think I need to better explain the config feature:agent.package_manager.post_packages = ["PyJWT"]
Basically this means that IF you have pyjwt in the installation package it will be installed after everything else is installed.
This doesn't mean it will always be installed.
Think for example "horovod" has to be installed after you have TF / PyTorch installed.
(The same goes for "pre_package" and Cython)
In terms of creating dynamic pipelines and cyclic graphs, the decorator approach seems the most powerful to me.
Yes that is correct, the decorator approach is the most powerful one, I agree.
DeterminedToad86
So based on the log it seems the agent is installing:
torch from https://download.pytorch.org/whl/cu102/torch-1.6.0-cp36-cp36m-linux_x86_64.whl
and torchvision from https://torchvision-build.s3-us-west-2.amazonaws.com/1.6.0/gpu/cuda-11-0/torchvision-0.7.0a0%2B78ed10c-cp36-cp36m-manylinux1_x86_64.whl
See in the log:Warning, could not locate PyTorch torch==1.6.0 matching CUDA version 110, best candidate 1.7.0
But torchvision is downloaded from the cuda 11 folder...
I...
HighOtter69 , let me check something
is it possible to change an existing model's URL?
Edit the DBs ... That's basically the only way π
Hmm I guess that now that you mention it, not that obvious when I'm on a Mac as well, maybe we should have the archive button at the bottom as well..
SteadyFox10 What do you think?
The first pipeline
Β step is calling init
GiddyPeacock64 Is this enough to track all the steps?
I guess my main question is every step in the pipeline an actual Task/Job or is it a single small function?
Kubeflow is great for simple DAGs but when you need to build more complex logic it is usually a bit limited
(for example the visibility into what's going on inside each step is missing so you cannot make a decision based on that).
WDYT?
Sorry if it's something trivial. I recently started working with ClearML.
No worries, this has actually more to do with how you work with Dask
The Task ID is the unique id of the any Task in the system (task.id will return the UID str)
Can you post a toy Dash code here, I'll explain how to make it compatible with clearml π
CluelessFlamingo93 I would also fix the pip version requirements to:pip_version: ["<20.2 ; python_version < '3.10'", "<22.3 ; python_version >= '3.10'"]
That is a bit odd, But SSH keys have to have a specific chmod flags for them to work (security issues)
What was the error ?
ElegantCoyote26 could be, if the Task run is under 30sec?!
It seems stuck somewhere in the python path... Can you check in runtime what's os.environ['PYTHONPATH']
Hi @<1556812486840160256:profile|SuccessfulRaven86>
it does not when I run a flask command inside my codebase. Is it an expected behavior? Do you have some workarounds for this?
Hmm where do you have your Task.init ?
(btw: what's the use case of a flask app tracking?)
Then I deleted those workers,
How did you delete those workers? the autoscaler is supposed to spin the ec2 instances down when they are idle, in theory there is no need for manual spin down.
I can't find out how to pass my custom clearml.conf
Hi @<1544491301435609088:profile|TeenyElk27>
The easiest is to map it into the container in your docker-compose
(map a host clearml.conf into /root/clearml.conf inside the container)
Hi @<1523704157695905792:profile|VivaciousBadger56>
You should replace
task.mark_completed()
with:
task.close()
To your point
parameters = task.connect(parameters)
Will be retrieved with:
task.get_parameters()
fyi:
connect_configuration -> get_configuration_objects
Not sure: They also have the feature store (data management), as mentioned, which is pretty MLOps-y
.
Right, sorry, I was thinking about "Nuclio", my bad.
How would you compare those to ClearML?
At least based on the documentation and git state I would say this is very early stages. In terms of features they "tick all the boxes", but I'll be a bit skeptic on the ability to scale and support these features.
Taking a look at the screenshots from the docs, it also seem...
Hi VivaciousBadger56
Basically you can think of MLRun as "amazon lambda service without amazon". It is designed to run a "function" in scale on multiple nodes.
ClearML on the other hand is an MLOps platform. It does the experiment tracking, it orchestrates Task (think jobs), it does data management and lastly we recently released the serving. These are two different use cases.
Am I making sense here?