If the same Task is run with different parameters...
ShinyWhale52 sorry, I kind of missed that in the explanation
The pipeline will always* create a new copy (clone) of the original Task (step), then modify the step's inputs etc.
The idea is that you have the experiment management (read execution management) to create full transparancy into the pipelines and steps. Think of it as the missing part in a lot of pipelines platforms where after you executed the pipeline you need to furthe...
Hi @<1523715429694967808:profile|ThickCrow29> , thank you for pinging!
We fixed the issue (hopefully) can you verify with the latest RC? 1.14.0rc0 ?
RoundMole15 how does the Task.init look like?
Ho @<1739818374189289472:profile|SourSpider22>
What are you trying to install, just the agent? if so pip install clearml-agent is all you need
JitteryCoyote63 no I think this is all controlled from the python side.
Let me check something
Hi NaughtyFish36
c++ module fails to import, anyone have any insight? required c++ compilers seem to be installed on the docker container.
Can you provide log for the failed Task?
BTW: if you need build-essentials you can add it as the Task startup scriptapt-get install build-essentials
Can you run the entire thing on your own machine (just making sure it doesn't give this odd error) ?
Well it should work out if the box as long as you have the full route, i.e. Section/param
Besides that, what are your impressions on these serving engines? Are they much better than just creating my own API + ONNX or even my own API + normal Pytorch inference?
I would separate ML frameworks from DL frameworks.
With ML frameworks, the main advantage is multi-model serving on a single container, which is more cost effective when it comes to multiple model serving. As well as the ability to quickly update models from the clearml model repository (just tag + publish and the end...
1
One reason I don't like using the configuration section is that it makes debugging much much harder.
debugging ? please explain how it relates to the configuration, and presentation (i.e. preview)
2.
Yes in theory, but in your case it will not change things, unless these "configurations" are copied on any Task (which is just storage, otherwise no real harm)
3.
I was thinking "zip" file that the Task creates and uploads, and a new configuration type, say "external/zip" , and in the c...
Yes, no reason to attach the second one (imho)
CheerfulGorilla72 could it be the server address has changed when migrating ?
With pleasure 🙂
My plan is to have a AWS Step Functions state machine (DAG) that treats running a ClearML job as one step (task) in the DAG.
...
Yep, that should work
That said, after you have that working, I would actually check pipelines + clearml aws autoscaler, easier setup, and possibly cheaper on the cloud (Lambda vs EC2 instance)
If this works, we might be able to fully replace Metaflow with ClearML!
Can't wait for your blog post on it 😉
Hi @<1523701323046850560:profile|OutrageousSheep60>
What do you mean by "in clearml server" ? I do not see any reason a subprocess call from a Task will be an issue. What am I missing ?
I'm checking the preview HTML and it seems like it was not uploaded...
, how do different tasks know which arguments were already dispatched if the arguments are generated at runtime?
A bit of how clearml-agent works (and actually on how clearml itself works).
When running manually (i.e. not executed by an agent), Task.init (and similarly task.connect etc.) will log data on the Task itself (i.e. will send arguments /parameters to the server), This includes logint the argparser for example (and any other part of the automagic or manuall connect).
When run...
When i say accessing, it means i want to use the data for training(without actually getting a local copy of it ).
How can you "access" it without downloading it ?
Do you mean train locally on a subset, then on the full dataset remotely ?
Hi SillyPuppy19
I think I lost you half way through.
I have a single script that launches training jobs for various models.
Is this like the automation example on the Github, i.e. cloning/enqueue experiments?
flag which is the model name, and dynamically loading the module to train it.
a Model has a UUID in the system as well, so you can use that instead of name (which is not unique), would that solve the problem?
This didn't mesh well with Trains, because the project a...
so "add_external_files" means the files are not actually uploaded, they are "registered" as external links. This means the upload is basically doing nothing , because there is nothing to upload
Where exactly are you getting the error?
If you edit the requirements to have
https://download.pytorch.org/whl/cpu/torch-1.4.0%2Bcpu-cp37-cp37m-linux_x86_64.whl
You mean does one solution is better than combining maintaining and automating 3+ solutions (dvc/lakefs + mlflow + cubeflow/airflow)
Yes I'd say it is. BTW if you have airflow running for other automations you can very easily combine the automation with clearml and have a single airflow automation for everything, but the main difference now airflow only launches logic, never actual compute/data (which are launched and scaled via clearml
Does that make sense?
Hi FunnyTurkey96
what's the clearml server you are using ?
I want the model to be stored in a way that clearml-serving can recognise it as a model
Then OutputModel or task.update_output_model(...)
You have to serialize it, in a way that later your code will be able to load it.
With XGBoost, when you do model.save clearml automatically picks and uploads it for you
assuming you created the Task.init(..., output_uri=True)
You can also manually upload the model with task.update_output_model or equivalent with OutputModel class.
if you want to dis...
Hi SuperiorCockroach75
You mean like turning on caching ? What do you mean by taking too long?
Yeah the doctring is always the most updated 🙂
Yes... I think that this might be a bit much automagic even for clearml 😄