Reputation
Badges 1
25 × Eureka!Generally speaking, for the exact reason if you are passing a list of files, or a folder, it will actually zip them and upload the zip file. Specifically to pipeline it should be similar. BTW I think you can change the number of parallel upload threads in StorageManager, but as you mentioned it is faster to zip into one file. Make sense?
SubstantialElk6 Ohh okay I see.
Let's start with background on how the agent works:
When the agent pulls a job (Task), it will clone the code based on the git credentials available on the host itself, or based on the git_user/git_pass configured in ~/clearml.conf
https://github.com/allegroai/clearml-agent/blob/77d6ff6630e97ec9a322e6d265cd874d0ab00c87/docs/clearml.conf#L18
The agent can work in two modes:
Virtual environment mode, where it will create a new venv for each experiment ba...
Hi SubstantialElk6
No need for that, you can use the helm chart (or spin them once with kubctl) then they take care of scheduling by themselves.
You can also use the k8s glue (basically spinning kubernetes pods automatically for you, based on the Tasks that you push into the ClearML queue)
https://github.com/allegroai/clearml-agent/blob/master/examples/k8s_glue_example.py
In short, two possible deployments
Static k8s pod running the agent (then the agent runs all the experiments inside t...
Hi RoughTiger69
A. Yes makes total sense . Basically you can use Task.export Task.import to do achieve this process (notice we assume the dataset artifacts links are available on both, usually this is the case)
B. The easiest way would be to use Process , then one subprocess is exporting from dev , where the credentials and configuration is passed with os environment. The another subprocess imports it to the prod server (again with os environment pointing to the prod server). Make sense?
WackyRabbit7 you can configure AWS autoscaler with two types of instances , with priority to one of them. So in theory you do not need two autoscaler processes, with that in mind I "think" single IAM should suffice
Hi SubstantialElk6
saved in the files_server (indicated in ClearML.conf) instead of the indicated output_uri in the dataset.create argument
What's the clearml SDK version ? how are you specifying the output target?
Yes 🙂
BTW: do you guys do remote machine development (i.e. Jupyter / vscode-server) ?
It is way too much to pass on env variable 😞
The easiest is to pass an entire trains.conf
file
Nice!
is trainsConfig
pure text blob ?
an implementation of this kind is interesting for you or do you suggest to fork
You mean adding a config map storing a default trains.conf for the agent?
but I can't seem to figure out a way to do something similar using a task in add_step
VexedCat68 With "add_step" it assumes the Task you are adding is self contained (i.e. there is no "return object" to serialize), this means you can only add arguments, or use the artifacts the Task (i.e. step) will recreate, obviously you knowing in advance what the step creates. Make sense ?
Hi ReassuredOwl55
How would I find Tasks that have the same code with different inputs/parameters?
Assuming you have the git repo
you can do:Task.query_tasks(..., task_filter={'_all_'=dict(fields=['script.repository'], pattern='github.com/user/repo'))
wdyt?
Hi JitteryCoyote63
Is this close ?
https://github.com/allegroai/clearml/issues/283
and the step is "queued" or is it "queued" in the pipeline state (i.e. the visualization did not update) ?
WackyRabbit7 I might be missing something here, but the pipeline itself should be launched on the "pipelines" queue, is the pipeline itself running? or is it the step itself that is stuck in ""queued" state?
So I'm gusseting the cli will be in the folder of python:import sys from pathlib2 import Path (Path(sys.executable).parent / 'cli-util-here').as_posix()
JuicyFox94
NICE!!! this is exactly what I had in mind.
BTW: you do not need to put the default values there, basically it reads the defaults from the package itself trains-agent/trains and uses the conf file as overrides, so this section can only contain the parts that are important (like cache location credentials etc)
Hi JuicyFox94
you pointed to exactly the issue 🙂
In your trains.conf
https://github.com/allegroai/trains/blob/f27aed767cb3aa3ea83d8f273e48460dd79a90df/docs/trains.conf#L94
I think this is great! That said, it only applies when you are spining agents (the default helm is for the server). So maybe we need another one? or an option?
Thanks @<1689446563463565312:profile|SmallTurkey79> ! 🙏
and I install the tar
I think the only way to do that is add it into the docker bash setup script (this is a bash script executed before Task)
When you install using pip <filename> you should end up with something like:minerva @ file://... or minerva @ https://...
So from foo.mod import
"translates" to foo-mod @ git+
None ..
?
Hi DilapidatedCow43
I'm assuming the returned object cannot be pickled (which is ClearML's way of serializing it)
You can upload it as a model with
` uploaded_model_url = Task.current_task().update_output_model(model_path="/path/to/local/model")
...
return uploaded_model_url `wdyt?
Gitlab has support for S3 based cache btw.
This might still be considered "slow" compared to local-dist/cluster mount
Would adding support for some sort of post task script help? Is something already there?
Interesting, can you expand on the use case? (currently there is only pre-task script, for setup)
But functionality is working
Awesome , I will wait with the merge until tested internally .
There is a resale coming out after the weekend, once it is out I expect we will merge it.
(or woman or in between, we are supportive as long as code is working 🙂 )