GentleSwallow91 how come it does not already find the correct pytorch version inside the docker ? whats the clearml-agent version you are using ?
Is there an easy way to add a docker argument in the python script?
On the task it self in the UI you can edit the docker arguments and add any missing flags
(task.set_base_docker will do the same from code)
You can also edit the configuration and always add this flag:
None
I understand, but then the toml file needs to be parsed to ensure poetry is used. It's just a tool entry in the pyproject.toml.
Probably too much for the agent... and specifically it seems poetry actually managed to parse it?! what are you getting in the log?
Hi @<1523702932069945344:profile|CheerfulGorilla72>
This is a property on the Model object
model.published
Not sure why we do not have it here...
None
(I'll ask them to fix that)
You described getting a secret key pair from the UI and feeding it back into the compose file. Does this mean it's not possible to seed the secrets in the compose file, starting from clean state? If so, that would explain why I can't get it to work.
Long story short, no. This would basically mean you have a pre-build credentials in the docker, this sounds dangerous π
I'm not sure I'm following the use case here, what exactly are we trying to do?
(or maybe I missed something here?)
So the agent installed okay. It's the specific Task that the agent is failing to create the environment for, correct?
if this is the case, what do you have in the "Installed Packages" section of the Task (see under the Execution tab)
Hi GiganticTurtle0
dataset_task = Task.get_task(task_id=dataset.id)
Hmmm I think that when it gets the Task "output_uri" is not updated from the predefined Task (you can obviously set it again).
This seems like a bug that is unrelated to Datasets.
Basically any Task that you retrieve will default to the default ouput_uri (not the stored one)
GiganticTurtle0 where in the code you set the output destination to "file:///home/mount/user/server_local_storage" ?
GiganticTurtle0 found it, fix will be pushed tomorrow π
Hmm could you try to upload to your files server (not the S3)
Maybe some credentials error ?
Hi GiganticTurtle0
Let me check
Ohh, two options:
From the script itself you can do:from clearml import Task task = Task.init(...) task.execute_remotely(queue='default')
Then run the script locally, it will get until the "execute_remotely call, quit the process and re-launch it on the "default" queue.
Option B:
Use the cleaml-task
$ clearml-task --folder <where the script is> --project ...
See https://github.com/allegroai/clearml/blob/master/docs/clearml-task.md#launching-a-job-from-a-local-script
You can change it the CWD folder, if you put .
in working dir it will be the root git repo, but you can do any subfolder, obviously you need to change the script path to match the folder, e.g. ./folder/script.py
etc.
Hi @<1541954607595393024:profile|BattyCrocodile47>
Do you mean to start a remote session instead of the cli directly from the vscode ui and connect to it? If so, that would be awesome!! We have a remote session from the web were it spins you remote session and launches vscode inside the container so you work on it in your browser. But a VSCode plugin is a great idea, do you have a ref code to similar plugins?
Hi CheerfulGorilla72 ,
Sure there are:
https://github.com/allegroai/clearml/tree/master/examples/frameworks/pytorch-lightning
what just happened next time and what is happening underneath.
Not sure I follow, is there still an issue ?
'
' error [Errno 13] Permission denied:
Seems like a permission issue ?
Try to remove your entire clearml cache folder None
the task is being Aborted rather than be in Draft. Am I missing something?
Yes, the reason is for not missing anything that you might have reported on it.
And usually execute_remotely will get the execution queue as a paramter (i.e. immdiatly launching the Task)
You can now (starting v1.0) enqueue an aborted Task so it should not make a difference, you can also reset the Task and edit it in the UI
Hi PanickyMoth78
So the current implantation of the pipeline parallelization is exactly like python async function calls:for dataset_conf in dataset_configs: dataset = make_dataset_component(dataset_conf) for training_conf in training_configs: model_path = train_image_classifier_component(training_conf) eval_result_path = eval_model_component(model_path)
Specifically here since you are passing the output of one function to another, image what happens is a wait operation, hence it ...
BTW: we are now adding "datasets chunks for a more efficient large dataset storage"
EnviousStarfish54
Can you check with the latest clearml from github?pip install git+
Hi EnthusiasticCoyote38
Does clearml-agent hasΒ option
Fully supported π
Should work out of the box, it will always clone with --recursive and will bring all submodules
I'm not sure this is configurable from the outside π
or creating a dedicated function I would suggest also including the actual sampled point in the HP space.
Could you expand ?
This would be the most common use case, and essentially the reason for running the HPO understanding the sensitivity of metrics with respect to hyper-parameters
Does this relates to:
https://github.com/allegroai/clearml/issues/430
manually" filtering the keys I've put in for the HP space. I find it a bit strange that they are not saved as part of t...
Correct, (if this is running on k8s it is most likely be passed via env variables , CLEARML_WEB_HOST etc,)
But first I want to make sure the verify argument is actually used, hence False