he problem is due to tight security on this k8 cluster, the k8 pod cannot reach the public file server url which is associated with the dataset.
Understood, that makes sense, if this is the case then the path_substitution
feature is exactly what you are looking for
Hi GrievingTurkey78
Can you test with the latest clearml-agent RC (I remember a fix just for that)pip install clearml-agent==1.2.0rc0
Right! I just noticed that! this is odd... and yes defiantly has something to do with the multi pipeline executed on the agent, I think I know what to look for ...
(just making sure (again), running_locally produced exactly what we were expecting, is that correct?)
No worries, I'll see if I can replicate it anyhow
Hi PompousBeetle71
Try this one, let me know if it helpedlogging.getLogger('trains.frameworks').setLevel(ERROR)
What's the trains-server version ?
Yes! Thanks so much for the quick turnaround
My pleasure ๐
BTW: did you see this (it seems like the same bug?!)
https://github.com/allegroai/clearml-helm-charts/blob/0871e7383130411694482468c228c987b0f47753/charts/clearml-agent/templates/agentk8sglue-configmap.yaml#L14
So if I do this in my local repo, will it mess up my git state, or should I do it in a fresh directory?
It will install everything fresh into the target folder (including venv and code + uncommitted changes)
The file is never touched, nowhere in the process that file is deleted.
it should never have gotten there, this is not the git repo folder, it one level above...
Check the examples on the github page, I think this is what you are looking for ๐
https://github.com/allegroai/trains-agent#running-the-trains-agent
- Yes Task.init should be called on each subprocess (because torch forks them before they ar epatched)
- I think the main issue is that we patch the argparse on the Subprocess (this is assuming you did not manually parse non argv argument)
- If you can create a mock test I think we can work around the issue, as long as the way you spin it is the standard pytorch distub way
CooperativeFox72
Could you try to run the docker and then inside the docker try to do:su root whoami
WackyRabbit7
Long story short, yes, only by name (hashing might be too slow on large files)
The easiest solution, if the hash is incorrect, delete the local copy it returns, and ask again, it will download it.
I'm not sure if the hashing is exposed, but if it is not, we can add it.
What do you think?
But this will require some code changes...
In the main pipeline I want to work with the secondary pipeline and other functions decorated withย
PipelineDecorator
. Does ClearMl allow this? I have not been able to get it to work.
Usually when we think about pipelines or pipelines, the nested pipeline is just another Task you are running in the DAG (where the target queue is the services
queue).
When you say nested pipelines with decorators, what exactly do you have in mind ?
HealthyStarfish45 you mean as in RestAPI ?
but I cannot compare between them
I think we noticed it, and this will be fixed in the next server update (again, some plotly.js issue there)
You can try callingtask._update_repository()
I'm still trying to figure out how to reproduce it...
Specifically for model files, if you set the Task.init(..., output_uri=True) it will automatically upload any saved model to the files server (you can also pointย to any object storage / shared folder)
What's the framework you are using ?
Merged, is it working for you now?
Are you sure trains-server not trains package (i.e. backend)
Hi EmbarrassedSpider34clearml-init
will try to create ~/clearml.conf
I'm assuming that when you execute under root it is resolved to /root/clearml.conf
That said you might be able to override it with:CLEARML_CONFIG_FILE=$HOME/clearml.con sudo clearml-init
Thanks @<1523701868901961728:profile|ReassuredTiger98>
From the log this is what conda is installing, it should have worked
/tmp/conda_env1991w09m.yml:
channels:
- defaults
- conda-forge
- pytorch
dependencies:
- blas~=1.0
- bzip2~=1.0.8
- ca-certificates~=2020.10.14
- certifi~=2020.6.20
- cloudpickle~=1.6.0
- cudatoolkit~=11.1.1
- cycler~=0.10.0
- cytoolz~=0.11.0
- dask-core~=2021.2.0
- decorator~=4.4.2
- ffmpeg~=4.3
- freetype~=2.10.4
- gmp~=6.2.1
- gnutls~=3.6.13
- imageio~=2.9.0
-...
correct, you can pass it as keys on the "task_filter" argument, e.g:Task.get_tasks(..., task_filter={'status': ['failed']})