Quick question: Where again does clearml place the venv? I wanna take a look into it after the task has failed
channels:
- defaults
- conda-forge
- pytorch
dependencies:
- cudatoolkit==11.1.1
- pytorch==1.8.0
Gives CPU version
Ah, now I see. This sounds like a good solution.
Locally it works fine.
To answer my own question: In the WebUI where one inputs the credentials, use https
for the host instead of the auto-added http
It is weird though. The task is submitted by the original user and then run on the agent. The task however is still registered by the original user, since it is created by the original user.
Makes more sense to just inherit the user from the task than from the agent?
@<1523701994743664640:profile|AppetizingMouse58> Thank you very much. I forgot the volume mapping.
So can I just add the config to the async_delete container and mirror the directory structure from github?
volumes:
- /opt/clearml/config:/opt/clearml/config
- /opt/clearml/logs:/var/log/clearml
The package is just subdir by the way. So it should not be in installed packages anyways, right?
Perfect, just what I always wanted. Looking forward to the MinIo version. Thank you:)
==> 2021-03-11 13:54:59 <==
# cmd: /home/tim/miniconda3/condabin/conda create --yes --mkdir --prefix /home/tim/.clearml/venvs-builds/3.8 python=3.8
# conda version: 4.9.2
+defaults/linux-64::_libgcc_mutex-0.1-main
+defaults/linux-64::ca-certificates-2021.1.19-h06a4308_1
+defaults/linux-64::certifi-2020.12.5-py38h06a4308_0
+defaults/linux-64::ld_impl_linux-64-2.33.1-h53a641e_7
+defaults/linux-64::libedit-3.1.20191231-h14c3975_1
+defaults/linux-64::libffi-3.3-he6710b0_2
+defaults/linux-64...
Thanks, I will look into it. For me the weird thing is that saving works and only deletion fails somehow.
Yes, that works fine. Just the http vs https was the problem. The UI will automatically change s3://<minio-address>:<port>
to
http://<minio-address>:<port>
in http://myclearmlserver.org/settings/webapp-configuration . However what is needed for me is https://<minio-address>:<port>
Based on https://github.com/lanpa/tensorboardX/blob/34d1616c035faaa0f3f7c9d19cb8bb4425f19939/tensorboardX/summary.py#L355 I would guess that it is already encoded before added to the tensorboard summary.
You can add and remove clearml-agents to/from the clearml-server anytime.
Maybe if you have time you can take a look at the log I posted in the beginning. I think I have the same extra_index_url
and the nightly flag activated 😕
Thank you very much. I tested it on a different machine now and it works like intended. So there must be something misconfigured with this one machine.
Is this working in the latest version? clearml-agent falls back to /usr/bin/python3.8
no matter how I configure clearml.conf
Just want to make sure, so I can investigate what's wrong with my machine if it is working for you.
Installed packages:
` # Python 3.7.10 (default, Feb 26 2021, 18:47:35) [GCC 7.3.0]
absl-py==0.12.0
aiostream==0.4.2
attrs==20.3.0
cached-property==1.5.2
cffi==1.14.5
chardet==4.0.0
clearml==0.17.5
cython==0.29.22
dm-control==0.0.364896371
dm-env==1.4
dm-tree==0.1.5
fasteners==0.16
furl==2.1.0
future==0.18.2
glfw==2.1.0
gym==0.18.0
h5py==3.2.1
humanfriendly==9.1
idna==2.10
imageio-ffmpeg==0.4.3
importlib-metadata==3.7.3
jsonschema==3.2.0
labmaze==1.0.4
lxml==4.6.3
moviepy==1.0.3
mujoco-py==...
Maybe this opens up another question, which is more about how clearml-agent is supposed to be used. The "pure" way would be to make the docker image provide everything and clearml-agent should do not setup at all.
What I currently do instead is letting the docker image provide all system dependencies and let clearml-agent setup all the python dependencies. This allows me to reuse a docker image for more different experiments. However, then it would make sense to have as many configs as possib...
Hi CostlyOstrich36 , thank you for answering so quick. I think that s not how it works because if this was true, one would have to always match local machine to servers. Afaik clearml finds the correct PyTorch Version, but I was not sure how (custom vs pip does it)
An upload of 11GB took around 20 hours which cannot be right. Do you have any idea whether ClearML could have something to do with this slow upload speed? If not I am going to start debugging with the hardware/network.
Thanks for your help again. I will just use detect_with_conda_freeze: true
. Seems like a perfect solution for me!
Ah, it actually is also a string with remote_execution, but still not what it should be.
Seems more like a bug or something is not properly configured on my side.