Hard to answer now. I just wiped everything and reinstalled. If I encounter this problem again, I will investigate further.
I am not sure what happened, but my experiments are gone. However, the data directory is still filled.
So I just tried again, but with manual deleting via Web UI.
I mean that locally I was able to install the correct version without a problem.
Any idea why deletion of artifacts on my second fileserver does not work?
fileserver_datasets: networks: - backend - frontend command: - fileserver container_name: clearml-fileserver-datasets image: allegroai/clearml:latest restart: unless-stopped volumes: - /opt/clearml/logs:/var/log/clearml - /opt/clearml/data/fileserver-datasets:/mnt/fileserver - /opt/clearml/config:/opt/clearml/config ports: - "8082:8081"
ClearML successfu...
No, it is just a pain to find files that have been deleted by a user, but are actually not deleted in the fileserver/s3 🙂
But no worries, nothing that is crucial.
I created this issue today, which can alleviate the pain temporarily: https://github.com/allegroai/clearml-server/issues/133
Yea, the clearml-data is immutable, but not the underlying data if I just store a pointer to some location.
For everyone who had the patience to read through everything, here is my solution to make clearml work with ssh-agent forwarding in the current version:
Start and ssh-agent Add ssh keys with ssh-add to agent echo $SSH_AUTH_SOCK and paste into clearml.conf as here: https://github.com/allegroai/clearml-agent/issues/45#issuecomment-779302144 (replace $SSH_AUTH_SOCKET with actually value) Move all the files except known_hosts
out of ~/.ssh
of the clearml-agent workstation. Start the...
Thanks for answering. I don't quite get your explanation. You mean if I have 100 experiments and I start up another one (experiment "101"), then experiment "0" logs will get replaced?
I am still trying to solve the add_requirements
+ importlib
combo. If I use detect_with_freeze
I can not use add_requirements
and if I use automatic code analysis it will not find all packages because of importlib
.
For now I come to the conclusion, that keeping a requirements.txt
and making clearml parse the requirements from there should be the most robust solution. Unfortunately, there seems to be no way to do this with Task.init
.
However, because of the import carla
it is added to the task requirements and clearml-agent tries to install it, although it is meant to be included at runtime.
One last question then I have everything solved: Is it possible to pass clearml the files to analyze manually? For example my setup consists of a run_this.py
and this_should_be_run_A.py
and this_should_be_run_B.py
. I can then programmatically choose which file to import with importlib. Is there a way to tell clearml programmatically to analyze the files, so it can built up the requirements correctly?
I was wondering whether some solution is builtin in clearml, so I do not have to configure each server manually. However, from your answer I take that this is not the case.
Okay, thanks for the info! I am currently not using k8s, but may be good to know for the future.
Thank you. Seems like someone implemented a type check Error: Dataset id=8d7355655830427f9243671c8cf0a6b0 is not of type Dataset
:)
Perfect and thank you for your efforts! :)
Hey AgitatedDove14 is there any update on this?
Mhhm, now conda env creation takes forever since it probably resolves conflicts. At least that is what is happening when I tried to manually install my environment
Makes sense, but it is not optimal if one of the agents is only able to handle tasks of a single queue (e.g. if the second agent can only work on tasks of type B).
drwxr-xr-x 10 root root 4096 Jul 31 2020 .
drwxr-xr-x 14 root root 4096 Jul 31 2020 ..
drwxr-xr-x 2 root root 4096 Feb 4 13:52 bin
drwxr-xr-x 2 root root 4096 Jul 31 2020 etc
drwxr-xr-x 2 root root 4096 Jul 31 2020 games
drwxr-xr-x 2 root root 4096 Jul 31 2020 include
drwxr-xr-x 4 root root 4096 Feb 3 13:40 lib
lrwxrwxrwx 1 root root 9 Dez 10 14:29 man -> share/man
drwxr-xr-x 2 root root 4096 Jul 31 2020 sbin
drwxr-xr-x 7 root root 4096 Jul 31 2020 share
drwxr-xr-x ...
name: core
channels:
- pytorch
- conda-forge
- defaults
dependencies:
- _libgcc_mutex=0.1
- _openmp_mutex=4.5
- blas=1.0
- bzip2=1.0.8
- ca-certificates=2020.12.5
- certifi=2020.12.5
- cudatoolkit=11.1.1
- ffmpeg=4.3
- freetype=2.10.4
- gmp=6.2.1
- gnutls=3.6.13
- jpeg=9b
- lame=3.100
- lcms2=2.11
- ld_impl_linux-64=2.33.1
- libedit=3.1.20191231
- libffi=3.3
- libgcc-ng=9.3.0
- libiconv=1.16
- libpng=1.6.37
- libstdcxx-ng=9.3.0
- libtiff...
Thank you very much, gonna try that!