Reputation
Badges 1
611 × Eureka!SuccessfulKoala55 So what happens is, that always when/after the cleanup_service runs, clearml will throw these kind of errors
For example I get the following error if I simply clone and rerun:ERROR: Could not find a version that satisfies the requirement ruamel_yaml_conda>=0.11.14 (from conda==4.10.1->-r /tmp/cached-reqs6wtc73be.txt (line 28)) (from versions: none) ERROR: No matching distribution found for ruamel_yaml_conda>=0.11.14 (from conda==4.10.1->-r /tmp/cached-reqs6wtc73be.txt (line 28))
I have a related question: I read here that 4GB is a http limitation and ClearML will not chunk single files. I take from that, that ClearML did not want/there was no need to implement an own solution so far. But what about models that are larger than 4GB?
@<1523701994743664640:profile|AppetizingMouse58> Thank you very much. I forgot the volume mapping.
So can I just add the config to the async_delete container and mirror the directory structure from github?
volumes:
- /opt/clearml/config:/opt/clearml/config
- /opt/clearml/logs:/var/log/clearml
@<1576381444509405184:profile|ManiacalLizard2> I ll check again 🙂 thanks
I just wanna avoid that ClearML leaves files lingering around. Btw: a better default behavior in my opinion would be to delete tasks only after files have been deleted. And only with the force option to delete the task anyways!
Yea, when the server handles the deletes everythings fine and imo, that is how it should always have been.
I don't think it is a viable option. You are looking at the best case, but I think you should expect the worst from the users 🙂 Also I would rather know there is a problem and have some clutter than to hide it and never be able to fix it because I cannot identify which artifacts are still in use without spending a lot of time comparing artifact IDs.
I guess this is the current way to do it: https://github.com/tensorflow/tensorboard/issues/39#issuecomment-568917607 so I would say: Yes, it supports gif.
@<1576381444509405184:profile|ManiacalLizard2> Yes, exactly. I just didn't know how, but now it is all working 🙂
And yes, I have multiple credentials in the clearml.conf of the agents. It's not a good solution, but since I am currently limited to the free version of ClearML, it is the best I could do.
Ah, perfect. Did not know this. Will try! Thanks again! 🙂
Maybe this is something that is only possible with the vault of the enterprise version?
MortifiedDove27 Sure did, but I do not understand it very well. Else I would not be asking here for an intuitive explanation 🙂 Maybe you can explain it to me?
https://clearml.slack.com/archives/CTK20V944/p1620855259093200 This thread may also be interesting for you.
I see. Thank you very much. For my current problem giving priority according to queue priority would kinda solve it. For experimentation I will sometimes enqueue a task and then later enqueue a another one of a different kind, but what happens is that even though this could be trivially solved, I will have to wait for the first one to finish. I guess this is only a problem for people with small "clusters" where SLURM does not make sense, but no scheduling at all is also suboptimal.
However, I...
==> 2021-03-11 13:54:59 <==
# cmd: /home/tim/miniconda3/condabin/conda create --yes --mkdir --prefix /home/tim/.clearml/venvs-builds/3.8 python=3.8
# conda version: 4.9.2
+defaults/linux-64::_libgcc_mutex-0.1-main
+defaults/linux-64::ca-certificates-2021.1.19-h06a4308_1
+defaults/linux-64::certifi-2020.12.5-py38h06a4308_0
+defaults/linux-64::ld_impl_linux-64-2.33.1-h53a641e_7
+defaults/linux-64::libedit-3.1.20191231-h14c3975_1
+defaults/linux-64::libffi-3.3-he6710b0_2
+defaults/linux-64...
drwxr-xr-x 10 root root 4096 Jul 31 2020 .
drwxr-xr-x 14 root root 4096 Jul 31 2020 ..
drwxr-xr-x 2 root root 4096 Feb 4 13:52 bin
drwxr-xr-x 2 root root 4096 Jul 31 2020 etc
drwxr-xr-x 2 root root 4096 Jul 31 2020 games
drwxr-xr-x 2 root root 4096 Jul 31 2020 include
drwxr-xr-x 4 root root 4096 Feb 3 13:40 lib
lrwxrwxrwx 1 root root 9 Dez 10 14:29 man -> share/man
drwxr-xr-x 2 root root 4096 Jul 31 2020 sbin
drwxr-xr-x 7 root root 4096 Jul 31 2020 share
drwxr-xr-x ...
But this seems like something that is not related to clearml 🙂 Anyways, thanks again for the explanations!
Hey AgitatedDove14 is there any update on this?
Ah, now I see. This sounds like a good solution.
Okay, no worries. I will check first. Thanks for helping!
I will try again tomorrow. It s getting late! Thank you for helping so far!
Quick question: Where again does clearml place the venv? I wanna take a look into it after the task has failed
Sure, no problem!
Currently, my solution is to create an "agent-git" account and users can give read-access to this account which the clearml-agent then uses to clone. However, I find access-tokens to be a better solution. Unfortunately, clearml-agent removes the token from the git url
Local execution output:ClearML Task: created new task id=855948f5d73c47e2ae37bb821385e15b ======> WARNING! Git diff to large to store (2190kb), skipping uncommitted changes <====== ClearML results page: uploading artifact done uploading artifact 2021-02-05 16:24:56,112 - clearml.Task - INFO - Waiting to finish uploads 2021-02-05 16:24:58,499 - clearml.Task - INFO - Finished uploading
btw: With the ssh agent forwarding I do not have any issues ( https://github.com/allegroai/clearml-agent/issues/45 )
Oh, interesting!
So pip version on per task basis makes sense ;D?