Reputation
Badges 1
606 × Eureka!Yea, and the script ends with clearml.Task - INFO - Waiting to finish uploads
So my network seems to be fine. Downloading artifacts from the server to the agents is around 100 MB/s, while uploading from the agent to the server is slow.
An upload of 11GB took around 20 hours which cannot be right. Do you have any idea whether ClearML could have something to do with this slow upload speed? If not I am going to start debugging with the hardware/network.
But it is not related to network speed, rather to clearml. I simple file transfer test gives me approximately 1 GBit/s transfer rate between the server and the agent, which is to be expected from the 1Gbit/s network.
Yea, correct! No problem. Uploading such large artifacts as I am doing seems to be an absolute edge case 🙂
I guess this is from clearml-server and seems to be bottlenecking artifact transfer speed.
I just realized that I forgot again that I am using importlib and this is probably why everythings weird. I tried to reproduce the error was a smaller project and was not able to get the error again. Sorry for having wasted your time! 😐
Thanks for your help again. I will just use detect_with_conda_freeze: true
. Seems like a perfect solution for me!
I was wondering whether some solution is builtin in clearml, so I do not have to configure each server manually. However, from your answer I take that this is not the case.
I just wanna add: I can run this task on the same workstation with the same conda installation just fine.
btw: I also tested the clearml-agent running on a different machine and with python 3.8 and I get the same problems.
So only short update for today: I did not yet start a run with conda 4.7.12.
But one question: Actually conda can not be at fault here, right? I can install pytorch just fine locally on the agent, when I do not use clearml(-agent)
Or there should be an early error for trying to run conda based tasks on pip agents
Can you ping me when it is updated in None so I can update my installation?
Would it help you diagnose this problem if I ran conda env create --file=environment.yml
and see whether it works?
From the logs when ran with --foreground I
I do not see any conda create
command.
Yes, that works fine. Just the http vs https was the problem. The UI will automatically change s3://<minio-address>:<port>
to
http://<minio-address>:<port>
in http://myclearmlserver.org/settings/webapp-configuration . However what is needed for me is https://<minio-address>:<port>
To answer my own question: In the WebUI where one inputs the credentials, use https
for the host instead of the auto-added http