Reputation
Badges 1
611 × Eureka!I will debug this myself a little more.
These are the errors I get if I use file_servers without a bucket ( s3://my_minio_instance:9000 )
2022-11-16 17:13:28,852 - clearml.storage - ERROR - Failed creating storage object Reason: Missing key and secret for S3 storage access ( ) 2022-11-16 17:13:28,853 - clearml.metrics - WARNING - Failed uploading to ('NoneType' object has no attribute 'upload_from_stream') 2022-11-16 17:13:28,854 - clearml.storage - ERROR - Failed creating storage object ` Reason: Missing key...
I use fixed users!
It seems like the services-docker is always started with Ubuntu 18.04, even when I usetask.set_base_docker( "continuumio/miniconda:latest -v /opt/clearml/data/fileserver/:{}".format( file_server_mount ) )
Is this working in the latest version? clearml-agent falls back to /usr/bin/python3.8 no matter how I configure clearml.conf Just want to make sure, so I can investigate what's wrong with my machine if it is working for you.
What you mean by "Why not add the extra_index_url to the installed packages part of the script?"?
Maybe deletion happens "async" and is not reflected in parts of clearml? It seems that if I try to delete often enough at some point it is successfull
But here is the funny thing:
channels:
- pytorch
- conda-forge
- defaults
dependencies:
- cudatoolkit=11.1.1
- pytorch=1.8.0
Installs GPU
I restarted it after I got the errors, because as everyone knows, turning it off and on usually works 😄
Okay, thanks for explaining!
I am also wondering how I integrate my (preexisting) main task in the pipeline. I start my main task like this: python my_script.py --myarg "myargs" . How are the arguments captured? I am very confused, how one integrates this correctly...
For now I can tell you that with conda_freeze: true it fails, but with conda_freeze: false it works!
I see. I was just wondering what the general approach is. I think PyTorch used to ship the pip package without CUDA packaged into it. So with conda it was nice to only install CUDA in the environment and not the host. But with pip, you had to use the host version as far as I know.
I have no idea myself, but what the serverfault thread says about man-in-the-middle makes sense. However this also prohibits an automatic solution except for a shared known_hosts file I guess.
Sounds like a good hack, but not like a good solution 😄 But thank you anyways! 🙂
What's the reason for the shift?
So the environment variables are not set by the clearml-agent, but by clearml itself
Well, I guess no hurdles vs. safety is inherently no solvable. I am all for hurdles, if it is clear how to overcome it. And in my opinion referring to clearml-init is something which makes sense from a developer and a user perspective.
But I do not have anything linked correctly since I rely in conda installing cuda/cudnn for me
I just wanna add: I can run this task on the same workstation with the same conda installation just fine.
I see, so it is actually not related to clearml 🎉
Yea, but before in my original setup the config file was filled. I just added some lines to the config and now the error is back.
It is weird though. The task is submitted by the original user and then run on the agent. The task however is still registered by the original user, since it is created by the original user.
Makes more sense to just inherit the user from the task than from the agent?
Okay, great! I just want to run the cleanup services, however I am running into ssh issues so I wanted to restart it to try to debug.