Reputation
Badges 1
611 × Eureka!If you think the explanation takes too much time, no worries! I do not want to waste your time on my confusion π
Is sdk.development.default_output_uri used with s3://ip:9000/clearml or ip:9000/clearml ?
Thank you very much for the fast work!
One last question: Is it possible to set the pip_version task-dependent?
And how do I specify this in the output_uri ? The default file server is specified by passing True . How would I specify to use the second?
I only added# Python 3.8.2 (main, Nov 24 2022, 14:13:03) [GCC 11.2.0] --extra-index-url clearml torch == 1.14.0.dev20221205+cu117 torchvision == 0.15.0.dev20221205+cpuand I used a amd64/ubuntu:20.04 docker image with python3.8 . Same error. If it is not too much to ask, could you try to run it with this docker image?
I have an carla.egg file on my local machine and on the worker that I include with sys.path.append before I can do import carla . It is the same procedure on my local machine and on the clearml-agent worker.
For example in our case we do reinforcement learning and the we would call a script like this: python run_openai_gym.py some_ http://package.my _agent .
Good to know!
I think the current solutions are fine. I will try it first and probably will have some more questions/problems π
The default behavior mimics Pythonβs assert statement: validation is on by default, but is disabled if Python is run in optimized mode (via python -O). Validation may be expensive, so you may want to disable it once a model is working.
Maybe the difference is that I am using pipnow and I used to use conda! The NVIDIA PyTorch container uses conda. Could that be a reason?
Hi KindChimpanzee37 I was more asking about the general idea to make these settings task-specific, but thank you for the suggestion anyways, I will definitely apply it.
Is ther a way to see the contents of /tmp/conda_envaz1ne897.yml ? Seems to be deleted after the task is finihsed
And in the WebUI I can see arguments similar to the second print statement's.
These are the errors I get if I use file_servers without a bucket ( s3://my_minio_instance:9000 )
2022-11-16 17:13:28,852 - clearml.storage - ERROR - Failed creating storage object Reason: Missing key and secret for S3 storage access ( ) 2022-11-16 17:13:28,853 - clearml.metrics - WARNING - Failed uploading to ('NoneType' object has no attribute 'upload_from_stream') 2022-11-16 17:13:28,854 - clearml.storage - ERROR - Failed creating storage object ` Reason: Missing key...
Is there a simple way to get the response of the MinIO instance? Then I can verify whether it is the MinIO instance or my client
I will debug this myself a little more.
name: core
channels:
- pytorch
- anaconda
- conda-forge
- defaults
dependencies:
- _libgcc_mutex=0.1
- _openmp_mutex=4.5
- blas=1.0
- bzip2=1.0.8
- ca-certificates=2020.10.14
- certifi=2020.6.20
- cloudpickle=1.6.0
- cudatoolkit=11.1.1
- cycler=0.10.0
- cytoolz=0.11.0
- dask-core=2021.2.0
- decorator=4.4.2
- ffmpeg=4.3
- freetype=2.10.4
- gmp=6.2.1
- gnutls=3.6.13
- imageio=2.9.0
- jpeg=9b
- kiwisolver=1.3.1
- lame=3.100
- lcms2=2.11
-...
Thank you very much! π
Setting the api.files_server: s3://myhost:9000/clearml in clearml.conf works!
Nvm, that does not seem to be a problem. I added a part to the logs in the post above. It shows that some packages are found from conda.
Is there a way to capture uncommited changes with Task.create just like Task.init does? Actually, I would like to populate the repo, branch and packages automatically...
When I passed specific arguments (for example --steps) it ignored them...
I am not sure what you mean by this. It should not ignore anything.
One question: Does clearml resolve the CUDA Version from driver or conda?