Reputation
Badges 1
611 × Eureka!Is there a way to specify this on a per task basis? I am running clearml-agent in docker mode btw.
Thank you very much for the fast work!
One last question: Is it possible to set the pip_version task-dependent?
And how do I specify this in the output_uri ? The default file server is specified by passing True . How would I specify to use the second?
I only added# Python 3.8.2 (main, Nov 24 2022, 14:13:03) [GCC 11.2.0] --extra-index-url clearml torch == 1.14.0.dev20221205+cu117 torchvision == 0.15.0.dev20221205+cpuand I used a amd64/ubuntu:20.04 docker image with python3.8 . Same error. If it is not too much to ask, could you try to run it with this docker image?
I have an carla.egg file on my local machine and on the worker that I include with sys.path.append before I can do import carla . It is the same procedure on my local machine and on the clearml-agent worker.
For example in our case we do reinforcement learning and the we would call a script like this: python run_openai_gym.py some_ http://package.my _agent .
Good to know!
I think the current solutions are fine. I will try it first and probably will have some more questions/problems 🙂
Maybe the difference is that I am using pipnow and I used to use conda! The NVIDIA PyTorch container uses conda. Could that be a reason?
Hi KindChimpanzee37 I was more asking about the general idea to make these settings task-specific, but thank you for the suggestion anyways, I will definitely apply it.
And in the WebUI I can see arguments similar to the second print statement's.
These are the errors I get if I use file_servers without a bucket ( s3://my_minio_instance:9000 )
2022-11-16 17:13:28,852 - clearml.storage - ERROR - Failed creating storage object Reason: Missing key and secret for S3 storage access ( ) 2022-11-16 17:13:28,853 - clearml.metrics - WARNING - Failed uploading to ('NoneType' object has no attribute 'upload_from_stream') 2022-11-16 17:13:28,854 - clearml.storage - ERROR - Failed creating storage object ` Reason: Missing key...
Is there a simple way to get the response of the MinIO instance? Then I can verify whether it is the MinIO instance or my client
name: core
channels:
- pytorch
- anaconda
- conda-forge
- defaults
dependencies:
- _libgcc_mutex=0.1
- _openmp_mutex=4.5
- blas=1.0
- bzip2=1.0.8
- ca-certificates=2020.10.14
- certifi=2020.6.20
- cloudpickle=1.6.0
- cudatoolkit=11.1.1
- cycler=0.10.0
- cytoolz=0.11.0
- dask-core=2021.2.0
- decorator=4.4.2
- ffmpeg=4.3
- freetype=2.10.4
- gmp=6.2.1
- gnutls=3.6.13
- imageio=2.9.0
- jpeg=9b
- kiwisolver=1.3.1
- lame=3.100
- lcms2=2.11
-...
Thank you very much! 😃
Setting the api.files_server: s3://myhost:9000/clearml in clearml.conf works!
Nvm, that does not seem to be a problem. I added a part to the logs in the post above. It shows that some packages are found from conda.
Is there a way to capture uncommited changes with Task.create just like Task.init does? Actually, I would like to populate the repo, branch and packages automatically...
One question: Does clearml resolve the CUDA Version from driver or conda?
` args = parser.parse_args()
print(args) # args PRINTED HERE ON LOCAL
command = args.command
enqueue = args.enqueue
track_remote = args.track_remote
preset_name = args.preset
type_name = args.type
environment_name = args.environment
nvidia_docker = args.nvidia_docker
# Initialize ClearML Task
task = (
Task.init(
project_name="reinforcement-learning/" + type_name,
task_name=args.name or preset_name,
tags=...
That seems to be the case. After parsing the args I run task = Task.init(...) and then task.execute_remotely(queue_name=args.enqueue, clone=False, exit_process=True) .
Python 3.8.8, clearml 1.0.2
Good, at least now I know it is not a user-error 😄