In my case I use the conda freeze option and do not even have CUDA installed on the agents.
I was wrong: I think it uses the agent.cuda_version
, not the local env cuda version.
Ah, perfect. Did not know this. Will try! Thanks again! π
Ah, very cool! Then I will try this, too.
I just updated my server to 1.0 and now the services agent is stuck in restarting:
It seems like the services-docker is always started with Ubuntu 18.04, even when I usetask.set_base_docker( "continuumio/miniconda:latest -v /opt/clearml/data/fileserver/:{}".format( file_server_mount ) )
However, to use conda as package manager I need a docker image that provides conda.
` ocker-compose ps
Name Command State Ports
clearml-agent-services /usr/agent/entrypoint.sh Restarting
clearml-apiserver /opt/clearml/wrapper.sh ap ... Up 0.0.0.0:8008->8008/tcp, 8080/tcp, 8081/tcp ...
How can I get the agent log?
It is not explained there, but do you meanCLEARML_API_ACCESS_KEY: ${CLEARML_API_ACCESS_KEY:-} CLEARML_API_SECRET_KEY: ${CLEARML_API_SECRET_KEY:-}
?
I see. Thanks a lot!
Hey Martin, thank you for answering!
I see your point, however in my opinion this is really unexpected behavior. Sure, I can do some work to make it "safe", but shouldn't that be default. So throw an error without clearml.conf and expect CLEARML_USE_DEFAULT_SERVER=1
` .
Well, I guess no hurdles vs. safety is inherently no solvable. I am all for hurdles, if it is clear how to overcome it. And in my opinion referring to clearml-init
is something which makes sense from a developer and a user perspective.
Installed packages:
` # Python 3.7.10 (default, Feb 26 2021, 18:47:35) [GCC 7.3.0]
absl-py==0.12.0
aiostream==0.4.2
attrs==20.3.0
cached-property==1.5.2
cffi==1.14.5
chardet==4.0.0
clearml==0.17.5
cython==0.29.22
dm-control==0.0.364896371
dm-env==1.4
dm-tree==0.1.5
fasteners==0.16
furl==2.1.0
future==0.18.2
glfw==2.1.0
gym==0.18.0
h5py==3.2.1
humanfriendly==9.1
idna==2.10
imageio-ffmpeg==0.4.3
importlib-metadata==3.7.3
jsonschema==3.2.0
labmaze==1.0.4
lxml==4.6.3
moviepy==1.0.3
mujoco-py==...
I got some warnings about broken packages. I cleaned the conda cache with conda clean -a
` and now it installed fine!
Good to know the --debug
flag exists in master! π
Give me 5min and I send the full log
Yea I know, I reported this π .
The default behavior mimics Pythonβs assert statement: validation is on by default, but is disabled if Python is run in optimized mode (via python -O). Validation may be expensive, so you may want to disable it once a model is working.
I don't know actually. But Pytorch documentation says it can make a difference: https://pytorch.org/docs/stable/distributions.html#torch.distributions.distribution.Distribution.set_default_validate_args
Is there a way to specify this on a per task basis? I am running clearml-agent in docker mode btw.
Thank you very much, didnt know about that π
I can put anything there: s3://my_minio_instance:9000 /bucket_that_does_not_exist
and it will work.
These are the errors I get if I use file_servers without a bucket ( s3://my_minio_instance:9000 )
2022-11-16 17:13:28,852 - clearml.storage - ERROR - Failed creating storage object
Reason: Missing key and secret for S3 storage access (
) 2022-11-16 17:13:28,853 - clearml.metrics - WARNING - Failed uploading to
('NoneType' object has no attribute 'upload_from_stream') 2022-11-16 17:13:28,854 - clearml.storage - ERROR - Failed creating storage object
` Reason: Missing key...
I have set default_output_uri
to s3://my_minio_instance:9000/clearml
If I set files_server
to s3://my_minio_instance:9000 /bucket_that_does_not_exist
it fails at uploading metrics, but model upload still works:
WARNING - Failed uploading to
s3://my_minio_instance:9000/ bucket_that_does_not_exist
('NoneType' object has no attribute 'upload')
clearml.Task - INFO - Completed model upload to
s3://my_minio_instance:9000/clearml
What is ` default_out...