Reputation
Badges 1
606 × Eureka!Thanks a lot. But even for a user, I can not set a default for all projects, right?
I see, I just checked the logs and it showsurllib3.exceptions.NewConnectionError: <urllib3.connection.HTTPConnection object at 0x7f246f0d6c18>: Failed to establish a new connection: [Errno 111] Connection refused [2022-04-29 08:45:55,018] [9] [WARNING] [elasticsearch] POST
[status:N/A request:0.000s]
Unfortunetely, there are no logs in /usr/share/elasticsearch/logs
to see what elastic was up to
Or maybe a different question: What is not
Artifacts and Models. debug samples (or anything else the Logger class creates)
?
Also it is not possible to use multiple files server? E.g. log tasks on different S3 buckets without changing clearml.conf
Thank you. I am not trying to use this option to speed up the setup. I have some package (carla simulator PythonAPI) that has no pip support (only easy_install). So I am thinking about just installing this manually on the worker, so that tasks can assume, that carla is provided by the system
I guess then it is hard to solve and probably not worth it for me to make suggestions without any knowledge about the internals 😕 Seems like a small weakness in the design of the open-source version. But not much of an issue 🙂
When I go into the GUI there are no artifacts displayed.
Yea, I am still trying to get docker to work with clearml. I do not have much experience with docker besides creating Dockerfiles and it seems like the ~/.ssh/config
ownership seems broken when mounted into the container on my workstations.
I can put anything there: s3://my_minio_instance:9000 /bucket_that_does_not_exist
and it will work.
I have set default_output_uri
to s3://my_minio_instance:9000/clearml
If I set files_server
to s3://my_minio_instance:9000 /bucket_that_does_not_exist
it fails at uploading metrics, but model upload still works:
WARNING - Failed uploading to
s3://my_minio_instance:9000/ bucket_that_does_not_exist
('NoneType' object has no attribute 'upload')
clearml.Task - INFO - Completed model upload to
s3://my_minio_instance:9000/clearml
What is ` default_out...
mytask.get_logger().current_logger().set_default_upload_destination("
s3://ip:9000/clearml ")
this is what I do. Do you do the same?
I am pretty sure there is a flag in the clearml.conf where you can specify which python binary to use.
Could be clean log after restart. Unfortunately, I restarted the server right away 😞 I gonna post if it happens again with the appropriate logs.
Makes sense, but this means that we are not able to tell clearml-agent where to save on a per-task basis? I see the output_destination set correctly in clearml web interface, but as you say, clearml-agent always uses its api.fileserver
?
SuccessfulKoala55 I just had the issue again. The logs show nothing of interest. It looks like OOM to me, but I will test this again with way larger SWAP, so the server only slows down, but does not kill something. Unfortunately, kernel logs also do not show much (maybe I have my server logs misconfigured, I am no expert).
What is interesting though is that docker only showed my nginx, minio and docker-registry to have exited, while all the clearml containers were still running. I restarted ...
Works with 1.4. Sorry for not checking versions myself!
I created an issue on using conda as package manager: https://github.com/allegroai/clearml-agent/issues/44
Yea, I also get it when I zoom in.
One question: Does clearml resolve the CUDA Version from driver or conda?
@<1523701994743664640:profile|AppetizingMouse58> Thank you very much. I forgot the volume mapping.
So can I just add the config to the async_delete container and mirror the directory structure from github?
volumes:
- /opt/clearml/config:/opt/clearml/config
- /opt/clearml/logs:/var/log/clearml
I installed my local conda environment from an environment.yml
without issues, so maybe clearml makes some changes that leads to conflicts which finally leads to the cpu-version install.
Tried to install cudatoolkit==11.1 manually in this environemnt and got:
Found conflicts! Looking for incompatible packages.
This can take several minutes. Press CTRL-C to abort.
failed
UnsatisfiableError: The following specifications were found to be incompatible with each other:
Package xz conflicts for:
python=3....
Here it is
My driver says "CUDA Version: 11.2" (I am not even sure this is correct, since I do not remember installing code in this machine, but idk) and there is no pytorch for 11.2, so maybe it fallbacks to cpu?
Thank you very much! 😃
Thank you very much, didnt know about that 🙂
` =============
== PyTorch ==
NVIDIA Release 22.03 (build 33569136)
PyTorch Version 1.12.0a0+2c916ef ...
Looking in indexes: ,
Requirement already satisfied: pip in /root/.clearml/venvs-builds/3.8/lib/python3.8/site-packages (22.0.4)
2022-04-07 16:40:57
Looking in indexes: ,
Requirement already satisfied: Cython in /opt/conda/lib/python3.8/site-packages (0.29.28)
Looking in indexes: ,
Requirement already satisfied: numpy==1.22.3 in /opt/conda/...
Is there a simple way to get the response of the MinIO instance? Then I can verify whether it is the MinIO instance or my client