I will create a minimal example.
I have set default_output_uri to s3://my_minio_instance:9000/clearml
If I set files_server to s3://my_minio_instance:9000 /bucket_that_does_not_exist it fails at uploading metrics, but model upload still works:
WARNING - Failed uploading to s3://my_minio_instance:9000/ bucket_that_does_not_exist ('NoneType' object has no attribute 'upload')
clearml.Task - INFO - Completed model upload to s3://my_minio_instance:9000/clearml
What is ` default_out...
So clearml 1.0.1 clearml-agent 1.0.0 and clearml-server from master
The agent is run with pip. However, the docker image uses conda (because NVIDIA uses conda to build PyTorch most probably). My theory is that when the task is run the first time on an agent, Task.init will update the requirements. Then when ran a second time, the task will contain the requirements of the (conda-) environment from the first run.
Yea, I am still trying to get docker to work with clearml. I do not have much experience with docker besides creating Dockerfiles and it seems like the ~/.ssh/config ownership seems broken when mounted into the container on my workstations.
Yea, but doesn't this feature make sense on a task level? If I remember correctly, some dependencies will sometimes require different pip versions. And dependencies are on task basis.
` # Connecting ClearML with the current process,
from here on everything is logged automatically
task = Task.init(project_name="examples", task_name="artifacts example")
task.set_base_docker(
"my_docker",
docker_arguments="--memory=60g --shm-size=60g -e NVIDIA_DRIVER_CAPABILITIES=all",
)
if not running_remotely():
task.execute_remotely("docker", clone=False, exit_process=True)
timer = Timer()
with timer:
# add and upload Numpy Object (stored as .npz file)
task.upload_a...
@<1523701435869433856:profile|SmugDolphin23> Good catch. I have a good but unsatisfying message for you guys: I restarted the whole machine (server and agent) and now it works fine ...
Or alternatively I just saw that Task.create takes a requirements.txt as an argument. This would also be fine for me, however I am not sure whether I should use Task.create ?
I have to correct myself, I do not even have CUDA installed. Only the driver and everything CUDA-related is provided by the docker container. This works with a container that has CUDA 11.4, but now I have one with 11.6 (latest nvidia pytorch docker).
However, even after changing the clearml.conf and overriding with CUDA_VERSION, the clearml-agent prints on the docker container agent.cuda_version = 114 ! (Other changes to the clearml.conf on the agent are reflected in the docker, so only...
And in the WebUI I can see arguments similar to the second print statement's.
Okay, but are you logs still stored on MinIO with only using sdk.development.default_output_uri ?
Thanks, I will look into it. For me the weird thing is that saving works and only deletion fails somehow.