Yea, tensorboardX is using moviepy.
I have to correct myself, I do not even have CUDA installed. Only the driver and everything CUDA-related is provided by the docker container. This works with a container that has CUDA 11.4, but now I have one with 11.6 (latest nvidia pytorch docker).
However, even after changing the clearml.conf and overriding with CUDA_VERSION, the clearml-agent prints on the docker container agent.cuda_version = 114
! (Other changes to the clearml.conf on the agent are reflected in the docker, so only...
Hi CostlyOstrich36 , thank you for answering so quick. I think that s not how it works because if this was true, one would have to always match local machine to servers. Afaik clearml finds the correct PyTorch Version, but I was not sure how (custom vs pip does it)
I used the wrong docker container. The docker container I used had version 11.4. Interestingly, the override from clearml.conf and CUDA_VERSION Env variable did not work there.
With the correct docker container everything works fine. Shame on me.
Nvm, I think its my mistake. I will investigate.
The agent and server have similar hardware also. So I would expect same read/write speed.
I will debug this myself a little more.
Perfect, works! I was looking for "host", didn't come to my mind to search for "worker". Any idea about getting the user that created the task?
Or maybe even better: How can I get all the information of the "INFO" page in the WebUI of a task?
I think doing all that work is not worth it right now, I am just trying to understand why I clearml seems not to be designed something like this:
` task_name = args.task_name
task = Task()
task = task.load_statedict(await Task.load_or_create(task_name))
task.requirements.add(...)
await task.synchronize()
task.execute_remotely(queue_name, exit=True) `
Also, is max_workers about compression threads or upload threads or both?
I see. But I just realized: Subsampling means you just show every nth datapoint, right? I still do not get why this leads to some 0.5 values when in my plot there should only be 0 and 1.
[2021-05-07 10:53:00,566] [9] [WARNING] [elasticsearch] POST
` [status:N/A request:60.061s]
Traceback (most recent call last):
File "/usr/local/lib/python3.6/site-packages/urllib3/connectionpool.py", line 445, in _make_request
six.raise_from(e, None)
File "<string>", line 3, in raise_from
File "/usr/local/lib/python3.6/site-packages/urllib3/connectionpool.py", line 440, in _make_request
httplib_response = conn.getresponse()
File "/usr/lib64/python3.6/http/client.py", lin...
Or there should be an early error for trying to run conda based tasks on pip agents
Is there a way for me to configure/add the run arguments for the docker run
call?
I am pretty sure there is a flag in the clearml.conf where you can specify which python binary to use.
I think I still don't get how clearml is supposed to work/be used. Why wouldn't the following work currently?
Example:
` task = Task.init(...)
if not running_remotely:
task_dict = task.export_task()
requirements = task_dict["script"]["requirements"]["pip"].splitlines()
requirement_torch = [r for r in requirements if r.startswith("torch==")]
requirements.remove(requirement_torch[0])
requirements.append("torch >= 1.8.1")
task_dict["script"]["requirements"]["pip"] = "\n"....
No idea what's happening there.
So the environment variables are not set by the clearml-agent, but by clearml itself
Maybe a related question: Anyone every worked with datasets larger than the clearml-agent cache? Some colleague of mine has a dataset of ~ 1 tera byte...
mytask.get_logger().current_logger().set_default_upload_destination("
s3://ip:9000/clearml ")
this is what I do. Do you do the same?
It seems like the services-docker is always started with Ubuntu 18.04, even when I usetask.set_base_docker( "continuumio/miniconda:latest -v /opt/clearml/data/fileserver/:{}".format( file_server_mount ) )
Hi SuccessfulKoala55
I meant that in the WebUI deletion should only be allowed for artifacts for which deletion actually works.
For example I now have a lot of lingering artifacts that exist on the fileservers, but not on the clearml-api-server (I think).
Another example: I delete a task via WebUI. ClearML-server tries to delete the task and the artifacts belonging to the task. However, it will show that the task has been successfully deleted but some artifacts have not. Now there is no way...
I have set default_output_uri
to s3://my_minio_instance:9000/clearml
If I set files_server
to s3://my_minio_instance:9000 /bucket_that_does_not_exist
it fails at uploading metrics, but model upload still works:
WARNING - Failed uploading to
s3://my_minio_instance:9000/ bucket_that_does_not_exist
('NoneType' object has no attribute 'upload')
clearml.Task - INFO - Completed model upload to
s3://my_minio_instance:9000/clearml
What is ` default_out...