Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
Hi Everyone, Is It Possible To Show The Upload Progress Of Artificats? E.G. I Use

Hi everyone, is it possible to show the upload progress of artificats? E.g. I use torch.save to store some very large model, so it hangs forever when it uploads the model. Is there some flag to show a progress bar?

  
  
Posted one year ago
Votes Newest

Answers 26


I use 

torch.save

 to store some very large model, so it hangs forever when it uploads the model. Is there some flag to show a progress bar?

I'm assuming the upload is http upload (e.g. the default files server)?
If this is the case, the main issue we do not have callbacks on http uploads to update the progress (which I would love a PR for, but this is actually a "requests" issue)
I think we had a draft somewhere, but I'm not sure ...

  
  
Posted one year ago

Hi ReassuredTiger98 ,
Do you see the Starting upload:... message in the log?

  
  
Posted one year ago

ReassuredTiger98 after 20 hours, was it done uploading ?
What do you see in the Task resource monitoring? (notice there is network_tx_mbs metric that should be accordig to this, 0.152)

  
  
Posted one year ago

I guess this is from clearml-server and seems to be bottlenecking artifact transfer speed.

  
  
Posted one year ago

An upload of 11GB took around 20 hours which cannot be right.

That is very very slow this is 152kbps ...

  
  
Posted one year ago

So my network seems to be fine. Downloading artifacts from the server to the agents is around 100 MB/s, while uploading from the agent to the server is slow.

  
  
Posted one year ago

As far as I know the automatic binding uses async upload, which should be verbose

  
  
Posted one year ago

Yea, correct! No problem. Uploading such large artifacts as I am doing seems to be an absolute edge case 🙂

  
  
Posted one year ago

Agent runs in docker mode. I ran the agent on the same machine as the server this time.

  
  
Posted one year ago

An upload of 11GB took around 20 hours which cannot be right. Do you have any idea whether ClearML could have something to do with this slow upload speed? If not I am going to start debugging with the hardware/network.

  
  
Posted one year ago

Seems more like a bug or something is not properly configured on my side.

  
  
Posted one year ago

` # Connecting ClearML with the current process,

from here on everything is logged automatically

task = Task.init(project_name="examples", task_name="artifacts example")
task.set_base_docker(
"my_docker",
docker_arguments="--memory=60g --shm-size=60g -e NVIDIA_DRIVER_CAPABILITIES=all",
)

if not running_remotely():
task.execute_remotely("docker", clone=False, exit_process=True)

timer = Timer()
with timer:
# add and upload Numpy Object (stored as .npz file)
task.upload_artifact("Numpy Eye", np.eye(100000, 100000))

print(timer.duration)

we are done

print("Done") `

  
  
Posted one year ago

481.2130692792125 seconds

This is very slow.
It makes no sense, it cannot be network (this is basically http post, and I'm assuming both machines on the same LAN, correct ?)
My guess is the filesystem on the clearml-server... Are you having any other performance issues ?
(I'm thinking HD degradation, which could lead to a slow write speeds, which would effect the Elastic/Mongo as well)

  
  
Posted one year ago

But it is not related to network speed, rather to clearml. I simple file transfer test gives me approximately 1 GBit/s transfer rate between the server and the agent, which is to be expected from the 1Gbit/s network.

  
  
Posted one year ago

481.2130692792125 seconds
Done

  
  
Posted one year ago

ReassuredTiger98 is it possible the fileserver component's data folder mount is incorrect? This would mean the docker FS is used and can maybe account for the low performance?

  
  
Posted one year ago

Simple file transfer test gives me approximately 1 GBit/s transfer rate between the server and the agent, which is to be expected from the 1Gbit/s network.

Ohhh I missed that. What is the speed you get for uploading the artifacts to the server? (you can test it with simple toy artifact upload code) ?

  
  
Posted one year ago

I guess this is from clearml-server and seems to be bottlenecking artifact transfer speed.

I'm assuming you need multiple "file-server" instances running on the "clearml-server" with a load-balancer of a sort...

  
  
Posted one year ago

Artifact Size: 74.62 MB

  
  
Posted one year ago

Yea, it was finished after 20 hours. Since the artifact started uploading when the experiment finishes otherwise, there is no reporting for the the time where it uploaded. I will debug it and report what I find out

  
  
Posted one year ago

AgitatedDove14 Yea, I also had this problem: https://github.com/allegroai/clearml-server/issues/87 I have Samsung 970 Pro 2TB on all machines, but maybe something is missconfigured like SuccessfulKoala55 suggested. I will take a look. Thank you for now!

  
  
Posted one year ago

I see a python 3 fileserver.py running on a single thread with 100% load.

  
  
Posted one year ago

server-->agent is fast, but agent-->server is slow.

Then multiple connection will not help, this is the bottleneck of the upload speed of your machine, regardless of what the target is (file-server, S3, etc...)

  
  
Posted one year ago

Yea, and the script ends with clearml.Task - INFO - Waiting to finish uploads

  
  
Posted one year ago

The agent and server have similar hardware also. So I would expect same read/write speed.

  
  
Posted one year ago

It is only a single agent that is sending a single artifact. server-->agent is fast, but agent-->server is slow.

  
  
Posted one year ago
226 Views
26 Answers
one year ago
8 months ago
Tags