Reputation
Badges 1
44 × Eureka!Yes this will work I think.
I tried to removes all the images and content from docker with docker-compose down
and docker rmi
, also remove all the content in each directory of /opt/trains/
created by the containers, do you have any idea why this happens?
Issue open on the clearml-server github https://github.com/allegroai/clearml-server/issues/89 . Thanks for your help.
Thanks a lot I'll check how to do this correctly
Even simpler than a github, this code reproduce the issues I have.
I'll try to make a code that reproduce this behavior and post it on github is it fine ? that way you could check if I'm the problem (which is really likely) 😛
Is it better on clearml or clearml-server ?
I have made some changes in the codelogger.clearml_logger.report_image( self.tag, f"{self.tag}_{epoch:0{pad}d}", iteration=iteration, image=image ) `` epoch
range is 0-150 iteration
range is 0-100And the error is still there
` General data error (TransportError(503, 'search_phase_execution_exception', 'Trying to create too many buckets. Must be less than or equal to: [10000] but was [10001]. This limit can be set by changing the [search.max_buckets] clus...
So I see two options:
Reducing the number of image reported (already in our plan) Make on big image per epoch
yes tag is fixed
Something like 100 epoch with a least more than 100 images par epoch reported.
I have 6 plots with one or 2 metrics. But I have a lot of debug samples.
This is a run I made with the changes, As you can see the iteration now go from 0-111 and in each of them I have image with the name train_{001|150}
I call it like that:logger.clearml_logger.report_image( self.tag, f"{self.tag}_{iteration:0{pad}d}", epoch, image=image ) `` self.tag
is train
or valid
. iteration
is an int for the minibatch in the epoch
SuccessfulKoala55 feel free to roast my errors.