Reputation
Badges 1
29 × Eureka!OK that's great, thanks for the info SuccessfulKoala55 👍
Thanks @<1523701087100473344:profile|SuccessfulKoala55> , I’ve taken a look and is this force merging you’re referring to? Do you know how often ES is configured to merge in clearml server?
Shards that I can see are using a lot of disk space are
events-training_stats_scalar
events-log
- And then various
worker_stats_*
Hi CostlyOstrich36 thanks for the response and makes sense.
What sort of problems could happen, would it just be the corruption of the data that is being written or could it be more breaking?
For context, I’m currently backing up the server (spinning it down) every night but now need to run tasks over night and don’t want to have any missed logs/artifacts when the server is shutdown.
Ok, thanks Jake!
I think a note about the fileserver should be added to the https://clear.ml/docs/latest/docs/deploying_clearml/clearml_server_security page!
CumbersomeCormorant74 just to confirm in my case the file's aren't actually deleted - I have to manually delete them from the fileserver via a terminal
CostlyOstrich36 I use the GCP disk image to launch a Compute Engine instance which sits behind an HTTP load balancer
Thanks CumbersomeCormorant74
connect_configuration
seems to take about the same amount of time unfortunately!
it might be an issue in the UI due to this unconventional address or network settings
I think this is related to an https://github.com/allegroai/clearml-server/issues/112#issue-1149080358 that seems to be a reoccurring issue across many different setups
Ah apologies for getting the wrong end of the stick a bit!
Not sure if it helps you or not, but when the link to an artifact didn't work for me it was because the URL being used was internal to the server (I had an agent that had access to internal endpoints). In my case setting the agent fileserver url to the public domain solved my issue.
Ah right, nice! I didn’t think it was as I couldn’t see it in the Task
reference , should it be there too?