Reputation
Badges 1
6 × Eureka!Thanks for your reply ๐
We worked around the bug by only calling Dataset.add_files
once per folder that contains files (~120) using a wildcard, rather than for each individual file (~75,000)
I am unsure what effect this has, but I assume some log or other metadata was being created by the add_files
method, and calling it less times made the mongodb document smaller?
[Mongo has a way to store documents larger than the 16MB limit using GridFS](https://www.mongodb.com/docs/manual/...
Yes this solved it for me, although this is not ideal as pip will look in our private package repo and in PyPI for each dependency that is installed, slowing the environment setup process significantly
just for clarity this is the contents of my ~/clearml.conf
file, I'd like to replace this with env vars for programatically setting the config rather than editing this file
sdk {
aws {
s3 {
use_credentials_chain: true
}
}
}
Ah okay, thank you!
Would be cool if it worked on the SDK as well, in my case itโs much easier to manage env vars than the config file
Additionally , Iโd like to understand what is being stored in elasticsearch vs mongo, redis etc. from my understanding it is the metrics and console logs being stored in elastic?
Iโm thinking the solution may be to reduce the amount of metrics logged by averaging them locally and only reporting them once every 60s or so?
Or is there a way to tune the config of elastic, allowing it to handle the high volume of requests