Reputation
Badges 1
22 × Eureka!Can confirm that for me usually increasing RAM solves the problem. ES is sometimes very aggressive.
It's plotly, yes
Setting sdk.development.default_output_uri to the same URL as api.files_server seems to do the trick.
Thanks for responding quickly. For this specific use case I need a regression sklearn model (trained in 10-fold CV) that I want to hyperoptimize using optuna. As my datasets are updated regularly, I'd like to define all of this in a pipeline such that I can easily run everything again once the data is changed.
Thanks, I solved it. It was a very weird error in putting back the backup
Yeah, both of them. The HPO though requires everything to be defined by python code. The Hydra config is parsed and stored nicely, but it isn't recognized as describing HPO.
I'm now thinking I need some main process that runs first a base_template task such that all gets initialized well. In the same process start the HPO which will add subtasks to the queue. This main process (also a task) will then wait until all other tasks (i.e. hyperparameter setups) have completed before wrapping up and reporting back.
O yeah, one more thing. The initial link you sent me contains the snippet that is written to file using cat but for me it only works with simply echo on a single line. If I copy from the website, it inserts weird end of line characters that mess it up (at least that's my hypothesis) - so you might want to consider putting a warning on the website or updating to the instruction below
echo 'db.model.find({uri:{$regex:/^http:\/\/10\.0\.0\.12:8081/}}).forEach(function(e,i) { e.uri = e.uri.r...
Could it be that here Failed getting object 10.0.0.12:8081/Esti/
it is without the 'http' part? That I also have to replace all those occurrences?
Hi @<1523701070390366208:profile|CostlyOstrich36> - I'm using WebApp: 1.16.2-502 • Server: 1.16.2-502 • API: 2.30.
You mean that created the task? I probably should have added to the problem description that I'm able to delete the task manually, also using the SDK.
I'll elaborate on the setup.
I'm deploying server in the recommended way with very minor changes. The relevant portions of the yamls:
agent-services:
networks:
- backend
container_name: clearml-agent-services
image: allegroai/clearml-agent-services:latest
deploy:
restart_policy:
condition: on-failure
...
Thanks for the quick and helpful answer @<1722061389024989184:profile|ResponsiveKoala38> ! It works. At least, in the sense that I can see my artifacts are updated. However, my datasets are still on the wrong address. How to update those as well?
Awesome, thanks very much for this detailed reply! This indeed seemed to have updated every url.
One note - I had to call the mongo host as --mongo-host
None
Of course, you can see it in the error message that I already shared - but here is another one just in case.
.venv/bin/python -c "from clearml import Dataset; Dataset.get(dataset_project='Esti', dataset_name='bulk_density')"
2024-10-09 18:56:03,137 - clearml.storage - WARNING - Failed getting object size: ValueError('Failed getting object 10.0.0.12:8081/Esti/.datasets/bulk_density/bulk_density.f66a70c6cda440dd8fdaccb52d5e9055/artifacts/state/state.json (401): UNAUTHORIZED')
2024-10-09 ...
Great thanks for the fast and extremely helpful answers!
Ok, thanks, I'll ask the networking people to dig into this.
Ok, even weirder now - the model paths seem updated to 172. but I have also the csv's as artifacts that are still at 10.
Any clues @<1722061389024989184:profile|ResponsiveKoala38> ?
Does this help in any way @<1523701087100473344:profile|SuccessfulKoala55> ? Should I provide something else instead?
Ah okay, this python script is meant to replace all the other scripts? That makes sense then 🙂