
Reputation
Badges 1
23 × Eureka!Does this help in any way @<1523701087100473344:profile|SuccessfulKoala55> ? Should I provide something else instead?
Hi @<1523701070390366208:profile|CostlyOstrich36> - I'm using WebApp: 1.16.2-502 • Server: 1.16.2-502 • API: 2.30.
Thanks, I solved it. It was a very weird error in putting back the backup
It's plotly, yes
You mean that created the task? I probably should have added to the problem description that I'm able to delete the task manually, also using the SDK.
I'll elaborate on the setup.
I'm deploying server in the recommended way with very minor changes. The relevant portions of the yamls:
agent-services:
networks:
- backend
container_name: clearml-agent-services
image: allegroai/clearml-agent-services:latest
deploy:
restart_policy:
condition: on-failure
...
Yeah, both of them. The HPO though requires everything to be defined by python code. The Hydra config is parsed and stored nicely, but it isn't recognized as describing HPO.
Great thanks for the fast and extremely helpful answers!
Could it be that here Failed getting object 10.0.0.12:8081/Esti/
it is without the 'http' part? That I also have to replace all those occurrences?
Ah okay, this python script is meant to replace all the other scripts? That makes sense then 🙂
Ok, even weirder now - the model paths seem updated to 172. but I have also the csv's as artifacts that are still at 10.
Any clues @<1722061389024989184:profile|ResponsiveKoala38> ?
Thanks for responding quickly. For this specific use case I need a regression sklearn model (trained in 10-fold CV) that I want to hyperoptimize using optuna. As my datasets are updated regularly, I'd like to define all of this in a pipeline such that I can easily run everything again once the data is changed.
venv mode I believe. Just the basic version that I can copy from your docs and then hit 'y' to execute it remotely.
Setting sdk.development.default_output_uri to the same URL as api.files_server seems to do the trick.
Ok, thanks, I'll ask the networking people to dig into this.
O yeah, one more thing. The initial link you sent me contains the snippet that is written to file using cat but for me it only works with simply echo on a single line. If I copy from the website, it inserts weird end of line characters that mess it up (at least that's my hypothesis) - so you might want to consider putting a warning on the website or updating to the instruction below
echo 'db.model.find({uri:{$regex:/^http:\/\/10\.0\.0\.12:8081/}}).forEach(function(e,i) { e.uri = e.uri.r...
Awesome, thanks very much for this detailed reply! This indeed seemed to have updated every url.
One note - I had to call the mongo host as --mongo-host
None
Thanks for the quick and helpful answer @<1722061389024989184:profile|ResponsiveKoala38> ! It works. At least, in the sense that I can see my artifacts are updated. However, my datasets are still on the wrong address. How to update those as well?
Can confirm that for me usually increasing RAM solves the problem. ES is sometimes very aggressive.
Of course, you can see it in the error message that I already shared - but here is another one just in case.
.venv/bin/python -c "from clearml import Dataset; Dataset.get(dataset_project='Esti', dataset_name='bulk_density')"
2024-10-09 18:56:03,137 - clearml.storage - WARNING - Failed getting object size: ValueError('Failed getting object 10.0.0.12:8081/Esti/.datasets/bulk_density/bulk_density.f66a70c6cda440dd8fdaccb52d5e9055/artifacts/state/state.json (401): UNAUTHORIZED')
2024-10-09 ...
I'm now thinking I need some main process that runs first a base_template task such that all gets initialized well. In the same process start the HPO which will add subtasks to the queue. This main process (also a task) will then wait until all other tasks (i.e. hyperparameter setups) have completed before wrapping up and reporting back.