
Reputation
Badges 1
50 × Eureka!@<1590514584836378624:profile|AmiableSeaturtle81> this was last time i tried: https://clearml.slack.com/archives/CTK20V944/p1725534932820309
diff --git a/docker-compose.yml b/docker-compose.diff.yml
index c6b49e1..07f7f43 100644
--- a/docker-compose.yml
+++ b/docker-compose.diff.yml
@@ -5,7 +5,7 @@ services:
command:
- apiserver
container_name: clearml-apiserver
- image: allegroai/clearml:1.15.0
+ image: allegroai/clearml:latest
restart: unless-stopped
volumes:
- /opt/clearml/logs:/var/log/clearml
@@ -19,17 +19,18 @@ services:
environment:
CLEARML_ELASTIC_SERVICE_HOST: elastics...
I've tried setting the output_uri
on Task.init, but that seems to only affect model checkpoints and artifacts
Do you mean to the Web UI?
Yes that's what I meant, sorry I'm still coming to terms with ClearML terminology π . Is it possible to store the web app cloud access token serverside so we don't have to input it in the Web UI? π
Yes, I tried updating recently, it costed me a full days work of rolling back versions until I found something that worked π
Thanks for responding @<1523701087100473344:profile|SuccessfulKoala55> . Good question! One solution could be to create a new open-source project with lightning + clearml integrations and link it to the Lightning ecosystem-ci ; I believe most people use the basic tensorboard-logger with ClearML, but the extended usecase of a ClearML model checkpoint callback might make it valuable.
I guess one would have to disable auto-logging of p...
I don't have issues with setting the hyperparameters - I just would like to link changes to one hyperparameter (eg. encoder.layers
) to another parameter (e.g. http://decoder.in _layers
) when optimizing over encoder.layer
Hi Martin,
It doesn't seem to work with dev.azure though:
Using user/pass credentials - replacing ssh url 'git@ssh.dev.azure.com:v3/ORG/TEAM/PROJECT' with https url '
'
fatal: repository '
' not found
The expected format for the https protocol is None .
Thoughts @<1523701205467926528:profile|AgitatedDove14> ?
None for visibility
Perfect! Thanks SuccessfulKoala55 , that would be an acceptable workaround until setup_upload also supports Azure π π
Which version of the server are you running?
Any tips on how to check if we are storing data on deleted tasks? Maybe @<1722061389024989184:profile|ResponsiveKoala38> knows? Is there a field on each scalar that I can cross check with ClearML?
@<1523701070390366208:profile|CostlyOstrich36> any thoughts? Are the model files themselves easier to serve?
How does it look in the Web UI?
I just had a look, and they are visible under debug samples, but not under plots, as I had expected.
I thought that by using report_matplotlib_figure
it would get grouped under plots? π
@<1590514584836378624:profile|AmiableSeaturtle81> thatβs the service we are using :-)
How much RAM have you assigned to your elastic service?
Hi CostlyOstrich36
I have created a base task on which I'm optimizing hyperparameters. With clearml-param-search
I could use --params-override
to set a static parameter, which should not be optimized, e.g. changing the number of epochs for all experiments. It seems to me that this capability is not present in HyperParameterOptimizer
. Does that make sense?
From the example on https://clear.ml/docs/latest/docs/apps/clearml_param_search/ :
` clearml-param-search {...} --p...
Hi @<1523701070390366208:profile|CostlyOstrich36> , the task is being aborted via the web UI - I have another method that catches local interrupts (exceptions like keyboard interrupts and crashes). The case is equal for running tasks via agents or just local cli
Hi @<1523701087100473344:profile|SuccessfulKoala55> , thanks for responding. I've found out that my first error came from cloning a super old version of the clean up task in the web UI π
I don't know about the other error, to me it looks like the task gets deleted before handling errors, but since an error occurred (some 404 stuff, maybe the files actually aren't there) when deleting some artifacts on the task, clearml tries to reload the task and fails, with the 400/201 or 400/101. ...
I just tried and the result is the same. The other method only triggers on exceptions
Specifically, this is what I get in the console log when the agent spins up a task:
Poetry Enabled: Ignoring requested python packages, using repository poetry lock file!
Creating virtualenv latent-features in /data/clearml/venvs-builds/3.9/task_repository/our-repo/.venv
Installing dependencies from lock file
CostlyOstrich36 any thought on how we can further debug this? It's making ClearML practically useless for us