Reputation
Badges 1
662 × Eureka!I'm guessing that's not on pypi yet?
That's fine for the current use-case I believe.
Once the team is happy with the logging functionality, we'll move on to remote execution and things will update.
That's what I found as well, but it did not like it after all (boto is fine with it, but underlying urllib
and requests
were not?)
It's fine -- I see the added benefit in making sure the users set up their clearml.conf
and I've made a script to edit it to our needs as part of the installation process 🙂 Thanks Martin!
Always great to find a bug! I'll make relevant SDK updates then.
That would be nice :)
Sorry to keep this up - what about support for minio using the environment variable? Do I set the CLEARML_FILES_HOST
to the end point instead of an s3 bucket?
One last MinIO-related question (sorry for the long thread!)
While I do have the access and secret defined in clearml.conf, and even in the WebUI, I still get similar warnings as David does here - https://clearml.slack.com/archives/CTK20V944/p1640135359125200
If I add the bucket to that (so CLEARML_FILES_HOST=
s3://minio_ip:9000/minio/bucket ), I then get the following error instead --
2021-12-21 22:14:55,518 - clearml.storage - ERROR - Failed uploading: SSL validation failed for
... [SSL: WRONG_VERSION_NUMBER] wrong version number (_ssl.c:1076)
Ah! Makes sense. Thanks!
Yes, exactly! I've added instructions for the users on creating their account and running clearml-init
, and then they run the snippet that updates the api and sdk sections.
Or did you mean I can couple a short "mini config" with the package and redirect clearml to use this local one (instead of the one at ~/clearml.conf)?
Is it CLEARML_CONFIG_FILE
? (I had to dig this from the GH code 😅 )
AgitatedDove14 another option I thought would be nice is to actually self-sign the internal MinIO bucket, but then I get[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self signed certificate (_ssl.c:1076)
Are you aware of any other way then (other than the secure: false
flag?
The odd thing is that it was already defined, and then when I clicked an S3 link, it asked me to fill it in again, adding a duplicate credentials row
An internal project I've accidentally made with a hidden tag while playing around with the ClearML internal code.
Thanks! To clarify, all the agent does is then spawn new nodes to cover the tasks?
Some examples of the mess it creates (also posted in the main channel):
A single project now has multiple subprojects The subprojects have the .datasets
hidden subproject (with really frustrating project names) The subprojects are empty To access the original project, I have to go twice into the same project because of these hidden projects Because of these hidden subprojects, I cannot delete a project that has 0 experiments
Nothing I can spot --
ClearML results page:
ClearML pipeline page:
Launching the next 2 steps
Launching step [...]
Launching step [...]
Launching step: ...
Parameters:
{...}
Configurations:
{}
Overrides:
{}
Launching step: ...
Parameters:
{...}
Configurations:
{}
Overrides:
{}
ClearML Monitor: GPU monitoring failed getting GPU reading, switching off GPU monitoring
2023-02-21 13:53:48
ClearML Monitor: Could not detect iteration reporting, falling back to itera...
Oh! Nice! I'll have a go at it and report back at the PR if it's in a functional state 🙂 Thanks AgitatedDove14 !
I’ll also post this on the main channel -->
Eek. Is there a way to merge a backup from elastic to current running server?
But... Which queue does it listen to, and which type of instances will it use etc
ClearML 1.1.4, Matplotlib 3.3.0 (it's not the latest as we have some backward compatibility issues)
Sorry AgitatedDove14 , forgot to get back to this.
I've been trying to convince my team to drop poetry 😄
This was a long time running since I could not access the macbook in question to debug this.
It is now resolved and indeed a user error - they had implicitly defined CLEARML_CONFIG_FILE
to e.g. /home/username/clearml.conf
instead of /Users/username/clearml.conf
as is expected on Mac.
I guess the error message could be made clearer in this case (i.e. CLEARML_CONFIG_FILE='/home/username/clearml.conf' file does not exist
). Thanks for the support! ❤
Internally yes, but in Task.init
the default argument is a boolean, not an int.
We don't want to close the task, but we have a remote task that spawns more tasks. With this change, subsequent calls to Task.init
fail because it goes in the deferred init clause and fails on validate_defaults
.
Maybe. When the container spins, are there any identifiers regarding the task etc available? I create a folder on the bucket per python train.py
so that the environment variables files doesn't get overwritten if two users execute almost-simultaneously
Feels like we've been over this 😄 Has there been new developments perhaps?
It's essentially that this - https://clear.ml/docs/latest/docs/guides/advanced/multiple_tasks_single_process cannot work in a remote execution.
Thanks SuccessfulKoala55 ! Is this listed anywhere in the documentation?
Could I set an environment variable there and then refer to it internally in the config with the ${...}
notation?
I see https://github.com/allegroai/clearml-agent/blob/d2f3614ab06be763ca145bd6e4ba50d4799a1bb2/clearml_agent/backend_config/utils.py#L23 but not where it's called 🤔