Reputation
Badges 1
662 × Eureka!I just ran into this too recently. Are you passing these also in the extra_clearml_conf for the autoscaler?
FWIW It’s also listed in other places @<1523704157695905792:profile|VivaciousBadger56> , e.g. None says:
In order to make sure we also automatically upload the model snapshot (instead of saving its local path), we need to pass a storage location for the model files to be uploaded to.
For example, upload all snapshots to an S3 bucket…
Not sure I understand your comment - why not let the user start with an empty comparison page and add them from "Add Experiment" button as well?
Yes exactly, but I guess I could've googled for that 😅
Copy the uncommitted changes captured by ClearML using the UI, write to changes.patch , run git apply changes.patch 👍
I'm trying, let's see; our infra person is away on holidays :X Thanks! Uh, which configuration exactly would you like to see? We're running using the helm charts on K8s, so I don't think I have direct access to the agent configuration/update it separately?
From the log you shared, the task is picked up by the
worker_d1bd92a3b039400cbafc60a7a5b1e52b_4e831c4cbaf64e02925b918e9a3a1cf6_<hostname>:gpu0,1
worker
I can try and target the default one if it helps..?
Ah I see, if the pipeline controller begins in a Task it does not add the tags to it…
I’ll give the create_function_task one more try 🤔
Oh nono, more like:
- Create a pipeline
- Add N steps to it
- Run the pipeline
- It fails/succeeds, the user does something with the output
- The user would like to add/modify some steps based on the results now (after closer inspection).I wonder if at (5), do I have to recreate the pipeline every time? 🤔
PricklyRaven28 That would be my fallback, it would make development much slower (having to build containers with every small change)
The results from searching in the "Add Experiment" view (can't resize column widths -> can't see project name ...)
Any updates @<1523701087100473344:profile|SuccessfulKoala55> ? 🙂
Hm, I'm not sure I follow 🤔 How does the API server config relate to the file server?
And this is of course strictly with the update to 1.6.3 (or newer) that should support API 2.20
From our IT dept:
Not really, when you launch the instance, the launch has to already be in the right VPC/Subnet. Configuration tools are irrelevant here.
I believe it is maybe a race condition that's tangent to clearml now...
No worries @<1537605940121964544:profile|EnthusiasticShrimp49> ! I made some headway by using Task.create , writing a temporary Python script, and using task.update in a similar way to how pipeline steps are created.
I'll try and create an MVC to reproduce the issue, though I may have strayed from your original suggestion because I need to be able to use classes and not just functions.
Uhhh, but pyproject.toml does not necessarily entail poetry... It's a new Python standard
Now, the original pyhocon does support include statements as you mentioned - https://github.com/chimpler/pyhocon
Oh! Nice! I'll have a go at it and report back at the PR if it's in a functional state 🙂 Thanks AgitatedDove14 !
So the ..data referenced in the example above are part of the git repository?
What about setting the working_directory to the user working directory using Task.init or Task.create ?
I mean, I know I could connect_configuration({k: os.environ.get(k) for k in [...]}) , but then those environment variables would be exposed in the ClearML UI, which is not ideal (the environment variables in question hold usernames and passwords, required for DB access)
Any simple ways around this for now? @<1523701070390366208:profile|CostlyOstrich36>
One last MinIO-related question (sorry for the long thread!)
While I do have the access and secret defined in clearml.conf, and even in the WebUI, I still get similar warnings as David does here - https://clearml.slack.com/archives/CTK20V944/p1640135359125200
If I add the bucket to that (so CLEARML_FILES_HOST= s3://minio_ip:9000/minio/bucket ), I then get the following error instead --
2021-12-21 22:14:55,518 - clearml.storage - ERROR - Failed uploading: SSL validation failed for ... [SSL: WRONG_VERSION_NUMBER] wrong version number (_ssl.c:1076)
SweetBadger76 TimelyPenguin76
We're finally tackling this (since it has kept us back at 1.3.2 even though 1.6.2 is out...), and noticed that now the bucket name is also part of the folder?
So following up from David's latest example:StorageManager.download_folder(remote_url='s3://****-bucket/david/', local_folder='./')Actually creates a new folder ./****-bucket/david/ and puts it contents there.
EDIT: This is with us using internal MinIO, so I believe ClearML parses that end...
I'm saying it's a bug