Reputation
Badges 1
662 × Eureka!I cannot, the instance is long gone... But it's not different to any other scaled instances, it seems it just took a while to register in ClearML
Note that it would succeed if e.g. run with pytest -s
We have a more complicated case but I'll work around it π
Follow up though - can configuration objects refer to one-another internally in ClearML?
I'll have a look, at least it seems to only use from clearml import Task
, so unless mlflow changed their SDK, it might still work!
Not that I recall
It's given as the second form you suggested in the mini config ( http://${...}:8080
). The quotation marks are added later by pyhocon.
Ah it already exists https://github.com/allegroai/clearml-server/issues/134 , so I commented on it
This was a long time running since I could not access the macbook in question to debug this.
It is now resolved and indeed a user error - they had implicitly defined CLEARML_CONFIG_FILE
to e.g. /home/username/clearml.conf
instead of /Users/username/clearml.conf
as is expected on Mac.
I guess the error message could be made clearer in this case (i.e. CLEARML_CONFIG_FILE='/home/username/clearml.conf' file does not exist
). Thanks for the support! β€
We have a mini default config (if you remember from a previous discussion we had) that actually uses the second form you suggested.
I wrote a small "fixup" script that combines this default with the one generated by clearml-init
, and it simply does:def_config = ConfigFactory.parse_file(DEF_CLEARML_CONF, resolve=False) new_config = ConfigFactory.parse_file(new_config_file, resolve=False) updated_new_config = ConfigTree.merge_configs(new_config, def_config)
β¦ And itβs failing on typing hints for functions passed in pipe.add_function_step(β¦, helper_function=[β¦])
β¦ I guess those arenβt being removed like the wrapped function step?
Bump SuccessfulKoala55 ?
-ish, still debugging some weird stuff. Sometimes ClearML picks ip
and sometimes ip2
, and I can't tell why π€
That's what I thought @<1523701087100473344:profile|SuccessfulKoala55> , but the server URL is correct (and WebUI is functional and responsive).
In part of our code, we look for projects with a given name, and pull all tasks in that project. That's the crash point, and it seems to be related to having running tasks in that project.
AgitatedDove14 for future reference this is indeed a PEP-610 related bug, fixed in https://python-poetry.org/blog/announcing-poetry-1.2.0a1/ . I see we can choose the pip
version in the config, can we also set the poetry
version used? Or is it updated from the lock file itself, or...?
AgitatedDove14
hmmm... they are important, but only when starting the process. any specific suggestion ?
(and they are deleted after the Task is done, so they are temp)
Ah, then no, sounds temporary. If they're only relevant when starting the process though, I would suggest deleting them immediately when they're no longer needed, and not wait for the end of the task (if possible, of course)
Yeah, and just thinking out loud what I like about the numpy/pandas documentation
TimelyPenguin76 that would have been nice but I'd like to upload files as artifacts (rather than parameters).
AgitatedDove14 I mean like a grouping in the artifact. If I add e.g. foo/bar
to my artifact name, it will be uploaded as foo/bar
.
yes, a lot of moving pieces here as we're trying to migrate to AWS and set up autoscaler and more π
No, I have no running agents listening to that queue. It's as if it's retained in some memory somewhere and the server keeps creating it.
Hmmm, what π
CostlyOstrich36 so internal references are not resolved somehow? Or, how should one achieve:
def my_step(): from ..utils import foo foo("bar")
Hm. Is there a simple way to test tasks, one at a time?
Ah. Apparently getting a task ID while itβs running can cause this behaviour π€
The network is configured correctly π But the newly spun up instances need to be set to the same VPC/Subnet somehow
Or some users that update their poetry.lock
and some that update manually as they prefer to resolve on their own.
I realized it might work too, but looking for a more definitive answer π Has no-one attempted this? π€
i.e.ERROR Fetching experiments failed. Reason: Backend timeout (600s)
ERROR Fetching experiments failed. Reason: Invalid project ID
AgitatedDove14 Unfortunately not, the queues tab shows only the number of tasks, but not resources used in the queue . I can toggle between the different workers but then I don't get the full image.