Reputation
Badges 1
662 × Eureka!Is it currently broken? ๐ค
I cannot, the instance is long gone... But it's not different to any other scaled instances, it seems it just took a while to register in ClearML
Note that it would succeed if e.g. run with pytest -s
We have a more complicated case but I'll work around it ๐
Follow up though - can configuration objects refer to one-another internally in ClearML?
I'll have a look, at least it seems to only use from clearml import Task
, so unless mlflow changed their SDK, it might still work!
Bump SuccessfulKoala55 ?
-ish, still debugging some weird stuff. Sometimes ClearML picks ip
and sometimes ip2
, and I can't tell why ๐ค
That's what I thought @<1523701087100473344:profile|SuccessfulKoala55> , but the server URL is correct (and WebUI is functional and responsive).
In part of our code, we look for projects with a given name, and pull all tasks in that project. That's the crash point, and it seems to be related to having running tasks in that project.
AgitatedDove14
hmmm... they are important, but only when starting the process. any specific suggestion ?
(and they are deleted after the Task is done, so they are temp)
Ah, then no, sounds temporary. If they're only relevant when starting the process though, I would suggest deleting them immediately when they're no longer needed, and not wait for the end of the task (if possible, of course)
Yeah, and just thinking out loud what I like about the numpy/pandas documentation
TimelyPenguin76 that would have been nice but I'd like to upload files as artifacts (rather than parameters).
AgitatedDove14 I mean like a grouping in the artifact. If I add e.g. foo/bar
to my artifact name, it will be uploaded as foo/bar
.
yes, a lot of moving pieces here as we're trying to migrate to AWS and set up autoscaler and more ๐
No, I have no running agents listening to that queue. It's as if it's retained in some memory somewhere and the server keeps creating it.
Hmmm, what ๐
CostlyOstrich36 so internal references are not resolved somehow? Or, how should one achieve:
def my_step(): from ..utils import foo foo("bar")
Hm. Is there a simple way to test tasks, one at a time?
Ah. Apparently getting a task ID while itโs running can cause this behaviour ๐ค
I realized it might work too, but looking for a more definitive answer ๐ Has no-one attempted this? ๐ค
AgitatedDove14 Unfortunately not, the queues tab shows only the number of tasks, but not resources used in the queue . I can toggle between the different workers but then I don't get the full image.
That's probably in the newer ClearML server pages then, I'll have to wait still ๐
Can I query where the worker is running (IP)?
But there's nothing of that sort happening. The process where it's failing is on getting tasks for a project.
Still; anyone? ๐ฅน @<1523701070390366208:profile|CostlyOstrich36> @<1523701205467926528:profile|AgitatedDove14>
Well the individual tasks do not seem to have the expected environment.
We can change the project nameโs of course, if thereโs a suggestion/guide that will make them see past the namespaceโฆ
It is installed on the pipeline creating the machine.
I have no idea why it did not automatically detect it ๐