Bump SuccessfulKoala55 ?
-ish, still debugging some weird stuff. Sometimes ClearML picks ip
and sometimes ip2
, and I can't tell why 🤔
That's what I thought @<1523701087100473344:profile|SuccessfulKoala55> , but the server URL is correct (and WebUI is functional and responsive).
In part of our code, we look for projects with a given name, and pull all tasks in that project. That's the crash point, and it seems to be related to having running tasks in that project.
AgitatedDove14
hmmm... they are important, but only when starting the process. any specific suggestion ?
(and they are deleted after the Task is done, so they are temp)
Ah, then no, sounds temporary. If they're only relevant when starting the process though, I would suggest deleting them immediately when they're no longer needed, and not wait for the end of the task (if possible, of course)
Yeah, and just thinking out loud what I like about the numpy/pandas documentation
TimelyPenguin76 that would have been nice but I'd like to upload files as artifacts (rather than parameters).
AgitatedDove14 I mean like a grouping in the artifact. If I add e.g. foo/bar
to my artifact name, it will be uploaded as foo/bar
.
yes, a lot of moving pieces here as we're trying to migrate to AWS and set up autoscaler and more 😅
No, I have no running agents listening to that queue. It's as if it's retained in some memory somewhere and the server keeps creating it.
Hmmm, what 😄
CostlyOstrich36 so internal references are not resolved somehow? Or, how should one achieve:
def my_step(): from ..utils import foo foo("bar")
Hm. Is there a simple way to test tasks, one at a time?
Ah. Apparently getting a task ID while it’s running can cause this behaviour 🤔
I realized it might work too, but looking for a more definitive answer 😄 Has no-one attempted this? 🤔
AgitatedDove14 Unfortunately not, the queues tab shows only the number of tasks, but not resources used in the queue . I can toggle between the different workers but then I don't get the full image.
That's probably in the newer ClearML server pages then, I'll have to wait still 😅
Can I query where the worker is running (IP)?
But there's nothing of that sort happening. The process where it's failing is on getting tasks for a project.
Still; anyone? 🥹 @<1523701070390366208:profile|CostlyOstrich36> @<1523701205467926528:profile|AgitatedDove14>
Well the individual tasks do not seem to have the expected environment.
We can change the project name’s of course, if there’s a suggestion/guide that will make them see past the namespace…
It is installed on the pipeline creating the machine.
I have no idea why it did not automatically detect it 😞
Yes. Though again, just highlighting the naming of foo-mod
is arbitrary. The actual module simply has a folder structured with an implicit namespace:
foo/
mod/
__init__.py
# stuff
FWIW, for the time being I’m just setting the packages to all the packages the pipeline tasks sees with:
packages = get_installed_pkgs_detail()
packages = [f"{name}=={version}" if version else name for name, version in packages.values()]
packages = task.data.script.require...
There's no decorator, just e.g.
def helper(foo: Optional[Any] = None):
return foo
def step_one(...):
# stuff
Then the type hints are not removed from helper and the code immediately crashes when being run
There's code that strips the type hints from the component function, just think it should be applied to the helper functions too :)
We just inherit from logging.Handler
and use that in our logging.config.dictConfig
; weird thing is that it still logs most of the tasks, just not the last one?