Reputation
Badges 1
662 × Eureka!AgitatedDove14
hmmm... they are important, but only when starting the process. any specific suggestion ?
(and they are deleted after the Task is done, so they are temp)
Ah, then no, sounds temporary. If they're only relevant when starting the process though, I would suggest deleting them immediately when they're no longer needed, and not wait for the end of the task (if possible, of course)
Yeah, and just thinking out loud what I like about the numpy/pandas documentation
TimelyPenguin76 that would have been nice but I'd like to upload files as artifacts (rather than parameters).
AgitatedDove14 I mean like a grouping in the artifact. If I add e.g. foo/bar
to my artifact name, it will be uploaded as foo/bar
.
yes, a lot of moving pieces here as we're trying to migrate to AWS and set up autoscaler and more ๐
No, I have no running agents listening to that queue. It's as if it's retained in some memory somewhere and the server keeps creating it.
Hmmm, what ๐
CostlyOstrich36 so internal references are not resolved somehow? Or, how should one achieve:
def my_step(): from ..utils import foo foo("bar")
Hm. Is there a simple way to test tasks, one at a time?
Ah. Apparently getting a task ID while itโs running can cause this behaviour ๐ค
The network is configured correctly ๐ But the newly spun up instances need to be set to the same VPC/Subnet somehow
I realized it might work too, but looking for a more definitive answer ๐ Has no-one attempted this? ๐ค
AgitatedDove14 Unfortunately not, the queues tab shows only the number of tasks, but not resources used in the queue . I can toggle between the different workers but then I don't get the full image.
That's probably in the newer ClearML server pages then, I'll have to wait still ๐
Can I query where the worker is running (IP)?
Yeah ๐ค ๐ค they did. I'll give your suggested fix a go on Monday!
AFAIU, something like this happens (oversimplified):
` from clearml import Task # <--- Crash already happens here
import argparse
import dotenv
if name == "main":
# set up argparse with optional flag for a dotenv file
dotenv.load_dotenv(args.env_file)
# more stuff `
They are set with a .env
file - it's a common practice. The .env
file is, at the moment, uploaded to a temporary cache (if you remember the discussion regarding the StorageManager
), so it's also available remotely (related to issue #395)
But there's nothing of that sort happening. The process where it's failing is on getting tasks for a project.
Still; anyone? ๐ฅน @<1523701070390366208:profile|CostlyOstrich36> @<1523701205467926528:profile|AgitatedDove14>
Well the individual tasks do not seem to have the expected environment.
We can change the project nameโs of course, if thereโs a suggestion/guide that will make them see past the namespaceโฆ
It is installed on the pipeline creating the machine.
I have no idea why it did not automatically detect it ๐
Yes. Though again, just highlighting the naming of foo-mod
is arbitrary. The actual module simply has a folder structured with an implicit namespace:
foo/
mod/
__init__.py
# stuff
FWIW, for the time being Iโm just setting the packages to all the packages the pipeline tasks sees with:
packages = get_installed_pkgs_detail()
packages = [f"{name}=={version}" if version else name for name, version in packages.values()]
packages = task.data.script.require...
There's no decorator, just e.g.
def helper(foo: Optional[Any] = None):
return foo
def step_one(...):
# stuff
Then the type hints are not removed from helper and the code immediately crashes when being run
There's code that strips the type hints from the component function, just think it should be applied to the helper functions too :)
And actually it fails on quite many tasks for us with this Python 3.6.
I tried to set up a different image ( agent8sglue.defaultContainerImage: "ubuntu:20.04"
) but that did not change much.
I suspect the culprit is agentk8sglue.image
, which is set to tag 1.24-21
of clearml-agent-k8s-base
. That image is quite very oldโฆ Any updates on that? ๐ค