Reputation
Badges 1
662 × Eureka!In which repo?:)
IIRC, get_local_copy()
downloads a local copy and returns the path to the downloaded file. So you might be interested in e.g.local_csv = pd.read_csv(a_task.artifacts['train_data'].get_local_copy())
With the models, you're looking for get_weights()
. It acts the same as get_local_copy()
, so it returns a path.
EDIT: I think also get_local_copy()
for a model should work 👍
Is it currently broken? 🤔
@<1523701205467926528:profile|AgitatedDove14> this
(the extra_vm_bash_script
is what you're after)
I mean, if I search for "model", will it automatically search for tasks containing "model" in their name?
@<1539780258050347008:profile|CheerfulKoala77> you may also need to define subnet or security groups.
Personally I do not see the point in Docker over EC2 instances for CPU instances (virtualization on top of virtualization).
Finally, just to make sure, you only ever need one autoscaler. You can monitor multiple queues with multiple instance types with one autoscaler.
We just redeployed to use the 1.1.4 version as Jake suggested, so the logs are gone 😞
Yeah 🤔 🤔 they did. I'll give your suggested fix a go on Monday!
After setting the sdk.development.default_output_uri
in the configs, my code kinda looks like:
` task = Task.init(project_name=..., task_name=..., tags=...)
logger = task.get_logger()
report with logger freely `
Thanks! That's what I thought, but then I get2021-12-21 22:08:35,376 - clearml.storage - ERROR - Failed uploading: Parameter validation failed: Invalid bucket name "": Bucket name must match the regex "^[a-zA-Z0-9.\-_]{1,255}$" or be an ARN matching the regex "^arn:(aws).*:(s3|s3-object-lambda):[a-z\-0-9]*:[0-9]{12}:accesspoint[/:][a-zA-Z0-9\-.]{1,63}$|^arn:(aws).*:s3-outposts:[a-z\-0-9]+:[0-9]{12}:outpost[/:][a-zA-Z0-9\-]{1,63}[/:]accesspoint[/:][a-zA-Z0-9\-]{1,63}$"
Follow up on this btw, from the WebUI/Server POV, I see there's an "Admin" role, etc. Do those have additional views available, such as users etc?
I can see the task in the UI, it is not archived, and that's pretty much the snippet, but in full I do e.g.
Oh! Nice! I'll have a go at it and report back at the PR if it's in a functional state 🙂 Thanks AgitatedDove14 !
I'll have a look, at least it seems to only use from clearml import Task
, so unless mlflow changed their SDK, it might still work!
The deferred_init
input argument to Task.init
is bool
by default, so checking type(deferred_init) == int
makes no sense to begin with, and is altering the flow.
I'm saying it's a bug
For now we've monkey-patched it to our usecase:
` Dataset._Dataset__hidden_tag = "active"
def foo(cls, dataset_project, dataset_name):
dataset_project = dataset_project or "Datasets"
return dataset_project, dataset_project.rpartition("/")[0]
Dataset._build_hidden_project_name = foo `
Not necessarily on the same branch, no
We're still working these quirks out. But one issue after we changed the AMI is that the VPC (SubnetId?) was missing from the instance so it could not reach the ClearML API server.
I think maybe the autoscaler service is missing some additional settings...
I... did not, ashamed to admit. The documentation says only boolean values.
Right, so where can one find documentation about it?
The repo just has the variables with not much explanations.
Maybe. When the container spins, are there any identifiers regarding the task etc available? I create a folder on the bucket per python train.py
so that the environment variables files doesn't get overwritten if two users execute almost-simultaneously
Aw you deleted your response fast CostlyOstrich36 xD
Indeed it does not appear in ps aux
so I cannot simply kill it (or at least, find it).
I was wondering if it's maybe just a zombie in the server API or similar