Reputation
Badges 1
25 × Eureka!can we somehow in clearml-session choose the pool of ports for work?
Yes, I think you can.
How do you spin the worker nodes? Is it Kubernetes ?
BoredHedgehog47 you need to configure the clearml k8s glue to spin pods (instead of allocating agents per pods statically) does that make sense ?
EnviousPanda91 'connect' will log the object properties, the automagic logging is controlled in the Task.init call. Specifically Which framework produces metrics that are not logged? Your sample code manually reports some scalars/values, do you these as well?
but can it NOT use /tmp for this iām merging about 100GB
You mean to configure your Temp folder for when squashing ?
you can do hack the following:
` import tempfile
tempfile.tempdir = "/my/new/temp"
Dataset squash
tempfile.tempdir = None `But regradless I think this is worth a GitHub issue with feature request, to set the temp folder///
EnviousPanda91 so which frame works are being missed? Is it a request to support new framework or are you saying there is a bug somewhere?
BitingKangaroo95 nice work š
I think that what did it was:
change the sshd_config
so that it allows port forwarding
, agent forwarding
and x11 forwarding
But just in case, it might be there was a pre existing SSH identifier on your machine, and hence the error.
clear known_hosts under ~/.ssh was also something I would try š
Hi AbruptHedgehog21
can you send the two models info page (i.e. the original and the updated one) ?
do you see the two endpoints ?
BTW: --version would add a version to the model (i.e. create a new endpoint with version "endpoint/{version}"
ValueError: Missing key and secret for S3 storage access
Yes that makes sense, I think we should make sure we do not suppress this warning it is too important.
Bottom line missing configuration section in your clearml.conf
Woot woot!
awesome, this RC is stable you can feel free to use it, the official release is probably due to be out next week :)
GrittyStarfish67
I do not wish for data duplication. Any Idea how to do this with clearml-data CLI/GUI/python?
At least in theory creating a new version with parents from multiple Datasets should just work out of the box.
wdyt?
Thank you for saying ! š
MotionlessCoral18 I think there is a fix in the latest clearml-agent RC 1.4.0rc0 can you test and update if your are still having this issue?
Is it possible to do something so that the change of the server address is supported and the pictures are pulled up on the new server from the new server?
The link itself (full link) is stored inside the server. Can I assume the access is IP based not host based (i.e. dns) ?
I think RoughTiger69 was discussing this exact scenario
https://clearml.slack.com/archives/CTK20V944/p1629885416175500?thread_ts=1629881415.172600&cid=CTK20V944
wdyt?
Hi GrievingTurkey78
Turning of pytorch auto-logging:Task.init(..., auto_connect_frameworks={'pytorch': False})
To manually log a model:from clearml import OutputModel OutputModel().update_weights('my_best_model.pt')
Maybe failed pipelines with zero steps count as completed
zero steps counts as successful.
That said, how could it have zero steps if one of the steps failed? no?
my question is how to recover, must i recreate the agents or there is another way?
Yes you have to recreate the Task (I assume they failed, no?!)
Ohh... I would not delete them then ... š
Maybe kind of heuristics (files created a week ago can be deleted?!)
SparklingElephant70 , let me make sure I understand, the idea is to make sure the pipeline will launch a specific commit/branch, and that you can control it? Also are you using the pipeline add_step
function or are you decorating a function with PipelineDecorator ?
However, when 'extra' is a positional argument then it is transformed to 'str'
Hmm... okay let me check something
Hmm are you getting the warning on the client side , or in the clearml-server ?
Hi JuicyFox94 ,
Actually we just added that š (still on GitHub , RC soon)
https://github.com/allegroai/clearml/blob/400c6ec103d9f2193694c54d7491bb1a74bbe8e8/clearml/automation/controller.py#L696
Hi EagerOtter28
I think the replacement should happen here:
https://github.com/allegroai/clearml-agent/blob/42606d9247afbbd510dc93eeee966ddf34bb0312/clearml_agent/helper/repo.py#L277
You mean the job with the exact same arguments ?
do you have other arguments you are passing ?
Are you using Optuna / HBOB ?
Internally we use blob.upload_from_file
it has a default 60sec timeout on the connection (I'm assuming the upload could take longer).
Ohh sorry you will also need to fix the def _patched_task_function
The parameter order is important as the partial call relies on it.
Hi CooperativeFox72
Sure štask.set_resource_monitor_iteration_timeout(seconds_from_start=1800)
Ohh I see now the force SSH did not replace the user in the SSH link (only if the original was http), right ?