
Reputation
Badges 1
25 × Eureka!Sure SharpDove45 ,from clearml import Model model = Model('model_id_aabbcc') model.system_tags += ['archived']
Hi @<1523701260895653888:profile|QuaintJellyfish58>
You mean some "daemon service" aborting Tasks that do not end after X hours? or is it based on CPU/GPU utilization?
BTW:
Just making sure, 74 was not supposed to be the last checkpoint (in other words it is not stuck on leaving the training process, but actually in the middle)
JitteryCoyote63 try to add the prefix to the parameter name, e.g. instead of "artifact_name" use "Args/artifact_name"
The reasoning is that most likely simultaneous processes will fail on GPU due to memory limit
i have it deployed successfully with istio.
Nice!
the only thing we had to do to get it to work was to modify the nginx.conf in the webserver pod to allow http 1.1
I was under the impression we fixed that, let me check
at the end it's just another env var
It should work GIT_SSH_COMMAND
is used by pip
Train Data Params/a = {} Train Data Params/b = ...
Then maybe we could "hack" it so that if you edit it in the UI like so:Train Data Params/a = {'new': 'value'} Train Data Params/b = ...
You end up withparam = {'a': {'new': 'value'}, 'b' : ... }
What do you think?
task._wait_for_repo_detection()
You can use the above, to wait until repository & packages are detected
(If this is something users need, we should probably make it a "public function" )
BulkyTiger31 could it be there is some issue with the elastic container ?
Can you see any experiment's metrics ?
Can you please tell me if it is possible to set up slack monitoring in clearml?
It is 🙂
This one?
https://clear.ml/docs/latest/docs/guides/services/slack_alerts
That's with the key at
/root/.ssh/id_rsa
You mean inside the container that the autoscaler spinned ?
Notice that the agent by defult would mount the Host .ssh over the existing .ssh inside the container, if you do not want this behavior you need to set: agent.disable_ssh_mount: true
in clearml.conf
now realise that the ignite events callbacks seem to not be fired
So this is an ignite issue ?
oh, if this is the case, why not use the "main" server?
Hi @<1598487094601191424:profile|MysteriousCow84>
You should put it in the dedicated section:
None
TrickyRaccoon92 the title
provided by write.scalars is also a representing string for the specific metric. This is more than just a title on the plot itself.
It means that this will be the name of the scalar metric (title/series combination) .
Is that your intention, or is it for viewing purpose only?
What happens when you call:
from clearml.backend_interface.task.repo import ScriptInfo
print(ScriptInfo._ScriptInfo__legacy_jupyter_notebook_server_json_parsing(None))
What do you have in "server_info['url']" ?
Hmm what do you have here?
os.system("cat /var/log/studio/kernel_gateway.log")
So this is optuna 🙂 the idea is it will test which parameters have potential (with early stopping), then launch a subset of the selected parameters
I am running from noebook and cell has returned
Well the Task will close when you shut down the notebook 🙂
Hi ContemplativePuppy11
This is really interesting point.
Maybe you can provide a pseudo class abstract of your current pipeline design, this will help in trying to understand what you are trying to achieve and how to make it easier to get there