I am using clearml_agent v1.0.0 and clearml 0.17.5 btw
AgitatedDove14 I see https://github.com/allegroai/clearml-session/blob/main/clearml_session/interactive_session_task.py#L21= that a key pair is hardcoded in the repo. Is it being used to ssh to the instance?
with open(path, "r") as stream: return yaml.load(stream, Loader=yaml.FullLoader)
AgitatedDove14 I made some progress:
In clearml.conf of the agent, I set: sdk.development.report_use_subprocess = false (because I had the feeling that Task._report_subprocess_enabled = False wasn’t taken into account) I’ve set task.set_initial_iteration(0) Now I was able to get the followin graph after resuming -
This https://discuss.elastic.co/t/index-size-explodes-after-split/150692 seems to say for the _split API such situation happens and solves itself after a couple fo days, maybe the same case for me?
SuccessfulKoala55 , This is not the exact corresponding request (I refreshed the tab since then), but the request is an events.get_task_logs , with the following content:
I also don't understand what you mean by unless the domain is different... The same way ssh keys are global, I would have expected the git creds to be used for any git operation
AgitatedDove14 ok, but this happens in my local machine, not in the agent
So previous_task actually ignored the output_uri
Note: Could be related to https://github.com/allegroai/clearml/issues/790 , not sure
CostlyOstrich36 I don’t see such number, can you please share a screenshot of where to look at?
Thanks, the message is not logged in GCloud instances logs when using startup scripts, this is why I did not see it. 👍
but according to the disks graphs, the OS disk is being used, but not the data disk
I think we should switch back, and have a configuration to control which mechanism the agent uses , wdyt? (edited)
That sounds great!
I get the following error:
AnxiousSeal95 The main reason for me to not use clearml-serving triton is the lack of documentation tbh 😄 I am not sure how to make my pytorch model run there
I am not sure I can do both operations at the same time (migration + splitting), do you think it’s better to do splitting first or migration first?
So that I don’t loose what I worked on when stopping the session, and if I need to, I can ssh to the machine and directly access the content inside the user folder
automatically promote models to be served from within clearml
Yes!
AgitatedDove14 That's a good point: The experiment failing with this error does show the correct aws key:... sdk.aws.s3.key = ***** sdk.aws.s3.region = ...
even if I explicitely use previous_task.output_uri = " s3://my_bucket " , it is ignored and still saves the json file locally
Add carriage return flush support using the sdk.development.worker.console_cr_flush_period configuration setting (GitHub trains Issue 181)
Ok, so after updating to trains==0.16.2rc0, my problem is different: when I clone a task, update its script and enqueue it, it does not have any Hyper-parameters/argv section in the UI
` ssh my-instance
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@ WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED! @
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
IT IS POSSIBLE THAT SOMEONE IS DOING SOMETHING NASTY!
Someone could be eavesdropping on you right now (man-in-the-middle attack)!
It is also possible that a host key has just been changed.
The fingerprint for the ED25519 key sent by the remote host is
SHA256:O2++ST5lAGVoredT1hqlAyTowgNwlnNRJrwE8cbM...
in my clearml.conf, I only have:sdk.aws.s3.region = eu-central-1 sdk.aws.s3.use_credentials_chain = true agent.package_manager.pip_version = "==20.2.3"