
Reputation
Badges 1
49 × Eureka!because clearml-agnet is not installed in my gke cluster
@<1523701087100473344:profile|SuccessfulKoala55> yes. It only occurs when running on the cloud. Itâs fine when running on-premises.
Hi again đ @<1523701087100473344:profile|SuccessfulKoala55> sure!
I want to get task id, properties right after submitting clearml-session task
Hope clearml-session will be more developed as clearml-agent. cause it is so useful! đ
My issue: None
Hi @<1523701205467926528:profile|AgitatedDove14>
The server is already self hosted. I realized i canât create a report using clearml sdk. so i think i need to find other ways
alright. thanks đ i hope that too.
itâs been working well until i removed virtualenv and recreated, then i reinstall only clearml and clearml-session
it is working on on-premise machine(i can see gpu usage on WORKERS & QUEUES Dashboard). but it is not working on cloud pod
Oh, it didnât generate conf file properly. I will try again
i fount the solution!! i added configuration to helmâs values.yaml below.
additionalConfigs:
# services.conf: |
# tasks {
# non_responsive_tasks_watchdog {
# # In-progress tasks that havenât been updated for at least âvalueâ seconds will be stopped by the watchdog
# threshold_sec: 21000
# # Watchdog will sleep for this number of seconds after each cycle
# watch_interval_sec: 900
# }
# }
apiserver.co...
i am having same issue: None
I tried using K8S_GLUE_POD_AGENT_INSTALL_ARGS=1.5.3rc2
instead of CLEARML_AGENT_UPDATE_VERSION=1.5.3rc2
, but itâs same. doesnât read gpu usage.. đĽ˛
Iâm also curious if itâs available to bind the same GPU to multiple queues.
here is the agent, task log file~!
It seems that there is no way to add environments, so i customized charts and using it on my own.
heres is the log when executing with --foreground. but is there any difference?
@<1523701205467926528:profile|AgitatedDove14> Good! I will try it
@<1523701087100473344:profile|SuccessfulKoala55> Okay..but how can i specify agentâs verison in helm chart?
Oh, Itâs not the issue with eks.. We had the same issue on an on-premise cluster too(clearml-agent is installed). Could it be because of clearml-agent installed?
nope. just running âclearml-agent daemon --queue shelleyâ
@<1523701205467926528:profile|AgitatedDove14> @<1529271085315395584:profile|AmusedCat74> Hi guys đ
- I think that by default it uses the host network so it can take care of that, are you saying you added k8s integration ?-> Yes, i modified clearml-agent helm chart.
- âSSH allows access with passwordâ it is a very long random password, not sure I see a risk here, wdyt?-> Currently, when enqueueing a task, clearml-session generates a long random password for SSH and VS Code and...
pls also refer to None :)
Thanks! also logs too?
@<1523701087100473344:profile|SuccessfulKoala55> what is task log? you mean the pod log provisioned by clearml-agent? do you want me to show them?
root@shelley-gpu-pod:/# clearml-agent daemon --queue shelley2 --foreground
/usr/local/lib/python3.8/dist-packages/requests/init.py:109: RequestsDependencyWarning: urllib3 (2.0.2) or chardet (None)/charset_normalizer (3.1.0) doesnât match a supported version!
warnings.warn(
Using environment access key CLEARML_API_ACCESS_KEY=ââ
Using environment secret key CLEARML_API_SECRET_KEY=********
Current configuration (clearml_agent v1.5.2, location: None):
agent.worker_id ...