Reputation
Badges 1
28 × Eureka!@<1523701205467926528:profile|AgitatedDove14> Thanks for quick replay. You are correct, issue resolved after removing https.
@<1523701087100473344:profile|SuccessfulKoala55> after enabling debug mode below are logs , just to let you know this agent do not have internet and pip packages are installed vis proxy which i can working but for pytorch it seems to going to internet "DEBUG:urllib3.connectionpool: http://api.clearml.domain.com:80 "GET /v2.5/tasks.started HTTP/1.1" 200 353
Executing task id [d3807deae2644e00824e774ff8997eaa]:
repository =
branch =
version_num =
tag =
dock...
@<1523701087100473344:profile|SuccessfulKoala55> Do you think below YML is okay “apiserver:
image:
registry: " harbor.example.com/projectname "
repository: "allegroai/clearml"
pullPolicy: IfNotPresent
tag: "1.10.0-357"
service:
type: ClusterIP
ingress:
enabled: true
hostName: " api.clearml.example.com "
fileserver:
image:
registry: " [harbor.example.com/projectname](http://harbo...
@<1523701087100473344:profile|SuccessfulKoala55> Any idea why it is going to internet only when I run training with PyTorch framework download.PyTorch.org
Yes, Just want to know where to provide private registry name when deploying this helm chart for clearML server as well as for its dependent chart like elastic search , mango DB.
@<1523701827080556544:profile|JuicyFox94> tlsSecretName for clearML web server , api server and File server all ?? In serving YAML ? I am getting error on pod clearml-serving-inference-6bdb9c757d-ww4vx"for /auth.login"
@<1523701087100473344:profile|SuccessfulKoala55> Thanks a lot , it worked !!! However i am getting Error when i open ClearML web application - Fetch tag failed "Error 0 : You can't write against a read only replica." DO you now if this is known issue and fix available for it.
@<1523701087100473344:profile|SuccessfulKoala55> It’s on prem server and remote agent . Both remote agent and my machine are in same network and I can ssh agent from my machine. Do we needs to be open others than SSH to make jupyterlab working from my computer to agent or agent to ClearML server ?
@<1523701087100473344:profile|SuccessfulKoala55> When I use docker I see it go out for NVIDIA , Ubuntu and pip package. I can fix pip via above but what about other NVIDIA and Ubuntu ?
@<1523701087100473344:profile|SuccessfulKoala55> Yes, I am able to create Clearml task and perform training from same machine. only when i start clearml-session this error coming. Do i need to specia config in clearml.conf file for clearml session to work ? Just to add However when i run this command , it work and execute task but do not give any interative jupyter or code url.
clearml-session --jupyter-lab true --queue P2000 --base-task-id=515159dab92d4baabcb6b3647263a144 , it run the task...
@<1523701087100473344:profile|SuccessfulKoala55> As I mentioned earlier, If I do not specify —base-task-Id than error is as below @Jake command clearml-session --jupyter-lab but getting blow error "Launch interactive session [Y]/n? Y
Removing stale interactive sessions
Creating new session
Retrying (Retry(total=237, connect=240, read=237, redirect=240, status=240)) after connection broken by
'ProtocolError('Connection aborted.', ConnectionResetError(10054, 'An existing connection w...
@<1523701087100473344:profile|SuccessfulKoala55> It’s hosted on kubernetes and behind the ingress controller. I use helm char provided on clearML page with ingress set as true. I can access web UI from browser and currently it is on http only.
@<1523701087100473344:profile|SuccessfulKoala55> It was blocked on Load balancer and after allowing traffic , it is working. Thanks a lot !!
@<1523701087100473344:profile|SuccessfulKoala55> Is it a fix for below error which we are getting with new version of clearml server "Starting Task Execution:
Traceback (most recent call last):
File "/home/admin/.clearml/venvs-builds/3.10/lib/python3.10/site-packages/clearml/utilities/requests_toolbelt/_compat.py", line 48, in <module>
from requests.packages.urllib3.contrib import appengine as gaecontrib
ImportError: cannot import name 'appengine' from 'requests.packages.urllib3.contr...
@<1523701087100473344:profile|SuccessfulKoala55> Sorry for delay reply , i have attached the logs and issue is only happening when do ML training with PyTorch. Training with other framework is working fine like tensor flow and sklearn.
@<1523701087100473344:profile|SuccessfulKoala55> it works once i allow traffic to download.PyTorch.org from proxy. 🙂
@<1523701070390366208:profile|CostlyOstrich36> I am looking for pod logs and api server logs
@<1523701087100473344:profile|SuccessfulKoala55> Yes, this is end of logs and nothing happening after it. i am using this command clearml-agent daemon --detached --gpu 0 --queue A40 to launch the agent.
Yes , machine is connected to on prem ClearML server.
@<1523701087100473344:profile|SuccessfulKoala55> When I add extra index url , it gives error for certificate and I am not sure where to configure all these settings in agent settings
@<1523701087100473344:profile|SuccessfulKoala55> Thanks a lot !!! Its fixed after i redeployed container. Could you please help me to fix clearml-session, I am running command clearml-session --jupyter-lab but getting blow error "Launch interactive session [Y]/n? Y
Removing stale interactive sessions
Creating new session
Retrying (Retry(total=237, connect=240, read=237, redirect=240, status=240)) after connection broken by
'ProtocolError('Connection aborted.', ConnectionResetError(10054, 'A...
@<1523701087100473344:profile|SuccessfulKoala55> Thanks .. I will try it and let you know. I have one more question . I have installed latest version of clearML server and now I see issue with Urllib3 V2 which will fix next week with new releases. How can I install old version with helm chart which is stable and working ?
@<1523701087100473344:profile|SuccessfulKoala55> Yes, We have Load balancer which provide IP to ClearML Server and it is working for all operation like normal task creation , remote training and all but only clearml-session is not working.
@<1523701087100473344:profile|SuccessfulKoala55> Agent is running outside Kubernetes on a standalone VM running Ubuntu 22.04
@<1523701087100473344:profile|SuccessfulKoala55> How can I install latest one. Do you have link to refer ?
Script i am running is hello.py with code "from clearml import Task
task = Task.init(project_name="mlops", task_name="Say Hellow")
task.execute_remotely(queue_name="P2000")
print("Hello")" console output " clearml-session --jupyter-lab true --queue P2000 --base-task-id=515159dab92d4baabcb6b3647263a144
clearml-session - CLI for launching JupyterLab / VSCode on a remote machine
Verifying credentials
Use previous queue (resource) 'P2000' [Y]/n? Y
Interactive session config:
{
"base_task_...
@<1537605940121964544:profile|EnthusiasticShrimp49> Yes