Reputation
Badges 1
28 × Eureka!@<1523701087100473344:profile|SuccessfulKoala55> Is it a fix for below error which we are getting with new version of clearml server "Starting Task Execution:
Traceback (most recent call last):
File "/home/admin/.clearml/venvs-builds/3.10/lib/python3.10/site-packages/clearml/utilities/requests_toolbelt/_compat.py", line 48, in <module>
from requests.packages.urllib3.contrib import appengine as gaecontrib
ImportError: cannot import name 'appengine' from 'requests.packages.urllib3.contr...
@<1523701087100473344:profile|SuccessfulKoala55> Do you think below YML is okay “apiserver:
image:
registry: " harbor.example.com/projectname "
repository: "allegroai/clearml"
pullPolicy: IfNotPresent
tag: "1.10.0-357"
service:
type: ClusterIP
ingress:
enabled: true
hostName: " api.clearml.example.com "
fileserver:
image:
registry: " [harbor.example.com/projectname](http://harbo...
@<1523701827080556544:profile|JuicyFox94> tlsSecretName for clearML web server , api server and File server all ?? In serving YAML ? I am getting error on pod clearml-serving-inference-6bdb9c757d-ww4vx"for /auth.login"
Yes, Just want to know where to provide private registry name when deploying this helm chart for clearML server as well as for its dependent chart like elastic search , mango DB.
@<1537605940121964544:profile|EnthusiasticShrimp49> Yes
Script i am running is hello.py with code "from clearml import Task
task = Task.init(project_name="mlops", task_name="Say Hellow")
task.execute_remotely(queue_name="P2000")
print("Hello")" console output " clearml-session --jupyter-lab true --queue P2000 --base-task-id=515159dab92d4baabcb6b3647263a144
clearml-session - CLI for launching JupyterLab / VSCode on a remote machine
Verifying credentials
Use previous queue (resource) 'P2000' [Y]/n? Y
Interactive session config:
{
"base_task_...
@<1523701087100473344:profile|SuccessfulKoala55> Thanks a lot , it worked !!! However i am getting Error when i open ClearML web application - Fetch tag failed "Error 0 : You can't write against a read only replica." DO you now if this is known issue and fix available for it.
@<1523701087100473344:profile|SuccessfulKoala55> Agent is running outside Kubernetes on a standalone VM running Ubuntu 22.04
@<1523701205467926528:profile|AgitatedDove14> Thanks for quick replay. You are correct, issue resolved after removing https.
@<1523701087100473344:profile|SuccessfulKoala55> Yes, I am able to create Clearml task and perform training from same machine. only when i start clearml-session this error coming. Do i need to specia config in clearml.conf file for clearml session to work ? Just to add However when i run this command , it work and execute task but do not give any interative jupyter or code url.
clearml-session --jupyter-lab true --queue P2000 --base-task-id=515159dab92d4baabcb6b3647263a144 , it run the task...
@<1523701087100473344:profile|SuccessfulKoala55> Thanks .. I will try it and let you know. I have one more question . I have installed latest version of clearML server and now I see issue with Urllib3 V2 which will fix next week with new releases. How can I install old version with helm chart which is stable and working ?
@<1523701087100473344:profile|SuccessfulKoala55> When I use docker I see it go out for NVIDIA , Ubuntu and pip package. I can fix pip via above but what about other NVIDIA and Ubuntu ?
@<1523701087100473344:profile|SuccessfulKoala55> Yes, We have Load balancer which provide IP to ClearML Server and it is working for all operation like normal task creation , remote training and all but only clearml-session is not working.
@<1523701087100473344:profile|SuccessfulKoala55> It was blocked on Load balancer and after allowing traffic , it is working. Thanks a lot !!
@<1523701087100473344:profile|SuccessfulKoala55> Any idea why it is going to internet only when I run training with PyTorch framework download.PyTorch.org
@<1523701087100473344:profile|SuccessfulKoala55> Thanks a lot !!! Its fixed after i redeployed container. Could you please help me to fix clearml-session, I am running command clearml-session --jupyter-lab but getting blow error "Launch interactive session [Y]/n? Y
Removing stale interactive sessions
Creating new session
Retrying (Retry(total=237, connect=240, read=237, redirect=240, status=240)) after connection broken by
'ProtocolError('Connection aborted.', ConnectionResetError(10054, 'A...
@<1523701087100473344:profile|SuccessfulKoala55> As I mentioned earlier, If I do not specify —base-task-Id than error is as below @Jake command clearml-session --jupyter-lab but getting blow error "Launch interactive session [Y]/n? Y
Removing stale interactive sessions
Creating new session
Retrying (Retry(total=237, connect=240, read=237, redirect=240, status=240)) after connection broken by
'ProtocolError('Connection aborted.', ConnectionResetError(10054, 'An existing connection w...
@<1523701087100473344:profile|SuccessfulKoala55> It’s hosted on kubernetes and behind the ingress controller. I use helm char provided on clearML page with ingress set as true. I can access web UI from browser and currently it is on http only.
@<1523701087100473344:profile|SuccessfulKoala55> It’s on prem server and remote agent . Both remote agent and my machine are in same network and I can ssh agent from my machine. Do we needs to be open others than SSH to make jupyterlab working from my computer to agent or agent to ClearML server ?
@<1523701087100473344:profile|SuccessfulKoala55> When I add extra index url , it gives error for certificate and I am not sure where to configure all these settings in agent settings
@<1523701087100473344:profile|SuccessfulKoala55> How can I install latest one. Do you have link to refer ?
@<1523701070390366208:profile|CostlyOstrich36> I am looking for pod logs and api server logs
@<1523701087100473344:profile|SuccessfulKoala55> Sorry for delay reply , i have attached the logs and issue is only happening when do ML training with PyTorch. Training with other framework is working fine like tensor flow and sklearn.
@<1523701087100473344:profile|SuccessfulKoala55> Yes, this is end of logs and nothing happening after it. i am using this command clearml-agent daemon --detached --gpu 0 --queue A40 to launch the agent.
@<1523701087100473344:profile|SuccessfulKoala55> it works once i allow traffic to download.PyTorch.org from proxy. 🙂
Yes , machine is connected to on prem ClearML server.
@<1523701087100473344:profile|SuccessfulKoala55> after enabling debug mode below are logs , just to let you know this agent do not have internet and pip packages are installed vis proxy which i can working but for pytorch it seems to going to internet "DEBUG:urllib3.connectionpool: http://api.clearml.domain.com:80 "GET /v2.5/tasks.started HTTP/1.1" 200 353
Executing task id [d3807deae2644e00824e774ff8997eaa]:
repository =
branch =
version_num =
tag =
dock...