Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
Clearml Agent Can Work On Gpu Machine In No Internet Environment Where We Have Proxy For Pip Packages And Ubuntu Updates ? If Yes, How To Configure These Proxy In Agent Settings. I See At Launch Of Script Saying "Hello" It Install Many Packages Befor

ClearML agent can work on GPU machine in no internet environment where we have proxy for pip packages and ubuntu updates ?

If yes, How to configure these proxy in agent settings.

I see at launch of script saying "Hello" it install many packages before and than print "Hello".

  
  
Posted one year ago
Votes Newest

Answers 27


The urllib3 issue is not related to the server, it's an SDK issue that was resolved in an RC and also by an official release last night

  
  
Posted one year ago

pip install clearml==1.10.4

  
  
Posted one year ago

@<1523701087100473344:profile|SuccessfulKoala55> How can I install latest one. Do you have link to refer ?

  
  
Posted one year ago

You'll have to set up the apt repository in the container - see here for example: None

  
  
Posted one year ago

An easier approach might be to inject into the docker container (using the init bash script) or preparing in the image in adnavce something like:

[global]
extra-index-url = 

cert = /path/to/my/bundle.pem

In /etc/pip.conf

  
  
Posted one year ago

@<1523701087100473344:profile|SuccessfulKoala55> When I add extra index url , it gives error for certificate and I am not sure where to configure all these settings in agent settings

  
  
Posted one year ago

@<1523701087100473344:profile|SuccessfulKoala55> When I use docker I see it go out for NVIDIA , Ubuntu and pip package. I can fix pip via above but what about other NVIDIA and Ubuntu ?

  
  
Posted one year ago

@<1523701087100473344:profile|SuccessfulKoala55> Agent is running outside Kubernetes on a standalone VM running Ubuntu 22.04

  
  
Posted one year ago

And do you have any network proxy or load balancer or firewall between the client running clearml-session and the server?

  
  
Posted one year ago

And what are the details? (The task log)

  
  
Posted one year ago

This means the clearml-session client cannot reach the ClearML server - did you configure the clearml.conf file where you're running the clearml-session CLI?

  
  
Posted one year ago

@<1523701087100473344:profile|SuccessfulKoala55> It was blocked on Load balancer and after allowing traffic , it is working. Thanks a lot !!

  
  
Posted one year ago

Looks like your elastic search on the server has some issue, possibly with storage, can you share the elastic search logs?

  
  
Posted one year ago

Hi @<1562973095227035648:profile|ThoughtfulOctopus83> , if the agent can reach the ClearML Server, it should work. If you have a proxy for pip packages and ubuntu updates, you'll need to configure extra index URL for pip (using the agent.package_manager.extra_index_url setting (see here ). If you're using the agent in docker mode, than it will be trying to install ubuntu packages in the spawned docker container, so you will need to either use a docker image already set up with the proxy, or make sure the proxy is set up in the init bash script (under agent.docker_preprocess_bash_script )

  
  
Posted one year ago

@<1523701087100473344:profile|SuccessfulKoala55> As I mentioned earlier, If I do not specify —base-task-Id than error is as below @Jake command clearml-session --jupyter-lab but getting blow error "Launch interactive session [Y]/n? Y
Removing stale interactive sessions
Creating new session
Retrying (Retry(total=237, connect=240, read=237, redirect=240, status=240)) after connection broken by
'ProtocolError('Connection aborted.', ConnectionResetError(10054, 'An existing connection was forcibly closed by the remote host', None, 10054, None))': /v2.23/tasks.edit
Retrying (Retry(total=236, connect=240, read=236, redirect=240, status=240)) after connection broken by
'ProtocolError('Connection aborted.', ConnectionResetError(10054, 'An existing connection was forcibly closed by the remote host', None, 10054, None))': /v2.23/tasks.edit
Retrying (Retry(total=235, connect=240, read=235, redirect=240, status=240)) after connection broken by
'ProtocolError('Connection aborted.', ConnectionResetError(10054, 'An existing connection was forcibly closed by the remote host', None, 10054, None))': /v2.23/tasks.edit
Retrying (Retry(total=234, connect=240, read=234, redirect=240, status=240)) after connection broken by
'ProtocolError('Connection aborted.', ConnectionResetError(10054, 'An existing connection was forcibly closed by the remote host', None, 10054, None))': /v2.23/tasks.edit"

  
  
Posted one year ago

@<1523701087100473344:profile|SuccessfulKoala55> Yes, I am able to create Clearml task and perform training from same machine. only when i start clearml-session this error coming. Do i need to specia config in clearml.conf file for clearml session to work ? Just to add However when i run this command , it work and execute task but do not give any interative jupyter or code url.
clearml-session --jupyter-lab true --queue P2000 --base-task-id=515159dab92d4baabcb6b3647263a144 , it run the task and at the end give error ERROR: Remote setup failed (status=completed) see details:
clearml-session - CLI for launching JupyterLab / VSCode on a remote machine
Verifying credentials
Use previous queue (resource) 'P2000' [Y]/n? Y

Interactive session config:
{
"base_task_id": "515159dab92d4baabcb6b3647263a144",
"git_credentials": false,
"jupyter_lab": true,
"keepalive": false,
"password": "*********",
"queue": "P2000",
"remote_ssh_port": "22",
"username": "mlopsadmin",
"vscode_server": true
}

Launch interactive session [Y]/n? Y
Removing stale interactive sessions
Cloning base session 515159dab92d4baabcb6b3647263a144
Configuring new session
New session created [id=f17d0e89a3ad43bf93e455a23109ccce]
Waiting for remote machine allocation [id=f17d0e89a3ad43bf93e455a23109ccce]
.Status [queued]
Remote machine allocated
Setting remote environment [Task id=f17d0e89a3ad43bf93e455a23109ccce]
Setup process details: None
Waiting for environment setup to complete [usually about 20-30 seconds, see last log line/s below]

task f17d0e89a3ad43bf93e455a23109ccce pulled from a3039785e5d54587a36a4af3e310bf73 by worker
Worker:gpu0

  • urllib3==2.0.2

Environment setup completed successfully

Starting Task Execution:

ClearML results page: None
Hello

Process completed successfully

ERROR: Remote setup failed (status=completed) see details: None

  
  
Posted one year ago

Script i am running is hello.py with code "from clearml import Task

task = Task.init(project_name="mlops", task_name="Say Hellow")
task.execute_remotely(queue_name="P2000")
print("Hello")" console output " clearml-session --jupyter-lab true --queue P2000 --base-task-id=515159dab92d4baabcb6b3647263a144
clearml-session - CLI for launching JupyterLab / VSCode on a remote machine
Verifying credentials
Use previous queue (resource) 'P2000' [Y]/n? Y

Interactive session config:
{
"base_task_id": "515159dab92d4baabcb6b3647263a144",
"git_credentials": false,
"jupyter_lab": true,
"keepalive": false,
"password": "************",
"queue": "P2000",
"remote_ssh_port": "22",
"username": "mlopsadmin",
"vscode_server": true
}

Launch interactive session [Y]/n? Y
Removing stale interactive sessions
Cloning base session 515159dab92d4baabcb6b3647263a144
Configuring new session
New session created [id=0e9cd1cdbba44fad87e7742a7e25af8f]
Waiting for remote machine allocation [id=0e9cd1cdbba44fad87e7742a7e25af8f]
.Status [queued]
..Status [in_progress] - queued pulled by agent
Remote machine allocated
Setting remote environment [Task id=0e9cd1cdbba44fad87e7742a7e25af8f]
Setup process details: None
Waiting for environment setup to complete [usually about 20-30 seconds, see last log line/s below]

task 0e9cd1cdbba44fad87e7742a7e25af8f pulled from a3039785e5d54587a36a4af3e310bf73 by worker
WORKER:gpu0

  • urllib3==2.0.2

Environment setup completed successfully

Starting Task Execution:

ClearML results page: None
Hello

Process completed successfully

ERROR: Remote setup failed (status=completed) see details: None " and task logs attached.

  
  
Posted one year ago

Did you set up SSL termination on the server? how do you access the web UI?

  
  
Posted one year ago

@<1523701087100473344:profile|SuccessfulKoala55> Yes, We have Load balancer which provide IP to ClearML Server and it is working for all operation like normal task creation , remote training and all but only clearml-session is not working.

  
  
Posted one year ago

@<1523701087100473344:profile|SuccessfulKoala55> Thanks a lot , it worked !!! However i am getting Error when i open ClearML web application - Fetch tag failed "Error 0 : You can't write against a read only replica." DO you now if this is known issue and fix available for it.
image

  
  
Posted one year ago

@<1523701087100473344:profile|SuccessfulKoala55> Thanks .. I will try it and let you know. I have one more question . I have installed latest version of clearML server and now I see issue with Urllib3 V2 which will fix next week with new releases. How can I install old version with helm chart which is stable and working ?

  
  
Posted one year ago

@<1523701087100473344:profile|SuccessfulKoala55> It’s on prem server and remote agent . Both remote agent and my machine are in same network and I can ssh agent from my machine. Do we needs to be open others than SSH to make jupyterlab working from my computer to agent or agent to ClearML server ?

  
  
Posted one year ago

I see the issue is you're using a --base-task-id , but the base task you're using is your own custom task, which does not have the interactive session settings in it. This is an advanced feature. If you want to use a base task, I'd recommend first starting without it, than examining the task created by the interactive session to figure out what exactly you need

  
  
Posted one year ago

@<1523701087100473344:profile|SuccessfulKoala55> It’s hosted on kubernetes and behind the ingress controller. I use helm char provided on clearML page with ingress set as true. I can access web UI from browser and currently it is on http only.

  
  
Posted one year ago

OK, so the server is hosted in k8s, where is the agent running?

  
  
Posted one year ago

Again, this is a network issue, it might have something to do with the different requests sent by the CLI when you use it this way (larger requests, with payloads, etc.) - are you using some proxy or is the server hosted on a cloud provider?

  
  
Posted one year ago

@<1523701087100473344:profile|SuccessfulKoala55> Thanks a lot !!! Its fixed after i redeployed container. Could you please help me to fix clearml-session, I am running command clearml-session --jupyter-lab but getting blow error "Launch interactive session [Y]/n? Y
Removing stale interactive sessions
Creating new session
Retrying (Retry(total=237, connect=240, read=237, redirect=240, status=240)) after connection broken by
'ProtocolError('Connection aborted.', ConnectionResetError(10054, 'An existing connection was forcibly closed by the remote host', None, 10054, None))': /v2.23/tasks.edit
Retrying (Retry(total=236, connect=240, read=236, redirect=240, status=240)) after connection broken by
'ProtocolError('Connection aborted.', ConnectionResetError(10054, 'An existing connection was forcibly closed by the remote host', None, 10054, None))': /v2.23/tasks.edit
Retrying (Retry(total=235, connect=240, read=235, redirect=240, status=240)) after connection broken by
'ProtocolError('Connection aborted.', ConnectionResetError(10054, 'An existing connection was forcibly closed by the remote host', None, 10054, None))': /v2.23/tasks.edit
Retrying (Retry(total=234, connect=240, read=234, redirect=240, status=240)) after connection broken by
'ProtocolError('Connection aborted.', ConnectionResetError(10054, 'An existing connection was forcibly closed by the remote host', None, 10054, None))': /v2.23/tasks.edit"

  
  
Posted one year ago
909 Views
27 Answers
one year ago
one year ago
Tags