What Sort Of Integration Is Possible With Clearml And Sagemaker? On The Page

Answered

What sort of integration is possible with ClearML and SageMaker? On the page describing ClearML Remote it says:

Create a remote development environment (e.g. AWS SageMaker, GCP CoLab, etc.) on any on-prem machine or any cloud.

But the only mention of SageMaker I see in the docs is the release notes for 0.13 saying "Add support for SageMaker".

I have SageMaker Studio up and running with access to my ClearML server and it's successfully able to log plots and scalars from experiments, but in terms of code it just logs the code used to launch the kernel:

"""Entry point for launching an IPython kernel.
This is separate from the ipykernel package so we can avoid doing imports until
after removing the cwd from sys.path.
"""
import sys

if __name__ == '__main__':
    # Remove the CWD from sys.path while we load stuff.
    # This is added back by InteractiveShellApp.init_path()
    if sys.path[0] == '':
        del sys.path[0]
    from ipykernel import kernelapp as app
    app.launch_new_instance()

Is it possible to capture more than that while using SageMaker?

  				
Posted 
	2 years ago

					More  		
  Report
		
					LittleReindeer37
				
					0
					 × 1

Votes Newest

Answers 77

What do you have in "server_info['url']" ?

  				
Posted 
	2 years ago

					More  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

so my reading of the jupyter-kernel-gateway docs is that each session is containerized, so each notebook "session" is totally isolated

  				
Posted 
	2 years ago

					More  		
  Report
		
					LittleReindeer37
				
					0
					 × 1

I can get it to run up to here: None

  				
Posted 
	2 years ago

					More  		
  Report
		
					LittleReindeer37
				
					0
					 × 1

but r.json() is an empty list

  				
Posted 
	2 years ago

					More  		
  Report
		
					LittleReindeer37
				
					0
					 × 1

if I instead change the request url to f"http://{server_info['hostname']}:{server_info['port']}/api/sessions" then it gets a 200 response... however , the response is an empty list

  				
Posted 
	2 years ago

					More  		
  Report
		
					LittleReindeer37
				
					0
					 × 1

so notebooks ends up empty

  				
Posted 
	2 years ago

					More  		
  Report
		
					LittleReindeer37
				
					0
					 × 1

api/kernels does report back the active kernel, but doesn't give notebook paths or anything

  				
Posted 
	2 years ago

					More  		
  Report
		
					LittleReindeer37
				
					0
					 × 1

We will add this to the SDK soon

  				
Posted 
	2 years ago

					More  		
  Report
		
					SmugDolphin23
				
					0

and that requests.get() throws an exception:

ConnectionError: HTTPConnectionPool(host='default', port=8888): Max retries exceeded with url: /jupyter/default/api/sessions (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f7ba9cadc30>: Failed to establish a new connection: [Errno -2] Name or service not known'))

  				
Posted 
	2 years ago

					More  		
  Report
		
					LittleReindeer37
				
					0
					 × 1

but the call to jupyter_server.serverapp.list_running_servers() does return the server

  				
Posted 
	2 years ago

					More  		
  Report
		
					LittleReindeer37
				
					0
					 × 1

if I change it to 0.0.0.0 it works

  				
Posted 
	2 years ago

					More  		
  Report
		
					LittleReindeer37
				
					0
					 × 1

curious whether it impacts anything besides sagemaker. I'm thinking it's generically a kernel gateway issue, but I'm not sure if other platforms are using that yet

  				
Posted 
	2 years ago

					More  		
  Report
		
					LittleReindeer37
				
					0
					 × 1

environ{'PYTHONNOUSERSITE': '0',
        'HOSTNAME': 'gfp-science-ml-t3-medium-d579233e8c4b53bc5ad626f2b385',
        'AWS_CONTAINER_CREDENTIALS_RELATIVE_URI': '/_sagemaker-instance-credentials/xxx',
        'JUPYTER_PATH': '/usr/share/jupyter/',
        'SAGEMAKER_LOG_FILE': '/var/log/studio/kernel_gateway.log',
        'PATH': '/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/tmp/miniconda3/condabin:/tmp/anaconda3/condabin:/tmp/miniconda2/condabin:/tmp/anaconda2/condabin',
        'REGION_NAME': 'us-east-1',
        'AWS_INTERNAL_IMAGE_OWNER': 'Custom',
        'AWS_DEFAULT_REGION': 'us-east-1',
        'PWD': '/home/sagemaker-user',
        'AWS_REGION': 'us-east-1',
        'SHLVL': '1',
        'HOME': '/home/sagemaker-user',
        'AWS_SAGEMAKER_PYTHONNOUSERSITE': '0',
        'AWS_ACCOUNT_ID': 'xxx',
        '_': '/opt/.sagemakerinternal/conda/bin/jupyter-kernelgateway',
        'LC_CTYPE': 'C.UTF-8',
        'KERNEL_LAUNCH_TIMEOUT': '40',
        'KERNEL_WORKING_PATH': '',
        'KERNEL_GATEWAY': '1',
        'JPY_PARENT_PID': '9',
        'PYDEVD_USE_FRAME_EVAL': 'NO',
        'TERM': 'xterm-color',
        'CLICOLOR': '1',
        'FORCE_COLOR': '1',
        'CLICOLOR_FORCE': '1',
        'PAGER': 'cat',
        'GIT_PAGER': 'cat',
        'MPLBACKEND': '

_inline'}

  				
Posted 
	2 years ago

					More  		
  Report
		
					LittleReindeer37
				
					0
					 × 1

LittleReindeer37 nice!!! 😍
Do you want to PR? it will be relatively easy to merge and test, and I think that they might even push it to the next version (or worst case quick RC)

  				
Posted 
	2 years ago

					More  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

but maybe that doesn't matter, actually - it might be one session per host I guess

  				
Posted 
	2 years ago

					More  		
  Report
		
					LittleReindeer37
				
					0
					 × 1

local Jupyter Lab:

  				
Posted 
	2 years ago

					More  		
  Report
		
					LittleReindeer37
				
					0
					 × 1

right now I can't figure out how to get the session in order to get the notebook path

you mean the code that fires "HTTPConnectionPool" ?

  				
Posted 
	2 years ago

					More  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

right now I can't figure out how to get the session in order to get the notebook path

  				
Posted 
	2 years ago

					More  		
  Report
		
					LittleReindeer37
				
					0
					 × 1

which I looked at previously to see if I could import sagemaker.kg or kernelgateway or something, but no luck

  				
Posted 
	2 years ago

					More  		
  Report
		
					LittleReindeer37
				
					0
					 × 1

Hmm what do you have here?

os.system("cat /var/log/studio/kernel_gateway.log")

  				
Posted 
	2 years ago

					More  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

lots of things like {"__timestamp__": "2023-02-23T23:49:23.285946Z", "__schema__": "sagemaker.kg.request.schema", "__schema_version__": 1, "__metadata_version__": 1, "account_id": "", "duration": 0.0007679462432861328, "method": "GET", "uri": "/api/kernels/6ba227af-ff2c-4b20-89ac-86dcac95e2b2", "status": 200}

  				
Posted 
	2 years ago

					More  		
  Report
		
					LittleReindeer37
				
					0
					 × 1

but the only exception handler is for requests.exceptions.SSLError

  				
Posted 
	2 years ago

					More  		
  Report
		
					LittleReindeer37
				
					0
					 × 1

the problem is here: None

  				
Posted 
	2 years ago

					More  		
  Report
		
					LittleReindeer37
				
					0
					 × 1

print(os.environ)

  				
Posted 
	2 years ago

					More  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

SageMaker Studio:

  				
Posted 
	2 years ago

					More  		
  Report
		
					LittleReindeer37
				
					0
					 × 1

  				
Posted 
	2 years ago

					More  		
  Report
		
					LittleReindeer37
				
					0
					 × 1

Yes, I'm running a notebook in Studio. Where should it be captured?

  				
Posted 
	2 years ago

					More  		
  Report
		
					LittleReindeer37
				
					0
					 × 1

if there are any tests/debugging you'd like me to try, just let me know

  				
Posted 
	2 years ago

					More  		
  Report
		
					LittleReindeer37
				
					0
					 × 1

This is very odd ... let me check something

  				
Posted 
	2 years ago

					More  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

What happens when you call:

from clearml.backend_interface.task.repo import ScriptInfo

print(ScriptInfo._ScriptInfo__legacy_jupyter_notebook_server_json_parsing(None))

  				
Posted 
	2 years ago

					More  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Show more results

Write your answer

86K Views

77 Answers

2 years ago