Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
What Sort Of Integration Is Possible With Clearml And Sagemaker? On The Page

What sort of integration is possible with ClearML and SageMaker? On the page describing ClearML Remote it says:

Create a remote development environment (e.g. AWS SageMaker, GCP CoLab, etc.) on any on-prem machine or any cloud.

But the only mention of SageMaker I see in the docs is the release notes for 0.13 saying "Add support for SageMaker".

I have SageMaker Studio up and running with access to my ClearML server and it's successfully able to log plots and scalars from experiments, but in terms of code it just logs the code used to launch the kernel:

"""Entry point for launching an IPython kernel.
This is separate from the ipykernel package so we can avoid doing imports until
after removing the cwd from sys.path.
"""
import sys

if __name__ == '__main__':
    # Remove the CWD from sys.path while we load stuff.
    # This is added back by InteractiveShellApp.init_path()
    if sys.path[0] == '':
        del sys.path[0]
    from ipykernel import kernelapp as app
    app.launch_new_instance()

Is it possible to capture more than that while using SageMaker?

  
  
Posted 2 years ago
Votes Newest

Answers 77


so notebook path is empty

  
  
Posted 2 years ago

the server_info is

[{'base_url': '/jupyter/default/',
  'hostname': '0.0.0.0',
  'password': False,
  'pid': 9,
  'port': 8888,
  'root_dir': '/home/sagemaker-user',
  'secure': False,
  'sock': '',
  'token': '',
  'url': '
',
  'version': '1.23.2'}]
  
  
Posted 2 years ago

Just ran the same notebook in a local Jupyter Lab session and it worked as I expected it might, saving a copy to Artifacts

  
  
Posted 2 years ago

if I use the same kernel there'll be two

  
  
Posted 2 years ago

yeah, even then it'll run but return 0 notebooks

  
  
Posted 2 years ago

it does return kernels, just not sessions

  
  
Posted 2 years ago

As another test I ran Jupyter Lab locally using the same custom Docker container that we're using for Sagemaker Studio, and it works great there, just like the native local Jupyter Lab. So it's seemingly not the image, but maybe something to do with how Studio runs it as a kernel.

  
  
Posted 2 years ago

seems like it's using None and that doesn't provide the normal api/sessions endpoint - or, it does, but returns an empty list

  
  
Posted 2 years ago

so my reading of the jupyter-kernel-gateway docs is that each session is containerized, so each notebook "session" is totally isolated

  
  
Posted 2 years ago

still empty
image

  
  
Posted 2 years ago

so notebooks ends up empty

  
  
Posted 2 years ago

poking around a little bit, and clearml.backend_interface.task.repo.scriptinfo.ScriptInfo._get_jupyter_notebook_filename() returns None

  
  
Posted 2 years ago

This is very odd ... let me check something

  
  
Posted 2 years ago

but maybe that doesn't matter, actually - it might be one session per host I guess

  
  
Posted 2 years ago

but the call to jupyter_server.serverapp.list_running_servers() does return the server

  
  
Posted 2 years ago

Hi @<1532532498972545024:profile|LittleReindeer37> @<1523701205467926528:profile|AgitatedDove14>
I got the session with a bit of "hacking".
See this script:

import boto3, requests, json
from urllib.parse import urlparse

def get_notebook_data():
    log_path = "/opt/ml/metadata/resource-metadata.json"
    with open(log_path, "r") as logs:
        _logs = json.load(logs)
    return _logs

notebook_data = get_notebook_data()
client = boto3.client("sagemaker")
response = client.create_presigned_domain_url(
    DomainId=notebook_data["DomainId"],
    UserProfileName=notebook_data["UserProfileName"]
)
authorized_url = response["AuthorizedUrl"]
authorized_url_parsed = urlparse(authorized_url)
unauthorized_url = authorized_url_parsed.scheme + "://" + authorized_url_parsed.netloc
with requests.Session() as s:
    s.get(authorized_url)
    print(s.get(unauthorized_url + "/jupyter/default/api/sessions").content)

Basically, we can get the session directly from AWS, but we need to be authenticated.
One way I found was to create a presigned url through boto3, by getting the domain id and profile name from a resoure-metadata file that is found on the machine None .
Then use that to get the session...
Maybe there are some other ways to do this (safer), but this is a good start. We know it's possible

  
  
Posted 2 years ago

image

  
  
Posted 2 years ago

sh-4.2$ cat /var/log/studio/kernel_gateway.log | head -n10
{"__timestamp__": "2023-02-23T21:48:28.036559Z", "__schema__": "sagemaker.kg.request.schema", "__schema_version__": 1, "__metadata_version__": 1, "account_id": "", "duration": 0.0012829303741455078, "method": "GET", "uri": "/api", "status": 200}
{"__timestamp__": "2023-02-23T21:48:39.111068Z", "__schema__": "sagemaker.kg.request.schema", "__schema_version__": 1, "__metadata_version__": 1, "account_id": "", "duration": 0.0012879371643066406, "method": "GET", "uri": "/api/kernels", "status": 200}
{"__timestamp__": "2023-02-23T21:48:39.116324Z", "__schema__": "sagemaker.kg.request.schema", "__schema_version__": 1, "__metadata_version__": 1, "account_id": "", "duration": 0.0007715225219726562, "method": "GET", "uri": "/api/terminals", "status": 200}
{"__timestamp__": "2023-02-23T21:48:39.272822Z", "__schema__": "sagemaker.kg.request.schema", "__schema_version__": 1, "__metadata_version__": 1, "account_id": "", "duration": 0.0007491111755371094, "method": "GET", "uri": "/api/terminals", "status": 200}
{"__timestamp__": "2023-02-23T21:48:43.000795Z", "__schema__": "sagemaker.kg.request.schema", "__schema_version__": 1, "__metadata_version__": 1, "account_id": "", "duration": 2.539133071899414, "method": "POST", "uri": "/api/kernels", "status": 201}
{"__timestamp__": "2023-02-23T21:48:43.073568Z", "__schema__": "sagemaker.kg.request.schema", "__schema_version__": 1, "__metadata_version__": 1, "account_id": "", "duration": 0.0013430118560791016, "method": "GET", "uri": "/api/kernels/6ba227af-ff2c-4b20-89ac-86dcac95e2b2", "status": 200}
{"__timestamp__": "2023-02-23T21:48:43.469751Z", "__schema__": "sagemaker.kg.request.schema", "__schema_version__": 1, "__metadata_version__": 1, "account_id": "", "duration": 0.0013761520385742188, "method": "GET", "uri": "/api/kernels/6ba227af-ff2c-4b20-89ac-86dcac95e2b2", "status": 200}
{"__timestamp__": "2023-02-23T21:48:43.702549Z", "__schema__": "sagemaker.kg.request.schema", "__schema_version__": 1, "__metadata_version__": 1, "account_id": "", "duration": 0.0013780593872070312, "method": "GET", "uri": "/api/kernels/6ba227af-ff2c-4b20-89ac-86dcac95e2b2", "status": 200}
{"__timestamp__": "2023-02-23T21:48:43.986808Z", "__schema__": "sagemaker.kg.request.schema", "__schema_version__": 1, "__metadata_version__": 1, "account_id": "", "duration": 0.0007445812225341797, "method": "GET", "uri": "/api/kernels/6ba227af-ff2c-4b20-89ac-86dcac95e2b2", "status": 200}
{"__timestamp__": "2023-02-23T21:48:43.992860Z", "__schema__": "sagemaker.kg.request.schema", "__schema_version__": 1, "__metadata_version__": 1, "account_id": "", "duration": 0.001028299331665039, "method": "GET", "uri": "/api/kernels", "status": 200}
  
  
Posted 2 years ago

right now I can't figure out how to get the session in order to get the notebook path

you mean the code that fires "HTTPConnectionPool" ?

  
  
Posted 2 years ago

sadly no

  
  
Posted 2 years ago

and the only calls to "uri": "/api/sessions" are the ones I made during testing - sagemaker doesn't seem to ever call that itself

  
  
Posted 2 years ago

if I change it to 0.0.0.0 it works

  
  
Posted 2 years ago

So it's seemingly not the image, but maybe something to do with how Studio runs it as a kernel.

Yeah I think that for some reason it fails detecting this is actually jupyter noteboko (not really sure why), Thank you for double checking on the container !!

  
  
Posted 2 years ago

but the only exception handler is for requests.exceptions.SSLError

  
  
Posted 2 years ago

right now I can't figure out how to get the session in order to get the notebook path

  
  
Posted 2 years ago

which I looked at previously to see if I could import sagemaker.kg or kernelgateway or something, but no luck

  
  
Posted 2 years ago

We will add this to the SDK soon

  
  
Posted 2 years ago

. I'm thinking it's generically a kernel gateway issue, but I'm not sure if other platforms are using that yet

The odd thing is that you can access the notebook, but it returns zero kernels ..

  
  
Posted 2 years ago

Yep I think you are correct, you should have had the same output as a local jupyter notebook, and it seems that in sagemaker studio it is not working 😞
Let me check something

  
  
Posted 2 years ago

I additionally tried using a Sagemaker Notebook instance, to see if it was the kernel dockerization that Studio uses that was messing things up. But it seems to actually log less information from a Notebook instance vs Studio .
image
image
image

  
  
Posted 2 years ago
156K Views
77 Answers
2 years ago
2 years ago
Tags
Similar posts