poking around a little bit, and clearml.backend_interface.task.repo.scriptinfo.ScriptInfo._get_jupyter_notebook_filename()
returns None
but the only exception handler is for requests.exceptions.SSLError
Hi @<1532532498972545024:profile|LittleReindeer37>
Yes you are correct it should capture the entire jupyter notebook in sagemaker studio.
Just verifying this is the use case, correct ?
and cat /var/log/studio/kernel_gateway.log | grep ipynb
comes up empty
the server_info
is
[{'base_url': '/jupyter/default/',
'hostname': '0.0.0.0',
'password': False,
'pid': 9,
'port': 8888,
'root_dir': '/home/sagemaker-user',
'secure': False,
'sock': '',
'token': '',
'url': '
',
'version': '1.23.2'}]
As another test I ran Jupyter Lab locally using the same custom Docker container that we're using for Sagemaker Studio, and it works great there, just like the native local Jupyter Lab. So it's seemingly not the image, but maybe something to do with how Studio runs it as a kernel.
Try to add here:
None
server_info['url'] = f"http://{server_info['hostname']}:{server_info['port']}/"
weird that it won't return that single session
Hmm and you are getting empty list for thi one:
server_info['url'] = f"http://{server_info['hostname']}:{server_info['port']}/"
but the call to jupyter_server.serverapp.list_running_servers()
does return the server
which I looked at previously to see if I could import sagemaker.kg or kernelgateway or something, but no luck
At the top there should be the URL of the notebook (I think)
@<1532532498972545024:profile|LittleReindeer37> nice!!! 😍
Do you want to PR? it will be relatively easy to merge and test, and I think that they might even push it to the next version (or worst case quick RC)
curious whether it impacts anything besides sagemaker. I'm thinking it's generically a kernel gateway issue, but I'm not sure if other platforms are using that yet
and that requests.get()
throws an exception:
ConnectionError: HTTPConnectionPool(host='default', port=8888): Max retries exceeded with url: /jupyter/default/api/sessions (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f7ba9cadc30>: Failed to establish a new connection: [Errno -2] Name or service not known'))
This is very odd ... let me check something
This is strange, let me see if we can get around it, because I'm sure it worked 🙂
as best I can tell it'll only have one .ipynb in $HOME
with this setup, which may work...
if I instead change the request url to f"http://{server_info['hostname']}:{server_info['port']}/api/sessions"
then it gets a 200 response... however , the response is an empty list
Just ran the same notebook in a local Jupyter Lab session and it worked as I expected it might, saving a copy to Artifacts
I think it just ends up in /home/sagemaker-user/{notebook}.ipynb
every time
but one possible workaround is to try to figure out if it's running in a gateway and then find the only notebook running on that server
and this
server_info['url'] = f"http://{server_info['hostname']}:{server_info['port']}/{server_info['base_url']}/"
Hi @<1532532498972545024:profile|LittleReindeer37> @<1523701205467926528:profile|AgitatedDove14>
I got the session with a bit of "hacking".
See this script:
import boto3, requests, json
from urllib.parse import urlparse
def get_notebook_data():
log_path = "/opt/ml/metadata/resource-metadata.json"
with open(log_path, "r") as logs:
_logs = json.load(logs)
return _logs
notebook_data = get_notebook_data()
client = boto3.client("sagemaker")
response = client.create_presigned_domain_url(
DomainId=notebook_data["DomainId"],
UserProfileName=notebook_data["UserProfileName"]
)
authorized_url = response["AuthorizedUrl"]
authorized_url_parsed = urlparse(authorized_url)
unauthorized_url = authorized_url_parsed.scheme + "://" + authorized_url_parsed.netloc
with requests.Session() as s:
s.get(authorized_url)
print(s.get(unauthorized_url + "/jupyter/default/api/sessions").content)
Basically, we can get the session directly from AWS, but we need to be authenticated.
One way I found was to create a presigned url through boto3, by getting the domain id and profile name from a resoure-metadata file that is found on the machine None .
Then use that to get the session...
Maybe there are some other ways to do this (safer), but this is a good start. We know it's possible
print(requests.get(url='
print(requests.get(url='