
Reputation
Badges 1
75 × Eureka!the problem is here: None
As another test I ran Jupyter Lab locally using the same custom Docker container that we're using for Sagemaker Studio, and it works great there, just like the native local Jupyter Lab. So it's seemingly not the image, but maybe something to do with how Studio runs it as a kernel.
and that requests.get()
throws an exception:
ConnectionError: HTTPConnectionPool(host='default', port=8888): Max retries exceeded with url: /jupyter/default/api/sessions (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f7ba9cadc30>: Failed to establish a new connection: [Errno -2] Name or service not known'))
but the call to jupyter_server.serverapp.list_running_servers()
does return the server
I think it just ends up in /home/sagemaker-user/{notebook}.ipynb
every time
Just ran the same notebook in a local Jupyter Lab session and it worked as I expected it might, saving a copy to Artifacts
Yes, I'm running a notebook in Studio. Where should it be captured?
if there are any tests/debugging you'd like me to try, just let me know
seems like it's using None and that doesn't provide the normal api/sessions
endpoint - or, it does, but returns an empty list
As in, which tab when I'm viewing the Experiment should I see it on? Should it be code, an artifact, or something else?
and $QUEUE and $NUM_WORKERS are particular to my setup, but they just give the name of the queue and how many copies of the agent to run
I'm doing that and it's working well
the key point is you just loop through the number of workers, set a unique CLEARML_WORKER_ID for each, and then run it in the background
the CLEARML_*
variables are all explained here: None
here's my script:
#!/bin/bash
echo "******************** Starting Agent ********************"
echo "******************** Getting ENV Variables ********************"
source /etc/profile.d/env-vars.sh
# test that we can access the API
echo "******************** Waiting for ${CLEARML_API_HOST} connectivity ********************"
curl --retry 10 --retry-delay 10 --retry-connrefused ${CLEARML_API_HOST}/debug.ping
# start the agent
for i in $(seq 1 ${NUM_WORKERS})
do
export CLEARML_WORK...
cool, thanks! the first one was what I had thought of but seemed unpythonic, so I'll give the second a shot
I could just loop through and create separate pipelines with different parameters, but seems sort of inefficient. the hyperparameter optimization might actually work in this case utilizing grid search, but seems like kind of a hack
I'm not sure if Subprojects will work for that - can you use the Web UI to compare the artifacts from two separate subprojects?
but the only exception handler is for requests.exceptions.SSLError
api/kernels
does report back the active kernel, but doesn't give notebook paths or anything
one possibility for getting the notebook filepath is finding and parsing /home/sagemaker-user/.jupyter/lab/workspaces/default-37a8.jupyterlab-workspace
I think, but I don't know if I can tie that to a specific session
I will once I figure out the fix!
awesome, I'll test it out - thanks for the tips!
those look like linear DAGs to me, but maybe I'm missing something. I'm thinking something like the map operator in Prefect where I can provide an array of ["A", "B", "C"]
and run the steps outlined with dotted lines independently for each of those are arguments
and is there any way to capture hydra from a notebook as a Configuration? you don't use the typical @hydra.main()
but rather call the compose API , and so far in my testing that doesn't capture the OmegaConf in ClearML
But we're also testing out new models all the time, which are typically implemented as git branches - they run on the same set of inputs but don't output their results into production