Reputation
Badges 1
75 × Eureka!and $QUEUE and $NUM_WORKERS are particular to my setup, but they just give the name of the queue and how many copies of the agent to run
cool, thanks! the first one was what I had thought of but seemed unpythonic, so I'll give the second a shot
those look like linear DAGs to me, but maybe I'm missing something. I'm thinking something like the map operator in Prefect where I can provide an array of ["A", "B", "C"] and run the steps outlined with dotted lines independently for each of those are arguments
I'm not sure if Subprojects will work for that - can you use the Web UI to compare the artifacts from two separate subprojects?
So I'm thinking maybe a Project for each thing we're forecasting, and then new Tasks for each time we run it
My use case is running forecasting models in production across multiple businesses
so my reading of the jupyter-kernel-gateway docs is that each session is containerized, so each notebook "session" is totally isolated
I think it just ends up in /home/sagemaker-user/{notebook}.ipynb every time
environ{'PYTHONNOUSERSITE': '0',
'HOSTNAME': 'gfp-science-ml-t3-medium-d579233e8c4b53bc5ad626f2b385',
'AWS_CONTAINER_CREDENTIALS_RELATIVE_URI': '/_sagemaker-instance-credentials/xxx',
'JUPYTER_PATH': '/usr/share/jupyter/',
'SAGEMAKER_LOG_FILE': '/var/log/studio/kernel_gateway.log',
'PATH': '/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/tmp/miniconda3/condabin:/tmp/anaconda3/condabin:/tmp/miniconda2/condabin:/tmp/anaconda2/condabin'...
I could just loop through and create separate pipelines with different parameters, but seems sort of inefficient. the hyperparameter optimization might actually work in this case utilizing grid search, but seems like kind of a hack
right now I'm doing
# clear existing configs in case we're rerunning the notebook
if hydra.core.global_hydra.GlobalHydra().is_initialized():
hydra.core.global_hydra.GlobalHydra.instance().clear()
# initialize Hydra
hydra.initialize(
version_base=None,
config_path=".",
job_name="test_app",
)
# use the compose API since we're running in a notebook instead of __main__
cfg = hydra.compose(
config_name="config",
overrides=[],
)
# report in ClearML Config UI
clearm...
the CLEARML_* variables are all explained here: None
As another test I ran Jupyter Lab locally using the same custom Docker container that we're using for Sagemaker Studio, and it works great there, just like the native local Jupyter Lab. So it's seemingly not the image, but maybe something to do with how Studio runs it as a kernel.
api/kernels does report back the active kernel, but doesn't give notebook paths or anything
thanks for the thoughtful response, @<1523701205467926528:profile|AgitatedDove14> ! I think I'll need to test out some workflows to see what works
I'll give it a shot and see! Just setting up a test server now, so it's still a hypothetical question just from reading the docs so far
And then we want to compare backtests or just this week's estimates across multiple of those models/branches
But we're also testing out new models all the time, which are typically implemented as git branches - they run on the same set of inputs but don't output their results into production
awesome, I'll test it out - thanks for the tips!
the key point is you just loop through the number of workers, set a unique CLEARML_WORKER_ID for each, and then run it in the background
one possibility for getting the notebook filepath is finding and parsing /home/sagemaker-user/.jupyter/lab/workspaces/default-37a8.jupyterlab-workspace I think, but I don't know if I can tie that to a specific session
I've poked around both the internal URL that Jupyter kernel is running on and some of the files in /sagemaker/.jupyter but no luck so far - I can find plenty of kernel info, but not session
lots of things like {"__timestamp__": "2023-02-23T23:49:23.285946Z", "__schema__": "sagemaker.kg.request.schema", "__schema_version__": 1, "__metadata_version__": 1, "account_id": "", "duration": 0.0007679462432861328, "method": "GET", "uri": "/api/kernels/6ba227af-ff2c-4b20-89ac-86dcac95e2b2", "status": 200}
and that requests.get() throws an exception:
ConnectionError: HTTPConnectionPool(host='default', port=8888): Max retries exceeded with url: /jupyter/default/api/sessions (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f7ba9cadc30>: Failed to establish a new connection: [Errno -2] Name or service not known'))
if I change it to 0.0.0.0 it works
I can get it to run up to here: None
