Hi @<1523701205467926528:profile|AgitatedDove14> .
What I mean is shown in the following minimal example notebook. Executing cell [4] works only once during kernel lifetime. When executing cell [4] again (after one successful run) it crashes (see error message below).
Notebook:
%env CLEARML_WEB_HOST=
%env CLEARML_API_HOST=
%env CLEARML_FILES_HOST=
%env CLEARML_API_ACCESS_KEY=<your_key>
%env CLEARML_API_SECRET_KEY=<your_key>
env: CLEARML_WEB_HOST= [None](http://localhost:8080)
env: CLEARML_API_HOST= [None](http://localhost:8008)
env: CLEARML_FILES_HOST= [None](http://localhost:8081)
...
from clearml import PipelineDecorator
@PipelineDecorator.component(cache=False, return_values=['value'])
def step1():
value = 1
return value
@PipelineDecorator.pipeline(name="test-pipeline", project="Test", add_pipeline_tags="True")
def test_pipeline():
value = step1()
# PipelineDecorator.debug_pipeline() # works, even when run repeatedly
PipelineDecorator.run_locally() # works on first execution of cell only, not when run repeatedly
test_pipeline()
ClearML Task: created new task id=32cb8577db6140d8bdcce7e520948c30
ClearML results page: [None](http://localhost:8080/projects/4d92231c28b748f190fde5b8b25d1a5a/experiments/32cb8577db6140d8bdcce7e520948c30/output/log)
ClearML pipeline page: [None](http://localhost:8080/pipelines/4d92231c28b748f190fde5b8b25d1a5a/experiments/32cb8577db6140d8bdcce7e520948c30)
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
/opt/jupyter-server/notebooks/clearml_test/clearml-pipeline-text copy.ipynb Cell 4 line 3
<a href=' [None](vscode-notebook-cell://ssh-remote) %2Bai1/opt/jupyter-server/notebooks/clearml_test/clearml-pipeline-text%20copy.ipynb#W4sdnNjb2RlLXJlbW90ZQ%3D%3D?line=0'>1</a> # PipelineDecorator.debug_pipeline() # works, even when run repeatedly
<a href=' [None](vscode-notebook-cell://ssh-remote) %2Bai1/opt/jupyter-server/notebooks/clearml_test/clearml-pipeline-text%20copy.ipynb#W4sdnNjb2RlLXJlbW90ZQ%3D%3D?line=1'>2</a> PipelineDecorator.run_locally() # works on first execution of cell only, not when run repeatedly
----> <a href=' [None](vscode-notebook-cell://ssh-remote) %2Bai1/opt/jupyter-server/notebooks/clearml_test/clearml-pipeline-text%20copy.ipynb#W4sdnNjb2RlLXJlbW90ZQ%3D%3D?line=2'>3</a> test_pipeline()
File /opt/jupyter-server/venv-jupyter-server/lib/python3.11/site-packages/clearml/automation/controller.py:4429, in PipelineDecorator.pipeline.<locals>.decorator_wrap.<locals>.internal_decorator(*args, **kwargs)
4427 # this time the pipeline is executed only on the remote machine
4428 try:
-> 4429 pipeline_result = func(**pipeline_kwargs)
4430 except Exception:
4431 a_pipeline.stop(mark_failed=True)
/opt/jupyter-server/notebooks/clearml_test/clearml-pipeline-text copy.ipynb Cell 4 line 3
<a href=' [None](vscode-notebook-cell://ssh-remote) %2Bai1/opt/jupyter-server/notebooks/clearml_test/clearml-pipeline-text%20copy.ipynb#W4sdnNjb2RlLXJlbW90ZQ%3D%3D?line=0'>1</a> @PipelineDecorator.pipeline(name="test-pipeline", project="Test", add_pipeline_tags="True")
<a href=' [None](vscode-notebook-cell://ssh-remote) %2Bai1/opt/jupyter-server/notebooks/clearml_test/clearml-pipeline-text%20copy.ipynb#W4sdnNjb2RlLXJlbW90ZQ%3D%3D?line=1'>2</a> def test_pipeline():
----> <a href=' [None](vscode-notebook-cell://ssh-remote) %2Bai1/opt/jupyter-server/notebooks/clearml_test/clearml-pipeline-text%20copy.ipynb#W4sdnNjb2RlLXJlbW90ZQ%3D%3D?line=2'>3</a> value = step1()
File /opt/jupyter-server/venv-jupyter-server/lib/python3.11/site-packages/clearml/automation/controller.py:4069, in PipelineDecorator.component.<locals>.decorator_wrap.<locals>.wrapper(*args, **kwargs)
4067 # get node and park is as launched
4068 cls._singleton._launched_step_names.add(_node_name)
-> 4069 _node = cls._singleton._nodes[_node_name]
4070 cls._retries[_node_name] = 0
4071 cls._retries_callbacks[_node_name] = retry_on_failure if callable(retry_on_failure) else \
4072 (functools.partial(cls._singleton._default_retry_on_failure_callback, max_retries=retry_on_failure)
4073 if isinstance(retry_on_failure, int) else cls._singleton._retry_on_failure_callback)
KeyError: 'step1'
Hi @<1627478122452488192:profile|AdorableDeer85>
Are you referring to running the pipeline on a remote machine ? could you provide the full Task/Pipeline log ?
Hi @<1627478122452488192:profile|AdorableDeer85>
I'm sorry I'm a bit confused here, any chance you can share the entire notebook ?
Also any reason why this is pointing to "localhost" and not IP/host of the clearml-server ? is the agent running on the same machine ?