I started running it again and it seems to have passed the phase where it failed last time
Yey!
Yes it is a common case....
I have the feeling ShinyLobster84 WackyRabbit7 you are not alone in this one 🙂 let me make sure we change the default value of Yes it is a common case
to False, so the code looks cleaner
Worth mentioning, nothing has changed before we executed this, it worked before and now after the update it breaks
Hi Martin,
I upgraded the ClearML version to 1.1.1 and I updated the pipeline code according to v2 as you wrote here and I got a new error which I haven't got before.
Just noting that I did a git push before.
Do you know what can cause this error?
Thanks!version_num = 1c4beae41a70c526d0efd064e65afabbc689c429 tag = docker_cmd = ubuntu:18.04 entry_point = tasks/pipelines/monthly_predictions.py working_dir = . Warning: could not locate requested Python version 3.8, reverting to version 3.6 created virtual environment CPython3.6.9.final.0-64 in 648ms creator CPython3Posix(dest=/root/.clearml/venvs-builds/3.6, clear=False, no_vcs_ignore=False, global=True) seeder FromAppData(download=False, pip=bundle, setuptools=bundle, wheel=bundle, via=copy, app_data_dir=/root/.local/share/virtualenv) added seed packages: pip==21.2.4, setuptools==58.1.0, wheel==0.37.0 activators BashActivator,CShellActivator,FishActivator,NushellActivator,PowerShellActivator,PythonActivator cloning:
fatal: could not read Username for '
': terminal prompts disabled Repository cloning failed: Command '['clone', '
` ', '/root/.clearml/vcs-cache/my_project.git.64cf56a47b594c89c550b96afe67e25f/my_project.git', '--quiet', '--recursive']' returned non-zero exit status 128.
clearml_agent: ERROR: Failed cloning repository.
- Make sure you pushed the requested commit:
(repository='', branch='monthly_pipeline', commit_id='1c4beae41a70c526d0efd064e65afabbc689c429', tag='', docker_cmd='ubuntu:18.04', entry_point='tasks/pipelines/monthly_predictions.py', working_dir='.')
- Check if remote-worker has valid credentials [see worker configuration file]
2021-10-05 10:33:39
Process failed, exit code 1 `
ShinyLobster84
fatal: could not read Username for '
': terminal prompts disabled
This is the main issue, it needs git credentials to clone the repo code, containing the pipeline logic (this is the exact same behaviour as pipeline v1 execute_remotely(), which is now the default, could it be that before you executed the pipeline logic, locally ?)
WackyRabbit7 could the local/remote pipeline logic could apply in your case as well ?
I understand it starts locally... what does it mean to run the steps locally?
awesome this will help a lot with debugging
WackyRabbit7 interesting! Are those "local" pipelines all part of the same code repository? do they need their own environment ?
What would be the easiest pipeline interface to run them locally? (I would if we could support this workflow, it seems you are not alone in this approach, and of course that you can always use them remotely, i.e. clone the pipeline and launch it on an agent)
And I'm getting this in the command line when the process fails:/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.8/lib/python3.8/multiprocessing/resource_tracker.py:216: UserWarning: resource_tracker: There appear to be 73 leaked semaphore objects to clean up at shutdown warnings.warn('resource_tracker: There appear to be %d '
Is this a common case? maybe we should change the run_pipeline_steps_locally
argument to False?
(The idea of run_pipeline_steps_locally=True
is that it will be easier to debug the entire pipeline on the same machine)
Not sure I understand, if i run pipe.start_locally(run_pipeline_steps_locally=True|False)
what is the difference betwee ntrue and false? assuming I want to execute locally
WackyRabbit7
we did execute locally
Sure, instead of pipe.start()
use pipe.start_locally(run_pipeline_steps_locally=False)
, this is it 🙂
what is the difference betwee nfalse and true?
the ability to exexute without an agent i was just talking about thia functionality the other day in the community channel
That was the idea behind the feature (and BTW any feedback on usability and debugging will be appreciated here, pipelines are notorious to debug 🙂 )
the ability to exexute without an agent i was just talking about thia functionality the other day in the community channel
What would be the use case ? (actually the infrastructure now supports it)
Does it mean that if it is set to False
I need an agent but if I set it to True
I don't need one?
Ohh, sorry 🙂:param run_pipeline_steps_locally: (default False) If True, run the pipeline steps themselves locally as a subprocess (use for debugging the pipeline locally, notice the pipeline code is expected to be available on the local machine)
We try to break up every thing into independent tasks and group them using a pipeline. The dependency on an agnet caused an unnecessary overhead since we just want to execute locally. It became a burden once new data scientists join the project and instead of just telling them "yeah, just execute this script" you have to now teach them about clearml, the role of agents, how to launch them, how they behave, how to remove them and stuff like that... things you want to avoid with data scientists
what does it mean to run the steps locally?
start_locally : means the pipeline code itself (the logic that runs / controls the DAG) runs on the local machine (i.e. no agent), but this control logic creates/clones Tasks and enqueues them, for those Tasks you need an agent to execute them
run_pipeline_steps_locally=True: means the Tasks the pipeline creates, instead of enqueuing them and having an agent runs them, they will be launched on the same local machine (think debugging, otherwise I cannot actually see the value). And to your question yes it means there is no need for an agent.
WackyRabbit7 did that help?
we are running the agent on the same machine AgitatedDove14 , it worked before upgrading the clearml... we never set these credentials
Yes it is a common case but I think the pipe.start_locally(run_pipeline_steps_locally=False)
solved it. I started running it again and it seems to have passed the phase where it failed last time