Reputation
Badges 1
149 × Eureka!I regularly run into the same problem when I launch pipelines locally (for remote execution)
However, when I clone the pipeline from web UI and launch it once again, it works. Is there a way to bypass this?
AgitatedDove14 is it expected behavior?
No, when I run the pipeline from the console on my local machine, it for some reason launches on clearml-services
hostname (despite of the fact I specified the queue with the desired agent with pipe.set_default_execution_queue
in my code)
AgitatedDove14 yeah, makes sense, that would require some refactoring in our projects though...
But why is my_name
a subproject? Why not just my_project/.datasets
?
Thanks. What is the difference between import_model
and load_model
?
Why can I only call import_model
with weights_url
and not model name or ID? This means I need to call query_models
first, if I got it right
AgitatedDove14
`
fatal: Could not read from remote repository.
Please make sure you have the correct access rights
and the repository exists.
error: Could not fetch origin
Repository cloning failed: Command '['git', 'fetch', '--all', '--recurse-submodules']' returned non-zero exit status 1.
clearml_agent: ERROR: Failed cloning repository.
- Make sure you pushed the requested commit:
(repository='git@...', branch='main', commit_id='...', tag='', docker_cmd='registry.gitlab.com/...:...', en...
I initialize tasks not as functions, but as scripts from different repositories, with different images
` 1633204284443 clearml-services INFO Executing: ['docker', 'run', '-t', '-l', 'clearml-worker-id=clearml-services:service:58186f9e975f484683a364cf9ce69583', '-l', 'clearml-parent-worker-id=clearml-services', '-e', 'NVIDIA_VISIBLE_DEVICES=none', '-e', 'CLEARML_WORKER_ID=clearml-services:service:58186f9e975f484683a364cf9ce69583', '-e', 'CLEARML_DOCKER_IMAGE=', '-v', '/tmp/.clearml_agent.pgsygoh2.cfg:/root/clearml.conf', '-v', '/root/.clearml/apt-cache:/var/cache/apt/archives', '-v', '/root/.cl...
task = Task.import_task(export_data)
pipe.add_step(
name=name,
base_task_id=task.id,
parents=parents,
task_overrides={'script.branch': 'main', 'script.version_num': '', },
execution_queue=pipe_cfg['step_queue'],
cache_executed_step=True,
clone_base_task=False
)
Agent 1.1.0
Python client 1.1.1
OK, I managed to launch the example and it works
pipeline controller itself is stuck at running mode forever all step tasks are created but never enqueued
I can share some code
creates all the step tasks in draft mode and then stucks
But I still cannot launch my own pipelines
You are right, I had [None]
as parents in one of the tasks. Now this error is gone
it has the same effect as start/wait/stop, kinda weird
The pipeline is initialized like thispipe = PipelineController(project=cfg['pipe']['project_name'], name='pipeline-{}'.format(name_postfix), version='1.0.0', add_pipeline_tags=True) pipe.set_default_execution_queue('my-queue')
Then for each step I have a base task which I want to clone
` step_base_task = Task.get_task(project_name=cfg[name]['base_project'],
task_name=...
but at that point it hadn't actually added any steps. Maybe failed pipelines with zero steps count as completed
hm, not quite clear how it is implemented. For example, this is how I do it now (explicitly)
I could insert some updated info to my conference talk if you share the recording by tomorrow morning 😄