Part of the docker compose, there is a container with a special agent that works specifically for managing services for the system, like the pipelines controllers
Let's separate things. You have the control plane which is the server (API/files/web/mongo/elastic/redis) and then you have the compute fabric (GPU/CPU machines).
Control plane manages things. The compute fabric actually executes the code. Agents manage the compute fabric.
I suggest watching the following two videos:
None
None
The video on pipelines talks about pipelines from tasks, however, the abstraction logic works the same with decorators
I resolve the original error message, but now, I can' access the webserver, None . The webserver log has:
Updating configuration from env
Updating configuration from env
Updating configuration from env
Updating configuration from env
Updating configuration from env
any thoughts?
You need the agents to run the various pipeline steps + controllers
Hi FloppySwan0 , the error message suggests you don't have a queue called services running. Did you read up on the agent?
how about the clearml api, file and web servers?
ah, these ports are taken on my machine so I mapped the fileserver, apiserver and webserver ports.
I got past that error by setting the default_queue in the pipeline.decorator. also set_default_execution_queue
any idea what this means?
{"timestamp":"2025-03-06T22:26:20Z","level":"INFO","msg":"No repository found, storing script code instead","file":"scriptinfo.py","line":"1087","module":"scriptinfo","func":"_get_script_info",}
{"timestamp":"2025-03-06T22:26:22Z","level":"WARNING","msg":"Could not fetch full definition of function upload_video_task: 'Attribute' object has no attribute 'id'","file":"populate.py","line":"1177","module":"populate","func":"__get_source_with_decorators",}
{"timestamp":"2025-03-06T22:26:22Z","level":"INFO","msg":"No repository found, storing script code instead","file":"scriptinfo.py","line":"1087","module":"scriptinfo","func":"_get_script_info",}
{"timestamp":"2025-03-06T22:26:24Z","level":"WARNING","msg":"Could not fetch full definition of function download_video_task: 'Attribute' object has no attribute 'id'","file":"populate.py","line":"1177","module":"populate","func":"__get_source_with_decorators",}
{"timestamp":"2025-03-06T22:26:24Z","level":"INFO","msg":"No repository found, storing script code instead","file":"scriptinfo.py","line":"1087","module":"scriptinfo","func":"_get_script_info",}
{"timestamp":"2025-03-06T22:26:26Z","level":"WARNING","msg":"Could not fetch full definition of function encode_video_task: 'Attribute' object has no attribute 'id'","file":"populate.py","line":"1177","module":"populate","func":"__get_source_with_decorators",}
{"timestamp":"2025-03-06T22:26:26Z","level":"INFO","msg":"No repository found, storing script code instead","file":"scriptinfo.py","line":"1087","module":"scriptinfo","func":"_get_script_info",}
{"timestamp":"2025-03-06T22:26:28Z","level":"WARNING","msg":"Could not fetch full definition of function upload_video_task: 'Attribute' object has no attribute 'id'","file":"populate.py","line":"1177","module":"populate","func":"__get_source_with_decorators",}
{"timestamp":"2025-03-06T22:26:28Z","level":"INFO","msg":"No repository found, storing script code instead","file":"scriptinfo.py","line":"1087","module":"scriptinfo","func":"_get_script_info",}
{"timestamp":"2025-03-06T22:26:31Z","level":"WARNING","msg":"Could not fetch full definition of function download_video_task: 'Attribute' object has no attribute 'id'","file":"populate.py","line":"1177","module":"populate","func":"__get_source_with_decorators",}
{"timestamp":"2025-03-06T22:26:31Z","level":"INFO","msg":"No repository found, storing script code instead","file":"scriptinfo.py","line":"1087","module":"scriptinfo","func":"_get_script_info",}
And from the error you get, like I mentioned it looks there is no services queue. I would check the logs on the agent-services container to see if you get any errors as this is the agent in charge of listening to the 'services' queue
perfect! I've also scheduled a demo since we are planning on acquiring the Enterprise license.
do I need to run the agent with this pipeline?
Regarding pipelines, did you happen to play with this example? - None
The idea is that each step in the pipeline including the pipeline controller are tasks in the system. So you have to choose separate queues for steps and also the controller. The controller by default maps the 'services' queue, but you can control also that.
The controller simply runs the logic of the pipeline and requires minimal resources. All the heavy computation happens on the nodes/machines running the steps
I am running a docker-compose based on the one in the server sample:
version: "3.9"
services:
apiserver:
command:
- apiserver
container_name: clearml-apiserver
image: allegroai/clearml # Match your SDK version
restart: unless-stopped
volumes:
- clearml_logs:/var/log/clearml
- clearml_config:/opt/clearml/config
- clearml_data_fileserver:/mnt/fileserver
depends_on:
- redis
- mongo
- elasticsearch
- fileserver
environment:
CLEARML_API_HOST:
# Internal reference
CLEARML_WEB_HOST:
# Internal reference
CLEARML_FILES_HOST:
# Internal reference
CLEARML_ELASTIC_SERVICE_HOST: elasticsearch
CLEARML_ELASTIC_SERVICE_PORT: 9200
CLEARML_MONGODB_SERVICE_HOST: mongo
CLEARML_MONGODB_SERVICE_PORT: 27017
CLEARML_REDIS_SERVICE_HOST: redis
CLEARML_REDIS_SERVICE_PORT: 6379
CLEARML_SERVER_DEPLOYMENT_TYPE: linux
CLEARML__apiserver__pre_populate__enabled: "true"
CLEARML__apiserver__pre_populate__zip_files: "/opt/clearml/db-pre-populate"
CLEARML__apiserver__pre_populate__artifacts_path: "/mnt/fileserver"
CLEARML__services__async_urls_delete__enabled: "true"
CLEARML__services__async_urls_delete__fileserver__url_prefixes: "[${CLEARML_FILES_HOST:-}]"
CLEARML__secure__credentials__services_agent__user_key: ${CLEARML_AGENT_ACCESS_KEY:-}
CLEARML__secure__credentials__services_agent__user_secret: ${CLEARML_AGENT_SECRET_KEY:-}
ports:
- "9008:8008"
networks:
- backend
- frontend
elasticsearch:
networks:
- backend
container_name: clearml-elastic
environment:
- bootstrap.memory_lock=true
- cluster.name=clearml
- cluster.routing.allocation.node_initial_primaries_recoveries=500
- cluster.routing.allocation.disk.watermark.low=500mb
- cluster.routing.allocation.disk.watermark.high=500mb
- cluster.routing.allocation.disk.watermark.flood_stage=500mb
- ES_JAVA_OPTS=-Xms4g -Xmx4g
- discovery.type=single-node
- http.compression_level=7
- node.name=clearml
- reindex.remote.whitelist="'*.*'"
- xpack.security.enabled=false
ulimits:
memlock:
soft: -1
hard: -1
nofile:
soft: 65536
hard: 65536
image: elasticsearch:8.17.0
restart: unless-stopped
volumes:
- clearml_data_elastic:/usr/share/elasticsearch/data
- clearml_logs:/usr/share/elasticsearch/logs
fileserver:
networks:
- backend
- frontend
command:
- fileserver
container_name: clearml-fileserver
image: allegroai/clearml
environment:
CLEARML_API_HOST:
# Internal reference
CLEARML_WEB_HOST:
# Internal reference
CLEARML_FILES_HOST:
# Internal reference
CLEARML__fileserver__delete__allow_batch: "true"
restart: unless-stopped
volumes:
- clearml_logs:/var/log/clearml
- clearml_data_fileserver:/mnt/fileserver
- clearml_config:/opt/clearml/config
ports:
- "9081:8081"
mongo:
networks:
- backend
container_name: clearml-mongo
image: mongo:6.0.19
restart: unless-stopped
command: --setParameter internalQueryMaxBlockingSortMemoryUsageBytes=196100200
volumes:
- clearml_data_mongo_db:/data/db
- clearml_data_mongo_configdb:/data/configdb
redis:
networks:
- backend
container_name: clearml-redis
image: redis:7.4.1
restart: unless-stopped
volumes:
- clearml_data_redis:/data
webserver:
command:
- webserver
container_name: clearml-webserver
image: allegroai/clearml
restart: unless-stopped
depends_on:
- apiserver
environment:
CLEARML_API_HOST:
# Internal reference
CLEARML_WEB_HOST:
# Internal reference
CLEARML_FILES_HOST:
# Internal reference
ports:
- "9080:8080"
networks:
- backend
- frontend
async_delete:
depends_on:
- apiserver
- redis
- mongo
- elasticsearch
- fileserver
container_name: async_delete
image: allegroai/clearml
networks:
- backend
restart: unless-stopped
environment:
CLEARML_API_HOST:
# Internal reference
CLEARML_WEB_HOST:
# Internal reference
CLEARML_FILES_HOST:
# Internal reference
CLEARML_ELASTIC_SERVICE_HOST: elasticsearch
CLEARML_ELASTIC_SERVICE_PORT: 9200
CLEARML_MONGODB_SERVICE_HOST: mongo
CLEARML_MONGODB_SERVICE_PORT: 27017
CLEARML_REDIS_SERVICE_HOST: redis
CLEARML_REDIS_SERVICE_PORT: 6379
PYTHONPATH: /opt/clearml/apiserver
CLEARML__services__async_urls_delete__fileserver__url_prefixes: "[${CLEARML_FILES_HOST:-}]"
entrypoint:
- python3
- -m
- jobs.async_urls_delete
- --fileserver-host
-
# Internal reference
volumes:
- clearml_logs:/var/log/clearml
- clearml_config:/opt/clearml/config
agent-services:
networks:
- backend
container_name: clearml-agent-services
image: allegroai/clearml-agent-services
deploy:
restart_policy:
condition: on-failure
privileged: true
environment:
# CLEARML_HOST_IP: ${CLEARML_HOST_IP:-localhost}
CLEARML_WEB_HOST:
# Internal reference
CLEARML_API_HOST:
# Internal reference
CLEARML_FILES_HOST:
# Internal reference
CLEARML_API_ACCESS_KEY: ${CLEARML_AGENT_ACCESS_KEY:-$CLEARML_API_ACCESS_KEY}
CLEARML_API_SECRET_KEY: ${CLEARML_AGENT_SECRET_KEY:-$CLEARML_API_SECRET_KEY}
CLEARML_WORKER_NAME: "clearml-agent-services"
CLEARML_AGENT_QUEUES: "default" # Match pipeline queue
CLEARML_AGENT_DEFAULT_BASE_DOCKER: "python:3.12-slim" # Match your Python env
CLEARML_AGENT_GIT_USER: ${CLEARML_AGENT_GIT_USER:-} # Optional, for repo access
CLEARML_AGENT_GIT_PASS: ${CLEARML_AGENT_GIT_PASS:-} # Optional, for repo access
AWS_ACCESS_KEY_ID: ${AWS_ACCESS_KEY_ID:-}
AWS_SECRET_ACCESS_KEY: ${AWS_SECRET_ACCESS_KEY:-}
AWS_DEFAULT_REGION: ${AWS_DEFAULT_REGION:-}
SHUTDOWN_IF_NO_ACCESS_KEY: "1"
volumes:
- /var/run/docker.sock:/var/run/docker.sock
- clearml_agent:/root/.clearml
depends_on:
- apiserver
entrypoint: >
bash -c "curl --retry 10 --retry-delay 10 --retry-connrefused '
' && /usr/agent/entrypoint.sh"
networks:
backend:
driver: bridge
frontend:
driver: bridge
volumes:
clearml_logs:
clearml_config:
clearml_data_fileserver:
clearml_data_elastic:
clearml_data_mongo_db:
clearml_data_mongo_configdb:
clearml_data_redis:
clearml_agent:
my pipeline decorator is:
@PipelineDecorator.pipeline(
name="Run Video Encoder Pipeline",
project="Video Encoder Project",
version="0.1.0",
return_value="encoding_detail_report",
repo="git@github.com:LiveViewTech/video_encoding_service.git",
repo_branch="main",
default_queue="default",
)
I am setting up an initial task and invoking the pipeline function:
# Initialize ClearML task, connecting to local server
task = Task.init(
project_name="Video Encoder Project",
task_name="Run Video Encoder Pipeline",
)
task.set_repo(repo="git@github.com:LiveViewTech/video_encoding_service.git", branch="main")
PipelineDecorator.set_default_execution_queue("default")
pipe.start()
Here is the api section of my clearml.conf filer:
api {
# Notice: 'host' is the api server (default port 8008), not the web server.
api_server:
web_server:
file_server:
# Credentials are generated using the webapp,
# Override with os environment: CLEARML_API_ACCESS_KEY / CLEARML_API_SECRET_KEY
credentials {
"access_key": "215M9OS00JVS6UXJ6HO2MJ64IN0R6T",
"secret_key": "pjR3FatZ9Cda6t0hWIHU4dYKnrn8fkeB5774SziELaxpxCHwZKEupJEYrtlQug8w5VM"
}
}
From where did you get the 9008/9080/9081 ports? I don't see them in the docker compose anywhere