the CLEARML_*
variables are all explained here: None
here's my script:
#!/bin/bash
echo "******************** Starting Agent ********************"
echo "******************** Getting ENV Variables ********************"
source /etc/profile.d/env-vars.sh
# test that we can access the API
echo "******************** Waiting for ${CLEARML_API_HOST} connectivity ********************"
curl --retry 10 --retry-delay 10 --retry-connrefused ${CLEARML_API_HOST}/debug.ping
# start the agent
for i in $(seq 1 ${NUM_WORKERS})
do
export CLEARML_WORKER_ID="${AGENT_NAME}:${i}"
if [[ "$QUEUE" == "services" ]]; then
echo "******************** Launching Services Worker ${i} ********************"
echo "Worker ID: ${CLEARML_WORKER_ID}"
python3 -m clearml_agent daemon \
--services-mode \
--queue $CLEARML_QUEUE \
--create-queue \
--docker \
--cpu-only \
&
else
echo "******************** Launching Worker ${i} in ${QUEUE} queue ********************"
echo "Worker ID: ${CLEARML_WORKER_ID}"
python3 -m clearml_agent daemon \
--queue $CLEARML_QUEUE \
--create-queue \
--docker \
--cpu-only \
&
fi
done
the key point is you just loop through the number of workers, set a unique CLEARML_WORKER_ID for each, and then run it in the background
can you give more details what's exactly happening here? what are the env variables and other stuff?
and $QUEUE and $NUM_WORKERS are particular to my setup, but they just give the name of the queue and how many copies of the agent to run
You don't even need to set the CLEARML_WORKER_ID, it will automatically assign one based on the machine's name