can you give more details what's exactly happening here? what are the env variables and other stuff?
and $QUEUE and $NUM_WORKERS are particular to my setup, but they just give the name of the queue and how many copies of the agent to run
here's my script:
#!/bin/bash
echo "******************** Starting Agent ********************"
echo "******************** Getting ENV Variables ********************"
source /etc/profile.d/env-vars.sh
# test that we can access the API
echo "******************** Waiting for ${CLEARML_API_HOST} connectivity ********************"
curl --retry 10 --retry-delay 10 --retry-connrefused ${CLEARML_API_HOST}/debug.ping
# start the agent
for i in $(seq 1 ${NUM_WORKERS})
do
export CLEARML_WORKER_ID="${AGENT_NAME}:${i}"
if [[ "$QUEUE" == "services" ]]; then
echo "******************** Launching Services Worker ${i} ********************"
echo "Worker ID: ${CLEARML_WORKER_ID}"
python3 -m clearml_agent daemon \
--services-mode \
--queue $CLEARML_QUEUE \
--create-queue \
--docker \
--cpu-only \
&
else
echo "******************** Launching Worker ${i} in ${QUEUE} queue ********************"
echo "Worker ID: ${CLEARML_WORKER_ID}"
python3 -m clearml_agent daemon \
--queue $CLEARML_QUEUE \
--create-queue \
--docker \
--cpu-only \
&
fi
done
the key point is you just loop through the number of workers, set a unique CLEARML_WORKER_ID for each, and then run it in the background
You don't even need to set the CLEARML_WORKER_ID, it will automatically assign one based on the machine's name
the CLEARML_*
variables are all explained here: None