Unanswered
Hi All! I Am Currently Using A Self-Hosted Clearml Server And Was Looking To Integrate The Clearml Agent To Make Better Usage Of Our Hpc Resources With Gpu Autoscaling.
I Am Aware That Clearml Already Supports Aws Autoscaler (In The Pro-Tier), But My Tea
Hi @<1600661428556009472:profile|HighCoyote66>
However, we need to allocate resources to ourselves manually, using an
srun
command or
sbatch
Long story short, there is a full SLURM integration, basically you push a job into the ClearML queue and it produces a slurm job that uses the agent to setup the venv/container and run your Task, but this is only part of the enterprise version 😞
You can however do the following (notice this is pseudo code, I probably have a typo in the srun command)
- Clone your Task in the UI
- Copy the new Task ID
srun clearml-agent execute --id <task-id-here>
This will use slurm to allocate the job and clearml-agent to actually set the environment automatically and run your code (with the ability to override arguments from the UI, like you would regularly). The missing part is of course the integration to the queue system and the automation (which unfortunately is not part of the open source)
183 Views
0
Answers
one year ago
one year ago