Badges 121 × Eureka!
Yes, I can verify the location, I'm uploading the file and then running:
! cp clearml.conf ~/clearml.conf
Sure. Just a sec.
I also tried setting it with the library manually using
Task.set_credentials(web_host=web_server, api_host=api_server,files_host=files_server, key=access_key, secret=secret_key )
1.1.5 both locally and on Google Colab
AgitatedDove14 With the GPU job management, do you know if you can limit usage per user. That is, could I limit a certain user to only using at most 2 GPUs at a time or only particular GPUs, as something similar to that?
Ahhhh... I see.
Yes, sorry I forgot to follow up on this other thing. Thanks for the quick response. If you are curious what happened was:
Previously we were manually mounting the poetry stuff via an additional docker flag in the config file. Then we updated the config file to select poetry Then there were conflicting docker volume mounts.
(of course now we can't log in. I'm assuming the two things aren't related)
This is great AgitatedDove14 . Thanks for the info on job queuing. That is one of the main things we are trying to enable. Do you work at Allegro, or is there someone else here I could talk with about Enterprise? I’m interested in the user management and permissions side of things along with the Data layer. Depending of course on pricing (because we are a non-profit).
I would ideally just want to have NVIDIA drivers and Docker on the on-prem nodes (along with the clearML agents). Would that allow me to get by with basic job scheduling/queues through clearML?
GrumpyPenguin23 and is priority queues something existing in clearML or would that require some external queuing solution like SLURM?
Thanks for the info. I will reach out to the team via email to see about an upgrade.
Ah, yeah, I don't see that. So we likely don't have that option enabled.
Maybe our plan doesn't include the vault?
That sounds great. We have the paid version. Where would find this?
Yeah, I don’t necessary want a traditional queuing system like in HPC clusters. I just want functional GPU management for users. As long as there is a job queue that works well, I can communicate the rest to the team for now.
I really want to avoid HPC craziness and keep things as simple as we can.
TimelyPenguin76 I ran the agent with Docker as
$clearml-agent daemon --detached --gpus 0 --queue idx_gandalf_titan-rtx --docker nvidia/cuda:11.2.2-cudnn8-runtime-ubuntu18.04However, that
nvidia image is Python 3.6, so when I run anything it's falling back to python 3.6. Thus, I thought maybe all I would need to do is clone the job and change the Docker image to an image with Python 3.8 and the right Pytorch, etc. Is my thinking right? I'm finding the right image to use now.
I got it running, and now I know some more tricks. Which is good. Part of it was me facing this really weird PyTorch + setuptools issue: https://github.com/pytorch/pytorch/issues/69894