Reputation
Badges 1
21 × Eureka!Maybe our plan doesn't include the vault?
Yes, I can verify the location, I'm uploading the file and then running:! cp clearml.conf ~/clearml.conf
I also tried setting it with the library manually usingTask.set_credentials(web_host=web_server, api_host=api_server,files_host=files_server, key=access_key, secret=secret_key )
Sure. Just a sec.
Thanks for the info. I will reach out to the team via email to see about an upgrade.
That sounds great. We have the paid version. Where would find this?
Ah, yeah, I don't see that. So we likely don't have that option enabled.
I got it running, and now I know some more tricks. Which is good. Part of it was me facing this really weird PyTorch + setuptools issue: https://github.com/pytorch/pytorch/issues/69894
This is great AgitatedDove14 . Thanks for the info on job queuing. That is one of the main things we are trying to enable. Do you work at Allegro, or is there someone else here I could talk with about Enterprise? I’m interested in the user management and permissions side of things along with the Data layer. Depending of course on pricing (because we are a non-profit).
GrumpyPenguin23 and is priority queues something existing in clearML or would that require some external queuing solution like SLURM?
Yes, sorry I forgot to follow up on this other thing. Thanks for the quick response. If you are curious what happened was:
Previously we were manually mounting the poetry stuff via an additional docker flag in the config file. Then we updated the config file to select poetry Then there were conflicting docker volume mounts.
(of course now we can't log in. I'm assuming the two things aren't related)
I would ideally just want to have NVIDIA drivers and Docker on the on-prem nodes (along with the clearML agents). Would that allow me to get by with basic job scheduling/queues through clearML?
AgitatedDove14 With the GPU job management, do you know if you can limit usage per user. That is, could I limit a certain user to only using at most 2 GPUs at a time or only particular GPUs, as something similar to that?
1.1.5 both locally and on Google Colab
I really want to avoid HPC craziness and keep things as simple as we can.
Yeah, I don’t necessary want a traditional queuing system like in HPC clusters. I just want functional GPU management for users. As long as there is a job queue that works well, I can communicate the rest to the team for now.
TimelyPenguin76 I ran the agent with Docker as$clearml-agent daemon --detached --gpus 0 --queue idx_gandalf_titan-rtx --docker nvidia/cuda:11.2.2-cudnn8-runtime-ubuntu18.04
However, that nvidia
image is Python 3.6, so when I run anything it's falling back to python 3.6. Thus, I thought maybe all I would need to do is clone the job and change the Docker image to an image with Python 3.8 and the right Pytorch, etc. Is my thinking right? I'm finding the right image to use now.