ExcitedFish86

8 Questions, 55 Answers

Active since 10 January 2023

Last activity one year ago

Reputation

Badges 1

43 × Eureka!

Questions 8
Answers 55

0 Votes

13 Answers

1K Views

0 Votes 13 Answers 1K Views

Hi Folks! I'M Using

Hi folks! I'm using SummaryWriter from PyTorch's tensorboard utils to log pr_curve , and I get the attached curve. Looks like the X axis is reversed, and I c...

tensorboard

3 years ago

0 Votes

30 Answers

948 Views

0 Votes 30 Answers 948 Views

Hi Guys! Is There A Way To Tell An Agent To Run A Task In An Existing Venv (Without Creating A New One)?

Hi guys! Is there a way to tell an agent to run a task in an existing venv (without creating a new one)?

mlops

2 years ago

0 Votes

11 Answers

1K Views

0 Votes 11 Answers 1K Views

Hi All, I Have A Question Regarding Multi-Node Training Using The Clearml-Agent. What Is The Recommended Setup In This Case? Say I Have 3 Nodes With 3 Agents Running On Them. How Do I Make Sure They All Run The Same Job?

Hi all, I have a question regarding multi-node training using the clearml-agent. What is the recommended setup in this case? Say I have 3 nodes with 3 agents...

clearml

3 years ago

0 Votes

3 Answers

1K Views

0 Votes 3 Answers 1K Views

Hi All, Is There A Way To Filter A Experiments In A Hyperparameter Sweep Based On A Given Range Of A Parameter/Metric In The Ui (Similar To

Hi all, Is there a way to filter a experiments in a hyperparameter sweep based on a given range of a parameter/metric in the UI (similar to wandb )? Also, is...

clearml

3 years ago

0 Votes

18 Answers

1K Views

0 Votes 18 Answers 1K Views

Hi All, I'M Using Clearml 1.0.3 With Clearml-Server <1 (How Do I Get The Current Running Version?) In Pytorch-Lightning I Use Ddp And I See Multiple Tasks (As The Number Of Gpus) Being Created And Remaining In Draft Mode. Is It A Problem Running Clearml

Hi all, I'm using clearml 1.0.3 with clearml-server <1 (how do I get the current running version?) In Pytorch-Lightning I use DDP and I see multiple tasks (a...

clearml

3 years ago

0 Votes

9 Answers

1K Views

0 Votes 9 Answers 1K Views

Hi All, I'M Trying To Upgrade

Hi all, I'm trying to upgrade clearml-server but I keep getting permission errors from the elastic search container: clearml-elastic | ElasticsearchException...

clearml

3 years ago

0 Votes

2 Answers

945 Views

0 Votes 2 Answers 945 Views

Hi Guys, Just Wanted To Let You Know That Many Links In The Clearml Github Page Are Broken (I.E.,

Hi guys, just wanted to let you know that many links in the ClearML github page are broken (i.e., https://github.com/allegroai/clearml/blob/master )

clearml

3 years ago

0 Votes

3 Answers

966 Views

0 Votes 3 Answers 966 Views

Hi All, I See There Is An Option For Running A Bash Script / Commands Inside A Container Started By An Agent. Is It Possible To Have This Set Differently Per

Hi all, I see there is an option for running a bash script / commands inside a container started by an agent. Is it possible to have this set differently per...

clearml

2 years ago

0 Hi Guys! Is There A Way To Tell An Agent To Run A Task In An Existing Venv (Without Creating A New One)?

Of course conda needs to be installed, it is using a pre-existing conda env, no?! what am I missing

its not a conda env, just a regular venv (poetry in this specific case)

And the assumption is the code is also there ?

yes. The user is responsible for the entire setup. the agent just executes python <path to script> <current hpo args>

2 years ago

0 Hi Guys! Is There A Way To Tell An Agent To Run A Task In An Existing Venv (Without Creating A New One)?

Regardless, it would be very convenient to add a flag to the agent which point it to an existing virtual environment and bypassing the entire setup process. This would facilitate ramping up new users to clearml who don't want the bells and whistles and would just a simple HPO from an existing env (which may not even exist as part of a git repo)

2 years ago

0 Hi Guys! Is There A Way To Tell An Agent To Run A Task In An Existing Venv (Without Creating A New One)?

Can you elaborate on what you would do with it? Like an OS environment disable the entire setup itself ? will it clone the code base ?

It will not do any setup steps. Ideally it would just pull an experiment from a dedicated HPO queue and run it inplace

2 years ago

0 Hi All, I Have A Question Regarding Multi-Node Training Using The Clearml-Agent. What Is The Recommended Setup In This Case? Say I Have 3 Nodes With 3 Agents Running On Them. How Do I Make Sure They All Run The Same Job?

Lets start with a simple setup. Multi-node DDP in pytorch

3 years ago

I see what you mean. So in a simple "all-or-nothing" solution I have to choose between potentially starving either the single node tasks (high priority + wait) or multi-node tasks (wait for a time when there are enough available agents and only then allocate the resource).

I actually meant NCCL. nvcc is the CUDA compiler 😅
NCCL communication can be both inter- and intra- node

3 years ago

0 Hi Folks! I'M Using

you rock!!! thank!

3 years ago

0 Hi Folks! I'M Using

I'm not working with tensorflow. I'm using SummaryWriter from torch.utils.tensorboard . Specifically add_pr_curve :
https://pytorch.org/docs/stable/tensorboard.html#torch.utils.tensorboard.writer.SummaryWriter.add_pr_curve

3 years ago

0 Hi Folks! I'M Using

AgitatedDove14 I'm not sure this is fixed... using the latest RC

3 years ago

0 Hi Folks! I'M Using

what do you get when you hover over the plot?

3 years ago

0 Hi Folks! I'M Using

thanks!

3 years ago

0 Hi, In One Of My Agents With Cuda Version: 11.1 (From Nvidia-Smi), Clearml Agent 0.17.1 Detects Version 100 (I Can See From Experiments Logs:

this is the cuda driver api. you need libcudart.so

3 years ago

0 Hi, In One Of My Agents With Cuda Version: 11.1 (From Nvidia-Smi), Clearml Agent 0.17.1 Detects Version 100 (I Can See From Experiments Logs:

note that the cuda driver was only recently added to nvidia-smi

3 years ago

0 Hi, In One Of My Agents With Cuda Version: 11.1 (From Nvidia-Smi), Clearml Agent 0.17.1 Detects Version 100 (I Can See From Experiments Logs:

can you initialize a tensor on the GPU?

3 years ago

0 Hi, In One Of My Agents With Cuda Version: 11.1 (From Nvidia-Smi), Clearml Agent 0.17.1 Detects Version 100 (I Can See From Experiments Logs:

libcudart

3 years ago

0 Hi, In One Of My Agents With Cuda Version: 11.1 (From Nvidia-Smi), Clearml Agent 0.17.1 Detects Version 100 (I Can See From Experiments Logs:

cudnn isn't cuda, it's a separate library.
are you running on docker on bare metal? you should have cuda installed at /usr/local/cuda-<>

3 years ago

0 Hi, In One Of My Agents With Cuda Version: 11.1 (From Nvidia-Smi), Clearml Agent 0.17.1 Detects Version 100 (I Can See From Experiments Logs:

JitteryCoyote63 I still don't understand what is the actual CUDA version you are using on your machine

3 years ago

0 Hi, In One Of My Agents With Cuda Version: 11.1 (From Nvidia-Smi), Clearml Agent 0.17.1 Detects Version 100 (I Can See From Experiments Logs:

the conda sets up cuda I think

3 years ago

0 Hi, In One Of My Agents With Cuda Version: 11.1 (From Nvidia-Smi), Clearml Agent 0.17.1 Detects Version 100 (I Can See From Experiments Logs:

just to be clear, multiple CUDA runtime version can coexist on a single machine, and the only thing that points to which one you are using when running an application are the library search paths (which can be set either with LD_LIBRARY_PATH , or, preferably, by creating a file under /etc/ld.so.conf.d/ which contains the path to your cuda directory and executing ldconfig )

3 years ago

0 Hi All, I'M Trying To Upgrade

just docker-compose up with the latest compose file from the repo

3 years ago

0 Hi, In One Of My Agents With Cuda Version: 11.1 (From Nvidia-Smi), Clearml Agent 0.17.1 Detects Version 100 (I Can See From Experiments Logs:

try:
sudo updatedb locate libcudart

3 years ago

0 Hi Guys! Is There A Way To Tell An Agent To Run A Task In An Existing Venv (Without Creating A New One)?

the hack doesn't work if conda is not installed 😞

2 years ago

0 Hi All, I'M Using Clearml 1.0.3 With Clearml-Server <1 (How Do I Get The Current Running Version?) In Pytorch-Lightning I Use Ddp And I See Multiple Tasks (As The Number Of Gpus) Being Created And Remaining In Draft Mode. Is It A Problem Running Clearml

I think so. IMHO all API calls should maybe reside in a different module since they usually happen inside some control code

3 years ago

0 Hi Guys! Is There A Way To Tell An Agent To Run A Task In An Existing Venv (Without Creating A New One)?

that was my next question 🙂
How does this design work with a stateful search algorithm?

2 years ago

0 Hi, In One Of My Agents With Cuda Version: 11.1 (From Nvidia-Smi), Clearml Agent 0.17.1 Detects Version 100 (I Can See From Experiments Logs:

so you dont have cuda installed 🙂

3 years ago

0 Hi Guys! Is There A Way To Tell An Agent To Run A Task In An Existing Venv (Without Creating A New One)?

You mean running everything on a single machine (manually)?

Yes, but not limited to this.
I want to be able to install the venv in multiple servers and start the "simple" agents in each one on them. You can think of it as some kind of one-off agent for a specific (distributed) hyperparameter search task

2 years ago

Show more results