ReassuredTiger98

95 Questions, 639 Answers

Active since 10 January 2023

Last activity 8 months ago

Reputation

Badges 1

606 × Eureka!

Answers 639

0 Hello! Since Today I Get

But here is the funny thing:

channels:
- pytorch
- conda-forge
- defaults
dependencies:
- cudatoolkit=11.1.1
- pytorch=1.8.0

Installs GPU

3 years ago

0 Hello! Since Today I Get

Is ther a way to see the contents of /tmp/conda_envaz1ne897.yml ? Seems to be deleted after the task is finihsed

3 years ago

0 Hello! Since Today I Get

Sure, but I will try it tomorrow then.

3 years ago

0 Hello! Since Today I Get

And then?

3 years ago

0 Hello! Since Today I Get

Perfect, will try it. fyi: The conda_channels that I used are from clearml-agent init

3 years ago

0 Hello! Since Today I Get

conda 4.9.2

3 years ago

0 Can Someone Confirm That

Is this working in the latest version? clearml-agent falls back to /usr/bin/python3.8 no matter how I configure clearml.conf Just want to make sure, so I can investigate what's wrong with my machine if it is working for you.

3 years ago

0 Can Someone Confirm That

Thank you very much. I tested it on a different machine now and it works like intended. So there must be something misconfigured with this one machine.

3 years ago

0 I Have A Problem That Might Not Directly Be Clearml Related, But Maybe Someone Here Has An Idea: I Run A Clearml-Server On A Machine With 128Gb Ram, 32 Cores And 2 Gpus. On The Same Machine I Run 2 Clearml-Agent Each With Access To 1 Gpu, 12 Cores, An 48G

CostlyOstrich36 Actually no container exits, so I guess if it s because of OOM like SuccessfulKoala55 implies, than maybe a process inside the container gets killed and the container will hang? Is this possible?
SuccessfulKoala55 I did not observe elastic to use much RAM (at least right after starting). Doesn't this line in the docker-compose control the RAM usage?
ES_JAVA_OPTS: -Xms2g -Xmx2g -Dlog4j2.formatMsgNoLookups=true

2 years ago

0 Hello! Since Today I Get

For now I can tell you that with conda_freeze: true it fails, but with conda_freeze: false it works!

3 years ago

0 Hello! Since Today I Get

Do you know how I can make sure I do not have CUDA or a broken installation installed?

3 years ago

0 Hello! Since Today I Get

I mean the version which it bases the PyTorch installation on.

3 years ago

0 Hello! Since Today I Get

Also tried conda version 4.7.12. Same problem.

3 years ago

0 Hello! Since Today I Get

Type "help", "copyright", "credits" or "license" for more information.
>>> from clearml_agent.helper.gpu.gpustat import get_driver_cuda_version
>>> get_driver_cuda_version()
'110'

3 years ago

0 Hello! Since Today I Get

I do not have a global cuda install on this machine. Everything except for the driver is installed via conda.

3 years ago

0 Hello! Since Today I Get

And this works fine.

3 years ago

0 Hello! Since Today I Get

What do you mean?

3 years ago

0 Hello! Since Today I Get

I tried to run the task with detect_with_conda_freeze: false instead of true and got

Executing Conda: /home/tim/miniconda3/condabin/conda install -p /home/tim/.clearml/venvs-builds/3.8 -c defaults -c conda-forge -c pytorch 'pip<20.2' --quiet --json
Pass
Conda: Trying to install requirements:
['pytorch~=1.8.0']
Executing Conda: /home/tim/miniconda3/condabin/conda env update -p /home/tim/.clearml/venvs-builds/3.8 --file /tmp/conda_envh7rq4qmc.yml --quiet --json
Conda error: Unsati...

3 years ago

0 Hello! Since Today I Get

ca-certificates           2021.1.19            h06a4308_1  
certifi                   2020.12.5        py38h06a4308_0  
cudatoolkit               11.0.221             h6bb024c_0  
ld_impl_linux-64          2.33.1               h53a641e_7  
libedit                   3.1.20191231         h14c3975_1  
libffi                    3.3                  he6710b0_2  
libgcc-ng                 9.1.0                hdf63c60_0  
libstdcxx-ng              9.1.0                hdf63c60_0  
ncurses    ...

3 years ago

0 Hello! Since Today I Get

I get 110 but it should be 111

3 years ago

0 Another Question: Is It Possible To Specify In Which Directory To Save All The Files That Clearml-Agent Creates (E.G. Cache Files Or Results Of The Currently Running Experiments)

Perfect, thank you very much.

3 years ago

0 I Have A Self-Hosted Clearm-Server And And Clearml-Agent Started With

However, I cloned the experiment again via the web UI. Then I enqueued it.

3 years ago

0 I Have A Self-Hosted Clearm-Server And And Clearml-Agent Started With

Sure, let me try.

3 years ago

0 When An Environment Variable Is Tracked Via

TRAINS_LOG_ENVIRONMENT=MYENVVAR works. Thank you!

3 years ago

0 Hi Everyone, I Tried To Implement Ssl Support With Nginx And Everything Seems To Work So Far, But I Get "The Following Artifacts Could Not Be Deleted". How Can I Debug This? I Do Not See Any Error In The Logs. I Can Safe Artifacts And Retrieve Them (Howev

Hey, thank you for answering.
I know this issue and I have it sometimes, but my current issue is a direct result of me trying to make SSL work. So I am not asking for help in solving my problem, but only for help how to debug. Finding out which step leads to the artifact not being deleted (e.g. the fileserver cannot be reached by from wherever the delete request is send)

2 years ago

0 Is It Possible To Ask An Agent To Use A Specified Existing Python Environment Instead Of Building One From Scratch?

At least when you use docker containers the agent will reuse the existing python environment.

3 years ago

0 Currently, To Provide Ssh Access To The Docker Images For A Task,

I just checked and my user is part of the docker group.

3 years ago

0 Hi Everyone, Is It Possible To Not Create A Copy Of A Dataset When Adding To Clearml? My Data Is Already In A Directory On The Clearml-Server Machine And I Do Not Want To Copy It, Just Add It To Clearml As Dataset.

Sounds like a good hack, but not like a good solution 😄 But thank you anyways! 🙂

2 years ago

0 How Many People Are Actually Working At Allegroai/On Clearml?

No reason in particular. How many people work at http://allegro.ai ?