RattyBluewhale45

6 Questions, 91 Answers

Active since 13 August 2024

Last activity one year ago

Reputation

Badges 1

89 × Eureka!

Questions 6
Answers 91

0 Votes

5 Answers

796 Views

0 Votes 5 Answers 796 Views

Hello! I Get This Error When Running Multiple Jobs On The Same Dataset, Would Someone Be Able To Help Debug?:

Hello! I get this error when running multiple jobs on the same dataset, would someone be able to help debug?: FileNotFoundError: Image Not Found /clearml_age...

clearml

one year ago

0 Votes

23 Answers

999 Views

0 Votes 23 Answers 999 Views

Hello! My Workers Utilization Is Empty And Not Showing Any Graphs. Do You Know How I Can Troubleshoot This?

Hello! My Workers Utilization is empty and not showing any graphs. Do you know how I can troubleshoot this?

clearml

one year ago

0 Votes

8 Answers

1K Views

0 Votes 8 Answers 1K Views

Hello! I Am Having A Dependency Issue With Clearml. Would Someone Be Able To Help Me Understand How To Debug It/Replicate It?

Hello! I am having a dependency issue with clearml. Would someone be able to help me understand how to debug it/replicate it? from ultralytics import YOLO Im...

clearml

one year ago

0 Votes

13 Answers

847 Views

0 Votes 13 Answers 847 Views

Hello! Are You Able To Help Be Debug This Message?

Hello! Are you able to help be debug this message? RuntimeError: unable to write to file : No space left on device (28) 2024-09-09 14:29:50,124 - clearml.rep...

clearml

one year ago

0 Votes

41 Answers

112K Views

0 Votes 41 Answers 112K Views

Hey Guys! I'M Having Some Issues With Pytorch And Clearml. I Am Starting A New Task Using Task.Create And Setting Pytorch As A Requirement Under `Packages`. For Some Reason Pytorch With Cuda 12 Is Being Installed, But I Need Cuda 11. Do You Know How To Se

Hey guys! I'm having some issues with pytorch and clearml. I am starting a new task using task.create and setting pytorch as a requirement under packages. Fo...

pytorch

one year ago

0 Votes

29 Answers

1K Views

0 Votes 29 Answers 1K Views

Hello! I Have An Issue Reproducing My Runs. The Task.Create Completes Successfully. When I Clone And Enqueue A Completed Task The Clone Fails. It Fails During The Python Requirements Installation. Why Is This? Do You Know How I Can Debug? Thank You In Adv

Hello! I have an issue reproducing my runs. The task.Create completes successfully. When I clone and enqueue a completed task the clone fails. It fails durin...

clearml

one year ago

0 Hey Guys! I'M Having Some Issues With Pytorch And Clearml. I Am Starting A New Task Using Task.Create And Setting Pytorch As A Requirement Under `Packages`. For Some Reason Pytorch With Cuda 12 Is Being Installed, But I Need Cuda 11. Do You Know How To Se

Thank you I will try that

one year ago

0 Hello! Are You Able To Help Be Debug This Message?

On prem is not K8s

one year ago

It seems to find a cuda 11, then it installs cuda 12


Torch CUDA 111 index page found, adding `

`
PyTorch: Adding index `

` and installing `torch ==2.4.0.*`
Looking in indexes:


Collecting torch==2.4.0.*
  Using cached torch-2.4.0-cp310-cp310-manylinux1_x86_64.whl (797.2 MB)
2024-08-12 12:40:37
Collecting clearml
  Using cached clearml-1.16.3-py2.py3-none-any.whl (1.2 MB)
Collecting triton==3.0.0
  Using cached

...

one year ago

0 Hello! I Am Having A Dependency Issue With Clearml. Would Someone Be Able To Help Me Understand How To Debug It/Replicate It?

@<1717350332247314432:profile|WittySeal70> what's strange is I can import the package in the docker container when I run it outside of clearML

one year ago

0 Hello! I Have An Issue Reproducing My Runs. The Task.Create Completes Successfully. When I Clone And Enqueue A Completed Task The Clone Fails. It Fails During The Python Requirements Installation. Why Is This? Do You Know How I Can Debug? Thank You In Adv

Full log for the failed clone

one year ago

I can install the correct torch version with this command:
pip install --pre torchvision --force-reinstall --index-url ` None ```

one year ago

0 Hello! My Workers Utilization Is Empty And Not Showing Any Graphs. Do You Know How I Can Troubleshoot This?

[2024-08-13 16:56:36,447] [9] [INFO] [clearml.service_repo] Returned 200 for workers.get_activity_report in 342ms
[2024-08-13 16:56:36,462] [9] [INFO] [clearml.service_repo] Returned 200 for workers.get_activity_report in 261ms

one year ago

Hi @<1523701205467926528:profile|AgitatedDove14>
ClearML Agent 1.9.0

one year ago

0 Hello! My Workers Utilization Is Empty And Not Showing Any Graphs. Do You Know How I Can Troubleshoot This?

one year ago

0 Hello! My Workers Utilization Is Empty And Not Showing Any Graphs. Do You Know How I Can Troubleshoot This?

Yes, sure!

one year ago

It was pointing to a network drive before to avoid the local directory filling up

one year ago

The original run completes successfully, it's only the runs cloned from the GUI which fail

one year ago

Maybe it's related to this section?

WARNING:clearml_agent.helper.package.requirements:Local file not found [anaconda-anon-usage @ file:///croot/anaconda-anon-usage_1710965072196/work], references removed

one year ago

What I dont understand is how to tell clearml to install this version of pytorch and torchvision, with cu118

one year ago

0 Hello! I Get This Error When Running Multiple Jobs On The Same Dataset, Would Someone Be Able To Help Debug?:

Seems to work!

one year ago

I have set agent.package_manager.pip_version="" which resolved that message

one year ago

Hi @<1523701070390366208:profile|CostlyOstrich36> I am not specifying a version 🙂

one year ago

Thank you

one year ago

This has been resolved now! Thank you for your help @<1523701070390366208:profile|CostlyOstrich36>

one year ago

pip install --pre torchvision --force-reinstall --index-url None

one year ago

But the process is still hanging, and not proceeding to actually running the clearml task

one year ago

0 Hello! Are You Able To Help Be Debug This Message?

Although that's not ideal as it turns off CPU parallelisation

one year ago

Thanks @<1523701205467926528:profile|AgitatedDove14> , will take a look

one year ago

In a cloned run with new container ultralytics/ultralytics:latest I get this error:

clearml_agent: ERROR: Could not install task requirements!
Command '['/root/.clearml/venvs-builds/3.10/bin/python', '-m', 'pip', '--disable-pip-version-check', 'install', '-r', '/tmp/cached-reqs7171xfem.txt', '--extra-index-url', '

', '--extra-index-url', '

 returned non-zero exit status 1.

one year ago

is this what you had on the Original manual execution ?

Yes this installed packages list is what succeeded via manual submission to agent

one year ago

I can install on the server with this command

one year ago

docker="nvidia/cuda:11.8.0-base-ubuntu20.04"

one year ago

ERROR: This container was built for NVIDIA Driver Release 530.30 or later, but
       version 460.32.03 was detected and compatibility mode is UNAVAILABLE.

       [[System has unsupported display driver / cuda driver combination (CUDA_ERROR_SYSTEM_DRIVER_MISMATCH) cuInit()=803]]

one year ago

@<1523701070390366208:profile|CostlyOstrich36> I'm now running the agent with --docker , and I'm using task.create(docker="nvidia/cuda:11.0.3-cudnn8-runtime-ubuntu20.04")

one year ago

DEBUG   Installing build dependencies ... [?25l- \ | / - done
[?25h  Getting requirements to build wheel ... [?25l- error
  [1;31merror[0m: [1msubprocess-exited-with-error[0m
  
  [31m×[0m [32mGetting requirements to build wheel[0m did not run successfully.
  [31m│[0m exit code: [1;36m1[0m
  [31m╰─>[0m [31m[21 lines of output][0m
  [31m   [0m Traceback (most recent call last):
  [31m   [0m   File "/root/.clearml/venvs-builds/3.8/lib/python3.8/site-packages/pip/_vendor/pyproject_hooks/_i...

one year ago

Show more results