EnchantingPenguin77

13 Questions, 46 Answers

Active since 03 August 2023

Last activity 7 months ago

Reputation

Badges 1

46 × Eureka!

Questions 13
Answers 46

0 Votes

3 Answers

2K Views

0 Votes 3 Answers 2K Views

Hi, I Am Trying To Log The Hydra Configuration Using Clearml-Task, And I Am Following The Demo Script:

Hi, I am trying to log the hydra configuration using clearml-task, and I am following the demo script: None . It's able to log the default hydra configs, but...

clearml

2 years ago

0 Votes

6 Answers

2K Views

0 Votes 6 Answers 2K Views

Hi, I Am Trying To Save My Trained Model Weights In S3 Bucket Instead Of Using Clearml Storage When Using Clearml-Task For Ml Training Remotely. I Tried To Use --Skip-Task-Init In Clearml-Task And Set Task.Init In My Scripts, But It Doesn'T Seem To Work.

Hi, I am trying to save my trained model weights in S3 bucket instead of using ClearML storage when using clearml-task for ml training remotely. I tried to u...

clearml

2 years ago

0 Votes

3 Answers

1K Views

0 Votes 3 Answers 1K Views

Hi, Is

Hi, Is clearml-task --docker able to take docker image > 10GB? I got some issue when using clearml-task --docker with AWS autoscaler. The error shows no spac...

clearml

one year ago

0 Votes

3 Answers

2K Views

0 Votes 3 Answers 2K Views

Hi, I am trying to save my trained model weights in S3 bucket instead of using ClearML storage when using clearml-task for ml training remotely. I tried to u...

clearml

2 years ago

0 Votes

1 Answers

658 Views

0 Votes 1 Answers 658 Views

Hi, I Was Using Pipeline Controller To Run Pipeline Tasks With 2 Steps, The 1St Step Is Supposed To Create 51 Task And The 2Nd Task Will Compute Result Based On The 51St Tasks Output In The 1St Step. But I Only See 34 Tasks In Step 1, And Got Error Of The

Hi, I was using pipeline controller to run pipeline tasks with 2 steps, the 1st step is supposed to create 51 task and the 2nd task will compute result based...

clearml

6 months ago

0 Votes

10 Answers

800 Views

0 Votes 10 Answers 800 Views

Hi, I'M Using Aws Ec2 Instance To Trian My Models With Clearml Autoscaler, But It Says Cuda Device Is Not Avaliable. The Code Runs Well On My Local Pc And It Runs Well On Clearml With Ec2 Yesterday, But It Suddenly Doesn'T Work Today. Is There Anyway To S

Hi, I'm using AWS EC2 instance to trian my models with ClearML autoscaler, but it says CUDA device is not avaliable. The code runs well on my local PC and it...

clearml

7 months ago

0 Votes

1 Answers

1K Views

0 Votes 1 Answers 1K Views

Hi, Is There Anything Changed On Clearml? I Saw The Web Ui Was Updated, And After Then, I Am Experiencing Package Not Found Issue Shown In The Log. I Have The Exactly Same Docker Image And Aws Ami Setting As Before, And Using The Same Previous Git Branch

Hi, is there anything changed on clearml? I saw the web UI was updated, and after then, I am experiencing package not found issue shown in the log. I have th...

aws

7 months ago

0 Votes

1 Answers

2K Views

0 Votes 1 Answers 2K Views

Hi, I Am Using Aws Autoscalaer To Train Model. I Have A Fair Large Dataset(400G) And The Data Is Private So I Can'T Really Store It In Clearml Dataset. Everytime When I Launch A Job, It'S Going To Take Very Long Time To Download The Data From S3. Is There

Hi, I am using AWS autoscalaer to train model. I have a fair large dataset(400G) and the data is private so I can't really store it in ClearML dataset. Every...

clearml

one year ago

0 Votes

3 Answers

1K Views

0 Votes 3 Answers 1K Views

Hi I Got Issue When I Am Trying To Mount Ebs To Aws Ec2 Instance When Running Clearml Pipeline, I'Ve Checked The Dev/Sdb Is Dev/Nvme1N1 In Ec2 And I Was Using

Hi I got issue when I am trying to mount EBS to AWS EC2 instance when running ClearML pipeline, I've checked the dev/sdb is dev/nvme1n1 in ec2 and I was usin...

mlops

7 months ago

0 Votes

4 Answers

1K Views

0 Votes 4 Answers 1K Views

Hi, I Got Some Issue When Using

Hi, I got some issue when using clearml-task --docker with AWS autoscaler. The error shows no space left on device , and my docker image is 12GB. I've tested...

clearml

one year ago

0 Votes

4 Answers

1K Views

0 Votes 4 Answers 1K Views

Hi, Is There A Way To Have The Docker Extra Arguments Takes

Hi, is there a way to have the docker extra arguments takes dockerd command instead of docker run ? I tried dockerd --storage-opt dm.basesize=20G in docker e...

clearml

one year ago

0 Votes

2 Answers

1K Views

0 Votes 2 Answers 1K Views

Hello Team, I Got An Issue Of

Hello Team, I got an issue of dataloader's workers are out of shared memory using AWS Autoscaler even though I've raised the shared memory to be 64gb in dock...

clearml

one year ago

0 Votes

38 Answers

160K Views

0 Votes 38 Answers 160K Views

Hi All, I Was Trying To Use Clearml-Task To Run A Custom Docker(With Poetry To Install All The Python Dependencies And Activated The Environment) Using Clearml Gpu, But It Seems Like Clearml Always Create A Virtual Environment And Run The Python Script Fr

Hi all, I was trying to use clearml-task to run a custom docker(with poetry to install all the python dependencies and activated the environment) using clear...

clearml

2 years ago

0 Hi All, I Was Trying To Use Clearml-Task To Run A Custom Docker(With Poetry To Install All The Python Dependencies And Activated The Environment) Using Clearml Gpu, But It Seems Like Clearml Always Create A Virtual Environment And Run The Python Script Fr

@<1523701205467926528:profile|AgitatedDove14> I'm trying to run Clearml GPU compute(RTX 3080) with pytorch-lightning but keep getting CUDA error. Is there any specific CUDA/Ubuntu/torch/python version required? I tried several different version but can't make it work

FROM nvidia/cuda:11.8.0-cudnn8-devel-ubuntu22.04 as telos_algorithms

  File "/code/.venv/lib/python3.10/site-packages/lightning/pytorch/trainer/trainer.py", line 1013, in _run_stage
    with isolate_rng():
  Fi...

2 years ago

0 Hi, I'M Using Aws Ec2 Instance To Trian My Models With Clearml Autoscaler, But It Says Cuda Device Is Not Avaliable. The Code Runs Well On My Local Pc And It Runs Well On Clearml With Ec2 Yesterday, But It Suddenly Doesn'T Work Today. Is There Anyway To S

@<1523701070390366208:profile|CostlyOstrich36> sorry wrong log uploaded, here is the error:
RuntimeError: Attempting to deserialize object on a CUDA device but torch.cuda.is_available() is False. If you are running on a CPU-only machine, please use torch.load with map_location=torch.device('cpu') to map your storages to the CPU.

7 months ago

Hi @<1523701070390366208:profile|CostlyOstrich36> , here is the configuration. The GPU could be found sometimes when I clone the previous successful run, but the GPU was found randomly. Also I am unable to run multiple task at the same time even with cloning the previous run

7 months ago

screenshot of AWS Autoscaler setup, cpu mode is NOT enabled

7 months ago

Hi @<1523701070390366208:profile|CostlyOstrich36> Any idea why this happen?

7 months ago

it has been pending whole day yesterday, but today it's able to run the task

2 years ago

Thanks @<1523701205467926528:profile|AgitatedDove14> . I just got an issue running clearml-task remotely, it has been working fine before today, but now every time I run clearml-task, it shows pending, and I've been waiting for 3 hours the status is still pending. The autoscalers was charging the hourly rate even though the task is still pending for 3 hours. From the console log of Clearml GPU instance, I saw it is listening to the queue, but there is no log even after 3 hours. There is not...

2 years ago

I am using hydra in main.py

2 years ago

I got the same cuda issue after being able to use GPU

2 years ago

It seems like CPU is working on something, I saw the usage is spiking periodically but I didn't run any task this morning

2 years ago

Here it is @<1523701205467926528:profile|AgitatedDove14>

2 years ago

The queue will be empty when I run task

2 years ago

Actually never mind, it's working now!

2 years ago

0 Hi I Got Issue When I Am Trying To Mount Ebs To Aws Ec2 Instance When Running Clearml Pipeline, I'Ve Checked The Dev/Sdb Is Dev/Nvme1N1 In Ec2 And I Was Using

Hi @<1808672071991955456:profile|CumbersomeCamel72> , the error instance is launched from ClearML AWS Autoscaler on the webpage. The sucessful mounted instance is launch manually from AWS web

7 months ago

0 Hi I Got Issue When I Am Trying To Mount Ebs To Aws Ec2 Instance When Running Clearml Pipeline, I'Ve Checked The Dev/Sdb Is Dev/Nvme1N1 In Ec2 And I Was Using

@<1808672071991955456:profile|CumbersomeCamel72> It can be mount without docker, but can't be mounted if I run a docker on the instance

7 months ago

@<1523701070390366208:profile|CostlyOstrich36> yes, in the end of the new file

7 months ago

And this issue happens randomly, I was able to run it again last night, but failed again this morning

7 months ago

0 Hi, Is

Hi @<1523701087100473344:profile|SuccessfulKoala55> , what preconfiguration is needed for the docker service to make? I've tried to run the docker pull manually in AWS EC2 with the same docker image without the space limit issue.

one year ago

@<1523701205467926528:profile|AgitatedDove14> Is there any reason why you mentioned that the "correct" way to work with python and containers is to actually install everything on the system (not venv)?

2 years ago

There is nothing on the queue and worker

2 years ago

I did use --args to clearml-task command for this run, but it looks like the docker didn't take it

2 years ago

@<1523701205467926528:profile|AgitatedDove14> Yes I cansee the worker:

2 years ago

I see, seems like the -args for scripts didn't passed to the docker:

--script fluoro_motion_detection/src/run/main.py \
--args experiment=example.yaml \

2 years ago

but it still not is able to run any task after I abort and rerun another task

2 years ago

I actually have aborted it

2 years ago

0 Hi, I Am Trying To Save My Trained Model Weights In S3 Bucket Instead Of Using Clearml Storage When Using Clearml-Task For Ml Training Remotely. I Tried To Use --Skip-Task-Init In Clearml-Task And Set Task.Init In My Scripts, But It Doesn'T Seem To Work.

@<1523701087100473344:profile|SuccessfulKoala55> Hi Jake, I tried to use --output-uri in clearml-task but got the same error clearml.storage - ERROR - Failed uploading: ' LazyEval Wrapper ' object cannot be interpreted as an integer

2 years ago

Hi @<1523701087100473344:profile|SuccessfulKoala55> I was able to solve this issue after upgrade clearml to 1.12.2, but my training/val loss become nan after the update

2 years ago

@<1523701087100473344:profile|SuccessfulKoala55> Hi Jake, I am using 1.12.0

2 years ago

Hi @<1523701070390366208:profile|CostlyOstrich36> , any suggestion for this error?

2 years ago

Show more results