ManiacalLizard2

31 Questions, 239 Answers

Active since 05 June 2023

Last activity 28 days ago

Reputation

Badges 1

92 × Eureka!

Questions 31
Answers 239

0 Votes

6 Answers

743 Views

0 Votes 6 Answers 743 Views

Hi, We Have An Agent Running Inside A Nvidia Official Container. The Agent Seems To See The Gpu Driver But The Gpu Count Is 0 When I Join That Container,

Hi, We have an agent running inside a Nvidia official container. The agent seems to see the GPU driver but the GPU count is 0 When I join that container, nvi...

mlops

9 months ago

Show more results

0 Hi Everyone, I Am Running A Pipeline Using The Autoscaler, I Am Able To Spin Up The Vm Instance Using The Autoscaler And The Docker Is Also Getting Installed In There Perfectly. The Issue I Am Facing Is That During Executing A Pipeline Task While Cloning

how di you provide credentials to clearml and git ?

one year ago

0 Hi Everyone! I Discovered That Uploading Model Artifacts At Each Checkpoint To The Clearml Server Significantly Slows Down Training. So I Set

you should be able to explicitly upload a file of your choice as artefact using something like this: None

one year ago

0 Hi Just To Confirm, I Set My Default Output_Uri To S3, Output_Uri Will Upload Just Artifacs, Console Log, File Server Will Save Plot Images Like Matplotlib/Seaborn. Is It Correct?

@<1523701070390366208:profile|CostlyOstrich36> I would like to point to azure blob storage, what kind of url schema should I use ? And also, where do you configure the credential for the ClearML server to access to Azure blob as file_server ? I couldn't find any documentation around this topic 😞
TIA

one year ago

0 Hello, What Happens To Running Experiments When The Clearml Agent Is Killed (And Restarted A Few Minutes Later)?

If the agent is the one running the experiment, very likely that your task will be killed.
And when the agent come back, immediately or later, probably nothing will happen. It won't resume ...

2 months ago

0 Hi, We Have An Agent Running Inside A Nvidia Official Container. The Agent Seems To See The Gpu Driver But The Gpu Count Is 0 When I Join That Container,

@<1523701087100473344:profile|SuccessfulKoala55> it is set to "all" as :

NV_LIBCUBLAS_VERSION=12.2.5.6-1NVIDIA_VISIBLE_DEVICES=allCLRML_API_SERVER_URL=https://<redacted>HOSTNAME=1b6a5b546a6bNVIDIA_REQUIRE_CUDA=cuda>=12.2 brand=tesla,driver>=470,driver<471 brand=unknown,driver>=470,driver<471 brand=nvidia,driver>=470,driver<471 brand=nvidiartx,driver>=470,driver<471 brand=geforce,driver>=470,driver<471 brand=geforcertx,driver>=470,driver<471 brand=quadro,driver>=470,driver<471 brand=qua...

9 months ago

0 Hi, We Have An Agent Running Inside A Nvidia Official Container. The Agent Seems To See The Gpu Driver But The Gpu Count Is 0 When I Join That Container,

oh ... maybe the bottleneck is augmentation in CPU !
But is it normal that the agent don't detect the GPU count and type properly ?

9 months ago

0 Hi, We Have An Agent Running Inside A Nvidia Official Container. The Agent Seems To See The Gpu Driver But The Gpu Count Is 0 When I Join That Container,

the weird thing is that: the GPU 0 seems to be in used as reported by nvtop in the host. But it is 50% slower than when running directly instead of through the clearml-agent ...

9 months ago

0 I Am Struggling A Bit To Understand The Use Case Of A Pipeline: Let Say You Have Step1 -> Step2 -> Step3 What Is The Point To Use Pipeline Feature Versus Having A Single Task That Do Those Steps One After Another ???

About the caching: how does it work ? ClearML maintain it own cache and monitor if any of you code changes? Even code that get change inside an import ?

one year ago

0 Just Want To Post It Here Before Raising A Github Issue: There Seems To Be A Regression Bug Since Clearml 1.13.0 Where Out Training In Gpu Is 2X Slower In Our Pipeline Based On

may be specific to fastai as I cannot reproduce it with another training using yolov5

9 months ago

0 Just Want To Post It Here Before Raising A Github Issue: There Seems To Be A Regression Bug Since Clearml 1.13.0 Where Out Training In Gpu Is 2X Slower In Our Pipeline Based On

not sure if related but clearml 1.14 tend to not "show" the gpu_type

9 months ago

0 I Have A Docker Container That Have Clearml-Agent Running Inside In Normal Mode. The Agent Take On A Task And Execute It Fine. I Just Want To Somehow Log The Docker Image Version That The Agent Is Running Inside. I Start My Container With Something Like:

@<1523701087100473344:profile|SuccessfulKoala55> It's working !! Thank you very much !!! Clearml is awesome !!!!

one year ago

I mean, what happen if I import and use function from another py file ? And that function code changes ?
Or you are expecting code should be frozen and only parameters changes between runs ?

one year ago

0 Hello Everyone, If I Use This Code, Where Is The File Downloaded To My Machine?

some clearml cache folder

9 months ago

0 Hi All, I Have Deployed A Clearml Server With Docker To One Of Our Local Machine. I Had Set Up The Filesserver Folder As Mount Point To The Cloud. How Easy Is It To Migrate Our Existing Experiments Later On To A Clearml Server That We Deploy In The Cloud

So if i spin up a new clearml server in the cloud and use the same file server mount point, i will see all task and expriment that i had on the in prem server in the cloud server?

one year ago

Oh, I was assuming you are passing the entire DB backups to the cloud.

Yes, that is what I want to do.
So I need to migrate both the MongoDB database and elastic search database from my local docker instance to the equivalent in the cloud ?

one year ago

but when I spin up a new server in the cloud, that server will have it's own mongodb and that will be empty no ?

one year ago

In summary:
Spin down the local server
Backup the data folder
In the cloud, extract the data backup
Spin up the cloud server

one year ago

Anything else to migrate ?

one year ago

0 Another Quick Question About Fileservers And Clearml-Agent: Clearml-Agent Seems To Ignore The Output Destination Set In The Task Config

nope, we are self-hosted in Azure

one year ago

0 Another Quick Question About Fileservers And Clearml-Agent: Clearml-Agent Seems To Ignore The Output Destination Set In The Task Config

no. I set apo.file_server to the None in Both the remote agent clearml.conf and my local clearml.conf
In which case, both case where the code is ran from local or remote, will store metrics to cloud storage

one year ago

0 Hi, I Have A Question About Task Status. I Have A Script That Runs "Forever": It Loads (Or Creates, If It Does Not Exist Yet) A Specific Clearml Task, Does Some Work (In My Case, Checks If Database Has Changed And If So Dump It To A File And Upload It As

@<1558986867771183104:profile|ShakyKangaroo32> If you just want something to run in regular period, have you consider TaskScheduler: None

one year ago

0 Another Quick Question About Fileservers And Clearml-Agent: Clearml-Agent Seems To Ignore The Output Destination Set In The Task Config

right, in which case you want to dynamically change with your code, not with the config file. This is where the Logger.set_default_output_upload come in

one year ago

0 Another Quick Question About Fileservers And Clearml-Agent: Clearml-Agent Seems To Ignore The Output Destination Set In The Task Config

If you are using multi storage place, I don't see any other choice than putting multi credential in the conf file ... Free or Paid Clearml Server ...

one year ago

0 Hi People When I Try To Use Docker Agents They Fail. If I Run The Command Clearml-Agent Daemon --Gpus 0 --Queue Default --Foreground Inside A Docker Prepared With All The Requirements Installed. It Tells Me:

while the other may need to be 1 instead of true

3 months ago

0 How To Tell Clearml Server To Use Cloud Storage (Azure)? I Have A Clearml Server Deployed With Docker-Compose. As Per Instruction

I didn;t know that from the client side, you can specify the storage elsewhere than the clearML server. Good to know !
But I still want to know, if possible, to use a blob storage by default, configured on the ClearML server, and each client don't need to do that ...

one year ago

What about migrating existing expriment in the on prem server?

one year ago

0 How To Tell Clearml Server To Use Cloud Storage (Azure)? I Have A Clearml Server Deployed With Docker-Compose. As Per Instruction

@<1523701087100473344:profile|SuccessfulKoala55> Is it even possible to have the server storing file to a given blob storage ?

one year ago

0 Hi. We Have Task That Generate Trained Model In

To "attach" that zip to the model, do you just use the update_weight and point to that zip file?

one year ago

0 I Am Trying To Run One Agent On My Local Machine And One Agent On A Vm

had you made sure that the agent inside GCP VM have access to your repository ? Can you ssh into that VM and try to do a git clone ?

one year ago

0 Hi Just To Confirm, I Set My Default Output_Uri To S3, Output_Uri Will Upload Just Artifacs, Console Log, File Server Will Save Plot Images Like Matplotlib/Seaborn. Is It Correct?

Found it: None
And credential are set with :

sdk {      
    azure.storage {
         containers: [
             {
                 account_name: "account"
                 account_key: "xxxx"
                 container_name:"clearml"
             }
         ]
    }
}

one year ago

Show more results