ClumsyElephant70

13 Questions, 89 Answers

Active since 10 January 2023

Last activity 2 years ago

Reputation

Badges 1

70 × Eureka!

Questions 13
Answers 89

0 Votes

30 Answers

2K Views

0 Votes 30 Answers 2K Views

Hi, I Would Like To Understand How I Can Set The Pip Cache Location For My Agent, I Thought That I Already Had The Right Setting With

Hi, I would like to understand how I can set the pip cache location for my agent, I thought that I already had the right setting with docker_internal_mounts....

clearml

3 years ago

0 Votes

2 Answers

2K Views

0 Votes 2 Answers 2K Views

Let’S Imagine I’M Building A Pipeline With Five Consecutive Steps, Where Some Of The Steps Are Non Ml/Dl Based. Using Clearml I Run A Lot Of Experiments To Find The Right Pipeline Configuration. After I Found The Right Algorithms And Parameters For My Pip

Let’s imagine I’m building a pipeline with five consecutive steps, where some of the steps are non ML/DL based. Using ClearML I run a lot of experiments to f...

clearml

4 years ago

0 Votes

2 Answers

2K Views

0 Votes 2 Answers 2K Views

Imagine I Browse Through My Experiment History And Find An Old Experiment That I Want To Use As A Base For A New Experiment. I Did Not Commit All My Changes Before Executing This Old Experiment, So The "Uncommitted Changes" In The "Execution Tab" Is Not

Imagine I browse through my experiment history and find an old experiment that I want to use as a base for a new experiment. I did not commit all my changes ...

clearml

4 years ago

0 Votes

2 Answers

2K Views

0 Votes 2 Answers 2K Views

Hi, Are There Other Ways To Add

Hi, are there other ways to add package_manager.extra_index_urls to my agents besides configuring them through the clearml.conf file?

clearml

3 years ago

0 Votes

30 Answers

2K Views

0 Votes 30 Answers 2K Views

Hi All, I Have An Elasticsearch Problem On My Clearml Server. The Error Message I Get On The Clearml Webapp Is

Hi all, I have an Elasticsearch problem on my ClearML server. The error message I get on the ClearML webapp is General data error (TransportError(503, 'searc...

clearml

4 years ago

0 Votes

7 Answers

2K Views

0 Votes 7 Answers 2K Views

Hi, How Can I Use

Hi, how can I use package_manager.force_repo_requirements_txt=true in a mono repository structure? like repo/project-a/requirements.txt , repo/project-b/requ...

clearml

4 years ago

0 Votes

20 Answers

2K Views

0 Votes 20 Answers 2K Views

Hey I’M Running This Script And Initialise The Clearml Task Also In This File

Hey I’m running this script and initialise the ClearML task also in this file https://github.com/facebookresearch/fastMRI/blob/master/banding_removal/scripts...

clearml

4 years ago

0 Votes

7 Answers

2K Views

0 Votes 7 Answers 2K Views

Hi, I Want To Pass Environment Variables From The Host To The Docker Containers Running My Task. I Managed To Use

Hi, I want to pass environment variables from the host to the docker containers running my task. I managed to use extra_docker_shell_script: ["export SECRET=...

clearml

4 years ago

0 Votes

11 Answers

2K Views

0 Votes 11 Answers 2K Views

Hey, Is There A Way To Limit The Number Of Tasks Run At The Same Time By An Agent In Service Mode?

Hey, is there a way to limit the number of tasks run at the same time by an agent in service mode?

mlops

3 years ago

0 Votes

4 Answers

2K Views

0 Votes 4 Answers 2K Views

Hi, Are There Any Plans Or Already Ways To Deploy A Pipeline With Clearml-Serving To Triton? I Would Also Be Interested In The Support Of Deploying Pure Python Models Using The New Python_Backend Of Triton.

Hi, are there any plans or already ways to deploy a pipeline with clearml-serving to triton? I would also be interested in the support of deploying pure pyth...

clearml

4 years ago

0 Votes

11 Answers

2K Views

0 Votes 11 Answers 2K Views

Any Idea Why I Get This Error In All My Agents

Any idea why I get this error in all my agents clearml_agent: ERROR: APIError: code 400/707: No queue is tagged as the default queue for this company

clearml

4 years ago

0 Votes

2 Answers

2K Views

0 Votes 2 Answers 2K Views

Hey, I'M Trying To Get The Google Cloud Platform Credentials As A

Hey, I'm trying to get the Google Cloud Platform Credentials as a .json file inside my dockerized clearML agents. I was able to copy those credentials from t...

clearml

4 years ago

0 Votes

9 Answers

2K Views

0 Votes 9 Answers 2K Views

Hey, I’M Getting The Following Error When Loading A Model Using Model.Get_Local_Copy()

Hey, I’m getting the following error when loading a model using model.get_local_copy() … raise ValueError("Could not retrieve a local copy of model weights {...

mlops

4 years ago

0 Hi All, I Have An Elasticsearch Problem On My Clearml Server. The Error Message I Get On The Clearml Webapp Is

Solving the replica issue now allowed me to get better insights into why the one index is red.
` {
"index" : "events-training_stats_scalar-d1bd92a3b039400cbafc60a7a5b1e52b",
"shard" : 0,
"primary" : true,
"current_state" : "unassigned",
"unassigned_info" : {
"reason" : "CLUSTER_RECOVERED",
"at" : "2021-11-09T22:30:47.018Z",
"last_allocation_status" : "no_valid_shard_copy"
},
"can_allocate" : "no_valid_shard_copy",
"allocate_explanation" : "cannot allocate because a...

4 years ago

0 Hi, Our Server Ip Address Has Changed, And This Breaks All The Paths To Artifacts / Datasets. Is There A Way To Fix The Old Paths So That They Can Be Accessed Again? Thank You!

I think Anna means that if artifacts and models are stored on the clearml fileserver their path will contain the IP or domain of the fileserver. If you then move the fileserver to a different host, all the urls are broken since the host changed.

4 years ago

0 Hi All, I Have An Elasticsearch Problem On My Clearml Server. The Error Message I Get On The Clearml Webapp Is

curl -XPUT -H 'Content-Type: application/json' 'localhost:9200/_settings' -d '{"index" : {"number_of_replicas" : 0}}This command made all my indices beside the broken one which is still red, come green again. It comes from https://stackoverflow.com/questions/63403972/elasticsearch-index-in-red-health/63405623#63405623 .

4 years ago

0 Hi, Our Server Ip Address Has Changed, And This Breaks All The Paths To Artifacts / Datasets. Is There A Way To Fix The Old Paths So That They Can Be Accessed Again? Thank You!

SuccessfulKoala55 Hey, for us artifact download urls, model download urls, images in plots and debug image urls are broken. In the linked example I can see a solution for the debug images and potentially plot images but cant find the artifacts and model urls inside ES. Are those urls maybe stored inside the mongodb? Any idea where to find them?

4 years ago

0 Hey All. Quick Question About The

the error your are citing happens when running clearml-agent daemon --gpus 0 --queue default --docker nvidia/cuda

4 years ago

0 Hey All. Quick Question About The

python3.6 -m virtualenv /home/tobias_vitt/.clearml/venvs-builds/3.6 returns StopIteration:

4 years ago

0 Hey All. Quick Question About The

` 2021-05-06 13:46:34.032391: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 0 with properties:

pciBusID: 0000:a1:00.0 name: NVIDIA Quadro RTX 8000 computeCapability: 7.5

coreClock: 1.77GHz coreCount: 72 deviceMemorySize: 47.46GiB deviceMemoryBandwidth: 625.94GiB/s

2021-05-06 13:46:34.032496: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: ...

4 years ago

0 Hi, I Would Like To Understand How I Can Set The Pip Cache Location For My Agent, I Thought That I Already Had The Right Setting With

the cache on the host is mounted as nfs and the nfs server was configured to not allow the clients to do root operations

3 years ago

0 Hi, I Would Like To Understand How I Can Set The Pip Cache Location For My Agent, I Thought That I Already Had The Right Setting With

it appears at multiple places. Seems like the mapping of pip and apt cache does work but the access rights are now an issue

3 years ago

0 Hey I’M Running This Script And Initialise The Clearml Task Also In This File

I'm running now the the code shown above and will let you know if there is still an issue

4 years ago

0 Hey All. Quick Question About The

clearml-agent daemon --gpus 0 --queue default --docker nvidia/cuda:11.3.0-cudnn8-runtime-ubuntu18.0 causes not using the GPUs because of missing libs.

4 years ago

0 Hi All, I Have An Elasticsearch Problem On My Clearml Server. The Error Message I Get On The Clearml Webapp Is

Try to restart ES and see if it helps

docker-compose down / up does not help

4 years ago

0 Hey I’M Running This Script And Initialise The Clearml Task Also In This File

This happens inside the agent, since I use task.execute_remotely() I guess. The agent runs on ubuntu 18.04 and not in docker mode

4 years ago

0 Hi All, I Have An Elasticsearch Problem On My Clearml Server. The Error Message I Get On The Clearml Webapp Is

, what version of clearml is your server?

the docker-compose use clearml:latest

4 years ago

0 Any Idea Why I Get This Error In All My Agents

We do have a queue called office and another queue called default, so the agent is not listening for queues that are not defined. Or do I misunderstand something? The server has all queues defined that the agents are using

4 years ago

0 Hey I’M Running This Script And Initialise The Clearml Task Also In This File

Hey AgitatedDove14 , I fixed my code issue and are now able to train on multiple gpus using the https://github.com/facebookresearch/fastMRI/blob/master/banding_removal/fastmri/spawn_dist.py . Since I create the ClearML Task in the main thread I now can't see any training plots and probably also not the output model. What would be the right approach? I would like to avoid using Task.current_task().upload_artifact() or manual logging. I really enjoy the automatic detection

4 years ago

0 Hey, I’M Getting The Following Error When Loading A Model Using Model.Get_Local_Copy()

I can see the following using docker ps:
d5330ec8c47d allegroai/clearml-agent "/usr/agent/entrypoi…" 3 weeks ago Up 3 weeks clearml

I execute the following to access the container
docker exec -u root -t -i clearml /bin/bash

I went to /root/.clearml/venv-builds but it is empty

4 years ago

0 Hey, I’M Getting The Following Error When Loading A Model Using Model.Get_Local_Copy()

SuccessfulKoala55 I'm currently inside the docker container to recover the ckpt files. But /root/.clearml/venvs-builds seems to be empty. Any idea where I could then find the ckpt files?

4 years ago

0 Hi, I Want To Pass Environment Variables From The Host To The Docker Containers Running My Task. I Managed To Use

I like this approach more but it still requires resolved environment variables inside the clearml.conf

4 years ago

0 Hey, Is There A Way To Limit The Number Of Tasks Run At The Same Time By An Agent In Service Mode?

I'm running the following agent:
clearml-agent --config-file /clearml-cache/config/clearml-cpu.conf daemon --queue cpu default services --docker ubuntu:20.04 --cpu-only --services-mode 4 --detached
The goal is to have an agent that can run multiple cpu only tasks at the same time. I notices that when enqueueing multiple tasks, all except for one stay pending until the first one finished downloading all packages and started with code execution. And then task by task switch to "run...

3 years ago

0 Hi, I Want To Pass Environment Variables From The Host To The Docker Containers Running My Task. I Managed To Use

I can figure out a way to resolve it, but is there any other way to get env vars / any value or secret from the host to the docker of a task?

4 years ago

0 Hey All. Quick Question About The

One more thing: The dockerized version is still not working as I want it to. If I use any specific docker image like docker: nvidia/cuda:11.3.0-cudnn8-runtime-ubuntu18.04 on a host machine with NVIDIA-SMI 465.19.01 Driver Version: 465.19.01 CUDA Version: 11.3 I always get a similar error as above where a lib is missing. If I use the example from http://clear.ml clearml-agent daemon --gpus 0 --queue default --docker nvidia/cuda I always get this error ` docker: Error...

4 years ago

0 Hey, I’M Getting The Following Error When Loading A Model Using Model.Get_Local_Copy()

thanks for the info, thats really bad 😬 I thought that the output_uri defaults to the fileserver 🙄

4 years ago

0 Hi, Are There Any Plans Or Already Ways To Deploy A Pipeline With Clearml-Serving To Triton? I Would Also Be Interested In The Support Of Deploying Pure Python Models Using The New Python_Backend Of Triton.

CostlyOstrich36 Thank you for your response, is there something like a public project roadmap?

4 years ago

0 Hey, Is There A Way To Limit The Number Of Tasks Run At The Same Time By An Agent In Service Mode?

We run a lot of pipelines that are cpu only with some parallel steps. Its just about improving the execution time

3 years ago

0 Hey, Is There A Way To Limit The Number Of Tasks Run At The Same Time By An Agent In Service Mode?

Ok, if I would like to have a different behaviour I would need one agent per task, right?

3 years ago

0 Hey, Is There A Way To Limit The Number Of Tasks Run At The Same Time By An Agent In Service Mode?

Thanks

3 years ago

0 Hey All. Quick Question About The

tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'libcusolver.so.10'; dlerror: libcusolver.so.10: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/nvidia/lib:/usr/local/nvidia/lib64

4 years ago

0 Hey, I’M Getting The Following Error When Loading A Model Using Model.Get_Local_Copy()

thanks a lot, yes it was the daemon :man-facepalming: I already could recover one checkpoint!

4 years ago

Show more results