AgitatedDove14

49 Questions, 8124 Answers

Active since 10 January 2023

Last activity one year ago

Reputation

Badges 1

25 × Eureka!

Answers 8124

0 Hi All! I I Tried To Run The

Sure, is it reproducible ?

4 years ago

0 Hello, I Have Some Problems With Allegro. I Run A Programm And Then I Saw It On The Trains Server. But Now I Change Something With The Code And I Pushed It Again. Now I Cloned It. But The Old Code Was Executed. How Can I Run The New Code I Pushed?

SuperiorDucks36 from code ? or UI?
(You can always clone an experiment and change the entire thing, the question is how will you get the data to fill in the experiment, i.e. repo / arguments / configuration etc)
There is a discussion here, I would love to hear another angle.
https://github.com/allegroai/trains/issues/230

4 years ago

0 Hey, Using K8S With Trains 0.16.1-320, All Of A Sudden The Entire Data (I.E Experiments, Tasks, Api Creds) Is Not Showing In The Ui Anymore. All Logs Seems To Be Fine Afai Can Tell... Any Idea What Went Wrong?

https://stackoverflow.com/questions/37743683/why-is-an-empty-mongodb-database-so-big

4 years ago

0 Hi Friends! I'M Trying To Upgrade The

With pleasure 🙂

4 years ago

0 When Using Docker Mode (And Specifically K8S Glue), What Are The Options For Caching? One Option Is Definitely Having A Base Image That Has The Things Needed. Anything Else? Thanks!

pip cache & git cache & venvs cache
Are all supported, you just need to map the folders.
If you do not want to spin a PVC with NFS mount, you can just mount an S3 bucket with s3fs as part of the container extra bash script,
https://github.com/allegroai/clearml-agent/blob/b39b54bbafab39e6731cb742fdf317bc6dcae54a/docs/clearml.conf#L140

s3 FUSE fuse filesystems:
https://github.com/kahing/goofys
https://github.com/s3fs-fuse/s3fs-fuse

WDYT?

4 years ago

0 Hi, Is There Any Way To Upload Data To A Clearml Dataset Without Compression At All? I Have Very Small Text Files That Make Up A Dataset And Compression Seems To Take Most Of The Upload Time And It Provide Almost No Benefits W.R.T Size

As a hack you can try DEFAULT_VERSION
(it's just a flag and should basically do Store)
EDIT: sorry that won't work 😞

2 years ago

0 Good Morning Folks, I Am Setting Up Clearml On A (Self-Hosted) K8S Cluster Using The

Correct, (if this is running on k8s it is most likely be passed via env variables , CLEARML_WEB_HOST etc,)

3 years ago

0 Hi All! I Have A Couple Of Things That Are Not Completely Clear To Me, Hope You Can Help Me To Sort Them Out.

Thanks OutrageousGrasshopper93
I will test it "!".
By the way the "!" is in the project or the Task name?

4 years ago

0 Hi! Is There Something Happening With The

Yes , both work :(

4 years ago

0 Hello, Is There An Api For Trains? To List/Edit Projects And Expirements From Other Code Externally>?

Hi JuicyDog96
The easiest way is:
from trains.backend_api.session.client import APIClient client = APIClient() client.projects.get_all()You can just run it from a python console and check what you are getting.
Full API is https://github.com/allegroai/trains/tree/master/trains/backend_api/services/v2_8

5 years ago

0 Hi, Anyone Seen This Issue?

Hmm I'm assuming something wrong here:
https://github.com/allegroai/clearml-server/blob/a64c4d264d00eadd2d11818b37151d3cc6266d99/docker/docker-compose.yml#L119
What's the host machine OS ?

3 years ago

0 Clearml (Remote Execution) Sometimes Doesn'T "Pick-Up" Gpu. After I Rerun The Task It Picks It Up. Seems Random, Doesn'T Happen Too Often (Maybe Once In 30-40 Times) And I Cannot Seem To Detect Any Pattern. Did Anyone Else Notice This? Agents Are Vms On G

This smells like a driver/image issue on the instance VM
What are you getting if add this inside your code?

os.system('nvidia-smi')

one year ago

0 Hello All! I Have Some Trouble With Running Remotely Task With Code From Gitlab Repo With Ssl Cert. On The Machine Where Clearml Agent Installed Cert Is Added And Repo Cloning Successfully, But When I Tried To Run Task - It Failing With Git Repo Cloning F

Hi @<1630377234361487360:profile|RoughSeaturtle43>

code from gitlab repo with ssl cert.

what do you mean by ssl secret? is it SSH or app-token ?

one year ago

0 If I Am Using The Demo Servers, Do I Need To Do Something Special To Use

HealthyStarfish45
No, it should work 🙂

4 years ago

0 How Can I Add My Requirements.Txt File To The Pipeline Instead Of Each Tasks?

but actually that path doesn't exist and it is giving me an error

So you are saying you only uploaded the "meta-data" i.e. a text file with links to the files, and this is why it is missing?

Is there a way to change the path inside the .txt file to clearml cache, because my images are stored in clearml cache only

I think a good solution would be to store the path in the txt file as relative path, i.e. instead of /Users/adityachaudhry/data/folder... as ./data/folder

2 years ago

0 Hey All. Another Question - How Are Private Packages Handled/Installed So That Clearml-Agent Can Execute A Task? I Have A Bunch Of Private Repos For Communicating With The Data Warehouse. I Could Do A System-Wide Installation For It On The Clearml-Agent I

Hi TenseOstrich47
Does the .ssh folder on the user running the agent contain the correct credentials ?
Basically from the user running the agent on the agent's machine can you clone the repo with:
ssh://git@github.com/15gifts/py-db.git

4 years ago

0 Hello, In The Following Context:

PS. I just noticed that this function is not documented. I'll make sure it appears in the doc-string.

5 years ago

0 Hi All, I Have A Question Regarding Multi-Node Training Using The Clearml-Agent. What Is The Recommended Setup In This Case? Say I Have 3 Nodes With 3 Agents Running On Them. How Do I Make Sure They All Run The Same Job?

So in theory you can clone yourself 2 extra times and push into an execution queue, but the issue might be actually making sure the resources are available. what did you have in mind?

4 years ago

0 Hi

.I am using pipeline from tasks method and not pipeline from decorator.

Wait I'm confused nowm if this is a pipeline from Tasks then the Tasks themselves should have clearml in the "installed packages", no? and if they do not, how were they created?

2 years ago

0 Hello,

what's the clearml package version and clearml-session version ?

2 years ago

0 Hi All, I'M Trying To Deploy Trains On Rancher (Nice Kubernetes Cluster Orchestration Project) Where I'M Quite New To Rancher And Kubernetes. I Have Been Able To Install Trains Using Helm

Hi WickedGoat98 ,
I think you are correct 😞
I would guess it is something with the ingress configuration (i.e. ConfigMap)

4 years ago

0 I .

Correct

3 years ago

Failed to initialize NVML: Unknown Error

yeah this is a driver issue. I think you need to check the VM image if the drivers match the GPU on that machine

one year ago

0 Hello Everyone. Nice To Meet You I Got This Error When I Run Docker-Compose After Upgrading Clearml-Serving From 1.0 => 1.3 Have You Seen This Error? If You Did And Solved, Could You Tell Me How To Solve It?

Hi @<1557899668485050368:profile|FantasticSquid9>
There is some backwards compatibility issue with 1.2 (I think).
Basically what you need it to spin a new one on a new session ID and rergister the endpoints

2 years ago

0 Hi! How To Add Files Locally To

MelancholyElk85

How do I add files without uploading them anywhere?

The files themselves need to be packaged into a zip file (so we have an immutable copy of the dataset). This means you cannot "register" existing files (in your example, files on your S3 bucket?!). The idea is to make sure your dataset is protected against changes on the one hand, but on the other to allow you to change it, and only store the changeset.
Does that make sense ?

3 years ago

0 Hi, Anyone Seen This Issue?

1633204289496 clearml-services DEBUG docker: invalid reference format.

This is the strange message, like the execution command is not valid...

3 years ago

0 I Am Trying To Do A Remote Execution Of A Test Task, But It Fails During Env Setup Due To Trying To Install An Obscure Version Of Pytorch. Been Trying To Solve This For Three Days! The Script:

but it fails during env setup due to trying to install an obscure version of pytorch. Been trying to solve this for three days!

AdventurousButterfly15 it tries to resolve the correct pytorch version based on the cuda inisde the container

ERROR: torch-1.12.1+cu116-cp310-cp310-linux_x86_64.whl is not a supported wheel on this platform.

seems like it is trying to install pytoch for python 3.10 with cuda 11.6 support, this seems reasonable, no?

2 years ago

0 Hi, I Am Trying To Execeute My Code On Nvidia/Cuda Docker, But It Keeps Running, It Is Not Failed Or Not Aborted. The Last Log Message Is

MysteriousBee56 that is so weird ... last one, I promise 🙂
docker run -t --rm nvidia/cuda:10.1-base-ubuntu18.04 bash -c "echo 'Binary::apt::APT::Keep-Downloaded-Packages \"true\";' > /etc/apt/apt.conf.d/docker-clean && apt-get update && apt-get install -y git python3-pip && python3 -m pip install trains-agent && echo \$(which python3) && echo \$(which trains-agent)"

5 years ago

0 Hi, I’M Having Troubles Initializing Connection To Clearml (“Error: Could Not Verify Credentials:“). Who Can Help? Thanks

No worries 🙂 glad it worked

3 years ago

0 Hello, I'M Trying Clearml-Serving On Any Of The Example Models From The 'Clearml Examples' Project. After Running 'Clearml-Serving Triton ...' I Always Get The Following Error: Clearml-Serving Triton --Endpoint "Keras_Mnist" --Model-Project "Clearml Exa

Hi ScaryLeopard77
I think the error message you are getting is actually "passed" from Triton. Basically someone needs to tell it what the Model in/out look like (matrix size/type) this is essentially the content of the "config.pbtxt" , and this has to be set when spinning the model endpoint. does that make sense to you?

3 years ago

Show more results