AgitatedDove14

48 Questions, 8049 Answers

Active since 10 January 2023

Last activity 6 months ago

Reputation

Badges 1

25 × Eureka!

Answers 8049

0 Hi Guys, Any Plan To Integrate The

Awesome! Any chance you feel like contributing it, I'm sure ppl would be thrilled 🙂

4 years ago

0 [Clearml Task Querying] How Would I Find Tasks That Have The Same Code With Different Inputs/Parameters? I’M Interested In “Diff”Ing The Inputs To/Outputs From A Task To Do Pipeline “Caching” In A More Intelligent Way (For My Use Case) Than Clearml Does B

Hi ReassuredOwl55

How would I find Tasks that have the same code with different inputs/parameters?

Assuming you have the git repo
you can do:
Task.query_tasks(..., task_filter={'_all_'=dict(fields=['script.repository'], pattern='github.com/user/repo'))wdyt?

one year ago

0 Trying To Setup A Trains-Agent Worker On A Remote Machine; When I Run Trains-Init And Follow The Steps To Give It Credentials For Our Trains Server I Get This

this issue on when trying to set up on our remote machines

You mean setting up the trains-server on remote machine?

3 years ago

0 Is It Possible To Filter Tasks By There Output And Input Names Using .Get_Tasks?

Let me check the API reference
https://clear.ml/docs/latest/docs/references/api/endpoints#post-tasksget_all
So not straight query, but maybe:
https://clear.ml/docs/latest/docs/references/api/endpoints#post-tasksget_all_ex
all section might do the trick.
SuccessfulKoala55 any chance you have an idea on what to pass there ?

3 years ago

0 I'M Probably Stupid, But How Do I Specify Worker Name? Usecase - I Want To Create Two Workers Using The Same Gpu, And New Worker Just Overwrites The Old One

DilapidatedDucks58 no don't say that, you are wonderful 😉

trains-agent --gpus 0 --queue my_queue -d
should create a worker machine:gpu0
Then you can do trains-agent --gpus 1 --queue my_queue -d which will create machine:gpu1

4 years ago

0 Hello, I Have Some Problems With Allegro. I Run A Programm And Then I Saw It On The Trains Server. But Now I Change Something With The Code And I Pushed It Again. Now I Cloned It. But The Old Code Was Executed. How Can I Run The New Code I Pushed?

Hi SuperiorDucks36

you have such a great and clear GUI

😊

I personally would love to do it with a CLI

Actually a lot of stuff are harder to get from UI (like current state of your local repository etc.) But I think your point stands 🙂 We will start with CLI, because it is faster to deploy/iterate, then when you guys say this is a winner we will have a wizard in the UI.
What do you think?

3 years ago

0 Hi, Community! For The Test I Logged My New Model To Clearml-Server File Host And Take Models For Clearml-Serving From There. And It Works With Clearml-Serving Model Add, But For Clearml-Serving Model Auto-Update I Do Not Exactly Understand What Happens.

Hi AbruptHedgehog21
can you send the two models info page (i.e. the original and the updated one) ?
do you see the two endpoints ?
BTW: --version would add a version to the model (i.e. create a new endpoint with version "endpoint/{version}"

2 years ago

0 Hi, Kudos For The 0.15 Guys! I Am Having An Issue Related To Git Auth: I Have An Issue With Trains-Agent (0.15): It Does Not Use Git Creds While Trying To Clone A Private Repo:

Thank you! 🙂

4 years ago

0 Hello. I Have An Issue In Regards To A Task That I Run As A Service ( Should Always Run). I Run The Clearml Server And Agents In Kubernetes. I Think This Is A Design Problem With The Way Clearml Agents Run On Kubernetes. The K8S Glue Will Launch A Worker

This means that if something happens with the k8s node the pod runs on,

Actually if the pod crashed (the pod not the Task) k8s should re spin it, no?

I also experience that if a worker pod running a task is terminated, clearml does not fail/abort the task.

From the k8s perspective, if the task ended (failed/completed) it always return with exit code 0, i.e. success. Because the agent was able to spin the Task. We do not want Tasks with exception to litter the k8s with endless r...

one year ago

SuperiorDucks36 from code ? or UI?
(You can always clone an experiment and change the entire thing, the question is how will you get the data to fill in the experiment, i.e. repo / arguments / configuration etc)
There is a discussion here, I would love to hear another angle.
https://github.com/allegroai/trains/issues/230

3 years ago

0 Hi all :wave:! I got a problem regarding Grafana/Prometheus. When I deploy a model with clearml-serving and I add metrics like this: `clearml-serving --id *** metrics add --endpoint slm_POC --variable-scalar beds=0,1,5,10,50 bath=0,1,5,10,50 y=0,100000,50

hi @<1546303293918023680:profile|MiniatureRobin9>

I can still see the metrics in Grafana. I

it will not delete it from grafana, it means it's no longer collected, make sense ?

one year ago

0 Any Idea Why Only A Single Instance Of Mujoco Can Be Run With Clearml-Agent? I Run 2 Clearm-Agents, One Per Gpu On My Workstation. However, The Second Task Failes With One Of The Following Errors:

Hi ReassuredTiger98
Are you running the agent in venv mode ?

3 years ago

0 I'M Probably Stupid, But How Do I Specify Worker Name? Usecase - I Want To Create Two Workers Using The Same Gpu, And New Worker Just Overwrites The Old One

🙂

4 years ago

0 I Try To Use A Ssh Key For Git. When I Push A File To Git I Still Have To Write My Name And Password. I Solved It By The Command: Git Remote Set-Url Origin Git@Rz-S-Git:C.Huber/Allegro.Git How Can I Use This With Allegro? At The Moment I Get Get Error: Fa

Does this mean that I need to create multiple ssh keys? 1 key for each user?

I think so

Use .git-credentials

This might also support multiple user/repo

3 years ago

0 Second: Is There A Way To Take Internally Tracked Training Runs And Publish Them Publicly, E.G. For A Research Paper? "Appendix A: Training Runs Can Be Found Here, Feel Free To Explore Them And Look At The Loss Curves"? For Example

How would one do this? Do I just share a link to the experiment, like

See "Share" in the right click menu on the experiment

2 years ago

0 Hello! I'M Trying To Make A Simple Eval.Py Script That Will Go Pull The Best Model Of A Given Experiment, Load It Locally And Evaluate It On Whatever Data I Give. Question 1: Is There A Standard Way Documented Somewhere To Do This? Question 2: I'M Loadin

Fixed in pip install clearml==1.8.1rc0 🙂

one year ago

0 Hi, I'M Trying To Run Task.Init Inside A Jupyter Notebook For The First Time (Used It A Lot Before In Normal Python Scripts), And I Get A Warning-

You can try calling
task._update_repository()I'm still trying to figure out how to reproduce it...

3 years ago

0 Hi, I'M Trying To Run Task.Init Inside A Jupyter Notebook For The First Time (Used It A Lot Before In Normal Python Scripts), And I Get A Warning-

thanks!

3 years ago

0 Hi, I Have A Worker On A Machine Using Gpus 0,1 And Another Worker On The Same Machine Using Gpus 0,1,2,3,4,5. A Worker Ran A Task On Gpus 0,1 But For Some Reason The Second Worker Started Additional Task In Queue On Gpus 0,1,2,3,4,5, Which Caused Both Of

you mean in the enterprise

Enterprise with the smarter GPU scheduler, this is inherent problem of sharing resources, there is no perfect solution, you either have fairness, but then you get idle GPU's of you have races, where you can get starvation

3 years ago

0 Hello Everyone! First, Thanks A Lot To Everyone That Made Clearml Possible, I'Ve Been Looking For A Tool Like That For Years. I Just Installed The Open Source Server (

Hi MistakenDragonfly51

Hello everyone! First, thanks a lot to everyone that made ClearML possible,

❤
To your questions 🙂
long story short, no unless you really want to compile the dockers, which I can't see the real upside here Yes, add the following /opt/clearml.conf:/root/clearml.conf herehttps://github.com/allegroai/clearml-server/blob/5de7c120621c2831730e01a864cc892c1702099a/docker/docker-compose.yml#L154
and configure your hosts " /opt/clearml.conf" with ...

2 years ago

0 Hi, I'M Trying To Run Task.Init Inside A Jupyter Notebook For The First Time (Used It A Lot Before In Normal Python Scripts), And I Get A Warning-

ThickDove42 Windows conda python3.6 was exactly what I was using,
started the jupyter with:
"python -m jupyter notebook"
Then opened / created a new notebook, everything worked.
Tested on latest clearml 0.17.2
Maybe it's something with the path to the repo that breaks it? Because obviously the issue is it is looking in the wrong folder.

3 years ago

0 Hi, I Am Following The Programmatic Orchestration Example Here:

Hi CloudySwallow27

how can I just "define" it on my local PC, but not actually run it.

You can use the clearml-task CLI
https://clear.ml/docs/latest/docs/apps/clearml_task#how-does-clearml-task-work
Or you can add the following line in your code, that will cause the execution to stop, and to continue on a remote machine (basically creating the Task and pushing it into an execution queue, or just aborting it)
task = Task.init(...) task.execute_remotely()https://clear.ml/do...

2 years ago

0 Hi, I'M Trying To Run Task.Init Inside A Jupyter Notebook For The First Time (Used It A Lot Before In Normal Python Scripts), And I Get A Warning-

ThickDove42 Windows also works 😞
Any specifics on the setup?

3 years ago

0 Any Idea Why Only A Single Instance Of Mujoco Can Be Run With Clearml-Agent? I Run 2 Clearm-Agents, One Per Gpu On My Workstation. However, The Second Task Failes With One Of The Following Errors:

Since you are running in venv mode, adding the OS environment before the clearml-agent, will basically make sure it will propagate to the process itself.
ReassuredTiger98 make sense ?

3 years ago

0 When Running Jobs, My Pipeline Controller Always Updates To The Latest Git Commit Id But Sometimes My Pipeline Steps Do Not. This Appears To Be Somewhat Random So I Believe It Is Due To Caching. Has Anyone Else Encountered This Or Have Any Idea How To Fix

AdventurousRabbit79 are you passing cache_executed_step=False to the PipelineController ?
https://github.com/allegroai/clearml/blob/332ceab3eadef4997e897d171957975a247a6dc1/clearml/automation/controller.py#L129
Could you send a usage example ?

my pipeline controller always updates to the latest git commit id

This will only happen if the Task the pipeline creates has no specific commit ID, and instead just uses the latest from the git repo. Is this the case ?

3 years ago

0 We Have A Environment Variables Definitions.Py File Which Every User Configures On Their Local Machine. This File Includes Local Paths As Well As Aws/Api Credentials. This Is An Issue When Spinning Up Clearml Tasks Since It Is Not Included In The Git Repo

Right, so this "vault" design is built into the paid tiers of ClearML to achieve exactly that. Long story short, users can put their credentials/configs on the clearml-server and the agent (or the clients) will pull and merge them into the execution.
It's very cool and works really nice, but not part of the open source (or the SaaS tier).
What you could do is store these configurations on the Task itself (one way o r another). Maybe for example have an empty definitions.py file part of ...

2 years ago

0 Hello Everyone! First, Thanks A Lot To Everyone That Made Clearml Possible, I'Ve Been Looking For A Tool Like That For Years. I Just Installed The Open Source Server (

...that file and the logs of the agent service always say the same thing as before:

Oh in that case you need feel in Your credentials here:
https://github.com/allegroai/clearml-server/blob/5de7c120621c2831730e01a864cc892c1702099a/docker/docker-compose.yml#L137
Basically CLEARML_API_ACCESS_KEY / CLEARML_API_SECRET_KEY will let the agent running inside the docker talk to the server itself. Just put your own credentials there as a start, it should solve the issue

2 years ago

0 Hi, Kudos For The 0.15 Guys! I Am Having An Issue Related To Git Auth: I Have An Issue With Trains-Agent (0.15): It Does Not Use Git Creds While Trying To Clone A Private Repo:

unless the domain is different

?

Imagine that you are working with both github and bitbucket for example, if you are using git-ssh than git will know which of the domains to send the key to. Currently there is a single user/pass entry so all domains will get the same credentials. But I think this is a rare use case.

4 years ago

0 Hello. It'D Be Really Helpful If Someone Could Let Me Know Why I Keep Getting "Misconfigurationexception('No Supported Gpu Backend Found!')" Error. I Am Using "Task.Execute_Remotely(Queue_Name="Default", Exit_Process=True)". Once It Gets Queued, I Clone I

Hi @<1715175986749771776:profile|FuzzySeaanemone21>

and then run "clearml-agent daemon --gpus 0 --queue gcp-l4" to start the worker.

I'm assuming the docker service cannot spin a container with GPU access, usually this means you are missing the nvidia docker runtime component

3 months ago

0 Is There A Way To Configure The File Server To Use Minio Storage, Or Does Every Individual User Have To Configure Their Own Minio Credentials?

Hi FantasticPig28

or does every individual user have to configure their own minio credentials?

You can configure the clients files entry in the clearml.conf (or use an OS environment)
files_server: " "https://github.com/allegroai/clearml/blob/12fa7c92aaf8770d770c8ed05094e924b9099c16/docs/clearml.conf#L10
Notice to make sure you also provide credentials here:
https://github.com/allegroai/clearml/blob/12fa7c92aaf8770d770c8ed05094e924b9099c16/docs/clearml.conf#L97

2 years ago

Show more results