AgitatedDove14

49 Questions, 8124 Answers

Active since 10 January 2023

Last activity one year ago

Reputation

Badges 1

25 × Eureka!

Answers 8124

0 Can You Help Me Make The Case For Clearml Pipelines/Tasks Vs Metaflow? Context Within...

Hi @<1541954607595393024:profile|BattyCrocodile47>

Can you help me make the case for ClearML pipelines/tasks vs Metaflow?

Based on my understanding

Metaflow cannot have custom containers per step (at least I could not find where to push them)
DAG only execution. I.e. you cannot have logic driven flows
cannot connect git repositories to different component in the pipeline
Visualization of results / artifacts is rather limited
Only Kubernetes is supported as underlying prov...

2 years ago

0 I Buried This Issue In Another Thread To Do With Deployment, But I Was Wondering If Anyone Else Has Had Problems Using

Would this be best if it were executed in the Triton execution environment?

It seems the issue is unrelated to the Triton ...

Could I use the

clearml-agent build

command and the

Triton serving engine

task ID to create a docker container that I could then use interactively to run these tests?

Yep, that should do it 🙂
I would start simple, no need to get the docker itself it seems like clearml credentials issue?!

4 years ago

0 Hi Everyone, I Was Working With Model Serving And Monitoring, And Wanted To Know About Monitoring Aspects/Usage In Serving. I Actually Wanted To Know About Exactly What All Queries Related To The Serving Can Be Done, Like What All Are Important Metric Mon

Like what would be the exact query given an endpoint, for requests per sec.

You mean in Grafana ?

2 years ago

0 Hi Guys, With The New Venv Caching Available In Clearml, I Have The Following Problem: I Force My Pip Requirements To Be:

Sorry I missed the additional "." in the _update_requirements
Let me check ....

4 years ago

0 Hi Guys, We Are Running Clearml-Serving On A Kube Cluster On Aws And We Have Noticed That We Are Getting Some 502 Errors Once In A While That We Can'T Seem To Trace Back.

Hi @<1569858449813016576:profile|JumpyRaven4>

The gunicorn logs do not show anything including any error or trace of the 502 only siege reports the 502 as well as the ALB.Is this an ALB or an ELB ?
What's the timeout its configured?
Do you have GPU instances as well? what's the clearml-serving-inference docker version ?

one year ago

0 If I Have A Task And A Dataset Is Being Created In A Task, How Can I Get A “Link” That This Dataset Is Created In This Task, Similar To How Model Has The Task Where It Came From

Regrading the first direction, this was just pushed 🙂
https://github.com/allegroai/clearml/commit/597a7ed05e2376ec48604465cf5ebd752cebae9c

Regrading the opposite direction:
That is a good question, I really like the idea of just adding another section named Datasets
SucculentBeetle7 should we do that automatically?

4 years ago

0 Does Clearml Creates Separate Virtual Environments For Each Pipeline Steps When Running Remotely?

Hi @<1610083503607648256:profile|DiminutiveToad80>
Yes, it does. They are also cached by default (on the machine with the agent)
None

one year ago

0 Hey, I Had A Problem With

Hi UptightBeetle98
The hyper parameter example assumes you have agents ( trains-agent ) connected to your account. These agents will pull the jobs from the queue (which they are now, aka pending) setup the environment for the jobs (venv or docker+venv) and execute the job with the specific arguments the optimizer chose.

Make sense ?

4 years ago

0 Was There Any Changes To Clearml Python Sdk In The Past 24 Hours?

RC you can see on the main readme, (for some reason the Conda badge will show RC and the PyPi won't)
https://github.com/allegroai/clearml/

2 years ago

0 Hi All! I Have A Question About Pipelines. My Pipeline Consists Of Several Steps:

If you take a look here, the returned objects are automatically serialized and stored on the files server or object storage, and also deserialized when passed to the next step.
https://github.com/allegroai/clearml/blob/master/examples/pipeline/pipeline_from_decorator.py

You can of course do the same manually

2 years ago

0 I Have Another Question Regarding Creating A Task With

GiganticTurtle0
What do you mean by "reuse_last_task_id" ? each component is always a new Task generated (unless it is cached, and then it will reuse the previous executed)
What am I missing here?

3 years ago

0 Hi, We Have Quite An Unusual Issue. We Run Some Agents, We Attach Them To Queue. They Are Doing The Job (They Are Doing Hyperparameter Optimization), However They Are Not Visible Either In:

RoundMosquito25 how is that possible ? could it be they are connected to a different server ?

2 years ago

0 I'M Using Tensorboard Summarywriter To Add Scalar Metrics For The Experiment. If Experiment Crashed, And I Want To Continue It From Checkpoint, For Some Reason It Plots Metrics In A Really Weird Way. Even Though I Pass Global_Step=Epoch To The Summarywrit

Thank you DilapidatedDucks58 for the ping!
totally slipped my mind 😞

3 years ago

0 Hi All, I'Ve Successfully Run A Task Locally, And Now I'M Trying To Clone It And Send It To A Queue. It Looks Like The Environment Is Built Successfully, But It Hangs Here:

How are you starting the agent?

one year ago

0 Hi There, It Seems Like There Is A Bug With The Visualization Of Debug Samples On The Ui (Server V1.2.0, Self-Hosted): When Clicking On A Debug Sample Then On The Download Button, If The Sample Is Stored In S3, The Download Button Opens A Blank Page With

JitteryCoyote63 I remember something with "!" in the name or maybe "/" in the name that might cause this behavior. May I suggest checking with clearml-server 1.3 ?

3 years ago

0 Is There A Way To Set Precedence On Package Managers? If We Set An Agent To Use

The other way will not work, as if you start with "pip" you cannot fail ... (if you fail it's in run time which is too late)

3 years ago

0 Hi, I'Ve Just Started To Evaluate Clearml For Internal Use At My Org And Am Wondering If There'S Anyway To Import Data From Old Experiments Into The Dashboard. Anyone Have Any Thoughts On This?

Hi JumpyPig73

import data from old experiments into the dashboard.

what do you mean by "old experiments" ?

3 years ago

0 Monitoring Related Question

Hi @<1607909176359522304:profile|UnevenCow76>

followed the below documentation to implement the clearml monitoring using prometheus and grafana

Did you try following this example, it includes both deploying a model and adding grafana metrics:
None

2 years ago

0 Hi, I Am Having Difficulties When Using The Dataset Functionality. I Am Trying To Create A Dataset With The Following Simple Code:

Found it
GiganticTurtle0 you are 🧨 ! thank you for stumbling across this one as well.
Fix will be pushed later today 🙂

3 years ago

0 I Want To Execute A Script Via Trains-Agent, But I Want To Be Able To Provide The Location Of A Config File By Specifying The Path Before Trains-Agent Executes The Script (Like A Flag Or Command Line Argument). How Can I Accomplish This?

The easiest if export_task / update_task:
https://allegro.ai/docs/task.html#trains.task.Task.export_task
https://allegro.ai/docs/task.html#trains.task.Task.update_task
Check the structure returned by export_task, you'll find the entire configuration test there,
then, you can use that to update back the Task.

BTW:
Partial update is also supported...

4 years ago

0 Hello, Another Question

ShortElephant92 yep, this is definitely enterprise feature 🙂
But you can configure user/pass on the open source, even store as hasedh the passwords if you need.

2 years ago

0 Hi, Would It Be Possible To Parse Torch Requirement When It’S Part Of The Extras_Require Dict? In My Code, I Have The Following:

Hi JitteryCoyote63
Yes I think you are correct, since torch is installed automatically as a requirement by pip, the agent is not aware of it, so it cannot download the correct one.
I think the easiest is just to add the torch as additional package
# call before Task.init() Task.add_requirements(package_name="torch", package_version="==1.7.1")

4 years ago

0 Hi Everyone! I'Ve Had A Problem. But When I Was Describing It Here It Was Solved. Maybe It Will Help Someone. I Use Pytorch And Training Accidentally Freezes After Weights Uploading By Trains. Don'T Know Exactly What'S Wrong, But It Was Somehow Connected

PungentLouse55 could you test with 0.15.2rc0 see if there is any difference ?

5 years ago

0 Hi Folks, We Are Trying To Find A Tool To Help With Workflow Orchestration. This Is Our Stack So Far (Label Studio/Clearml/Seldon). Does Anyone Have Any Experience With Using Any Workflow Which Is Most Compatible Esp Wrt To Clearml.

TenseOstrich47 / PleasantGiraffe85
The next version (I think releasing today) will already contain scheduling, and the next one (probably RC right after) will include triggering. That said currently the UI wizard for both (i.e. creating the triggers), is only available in the community hosted service. That said I think that creating it from code (triggers/schedule) actually makes a lot of sense,

pipeline presented in a clear UI,

This is actually actively worked on, I think Anxious...

4 years ago

0 Hello! I'M Using A

E.g. I might need to have different N-numbers for the local and remote (ClearML) storage.

Hmm yes, that makes sense

That'd be a great solution, thanks! I'll create a PR shortly

Thank you! 🙏 🤩

2 years ago

0 Is There An Enviroument Variable That I Can Use To Set The Trains.Conf File Path?

SlipperyDove40 Yes there is
TRAINS_CONFIG_FILEhttps://allegro.ai/docs/faq/faq/#trains-configuration

5 years ago

0 Hi, Kudos For The 0.15 Guys! I Am Having An Issue Related To Git Auth: I Have An Issue With Trains-Agent (0.15): It Does Not Use Git Creds While Trying To Clone A Private Repo:

Done!

Thanks

fatal: unable to find a suitable socket path; use --socket

)

Look here https://stackoverflow.com/questions/5015732/why-do-i-get-unable-to-connect-a-socket-when-i-try-to-clone-via-a-git-url

I think that's the root cause, we should probably also add https://github.com/allegroai/trains-agent/issues/16

5 years ago

0 Hello! I Think I'Ve Found A Bug, But Couldn'T Fix It Completely To Make A Pull Request. I Want To Optimizer Hyperparameters With Trains.Automation But:

PungentLouse55 you can find the metrics in the "original" (aka base template) experiment.

4 years ago

0 Hi, In One Of My Agents With Cuda Version: 11.1 (From Nvidia-Smi), Clearml Agent 0.17.1 Detects Version 100 (I Can See From Experiments Logs:

agent.cuda_driver_version = ...
agent.cuda_runtime_version = ...

Interesting idea! (I assume for reporting only, not configuration)

... The agent mentionned used output from nvcc (2) ...

The dependencies I shared are not how the agent works, but how Nvidia CUDA works 🙂
regrading the cuda check with nvcc , I'm not saying this is a perfect solution, I just mentioned that this is how this is currently done.
I'm actually not sure if there is an easy way to get it from nvid...

4 years ago

0 I Have A Set Up An Agent, On A Gpu Machine, And Spun Up The Daemon In Docker Moder, And Specifically Specified A Gpu That It Will Work With. The Image Is Okay And I Verified That By Running

well cudnn is actually missing from the base image...

5 years ago

Show more results