AgitatedDove14

49 Questions, 8094 Answers

Active since 10 January 2023

Last activity 10 months ago

Reputation

Badges 1

25 × Eureka!

Answers 8094

0 Hi Everyone, I Wanted To Inquire If It'S Possible To Have Some Type Of Model Unloading. I Know There Was A Discussion Here About It, But After Reviewing It, I Didn'T Find An Answer. So, I Am Curious: Is It Possible To Explicitly Unload A Model (By Calling

Suppose that I have three models and these models can't be loaded simultaneously on GPU memory(

Oh!!!

For now, this is the behavior I observe: Suppose I have two models, A and B. ....

Correct

Yes this is a current limitation of the Triton backend BUT!
we are working on a new version that does Exactly what you mentioned (because it is such a common case where in some cases models are not being used that frequently)
The main caveat is the loading time, re-loading models from dist...

one year ago

0 Hello All, I Wanted To Get The Advice Of The People Here About Data Versioning And Tracking Using Clearml. Many Of The Dataset We Work With Are Generated By Sql Query. It’S Not Necessary To Generate Them Every Time But I’M Trying To Get Advice On How To

Hi @<1545216070686609408:profile|EnthusiasticCow4>

Many of the dataset we work with are generated by SQL query.

The main question in these scenarios is, are those DB stable.
By that I mean, generally speaking DB serve applications, and from time to time they undergo migration (i.e. change in schema, more/less data etc).
The most stable way is to create a script that runs the SQL query, and creates a clearml dateset from it (that script becomes part of the Dataset, to have full tracta...

one year ago

0 Hi People! I Think The Clearml

The image is

allegroai/clearml:1.0.2-108

Yep, that makes sense, seems like a backwards compatibility issue

one year ago

0 I'M Getting Some Weird Clearml Behavior. I'Ve Deployed It To An Ec2 Instance. When I Access

Hi @<1541954607595393024:profile|BattyCrocodile47>
see here: None
Try with app.clearml.mlops-club.org
and the rest of them

one year ago

0 Hi Everyone, I Have A Few Questions To Understand The Clearml-Serving A Little Better (And How Much Resources To Allocate For The Serving Pods): Are The Models I Defined To Be Served E.G. Via The Cli Downloaded To The Serving Pod? So That They Are Physica

Hi @<1649221394904387584:profile|RattySparrow90>

: Are the models I defined to be served e.g. via the CLI downloaded to the serving pod

Yes this is done automatically and online (i.e. when you update the using CLI/API) , based on the models/endpoints you set

So that they are physically lying there as a file I can see in the filesystem?

They are, and cached there

Or is it more the case that the pod gets the model when needed/when an API call for this model is incoming?

I...

11 months ago

0 Hello, Is It Possible To Run Trains Offline Where There'S No Http Connection Between The Node Running The Job And Where The Web Ui Runs? I See In Your Diagram The Connection Between Training Machine And Trains Server (Which Contains The Web Ui) Is Over Ht

I see.
You can get the offline folder programmatically then copy the folder content (it's the same as the zip, and you can also pass a folder instead of zip to the import function)
task.get_offline_mode_folder()You can also have a soft link of the offline folder (if you are working on a linux machine:
ln -s myoffline_folder ~/.trains/cache/offline

4 years ago

0 Hey All! Ive Gone Through The Doco And Not Found Anything At The Moment, But Does Clearml Have Model Versioning And Staging (Similar To Mlflow).

LudicrousParrot69
I "think" I have a better handle on what you wish to do.
Is it kind of generic "serving" solution?
FYI:
Model artifact is, usually, a weights/model file. The idea that later you will be able to access it and serve it. Now the problem is (and I think this is what you are referring to) there is usually a specific piece of code tied to that model that can use it (a.k.a pyfunc)
A few ideas:
These days everyone is trying to build their models with generic interface, so that scik...

4 years ago

0 Hi I Saw This On The Clearml-Agent Docs But Other Than The Docker Image, I'M Not Sure How To Integrate This With Clearml Py And Clearml-Server. Please Advise.

SubstantialElk6 Ohh okay I see.
Let's start with background on how the agent works:
When the agent pulls a job (Task), it will clone the code based on the git credentials available on the host itself, or based on the git_user/git_pass configured in ~/clearml.conf
https://github.com/allegroai/clearml-agent/blob/77d6ff6630e97ec9a322e6d265cd874d0ab00c87/docs/clearml.conf#L18
The agent can work in two modes:
Virtual environment mode, where it will create a new venv for each experiment ba...

3 years ago

0 Can Someone Help Me With Deploying This Example Model (From Triton Inference Server) Deployed In Clearml-Serving? Too Many Random Errors For Me To Figure It Out

Do we launch multiple gorups of these in different projects?

Actually Triton can serve multiple models and the endpoints/models are controlled from the clearml-serving.
The only issue is adding a load-balancer in front of multiple nodes to balance the requests between them. wdyt?

3 years ago

0 Hi, With Clearml-Agent 1.5.1, I Tried To Run An Experiment Within A Docker With Image Python3:8 And It Failed Executing The Task While Trying To Call Python3.9. I Am Not Sure Why It'S Using Python3.9, Since The Agent.Default_Python Is 3.8 And The Image Is

packages are updated, and I don't know which python version I get, + changing the python version of the OS is not really recommended

Wait I'm confused, this is inside a container, no?

and the python version running my code should not depend of the python version running the clearml-agent (especially for experiments running in containers)

Generally speaking you are correct, but some packages will not have the same version for all python versions

Specifically in this case I think...

2 years ago

0 Hi, I Have Another Problem

This is also set in the command line.
--cpu-only or maybe without any --gpus flag at all

4 years ago

0 Hello Everyone, I’M Newcomer For Clearml. I Have Question Related To

That’s the question i want to raise too,

No file size limit
Let me try to run it myself

3 years ago

0 Let’S Imagine I’M Building A Pipeline With Five Consecutive Steps, Where Some Of The Steps Are Non Ml/Dl Based. Using Clearml I Run A Lot Of Experiments To Find The Right Pipeline Configuration. After I Found The Right Algorithms And Parameters For My Pip

Hi ClumsyElephant70

s there a way to run all pipeline steps, not in isolation but consecutive in the same environment?

You mean as part of a real-time inference process ?

3 years ago

0 Hi, I Have A Small Issue About Gpu Monitoring. I Run My Training Inside A Singularity Container And I Set The Cuda_Visible_Devices Variable. However, I Get The Following Message:

Yes, that means the nvidia drivers are present (as you mentioned the GPU seems to be detected).
Could you check you have libnvidia-ml.so.1 inside the container ?
For example in /usr/lib/nvidia-XYZ/

4 years ago

0 Hi, I Have Been Getting The Following For A While. Is There A More Detailed Log I Can Look Into? This Happens On Both Https And Http.

Setting the credentials on agent machine means the users cannot use their own credentials since an k8s glue agent serves multiple users.

Correct, I think "vault" option is only available on the paid tier 😞

but how should we do this for the credentials?

I'm not sure how to pass them, wouldn't it make sense to give the agent an all accessing credentials ?

3 years ago

0 When I Do

is this a config file on your side or something I can change, if we had enterprise version?

Yes, this is one of the things you can configure

2 years ago

0 Hello, I'M Trying To Save A Keras Model As A Task Artifact, And Then Upload It From Another Task. Does Anyone Know The Syntax For That? What I'Ve Seen Is Not Quite Working.

ConfusedPig65 could you send the full log (console) of this execution?

3 years ago

0 Hey, I'M Running A Pipeline, And 1 Stage Passed - But The Next One Failed. I Fixed The Bug For The Second One - Is There Any Way To Retry The Pipeline From The Failure?

Is there an option to do this from a pipeline, from within the

add_step

method? Can you link a reference to cloning and editing a task programmatically?

Hmm, I think there is an open GitHub issue requesting a similar ability , let me check on the progress ...

nope, it works well for the pipeline when not I don't choose to continue_pipeline

Could you send the full log please?

3 years ago

0 Hi All, We’Re Interested In Using Trains For A New Ml Project. This Project Is An Early Proof Of Concept So We’D Like To Start With The Open Source Version. One Question We’Re Finding Difficult To Answer Is: What Tools Do People Successfully Combine With

EnchantingWorm39 you have great timing ;)

4 years ago

0 Is This An Expected Behaviour? Trains Version 0.16.4, Not Able To Upgrade Now To Latest Version But I Doubt This Was Changed

New version will contain much more advanced search (including all the task fields)

are there any more fields in this function with partial matching? for example project? tags?

Yes they can all be filtered (basically everything you see in the UI)
notice: tags are strings (you can provide list of tags), project is an ID of the project
(Use Task.get_project_id, I think)

4 years ago

0 Hi All, I'M A New User With Clearml-Agent. I Know It'S Supposed To Automatically Replicate The Environment Of A Task, Based On Installed Packages List. However, Installed Packages Of My Task Is Misses Many Of Installed Packages (Any Idea Why?) How Do I Co

Hi @<1523702969063706624:profile|PoisedShark13>

However, INSTALLED PACKAGES of my task is misses many of installed packages (any idea why?)

It automatically detects the directly imported packages, literally analyzing your code base and looking for imports
The derivative packages (i.e. the one that any of the "main" packages need, will be listed after the first time the agent installs everything)
If something specific is missing, you can manually add it with:

Task.add_requiremen...

one year ago

0 Hi Everyone! I Have A Short Question That You Can For Sure Help Me With. Is There A Way To Avoid Each Task To Create A New Environment? I'D Like To Specify Which Env To Use. I Tried With

@<1523703080200179712:profile|NastySeahorse61> / @<1523702868694011904:profile|AbruptCow41>

Is there a way to avoid each task to create a new environment?

You can just define CLEARML_AGENT_SKIP_PYTHON_ENV_INSTALL=1 it will just use whatever you have there (notice it will totally ignore requirements.txt and "installed packages" on the Task)

BTW I would recommend turning on the venv caching, this is per docker/python/packages caching so the next time you are using th exact requi...

2 years ago

0 Assuming I Have A

A few implementation / design details:
When you run code with Trains (and call init) it will record your environment (python packages, git code, uncommitted changes etc) Everything is stored on the Task object in the trains-server, when you clone a task you literally create a copy of the Task object (i.e. a second experiment). on the cloned experiment, you can edit everything (parameters, git, base docker image etc) When you enqueue a Task you add its ID to the execution queue list a trains-a...

4 years ago

0 Hi Anyone

That sounds like an internal tritonserver error.
https://forums.developer.nvidia.com/t/provided-ptx-was-compiled-with-an-unsupported-toolchain-error-using-cub/168292

3 years ago

0 Hello, I'M Trying To Save A Keras Model As A Task Artifact, And Then Upload It From Another Task. Does Anyone Know The Syntax For That? What I'Ve Seen Is Not Quite Working.

Okay ConfusedPig65 I found the problem. For some reason the latest TF.keras.load_model . save_model is not tracked.
I'll make sure we push a fix later today

3 years ago

0 Hello! Since Today I Get

Thanks @<1523701868901961728:profile|ReassuredTiger98>
From the log this is what conda is installing, it should have worked

/tmp/conda_env1991w09m.yml:
channels:
- defaults
- conda-forge
- pytorch
dependencies:
- blas~=1.0
- bzip2~=1.0.8
- ca-certificates~=2020.10.14
- certifi~=2020.6.20
- cloudpickle~=1.6.0
- cudatoolkit~=11.1.1
- cycler~=0.10.0
- cytoolz~=0.11.0
- dask-core~=2021.2.0
- decorator~=4.4.2
- ffmpeg~=4.3
- freetype~=2.10.4
- gmp~=6.2.1
- gnutls~=3.6.13
- imageio~=2.9.0
-...

3 years ago

0 Hey! Is There A Way To Ignore The Spammy Output Of Progressbars Like

Long story short, work in progress.
BTW: are you referring to manual execution or trains-agent ?

4 years ago

0 Hello Everyone! The Question About Dataset.Squash(). The Squash Operation Copies All The Data And Is No Longer Linked To Previous Commits? I Thought This Operation Is Like Git Squash But It Seems To Me That Clearml Dataset.Squash() Create Just A Copy Of S

The Squash operation copies all the data and is no longer linked to previous commits?

Yes, basically the idea is if you have data version that relies on many parents that needs to be merged, the squash will create a merged copy and push it all as a single version, and then yes the parent versions are no longer needed

I thought this operation is like git squash but it seems to me

yeah... we did not want to actually delete the parents because unlike git, the operation is done ...

11 months ago

0 Well, This Is My Question... I'M Trying To Adapt Clearml To Aws Using Basically Ecs Fargate + Documentdb + Aws Es + Elasticache + Efs. I Could Start The Fileserver Component, But Now I'M Trying To Start The Api Server And Is Not Working, Before Stop The T

BTW: Full RestAPI reference here
https://allegro.ai/clearml/docs/rst/references/clearml_api_ref/index.html

3 years ago

0 I Have Set

Hi Guys, just curious here, what's was the final issue?
Also out of curiosity, what does that mean? "1.12.2 because some bug that make fastai lag 2x" ?

9 months ago

Show more results