AgitatedDove14

48 Questions, 8049 Answers

Active since 10 January 2023

Last activity 6 months ago

Reputation

Badges 1

25 × Eureka!

Answers 8049

0 Hi All, There Is A Way To Get From A Task-Object The Experiment Source Code? In Other Words, Assume I Have Access To A Specific Trains Server And Want To Store From A Particular Task The Experiment Source Code In A Temp File. There Is A Convenient Way To

Hmm are you running from inside the Kaggle jupyter thing ?

3 years ago

0 I Am Back With Another Question: Is There A File Similar To The

ReassuredTiger98 no, but I might be missing something.
How do you mean project-specific?

3 years ago

0 Is It Possible To Add A Callback For A Pipeline From A Step?

Ephemeral Dataset, I like that! Is this like splitting a dataset for example, then training/testing, when done deleting. Making sure the entire pipeline is reproducible, but without storing the data long term?

3 years ago

0 Hi, I Am Trying To Use The Aws Autoscaler To Assign Instance Profiles To New Machines. This Is A Better Way Than Managing Credentials. I Added The Configuration To The Autoscaler Config Like So:

RoughTiger69

Apparently,

, doesn’t populate that dict with

any keys that don’t already exist in it

.

Are you saying new entries are not added to the Dict even if they are on the Task (i.e. only entries that already exist on the dict are populated ?
But you already have all the entries defined here:
https://github.com/allegroai/clearml/blob/721569bb77d89d89e5b4f32a0ed98311c4574650/examples/services/aws-autoscaler/aws_autoscaler.py#L22

Since all this is ha...

2 years ago

0 Hi! I'M Currently Considering Switching To Clearml. In My Current Trials I Am Using Up The Api Calls Very Quickly Though. Is There Some Way To Limit That? The Documentation Is A Bit Sparse On What Uses How Many Api Calls. Is It Possible To Batch Them For

FlutteringWorm14 an RC is out (1.7.3dc1) with the ability to configure from clearml.conf
you can now set
sdk.development.worker.report_event_flush_threshold from clearml.conf

one year ago

0 Sorry For Always Posting Such Cryptic Problems. I Managed To Create A Docker-Compose File That Runs Clearml

@<1541954607595393024:profile|BattyCrocodile47> not restarting the docker, restarting the Docker service (on Mac it's an app, I think there is an option on the Docker app to do that)

11 months ago

0 Hi, Which Database Services Are Used To Store The Logged Data Such As Scalar, Text, Matrix, Etc? How Can I Query These For A Downstream Process Programmatically Instead Of Just Within The Web Ui? If Scalar Data Is Stored In Mongodb, Can I Use Pymongo To R

Ohh if this is the case, and this is a stream of constant inference Results, then yes, you should push it to some stream supported DB.
Simple SQL tables would work, but for actual scale I would push into a Kafka stream then pull it (serially) somewhere else and push into a DB

3 years ago

0 Hi, How Can I Change The Project.Default_Output_Destination? I Tried Setting It To None But It Is Not Updated

JitteryCoyote63 look for the latest RC it should have the fix (output_uri=False) 1.7.3rc1

one year ago

0 Hey :wave: *Tensorboard Logs Overwhelming Elasticsearch* I am running a clear ml server, however when running experiments with tensorboard logging I am seeing the elastic indexing time increase drastically and in some cases I have also seen timeout erro

... training script was set to upload every epoch. Seems like this resulted in a torrent of metrics being uploaded.

oh that makes sense, so basically you were bombarding the server with requests, and ending with kind of denial of service

10 months ago

0 Hi All

This one should work:
` path = task.connect_configuration(path, name=name)
if task.running_locally():
my_params = read_from_path(path)
my_params = change_parmas(my_params) # change some staff

store back the change, my_params assumed to be the content of the param file (text)

task.set_configuration_object(name=name, config_taxt=my_params) `

3 years ago

0 Hi, Together With

The experiment finished completely this time again

With the RC version or the latest ?

4 years ago

0 For Clearml Serving, If I Am Trying To Deploy 100 Models On A Gpu That Can Handle 5 Concurrently, But Each One Will Be Sporadically Used (Fine Tuned Models Trained For Different Customers), Can Clearml-Serving Automatically Load And Unload Models Based Up

Hi @<1523711619815706624:profile|StrangePelican34>

if I am trying to deploy 100 models on a GPU that can handle 5 concurrently,

Main limitation is Triton's ability to dynamically load / unload models. We know Nvidia is adding this capability, but I think this is still not out, once they support it, it should be transparent

11 months ago

0 Hello! Does Anyone Know How To Do

try Hydra/trainer.params.batch_size
hydra separates nesting with "."

one year ago

0 Trying To Setup A Trains-Agent Worker On A Remote Machine; When I Run Trains-Init And Follow The Steps To Give It Credentials For Our Trains Server I Get This

that might be it.
Is the web UI working properly ?
What ports are you using?

3 years ago

0 Brand New User Here. I’M Trying To Run An Optimization Task. The Tasks Resulting From The Optimization All Fail Because A Necessary Package Is Not Installed On Them. I Checked The Template Task And The List Of “Installed Packages” Indeed Does Not Have One

Hi BeefyHippopotamus73

. I checked the template task and the list of “Installed Packages” indeed does not have one of my required packages in the list.

Basically the "installed packages" is auto populated based on the directly imported packages n your code base.
Could it be you do not have import snowflake-connector-python and this is a derivative package (i.e. required from a different package)

BTW: when you clone your Task in the UI you can edit and add the missing packages,...

2 years ago

0 Has Anyone Tried Using Clearml With Ray Based Distributed Training For Computer Vision Models Like Resnet?

Should work out of the box, maybe the only thing to notice is that you will get a Task for every local_rank 0 process
does that make sense ?

8 months ago

0 I Found The Following Config Parameter (Related To Clearml-Data I Guess?):

In theory it should not, in practice you could run out of space while running the experiment itself...
You can always cleanup everything from time to time (maybe worth a flag?)

3 years ago

0 Hello, I Would Like To Use Spot Instances Together With The Aws Autoscaler To Train Models With Pytorch/Ignite And I Am Wondering How To Support Interruptions During The Training (In Case The Instance Is Terminated By Aws). Is There Anything Already Built

JitteryCoyote63 no I think this is all controlled from the python side.
Let me check something

3 years ago

0 Hi, I Encountered A Few Problems:

Hi FierceFly22 I'll answer according to q order?

4 years ago

0 Hello All , Good Morning ! Can You Help Better Understand The Distinction Of Cleargpt? How Is It Different From Chatgpt And What Gpt Model Are We Using In Clearml ? Thank You In Advance !

Hi @<1628565287957696512:profile|AloofBat92>
Yeah the name is confusing, we should probably change that. The idea is it is a low code / high code , train your own LLM and deploy it. Not really chatgpt 1:1 comparison, more like, GenAI for enterprises. make sense ?

11 months ago

0 I Wanted To Ask, I'M Versioning My Data Using Clearml Data. And I'Ll Have A Training Task With Clearml Task. My Question Is, Does Clearml Keep Track Of The Data Versions Fetched From Clearml Data? Basically I Want To See How Much Of Tracking And Informati

VexedCat68 actually a few users already suggested we auto log the dataset ID used as an additional configuration section, wdyt?

2 years ago

0 Hello Again, How Can I Use The

Hi AgitatedTurtle16
You can find documentation here:
https://github.com/allegroai/clearml-session
Basically it uses the cleaml-agents to launch a session on one of the machines in the cluster.
In the remote session itself it install jupyterlab + vscode-server, then it connects to the remote session (running on the agent's machine) automatically over ssh and creates tunnel to these services.

3 years ago

0 Is There Any Customization Options With Respect To The Ui Of The Debug Samples Tab In Results? Specifically I Am Looking For Something More Similar To Tensorboard, Namely The Slider That Lets You Scroll Conveniently Through The Debug Samples Across The E

Hi RipeGoose2
Yes the slide feature is definelty on the do do list (a lot of users asked for it).
Unfortunately other than actually PR-ing to the UI repo, there is no easy way to add customization (If you have an idea on how we could have an easy interface, that would be great.)
I'll check what's the status with the slider, maybe we will be lucky enough to see it in he next update 🙂

3 years ago

0 Hey All, Uploading A Dataframe To A Task'S Artifact Saves It With A Gz Extension Though Not Compressed. Therefore Attempting To Download It Fails Due To The Inability Of Decompressing It. Any Ideas How To Solve It?

Is this the same?
https://github.com/allegroai/clearml/issues/411

3 years ago

0 Hello!

Hmm, in the credentials popup there should be a "secure connect" checkbox, it tells it to use https instead of http. Can you verify?

2 years ago

0 Hi, I Wanted To Try Model Versioning, Suppose That I'Ve A Model And Want To Have Multiple Versions Of The Same Model And To Be Able To Have Inference On These Models(For Example

Hi @<1671689437261598720:profile|FranticWhale40>
Are you positive the Triton container finished syncing ?
Could you provide the docker log (both the serving and the triton)?
What is the clearml-serving version you are using ?
Could you add a print in the "preprocess" function, just to validate you are getting to the correct model version ?

7 months ago

0 Hi Quick Question. If I Use Clearml-Data To Upload A Dataset To A Remote Folder Which Is Mounted At, Say, /Mnt/Something/Data, When I Use Dataset.Get_Local_Copy(), It Looks Like It Is Unzipping That Data Also In The Remote Folder And Thus Returning The A

although ideally i'd like to tell it exactly where to unzip it.

Ohh you can use .get_local_mutable_copy()
It will unzip it to specific folder

3 years ago

0 Anyone Know How To Override The Task Start Up Shell Script Via The Helm Charts? I Have A Aws Eks K8S Cluster W/ Port 80 Closed To Force All Traffic Over Port 443 Which Is Causing Issues W/

Hi ZippyAlligator65
You can configure it in the clearml.conf: see here:
https://github.com/allegroai/clearml-agent/blob/ebb955187dea384f574a52d059c02e16a49aeead/clearml_agent/backend_api/config/default/agent.conf#L202

one year ago

0 Has Anyone Tried Using Clearml With Ray Based Distributed Training For Computer Vision Models Like Resnet?

Hi @<1658281093108862976:profile|EncouragingPenguin15>
Should work, I'm assuming multiple nodes are running agents ? or are you saying Ray spins the jobs and clearml logs them ?

8 months ago

0 Hi, I'M Trying To Set Up My Trains-Server And I'M Getting The Following:

Yey!

3 years ago

Show more results