AgitatedDove14

49 Questions, 8060 Answers

Active since 10 January 2023

Last activity 9 months ago

Reputation

Badges 1

25 × Eureka!

Answers 8060

0 Hi All! When I Set A List As A Task Parameter And Later Try To Retrieve It, What I Get Is A String. Is This The Expected Behavior? I Have Prepared The Following Snippet So That You Can Reproduce It.

Task.debug_simulate_remote_task simulates the Task being executed by the agent (basically same behaviour, only local). the argument it gets is the Task ID (string).
The to see how it works is to run the code once (no debug_simulate call), get the Task ID we created, then rerun with the debug_simulate_remote_task passing the previous Task.ID
Make sense ?

3 years ago

0 Hi, Does Anyone Have Some Issues With Cloning Git Repos Within Alegro? I Always Got Some Error Massage: Fatal: Unable To Access '

Is trains-agent using docker-mode or virtual-env ?

4 years ago

0 Hi, Does Anyone Have Some Issues With Cloning Git Repos Within Alegro? I Always Got Some Error Massage: Fatal: Unable To Access '

Please attach the log 🙂

4 years ago

0 Can I Manually Delete

Hi MelancholyElk85

Can I manually delete

.zip

files with datasets in

.clearml/cache/storage_manager/datasets

directory?

Yes, you can. I "think" the .zip is stored for easier access, but you can delete it, as long as the "extracted" folder exists, it should be fine.

3 years ago

0 Hi, I'M Configuring An Agent. After Pasting The Credentials, I Get:

And I'm assuming no luck?

4 years ago

If I try to connect a dictionary of type

dict[str, list]

with

task.connect

, when retrieving this dictionary with

Wait, this should work out of the box, do you have any specific example?

3 years ago

0 Hi, Is It Possible To Query All Experiments In A Project And Get The Best Performing One (Sorted By One Metric)? Something Similar As Search_Runs In Mlflow (

Still feels super hacky tho, think it would be nice to have a simplier way or atleast some nice documentation

YES you are absolutely correct, we should add it to the Task interface.
Any chance you add a GitHub issue so we do not forget ?

3 years ago

0 Can I Launch Training With Hugginfaces Accelerate Package Using Multi-Gpu Option And Task.Execute_Remotely() ? Currently It Fails:

Hi @<1657918724084076544:profile|EnergeticCow77>

Can I launch training with HugginFaces accelerate package using multi-gpu

Yes,

It detects torch distributed but I guess I need to setup main task?

It should 🤞
Under the execution Tab script path, you should see something like -m torch.distributed.launch ...

5 months ago

0 Hi All Im Trying To Save My Model Checkpoints During Runtime But Am Running Into A Confusing Snag. I'M Using The Huggingface Architecture For A Transformer. Using Their Training Module To Control Training. In The Training Args, I Have The

Hi @<1730033904972206080:profile|FantasticSeaurchin8>
Is this only relates to this
https://github.com/coqui-ai/Trainer/issues/7
Or is it a clearml sdk issue?

4 months ago

0 Hello, My Dl Workflow Includes Post-Training Quantization. Is There A Way To Implement These Procedures In Clearml?

However, SNPE performs quantization with precompiled CLI binary instead of python library (which also needs to be installed). What would be the pipeline in this case?

I would imagine a container with preinstalled SNPE compiler / quantizer, and a python script triggering the process ?

one more question: in case of triggering the quantization process, will it be considered as separate task?

I think this makes sense, since you probably want a container with the SNE environment, m...

3 years ago

0 Hi All, I'Ve Successfully Run A Task Locally, And Now I'M Trying To Clone It And Send It To A Queue. It Looks Like The Environment Is Built Successfully, But It Hangs Here:

Please let me know what you find 🤞

5 months ago

0 Hey Folks, When I Run

This port is odd, it should be 8008
8015

3 years ago

0 Hey, I Am Encountering The Following Error After Training A Model Exception Encountered While Uploading Failed Uploading Object ../Models/Best.Pt (413): <!doctype html> <Html Lang=En> <Title>413 Request Entity Too Large</Title> <H1>Request Entity Too Lar

Hi @<1610083503607648256:profile|DiminutiveToad80>

<h1>Request Entity Too Large</h1>

What's the size of the file? how are you running your clearml-server?

6 months ago

0 Encountered An Odd Bug. Upon Attempting To Write Images To Clearml (3D Projected, Matplotlib),

Hmm can you test with the latest RC?
pip install clearml==0.17.6rc1

3 years ago

0 Regarding The “Classic” Datasets (Not Hyper Datasets): Is There An Option To Do Something Equivalent To Dvc’S “

Also, how would one ensure immutability ?
I guess this is the big question, assuming we "know" a file was changed, this will invalidate all versions using it, this is exactly why the current implementation stores an immutable copy. Or are you suggesting a smarter "sync" function ?

3 years ago

0 Is It Possible To Restrict An Agent'S Cpu Usage? Like Limit The Number Of Cores It Can Use?

Hi ElegantCoyote26
If there is, it will have to be using the docker-mode, but I do not think this is actually possible because this is not a feature of docker. It is possible to do on k8s, but that's a diff level of integration 🙂
EDIT:
FYI we do support k8s integration

4 years ago

0 Is There Any Way To Clear The Installed Packages Of A Task Programmatically? (I.E. Using The Python Sdk And Not The Ui)

Regrading the missing packages, you might want to test with:
force_analyze_entire_repo: falsehttps://github.com/allegroai/trains/blob/c3fd3ed7c681e92e2fb2c3f6fd3493854803d781/docs/trains.conf#L162

Or if you have a full venv you like to store instead:
https://github.com/allegroai/trains/blob/c3fd3ed7c681e92e2fb2c3f6fd3493854803d781/docs/trains.conf#L169

BTW:
What is the missed package?

4 years ago

0 Hello Everyone, I’M Newcomer For Clearml. I Have Question Related To

Hi MortifiedCrow63 , thank you for pinging! (seriously greatly appreciated!)
See here:
https://github.com/googleapis/python-storage/releases/tag/v1.36.0
https://github.com/googleapis/python-storage/pull/374
Can you test with the latest release, see if the issue was fixed?
https://github.com/googleapis/python-storage/releases/tag/v1.41.0

3 years ago

0 I Just Deployed Clearml Into K8 Cluster Using Clearml Helm Package. When I Ran A Job, It Gave This Error In The Clearml Web Server (Attached Below). I Sshed Into The Pod Running The Clearml-Agent. Upon Typing Clearml-Agent Init, I Realised The Clearml.Con

Hi DeliciousBluewhale87
My theory is that the clearml-agent is configured correctly (which means you see it in the clearml-server). The issue (I think) is that the Task itself (running inside the docker) is missing the configuration. The way the agent passes the configuration into the docker is by mapping a temporary configuration file into the docker itself. If the agent is running bare-metal, this is quite straight forward. If the agent is running on k8s (or basically inside a docker) th...

3 years ago

0 Hi, I Am Trying To Upload A Plot To An Existing Task Using The

SmarmyDolphin68 okay what's happening is the process exists before the actual data is being sent (report_matplotlib_figure is an async call, and data is sent in the background)
Basically you should just wait for all the events to be flushed
task.flush(wait_for_uploads=True)That said, quickly testing it it seems it does not wait properly (again I think this is due to the fact we do not have a main Task here, I'll continue debugging)
In the meantime you can just do
sleep(3.0)And it wil...

4 years ago

0 Unrelated Problem (Or Is It?) The Clearml'S Built In Cleanup Service Fails

In the Task log itself it will say the version of all the packages, basically I wonder maybe it is using an older clearml version, and this is why I cannot reproduce it..

3 years ago

0 Greetings And Hello

Thanks PompousBaldeagle18 !

Which software you used to create the graphics?

Our designer, should I send your compliments 😉 ?

You should add which tech is being replaced by each product.

Good point! we are also missing a few products from the website, they will be there soon, hence the "soft launch"

3 years ago

0 Hello Community, I Had A Query Regarding Clearml-Data , Can The Dataset Be Queried Against Some Metadata Using Ui And/Or Cli ?

Hi HarebrainedBear62
What's the type of data ?

3 years ago

0 Hey There, I Would Like To Increase The

Give me a minute

3 years ago

0 Hey, We'Ve Experienced Some Issues With Clearml Trigger Schedulers We Were Playing With In The Last Few Days. This Is What Happened:

This is odd... can you post the entire trigger code ?
also what's the clearml version?

one year ago

0 Hello, Is It Possible To Run Trains Offline Where There'S No Http Connection Between The Node Running The Job And Where The Web Ui Runs? I See In Your Diagram The Connection Between Training Machine And Trains Server (Which Contains The Web Ui) Is Over Ht

Just to clarify, where do I run the second command?

Anywhere just open a python console and import the offline task:
from trains import TaskTask.import_offline_session('./my_task_aaa.zip')

Related, how to I specify in my code the cache_dir where the zip is saved?

This is the Trains cache folder, you can set it in the trains.conf file:
https://github.com/allegroai/trains/blob/10ec4d56fb4a1f933128b35d68c727189310aae8/docs/trains.conf#L24

4 years ago

0 Hello, I Am New To Clearml, I Would Like To Learn More About How Clearml Works On A Hpc Cluster Where The Only Way To Get Computational Resources Is Via Slurm:

I think so, when you are saying "clearml (bash script..." you basically mean, "put my code + packages + and run it" , correct ?

3 years ago

0 Hello, I Am New To Clearml, I Would Like To Learn More About How Clearml Works On A Hpc Cluster Where The Only Way To Get Computational Resources Is Via Slurm:

Correct 🙂

3 years ago

0 Hi All, We’Re Interested In Using Trains For A New Ml Project. This Project Is An Early Proof Of Concept So We’D Like To Start With The Open Source Version. One Question We’Re Finding Difficult To Answer Is: What Tools Do People Successfully Combine With

Hi EnchantingWorm39
Great question!
Regrading the data management, I know the enterprise edition has full support for unstructured data, and we plan to soon have a solution for structured data as part of the open source (soon= hopefully in a month time)
Regrading model serving, I know you can integrate with TFServing or seldon with very little effort (usually the challenge is creating triggers etc, but but in most cases this is custom code anyhow 🙂 )
I do not have experience with Cortex/B...

4 years ago

0 Hi, Is There A Way To Stop A Clearml-Agent From Within An Experiment? Or Block It To Prevent It Running Any Other Task?

Hi, Is there a way to stop a clearml-agent from within an experiment?

It is possible but only in the paid tier (it needs backend support for that) 😞

My use case it: in a spot instance marked for termination after 2 mins by aws

Basically what you are saying is you want the instance to spin down after the job is completed, correct?

3 years ago

Show more results