AgitatedDove14

49 Questions, 8126 Answers

Active since 10 January 2023

Last activity one year ago

Reputation

Badges 1

25 × Eureka!

Answers 8126

0 Folks, Could You Please Clarify/Help? I Correct Understand, If --Docker Is Enable That Will Means Every New Experiments Will Be Executed Into Dedicated Agent Worker Containers? Also I See For

Hi UnevenOstrich23

if --docker is enable that will means every new experiments will be executed into dedicated agent worker containers?

Correct

I think the missing part is how to specify the docker for the experiment?
If this is the case, in the web UI, clone your experiment (which will create a draft copy, that you can edit), then in the Execution tab, scroll down to the "base docker image" and specify the docker image to use.
Notice that you can also add flags after the docker im...

4 years ago

0 Is It Possible To Add A Callback For A Pipeline From A Step?

"General" is the parameter section name (like Args)

4 years ago

0 Hi, Anyone Seen This Issue?

what's the docker version?

4 years ago

0 Hi There, I Have A Batch Prediction Task That Load A Model Published On Clearml.

Hi IrritableGiraffe81
Can you share a code snippet ?
Generally I would try
task = Task.init(..., auto_connect_frameworks={"pytorch': False, 'tensorflow': False)

3 years ago

0 Playing Around With Hpo For First Time. I Am Giving This As Hyperparameter:

Ok, just my ignorance then?

LOL, no it is just that with a single discrete parameter the strategy makes less sense 🙂

4 years ago

0 Hi Guys! Is There A Way To Tell An Agent To Run A Task In An Existing Venv (Without Creating A New One)?

I still see things being installed when the experiment starts. Why does that happen?

This only means no new venv is created, it basically means install in "default" python env (usually whatever is preset inside the docker)
Make sense ?
Why would you skip the entire python env setup ? Did you turn on venvs cache ? (basically caching the entire venv, even if running inside a container)

3 years ago

0 Hi! Is There A Simple Way To Visualize Tensors In Clearml? Something Like Tensorboard'S Tsne Or Pca...

FrustratingWalrus87 Unfortunately TB's TSNE is not automatically captured by ClearML (Scalars, histograms etc. are)
That said, matplotlib will be automatically captured do you can run your own PCA/tSNE and use matplotlib to visualize (ClearML will capture it).
The same applies for plotly.
What do you think?

4 years ago

0 Is There A Reason

Now that we have the free tier (a.k.a community server) we might change the default behavior.
The idea is always to allow an easy way to on-board and test the system.
ReassuredTiger98
BTW: what's the scenario where your machine reverted to the default configuration (i.e. no configuration file) ?

4 years ago

0 Heya, Is There Any Plan For Clearml To Leverage The New

Hi FierceHamster54
This is already supported, unfortunately the open-source version only supports static allocation (i.e you can spin multiple agents and connect each one to specific number of GPUs), the dynamic option (where you have single agent allocating jobs to multiple GPUs / Slices is only part of the enterprise edition
(there is the hidden assumption there that if you spent so much on a DGX you are probably not a small team 🙂 )

3 years ago

0 I Am Getting This Specific Message When Trying To Run Hyper Parameters Optimization (Running Remotely My Task). Does It Affect My Flow? Do I Have Something To Worry About?

Although I didn't understand why you mentioned

torch

in my case?

Just a guess 🙂 other frameworks do multi-process as well,

I would guess it relates to parallelization of Tasks execution of the

HyperParameterOptimizer

class?

Yes that might be it, it's basically by product of using python "Process" class for multiprocessing. we are working on a fix, not a trivial unfortunately

3 years ago

0 Hi All! I Noticed When A Pipeline Fails, All Its Components Continue Running. Wouldn'T It Make More Sense For The Pipeline To Send An Abort Signal To All Tasks That Depend On The Pipeline? I'M Using Clearml V1.1.3Rc0 And Clearml-Agent 1.1.0

Okay so my thinking is, on the pipelinecontroller / decorator we will have:
abort_all_running_steps_on_failure=False (if True, on step failing it will abort all running steps and leave)
Then per step / component decorator we will have
continue_pipeline_on_failure=False (if True, on step failing, the rest of the pipeline dag will continue)
GiganticTurtle0 wdyt?

4 years ago

0 Hello! It'S My Second Time Trying Clearml - Hoping This Time I Will Succeed

So sorry for the delay here! totally lost track of this thread

Should I be deploying the entire docker file for every model, with the updated requirements?

It's for every "environment" i.e. if models need the same set of python packages , you canshare

2 months ago

0 Hi, I Would Like To Bring Awareness

if this is the case pytorch really messed things up, this means they removed packages
Let me check something

2 years ago

0 Maybe This Is More A Git Question Than A Clearml Question, But How Do I Get The Clearml_Agent_Git_User And Clearml_Agent_Git_Pass For Step 11 In

Make sense 🙂
Just make sure you configure the git user/pass in the docker-compose so the agent has your credentials for the repo clone.

4 years ago

0 Hi, Does Anyone Have Some Issues With Cloning Git Repos Within Alegro? I Always Got Some Error Massage: Fatal: Unable To Access '

Okay, this seems to be the problem

5 years ago

0 Hi All, Playing Around With Hp Optimisation, And I Notice In The Hyperparameteroptimizer Class Itself, The

Hmmm:

WOOT WOOT we broke the record! Objective reached 17.071016994817196
WOOT WOOT we broke the record! Objective reached 17.14302934610711

These two seems strange, let me look into it

4 years ago

0 Clearml Server Deployment Uses Node Storage. If More Than One Node Is Labeled As App=Clearml, And You Redeploy Or Update Later, Then Clearml Server May Not Locate All Your Data.

Hi TrickySheep9
You should probably check the new https://github.com/allegroai/clearml-server-helm-cloud-ready helm chart 😉
https://github.com/allegroai/clearml-server-helm-cloud-ready

4 years ago

0 Hello, There'S A Particular Metric (Perplexity) I'D Like To Track, But Clearml Didn'T Seem To Catch It. Specifically, This "Evaluation" Section Of Run_Mlm.Py In The Transformers Repo:

Clearml automatically gets these reported metrics from TB, since you mentioned see the scalars , I assume huggingface reports to TB. Could you verify? Is there a quick code sample to reproduce?

4 years ago

0 What Happens To File That Are Downloaded To A Remote_Execution Via Storagemanager? Are They Removed At The End Of The Run, Or Does It Continuously Increases Disk Space?

regrading the artifact, yes that make sense, I guess this is why there is "input" type for an artifact, the actual use case was never found (I guess until now?! what are you point there?)
Regrading the configuration

It's very useful for us to be able to see the contents of the configuration and understand

Wouldn't that just do exactly what you are looking for:
` local_config_file_that_i_can_always_open = task.connect_configuration("important", "/path/to/config/I/only/have/on/my/machi...

3 years ago

0 So I'M In A Colab Notebook, And After Running My Trainer(), How Do I Upload My Test Metrics To Clearml? Clearml Caught These Metrics And Uploaded Them:

(second cell)

4 years ago

0 Has Anyone Successfully Deployed Clearml On A Kube Cluster Utilizing Istio? I Don’T See Any Mention Of Istio In The Docs.

Hmm I think the easiest is using the helm chart:
https://github.com/allegroai/clearml-server-helm-cloud-ready
I know there is work on a teraform template, not sure about instio.
Is helm okay for you ?

4 years ago

0 Is There A Way To Report A Simple Series With X And Y Coords, X And Y Being Two Lists Of Same Length?

🙂

5 years ago

0 Question About Pipelines - So The Default For Pipeline Tasks That Are Executed Remotely Is To Execute On The

Hmm, this is a good question, I "think" the easiest is to mount the .ssh folder form the host to the container itself. Then also mount clearml.conf into the container with force_git_ssh_protocol: true see here
https://github.com/allegroai/clearml-agent/blob/6c5087e425bcc9911c78751e2a6ae3e1c0640180/docs/clearml.conf#L25

btw: ssh credentials even though sound more secure are usually less (since they easily contain too broad credentials and other access rights), just my 2 cents 🙂 I ...

3 years ago

0 Hi All, I Am Having Trouble Using The

Hi StraightDog31

I am having trouble using the

StorageManager

to upload files to GCP bucket

Are you using the storagemanager directly ? or are you using task.upload_artifact ?
Did you provide the GS credentials in the clearml.conf file, see example here:
https://github.com/allegroai/clearml/blob/c9121debc2998ec6245fe858781eae11c62abd84/docs/clearml.conf#L110

4 years ago

0 Hey All. I'M Seeing A Strange Error When Trying To Run Hyperparameter Optimisation By Cloning A Base Training Task

Verified, you are correct "." in label enumeration will break the clone .
I'll make sure this bug is passed to backend guys to fix. Thanks TenseOstrich47 !
meanwhile maybe "_" instead ? 😁

4 years ago

0 How Can I Upload A Model Manually If I’M Training Using Catboost Framework, Which Is Not Natively Supported By

BTW: I think we had a better example, I'll try to look for one

4 years ago

0 Hi Team, How To Configure Gerrit Details In Clearml So That Tasks Or Pipeline Will Be Executed Depends On Gerrit?

Hi @<1542316991337992192:profile|AverageMoth57>
Not sure I follow how the integration what you have in mind regarding Gerrit integration None
Sounds interesting ...
wdyt?

2 years ago

0 Hello, My Name Is Gabriel, I'M Using Clearml For Our Machine Learning Experiments, Which Is An Amazing Tool To Manage This Type Of Stuff So Thank You Guys For Creating This. But The Last Time I Tried To Use It Some Unexpected Error Came Up For Which I Can

https://github.com/pypa/pip/issues/9313

4 years ago

0 Question About The Configuration Format - I'D Like To Parse It Within My Python Code So I'Ll Be Able To Access Things Like

WackyRabbit7 in that case:
from trains.utilities.pyhocon import ConfigFactory, HOCONConverter from trains.config import config_obj new_conf_text = HOCONConverter.to_hocon(config=ConfigFactory.from_dict(config_obj.as_plain_ordered_dict()), compact=False, level=0, indent=2) print(new_conf_text)

5 years ago

0 Another Question, Is It Possible To Run A Single Experiment Which Is Composed Of Multiple Steps Executed As Sequential Sub-Processes Where The Current Task Is Fetched As

Hmm, let me see if you can somehow "signal" to the subprocess that it should not use the main process Task. (btw: are you forking or spawning a subprocess?)

4 years ago

Show more results