AgitatedDove14

49 Questions, 8122 Answers

Active since 10 January 2023

Last activity one year ago

Reputation

Badges 1

25 × Eureka!

Answers 8122

0 [Pipeline] Hey, Is It Possible To Specify The Output Uri For Pipelines And Their Components Using Pipeline Decorators? I Would Like To Store Pipeline Artifacts And Component Artifacts On S3.

It also seems that

PipelineDecorator.upload_artifact

is not compatible with caching, sadly,

Both use the exact same mechanism of uploading artifacts (i.e. including caching for downloaded artifacts), in terms of caching pipeline components, this is on a component level (i.e. same code/task same arguments, equals cache hit)
What exactly are you getting ? how is it that the "PipelineDecorator.upload_artifact" uploads to a different storage ? is that reproducible ?

2 years ago

0 Hello, I Would Like To Optimize Hparams Saved In Configuration Objects. I Used Hydra And Omegaconf For Hparams Definition (See Img). How Should I Define The Name Of Hparam In

assuming you have http://hparams.my _param my suggestion is:
` @hydra.main(config_path="solver/config", config_name="config")
def train(hparams: DictConfig):
task = Task.init(hparams.task_name, hparams.tag)
overrides = {'my_param': hparams.value}
task.connect(overrides, name='overrides')

in remote this will print the value we put in "overrides/my_param"

print(overrides['my_param'])

now we actually use overrides['my_param'] `Make sense ?

3 years ago

0 Hi, I Went Through This Slack'S History And The Problem Already Popped Up A Couple Of Times But Doesn'T Look Like Solved. On My Machine I Currently Have 4 Gpus, No Problems If I Want To Allocate All 4 Or Just 1 Using

BTW:

Error response from daemon: cannot set both Count and DeviceIDs on device request.

Googling it points to a docker issue (which makes sense considering):
https://github.com/NVIDIA/nvidia-docker/issues/1026
What is the host OS?

4 years ago

0 Hi, I Am Running Clearml Agent Using Sdk. When I Run A Remote Job On This Clearml Agent, The Venv Setup Is Totally Based On My Requirements.Txt Instead Of Adding On To What The Image Has Before. Why?

Are you running inside a kubernetes cluster ?

2 years ago

0 I Have A Questions About Queue Priorities With Clearml-Agent. I Have Two Queues,

a task of queue B if the next task is of type A it will have to wait,

It seems you imply there are two types of Tasks and they need to be executed one after the other ?

4 years ago

0 Hey All. Quick Question About The

ClumsyElephant70
Could it be virtualenv package is not installed on the host machine ?
(From the log it seems you are running in venv mode, is that correct?)

4 years ago

0 I Saw Some Talk Of Clearml + Kedro On Reddit. Is That A Good Approach?

One example is a node that resizes the images, this node receives as input a Dataset and iterates over each image, resizes it an outputs a new Dataset, which is used in the next node downstream in the Pipeline.

I agree, this sounds like a "function" rather than a job, so better suited for Kedro.

organization structure

and see for yourself (this pipeline has two nodes

train_model

and

predict

)

Interesting! let me dive into that and ...

4 years ago

0 My Autoscaled Instance Fails When Running "Git Clone" On A Private Repo. I

I

do

have the SSH key placed at

/root/.ssh/id_rsa

on the machine,

@<1541954607595393024:profile|BattyCrocodile47> is the SSH key part of the containers? or are you saying it is on the EC2 instance ?

2 years ago

0 Hi Everyone, I Have A Few Questions To Understand The Clearml-Serving A Little Better (And How Much Resources To Allocate For The Serving Pods): Are The Models I Defined To Be Served E.G. Via The Cli Downloaded To The Serving Pod? So That They Are Physica

Hi @<1649221394904387584:profile|RattySparrow90>

: Are the models I defined to be served e.g. via the CLI downloaded to the serving pod

Yes this is done automatically and online (i.e. when you update the using CLI/API) , based on the models/endpoints you set

So that they are physically lying there as a file I can see in the filesystem?

They are, and cached there

Or is it more the case that the pod gets the model when needed/when an API call for this model is incoming?

I...

one year ago

0 Is It Possible To Link Independent Training Experiments.. For Example.. I Have An Ensemble Of 2 Models (A & B) Each Models Are Trained Under Their Own Training Task In Trains Now I Will Run Another Script Which Will Use These Models To Create An Ensemble

Hmm I see what you mean. It is on the roadmap (ETA the next version 0.17, 0.16 is due in a week or so) to add multiple models per Task so it is easier to see the connections in the UI. I'm assuming this will solve the problem?

5 years ago

0 Hi, When I Save Model Using Tf.Keras.Save_Model Or Using Modelcheckpoint Model Is Not Saved As An Artifact. Output Uri Is Set To Google Cloud Bucket. When Reporting With Logger Everything Is Stored Correctly. Do You Maybe Have Any Idea Why This Would Not

Hi OutrageousGiraffe8

when I save model using tf.keras.save_model

This should create a new Model in the system (not artifact), models have their own entity and UID.
Are you creating the Task with output_uri=" gs://bucket/folder " ?

3 years ago

0 When We Run Some Agents And Then Kill Them, They Remain In Ui For Quite A Long Time (Even If They Are Don'T Exist) - It Is Like 5Min. It There Some Way To Make The Ui More Responsive? I Mean To Have A Shorter Timeout After Which The Worker Is Invisible?

RoundMosquito25 are you using clearml-agent daemon --stop or are you killing them ?

killing them basically means you loose them in the UI when they timeout, the backend does not see them for 10min so it assumes they died, when you call clearml-agent --stop they will unregister themselves and disappear immortally

2 years ago

0 Is There A Link Which Describes The Differences In Community And Enterprise Versions

PompousParrot44 I think the website should address that:
https://allegro.ai/
But the TD;DR is the enterprise version adds Full Dataset Versioning on top, with end-to-end integration from code to DLOps (e.g.. data sampling , database query capabilities, data visualization, multi-site support, permission etc,)

5 years ago

0 Hi All, I Am Starting To Use Clearml-Agent. Run It With

YEY!

4 years ago

0 Dear Community! I'M Trying Out A New Way To Make Clearml-Related Content. I'D Like Your Opinion On Whether This Is Something You Would Consider Watching (Provided Editing And Content Is A Little Bit Better

The Commodore 64 theme is hilarious

4 years ago

0 I Saw Some Talk Of Clearml + Kedro On Reddit. Is That A Good Approach?

(Just a thought, maybe we just need to combine Kedro-Viz ?)

4 years ago

0 Hello, I'M A Bit Lost In The Docs For The Mlops, I Have Script Which Already Integrate Clearml Logging, Should I Use Clearml-Task To Launch It On An Agent ? (I Already Have A Clearml-Server And A Clearml-Agent Running).

You can change it the CWD folder, if you put . in working dir it will be the root git repo, but you can do any subfolder, obviously you need to change the script path to match the folder, e.g. ./folder/script.py etc.

4 years ago

0 Is There Anywhere In The Web Ui Where One Can See The Clearml Server Version Running? I Keep Getting "Version 1.1.1 Is Now Available" Even Though I'M Pretty Sure I Took All The Steps To Update To The Latest Version

😄

4 years ago

0 In Order For A New Worker To Come Online In My K8 Cluster, Do I Need To Have An Ec2 Startup Script Init The Agent/Config, And Then Start The Daemon? Do I Have To Do This Manually Is This A Better Way?

Basically just change the helm yaml
queue: my_second_queue_name_here

2 years ago

0 Does Clearml Have The Ability To Run A Single Experiment Across Multiple Nodes/Gpus In A K8 Cluster?

Actually this is by default for any multi node training framework torch DDP / openmpi etc.

2 years ago

0 Hello, I Have An Error While Installing Git Dependencies Of Local Package: So Far I Used Task.

oh dear 😞 if that's the case I think you should open an Issue on pypa/pip , I'm not sure what we can do other than that ...

4 years ago

0 Hi, I'M Having Some Issues That I Can'T Seem To Find Where The Problem Is Or How To Solve It. I'M Running Some Code On The Worker When I'M Trying To Download One Of The Artifacts That Can Be Found In The Input Model Task I'M Getting:

PompousBeetle71 Check the beginning of the log, it should print the configuration, including the access key (excluding the secret) see if it makes sense...

5 years ago

0 Hi, I Try To Run Locally

What are you getting with:

curl http://<ip>:8008/auth.login

3 years ago

0 Hey Guys, Hope You'Re Having A Good Week

Yep 🙂
Basically:
` task = Task.get_task(task_id='aaaa')
while task.status not in ('completed', 'stopped',):

do something ?

sleep(15) `(Notice task.status / task.get_status() will refresh the Task status on every call)

4 years ago

0 Thought I Would Share This. Something To Think About Over The New Year.

Thanks SubstantialElk6 !
Happy new year 🎉 🍺 🍾 🎇

3 years ago

0 Hi, I Would Like To Check What Would Be The Recommended Hardware Specs For The Server Host Clearml Server. I Had One Configured With 32 Cpu Cores, 64Gb Ram And I Noticed That If We Have A Surge In Remote Task Creation, The Following Delays Occurs.

Wait I might be completely off.
Is this line "hangs" ?

task.execute_remotely(..., exit_process=True)

4 years ago

0 Hi There! Is There Any Way To Boost Creating Sha2 Hashes During

It uses only one CPU core, could I use multiprocessing somehow?

Hi EcstaticMouse10
Hmm, yes it should be multi core:
https://github.com/allegroai/clearml/blob/a9774c3842ea526d222044092172980ae505e24f/clearml/datasets/dataset.py#L1175
wdyt?

3 years ago

0 Clearml Team Is No Longer To Develp Clearml-Session..? I Wrote An Issue But Nobody Answer

Sorry @<1524922424720625664:profile|TartLeopard58> 😞 we probably missed it
clearml-session is still being developed 🙂
Which issue are you referring to ?

2 years ago

0 Hey All -- I'M Fairly New To This But, As Of Today, My Required Packages Aren'T Being Recognized In Cloned Runs And They Are Repeatedly Failing. Has Anyone Had Similar Issues/Found A Fix?

-- I've been running my script from VSCode for the first time,

In the initial Task (the one created when running inside VSCode) do you have all the packages listed in the "Installed Packages" section ?

2 years ago

0 Hey! I Would Like To Connect To Same Task From Multiple Consumer And Upload Debug Image. Is It Possibile? It Seems Like I Can Connect To The Task. Get The Logger But Nothing Is Uploaded.

yes you are correct, OS environment:
TRAINS_PROC_MASTER_ID=1:task_id_here

5 years ago

Show more results