AgitatedDove14

49 Questions, 8126 Answers

Active since 10 January 2023

Last activity one year ago

Reputation

Badges 1

25 × Eureka!

Answers 8126

0 Does K8S Glue Support Running Service Agent? Slightly Confused Here

👍

4 years ago

0 I .

@<1523707653782507520:profile|MelancholyElk85> what are you trying to change ? maybe there is a better way?
BTW: if you do step_base_task.export_task() you can use the parts that you need in the dict and pass them to:
task_overrides argument in add_step (you might need to flatten the nested arguments with '.' , and thinking about it, maybe we should do that automatically?!)

4 years ago

0 Does Clearml Have The Ability To Run A Single Experiment Across Multiple Nodes/Gpus In A K8 Cluster?

it seems like each task is setup to run on a single pod/node based on the attributes like

gpu memory

,

os

,

num of cores,

worker

BoredHedgehog47 of course you can scale on multiple node.
The way to do that is to create a k8s Yaml with replicas, each pod is actually running the exact same code with the exact same setup, notice that inside the code itself the DL frameworks need to be able to communicate with one another and b...

3 years ago

0 Hi, I Would Like To Add Artifacts From Two Parallel Process In The Same Task. But One One Process Finished It Changed Task Status To Complete. May Be You Know Some Save Way To Deal With Such Situation? Or Maybe The Best Way To Check Task Status Before Upl

great!

4 years ago

0 Hey Is

Hi FierceHamster54
Dataset is downloading multi threaded already
But yes get_local_copy() is thread / process safe

3 years ago

0 <image>

Not sure on the cause but if you do:

mp.set_start_method('fork', force=True)

There is no semaphore leakage

4 years ago

0 Is There A Way To Output The Cleamrl Reports Scalars / Configuration Etc Into A Output Pdf ? If Not Available, Is It On The Near Term Pipeline ?

And having a pdf is easier/better than sharing a link to the results page ?

4 years ago

0 I'M Having Issues Running Trains-Agent On My Aws, It Seems To Not Be Able To Install Pytorch... I Have

If you edit the requirements to have
https://download.pytorch.org/whl/cpu/torch-1.4.0%2Bcpu-cp37-cp37m-linux_x86_64.whl

5 years ago

0 When I Do

So this is an additional config file with enterprise?

Extension to the "clearml.conf" capabilities

Is this new config file deployable via helm charts?

Yes, you can also set it company/user wide using the clearml Vault feature (again enterprise, sorry 😞 )

3 years ago

0 Hi, The Following Does Not Seem To Work

simply record the type of each argument when you store it, and keep it in the database, unbeknownst to the user, what do you say?

This is now supported, but then you still need to flatten the dict.
Maybe we can just support "empty_dict/new_value = 42" if the original was "empty_dict = {}"
WDYT?

4 years ago

0 Hi, Is It Possible To Pass Temporary Iam Role To The Web App Could Access?

so the thing with IAM roles, they are designed to allow AWS instances to get "automatic" permission (based on the IAM role). They are not actually designed to generate key/secret as I think the lifetime is be default relatively short. Since the actual request to the S3 comes from the client browser (i.e. outside of AWS cluster) the IAM role cannot apply, and you have to provide the key/secret. The easiest way is to generate S3 keys regardless of the IAM roles, to be used with the clients (sp...

3 years ago

0 Hi, Together With

JitteryCoyote63 How is it so far ?

5 years ago

0 Hi All, I'Ve Successfully Run A Task Locally, And Now I'M Trying To Clone It And Send It To A Queue. It Looks Like The Environment Is Built Successfully, But It Hangs Here:

How are you starting the agent?

one year ago

0 Hi, I'M Trying To Run Task.Init Inside A Jupyter Notebook For The First Time (Used It A Lot Before In Normal Python Scripts), And I Get A Warning-

but this gives me an idea, I will try to check if the notebook is considered as trusted, perhaps it isn't and that causes issues?

This is exactly what I was thinking (communication with the jupyter service is done over http, to localhost, sometimes AV/Firewall software will block it, false-positive detection I assume)

4 years ago

0 Hi, I Would Like To Pass In Some Pip Arguments That Clearml-Agent Would Include When Setting Up The Venv On The Containers. How Should I Specify This? The Argument In Question Are --Trusted-Host And --Find-Links . I Need Them As I'Ve Installed A Pypi Repo

Hmm yes this is exactly what should not happen 🙂
Let me check it

4 years ago

0 Hi Guys, I Managed To Set Up A Kubernetes Cluster And Install Trains Into It. While Testing My Set-Up I Run The Test_Reporting.Py Example

Seems like everything is in order. Can you curl to the API/web/files server?

4 years ago

0 Hi Guys, Is A Task Updating Its Status To 'Complete' Before Finishing To Upload Its Artifacts/Metrics In The Background?

JitteryCoyote63 are you suggesting it happens ?
(obviously it should not 🙂 )

5 years ago

0 Hi! I Noticed A Bug Related To Reusing The Same Component In A Pipeline. I Have Prepared A Mock Example So That You Can Reproduce It:

... these nested components are not tagged with 'pipe: <pipeline_task_id>'. I assume this should not be like that, right?

Helper functions are not "component", they are actually files that will be accessible when running the component itself.
am I missing something ?

4 years ago

0 Hey All. I Need Some Help Debugging Some Errors. I Keep Getting An Error About Failing To Clone The Repository On The Remote Instance. What Could Be The Reason Of This? Are There Any Common Errors Related To This? I Suspect Permissions, But Not Entirely

That is a bit odd, But SSH keys have to have a specific chmod flags for them to work (security issues)
What was the error ?

4 years ago

0 Is The App/Ui/Backend Customizable? Any Tutorials For That?

CleanWhale17 what is " Online-Training Support(for Dataset Shifts" ?

5 years ago

0 Hello Everyone. I'M Getting Started With Clearml. I'M Trying Hpo Atm And Have Successfully Run The Base Task. When Running The Clone Of The Base Task In One Of The Agents, I'M Getting Following Error. Any Suggestions? Tia

Thanks!
fyi: This section is not necessary if you you have clearml.conf file in ~/
Task.set_credentials( api_host=" ", web_host=" ", files_host=" ", key='********************', secret='***********************' )Let me check the code for a min

2 years ago

0 Is There Any Example Showing How To Work With Nested Pipelines? In My Case I Have Several Functions Decorated With

I mean to use a function decorated with

PipelineDecorator.pipeline

inside another pipeline decorated in the same way.

Ohh... so would it make sense to add "helper_functions" so that a function will be available in the step's context ?
Or maybe we need a new to support "standalone" decorator?! Currently to actually "launch" the function step, you have to call it from the "pipeline" main logic function, but, at least in theory, one could do without the Pipeline itself.....

4 years ago

0 Hi, I'M Trying To Set Storage Manager To Use Our Internal Miniio Installation But I Ran Into This Issue With This Testing Code:

The easiest is to pass an entire trains.conf file

4 years ago

0 Hi All, I'Ve Successfully Run A Task Locally, And Now I'M Trying To Clone It And Send It To A Queue. It Looks Like The Environment Is Built Successfully, But It Hangs Here:

Our server is deployed on a kube cluster. I'm not too clear on how Helm charts etc.

The only thing that I can think of is that something is not right the the load balancer on the server so maybe some requests coming from an instance on the cluster are blocked ...
Hmm, saying that aloud that actually could be?! Try to add the following line to the end of the clearml.conf on the machine running the agent:

api.http.default_method: "put"

one year ago

0 Question About Using Agents. When Initializing An Agent, Credentials Are Required. As I See It, Credentials Is Something Personal, Which Belongs To Data Scientists Working Remotely Sharing The Same Server And The Same Set Of Agents. So I Wonder - Why Sho

Thank you WackyRabbit7 please feel free to remind me if it slips away during my night time (yes I do sleep , contrary to common belief :))

5 years ago

0 Hi Community!, I'M Facing This Kind Of Error When Using Git Action To Run My Clearml Training Model File. This Error Occurs When It Reached Task.Init() Command In My Model Training File. "Valueerror: Clearml Configuration Could Not Be Found (Missing `~/Cl

Your git execution needs this file, just like your machine does, to know where the server is and how to authenticate. You have to Manually pass it to your git action.

3 years ago

0 Is There A Reason Why All Clearml.Task Methods Regarding Requirements (E.G. Pip Requirements) Are Class Methods? Are Requirements Not Stored In A Task?

If you think the explanation takes too much time, no worries! I do not want to waste your time on my confusion

LOL no worries 🙂
Basically the git & python analysis can take some time (I mean it can take a minute! on a large repository)
And we wanted to make sure Task.init returns quickly (it already has to authenticate with the server that slows it down, and a few more things)
The easiest way is to have the code analysis run in the background since usually there is no interaction ...

4 years ago

0 Has Anyone Had Success Using Clearml With Huggingface Models? I Create My Hf

Hi @<1523702786867335168:profile|AdventurousButterfly15>
I do not think they log more than that ?!
(what happens if you use TB?)

2 years ago

0 Hi! I Am Currently Using Hydra+Clearml And Wanted To Know If There Are Still Some Updates Coming. At The Moment, If I Change The Defaults Hydra Uses From The

The -m src.train is just the entry script for the execution all the rest is be taken care by the Configuration section (whatever you pass after it will be ignored if you are using Argparse as it is auto-connects with ClearML)
Make sense ?

4 years ago

0 I'M Using Tensorboard Summarywriter To Add Scalar Metrics For The Experiment. If Experiment Crashed, And I Want To Continue It From Checkpoint, For Some Reason It Plots Metrics In A Really Weird Way. Even Though I Pass Global_Step=Epoch To The Summarywrit

sorry that I keep bothering you, I love ClearML and try to promote it whenever I can, but this thing is a real pain in the ass

No worries I totally feel you.
As a quick hack in the actual code of the Task itself, is it reasonable to have:
task = Task.init(....) task.set_initial_iteration(0)

4 years ago

Show more results