AgitatedDove14

48 Questions, 8049 Answers

Active since 10 January 2023

Last activity 6 months ago

Reputation

Badges 1

25 × Eureka!

Answers 8049

0 "5451Af93E0Bf68A4Ab09F654B222Ccae": { "1B790A3Da2E8D6Cd939Cf271694Fe81B": { "Metric": ":Monitor:Gpu", "Variant": "Gpu_0_Utilization", "Value": 0.0, "Min_Value": 0.0,

. Can I get gpu usage over time frame via API also?

task.get_reported_scalarsBut this will get you All the scalars, I think the next version of the server supports asking a specific one as well.
How are you implementing the alert monitoring?
Is is a stateless process starting every X min, or is it a state-full process running and monitoring ?

one year ago

0 Hi. I'M Encountering A Problem With

PanickyMoth78 ScantMoth28

With several models saved by the training process (whose code is not task-aware)

You can actually specify which models to be saved:
task = Task.init(..., auto_connect_frameworks{'pytorch': ['*.pt']})https://clear.ml/docs/latest/docs/references/sdk/task#taskinit

This way you can upload only the model you need.

one year ago

0 Bug?

Just verified the with the code base, should work out of the box 🙂 nothing to worry about

one year ago

0 Automatic Ssh Keys Export To Agent In Docker Mode

Thanks GentleSwallow91
That's a good tip, where in the docs would you add it?

2 years ago

0 How Come

what does it mean to run the steps locally?

start_locally : means the pipeline code itself (the logic that runs / controls the DAG) runs on the local machine (i.e. no agent), but this control logic creates/clones Tasks and enqueues them, for those Tasks you need an agent to execute them
run_pipeline_steps_locally=True: means the Tasks the pipeline creates, instead of enqueuing them and having an agent runs them, they will be launched on the same local machine (think debugging, other...

3 years ago

0 I'M Using Tensorboard Summarywriter To Add Scalar Metrics For The Experiment. If Experiment Crashed, And I Want To Continue It From Checkpoint, For Some Reason It Plots Metrics In A Really Weird Way. Even Though I Pass Global_Step=Epoch To The Summarywrit

I think we should open a GitHub Issue and get some more feedback, maybe we should just add support in the backend side ?

2 years ago

0 Hi All! Let'S Say I Have Two Functions Decorated With

why are all defined components shown in the UI Results/Plots/PipelineDetails/ExecutionDetails section? Shouldn't it make more sense to show only the ones that are used in that pipeline?

They are listed there (because of the decorator, you basically "say" these are steps so they are listed), the actual resolving (i.e. which steps are actually being called) is done in "real-time"
Make sense ?

2 years ago

0 How To Use

I specifically set is as empty with

export_data['script']['requirements'] = {}

in order not to reduce overhead during launch. I have everything installed inside the container

Do you have everything inside the container Inside a venv ?

3 years ago

0 Hello! I'M Just Starting Out With Clearml, And I Seem To Be Having Some Sort Of Conflict Between

Hi SmallDeer34
Can you try with the latest RC , I think we fixed something with the jupyter/colab/vscode support
!pip install clearml==1.0.3rc1

3 years ago

0 Hi, I Am Saving Plt Chart To Clearml Using

Yes I think the writer.add_figure somehow crops the image

3 years ago

0 Hi, I Would Like To Add Artifacts From Two Parallel Process In The Same Task. But One One Process Finished It Changed Task Status To Complete. May Be You Know Some Save Way To Deal With Such Situation? Or Maybe The Best Way To Check Task Status Before Upl

Hi EnthusiasticCoyote38

But one one process finished it changed task status to complete. May be you know some save way to deal with such situation? Or maybe the best way to check task status before upload object?

Well, you can actually forcefully set the state of the Task to running, then add artifacts, then close it?
would that work?

` my_other_task.reload()
my_other_task.mark_started(force=True)
my_other_task.upload_artifact(...)
my_other_task.flush(wait_for_uploads=True)
my_othe...

3 years ago

0 We Had A Problem Moving From Google Drive To Bucket Storage (S3, Google Storage, Etc.) In That We Still Wanted To Be Able To Mount The Bucket As A Network Drive. We Were Able To Find A Stable, Free, Open Source, Multiplatform Way To Do This. The Instruc

We were able to find a stable, free, open source, multiplatform way to do this

You mean to move the data from the gdrive to object storage ? or to just mount the gdrive ?

3 years ago

0 Hey, How Can I Add A Private Key In Order To Let The Clearml Agent To Clone From A Private Git Repository?

If it cannot find the Task ID I'm guessing it is trying to connect to the demo server and not your server (i.e. configuration is missing)

3 years ago

0 Hey All. I'M Seeing A Strange Error When Trying To Run Hyperparameter Optimisation By Cloning A Base Training Task

Ohh, hmm, that is odd, there should not be a limit there. Let me check ....

3 years ago

0 Clearml_Agent_Git_User Is This My Github Username? Or I Need To Setup A Custom Git Server?

CLEARML_AGENT_GIT_USER

Is your git user (on whatever git host/server you are using, GitHub/GitLab/BitBucket etc.)

3 years ago

0 Hey Folks, When I Run

The 'on-premise' server fails to connect to the ClearML server because of the VPN I think

I think you are correct.
You can quickly test it, try ti run curl http://local-server:8008 see if that works

3 years ago

0 I Hit A Issue That I Cannot See My Matplotlib Plot, But It Was Shown In The Panel. Any Idea?

I'm looking into the savefig issue, meanwhile you can disable the popup by adding at the top of your code the following:
import matplotlib matplotlib.rcParams['backend'] = 'agg' import matplotlib.pyplot matplotlib.pyplot.switch_backend('agg')

4 years ago

0 I Am Also Experiencing A Weird Behaviour When Running A Script Using The Module Flag. For Example I Run:

command line to the arg parser should be passed via the "Args" section in the Configuration tab.
What is the working directory on the experiment ?

3 years ago

0 Very Weird Error, Trying To Run An Experiment Through An Agent In Docker Mode, And I Get This Error

3 years ago

0 Hi, I'M Trying To Set Up My Trains-Server And I'M Getting The Following:

ElegantCoyote26 could you upgrade the docker-compose ?

3 years ago

0 Hi Maya, Can You Please Copy The Response For "Events.Get_Task_Plots" Request From The Network Tab In The Browser Developer Tools (F12)?

FranticCormorant35 DeterminedCrab71 please continue the discussion in this thread

4 years ago

0 Hi There

Also, for a single parameter you can use:
cloned_task.set_parameter(name="Args/artifact_name", value="test-artifact", description="my help text that will appear in the UI next to the value")This way, you are not overwriting the other parameters, you are adding to them.
(Similar to update_parameters , only for a single parameter)

4 years ago

0 How Can I Download The Plots From 'Scalars' And 'Plots' In High Resolution?

BeefyCow3 On the plot itself click on the json download button

4 years ago

0 Also, I Am Confused About Whether Trains Is Fully Open Source Because I Didn’T See Where The Source For The Web Client Is.

Trains is fully open-source, that said properly publishing and maintaining the web client is still on our to do list (I mean there is totally readable JavaScript code packaged in the trains-server and the dockers). It is constantly pushed because there is generally less contributions on the front-end with these kind of projects. That said of you guys are willing to help, it will greatly help in pushing it forward... LivelyLion31 what do you think, would you guys like to help with the fronte...

4 years ago

0 Hi - Quick Question. I Am Using The Pipelinecontroller With Abort_On_Failure Set To False. I Have A Pipe With A First Task That Branch Out In 3 Branches.

And same behavior if I make the dependance explicty via the retunr of the first one

Wait, are you saying that in the code above, when you abort "step_a" , then "step_b" is executed ?

9 months ago

0 Hello,

what's the clearml package version and clearml-session version ?

one year ago

0 I Have A Set Up An Agent, On A Gpu Machine, And Spun Up The Daemon In Docker Moder, And Specifically Specified A Gpu That It Will Work With. The Image Is Okay And I Verified That By Running

Okay, I'll make sure we change the default image to the runtime flavor of nvidia/cuda

4 years ago

0 What Sort Of Integration Is Possible With Clearml And Sagemaker? On The Page

. I'm thinking it's generically a kernel gateway issue, but I'm not sure if other platforms are using that yet

The odd thing is that you can access the notebook, but it returns zero kernels ..

one year ago

0 Hi All, I Am Trying To Spin Up Some Aws Autoscaler Instances, But I Seem To Have Some Issues With The Instance Creation:

Yes the one you create manually is not really of the same "type" as the one you create online, this is why you do not see it there 😞

one year ago

0 Given I Want To Run A Task In A Pipeline Using A Base Task Id. One Of My Steps Just Finds The Latest Model To Use. I Want The Task To Output The Id, And The Next Step To Use It. How Would I Go About Doing This?

VexedCat68 yes 🙂 you can also pass the parent folder and it will zip the entire subfolders into a single artifact

2 years ago

Show more results