AgitatedDove14

48 Questions, 8049 Answers

Active since 10 January 2023

Last activity 6 months ago

Reputation

Badges 1

25 × Eureka!

Answers 8049

0 Hi, There Is Small Bug In The Web Ui When Comparing Two Experiments Scalars: If The Two Tasks Have The Same Name, Then Clicking On The “Maximize Graph” Button On One Scalar Series To Get The Bigger View On That Scalar Series, Then The Color Of Both Series

Thanks, yes you are correct the color is derived from the series name, so I guess the issue is the name+Id is not kept in full screen

2 years ago

Fix in the next version 🙂

2 years ago

0 I'M Using Tensorboard Summarywriter To Add Scalar Metrics For The Experiment. If Experiment Crashed, And I Want To Continue It From Checkpoint, For Some Reason It Plots Metrics In A Really Weird Way. Even Though I Pass Global_Step=Epoch To The Summarywrit

maybe I should use explicit reporting instead of Tensorboard

It will do just the same 😞

there is no method for setting

last iteration

, which is used for reporting when continuing the same task. maybe I could somehow change this value for the task?

Let me double check that...

overwriting this value is not ideal though, because for :monitor:gpu and :monitor:machine ...

That is a very good point

but for the metrics, I explicitly pass th...

2 years ago

0 In Pipelinev2, Is It Possible To Register Artifacts To The Pipeline Task? I See There Is A Private Variable

If this is the case I would do:

` # Add the collector steps (i.e. the 10 Tasks
pipe.add_task(...
post_execute_callback=Collector.collect_me
)

pipe.start()
pipe.wait()
Collector.process_results(pipe) `wdyt?

2 years ago

0 Hi, Just Want To Report A Small Bug In The Clearml Dashboard: After Queuing An Experiment, If I Change The Experiment Queue, Then Go Back To The Experiment Info Tab, The Queue Property Still Shows The Previous Queue

Thanks JitteryCoyote63 let me double check if there is a reason for that (there might be one, not sure)

2 years ago

JitteryCoyote63 , just making sure, does refresh fixes the issue ?

2 years ago

0 Hi, I Am Having Difficulties When Using The Dataset Functionality. I Am Trying To Create A Dataset With The Following Simple Code:

But what I get with

get_local_copy()

is the following path: ...

Get local path will return an immutable copy of the dataset, by definition this will not be the "source" storing the data.
(Also notice that the dataset itself is stored in zip files, and when you get the "local-copy" you get the extracted files)
Make sense ?

3 years ago

0 Hi, I Am Having Difficulties When Using The Dataset Functionality. I Am Trying To Create A Dataset With The Following Simple Code:

Hi GiganticTurtle0
Let me check

3 years ago

0 Hi All—First Off, Thanks For Being Such A Helpful And Thorough Group Of People. I Learn A Ton Just Searching Through The Channel For Problems. I’M Seeing A Weird Issue. I Have A Conda Env On My Linux Machine, And I Can Successfully Run A Training Script

not sure if this is considered a bug or not! but I’d happily make an issue on github if needed.

I think we should, at least for the sake of transparency and visibility 🙂

thanks again for all your help.

My pleasure 🙂

3 years ago

that must have been it. here’s the installed packages when not using

-m

:

Hmm yes, can you open a GitHub issue on that? (this seems like a bug)

3 years ago

BTW: could it be the Task.init is Not called on the "module.name" entry point, but somewhere internally ?

3 years ago

Sounds great! let me know what you find out 🙂

3 years ago

0 Hi! I Need Help Debugging The Following Issue Please. I'M Training A Cnn And Plotting The Confusion Matrices For Train And Val In Each Epoch. When I Get To Epoch 101, The Ui Kind Of Breaks..It Starts Showing Me The Images For Epoch 1. When I Right Click O

MuddySquid7 you mean you are creating them with TB ? or are you uploading them as debug images ?
Specifically in the ClearML UI, do you have it under "plots" tab or "debug samples" tab ?

3 years ago

MuddySquid7 I might have found something, and this is very very odd, it seems it will Not upload any new images post the history size, which is very odd considering the number of users actively using this feature...
Do you want to try a hack to see if it solved your issue ?

3 years ago

Hi MuddySquid7 issue is verified, v1.1.1 will be released in a few hours with a fix.
Thank you for noticing!

3 years ago

oh...so is this a bug?

It was always a bug, only an elusive one 😉
Anyhow, I'll make sure we push a fix to GitHub, an RC is planned for later this week, it will contain it

3 years ago

I still wonder how no one noticed ... (maybe 100 unique title/series report is relatively high threshold)

3 years ago

0 Any Chance Storagemanager Could Re-Download Files Only If Their Size Is Different From File In Cache (As An Option)?

any chance StorageManager could re-download files only if their size is different from file in cache (as an option)?

I think there is force argument, to force download.
I think the main issue is getting the size from different backends (i.e. s3 /https / etc.)
Maybe we should add it as a GitHub feature request issue?
The main limitation is that the driver "list()" does not return file size.
For example it might be an issue with the default http files-server.
wdyt?

3 years ago

0 Dear Clearml Community, I Am Trying To Optimize Storage On My Clearml File Server When Doing A Lot Of Experiments. To Achieve This, I Already Upload Only The Newest And Best Checkpoints To Clearml File Server Instead Of All Checkpoints. Another Component

Notice that you need to pass the returned scroll_id to the next call

scroll_id = response["scroll_id"]

7 months ago

0 For Remote Execution Where The Queue Has

Hmm @<1523701083040387072:profile|UnevenDolphin73> I think this is the reason, None
and this means that even without a full lock file poetry can still build an environment

one year ago

0 Anyone Doing Sagemaker With Clearml - Something Like The K8S Glue But The Tasks Are Pulled Into Sagemaker Training Jobs

Aws autoscaler will work with iam rules along as you have it configured on the machine itself. Sagemaker job scheduling (I'm assuming this is what you are referring to, and not the notebook) you need to select the instance as well (basically the same as ec2). What do you mean by using the k8s glue, like inherit and implement the same mechanism but for sagemaker I stead of kubectl ?

3 years ago

0 Does The New 2.0 Helm Charts (App Ver 1.1.0) Not Support Nfs?

I think this is the discussion you are after:
https://clearml.slack.com/archives/C01H5VAUZ8R/p1612452197004900?thread_ts=1612273112.002400&cid=C01H5VAUZ8R

3 years ago

0 Hello Clearml Ppl

Hi SmoggyGoat53
What do you mean by "feature store" ? (These days the definition is quite broad, hence my question)

2 years ago

0 Hi Team, Me Again! Im Curious If Someone Can Explain To Me Better How Task And Optimisers Integrate With Each Other. In The Example Hyperparameter Optimisation, There Is Both A Task Initialised With

, is the team open to PRs from external people?

Yes please do! PRs are welcomed! I thought we fixed the GitHub readme to reflect it, anyhow I'll make sure we do 🙂

3 years ago

0 Hi, Plotting A Debug Sample With A

Thanks VirtuousFish83 !
This is great

3 years ago

0 Hi Guys. Say That We Train A Model With 10 Epoch, And Suddenly Interruption Occur On Epoch 5. How Can We Continue The By Using Clearml?

Hi @<1546665666675740672:profile|AttractiveFrog67>

Make sure you stored the model's checkpoint (either pass output_uri=True in Task.init or manually upload)
When you call Task.init pass " continue_last_task=True "
Now you can do last_checkpoint=task.models["output"][-1].get_local_copy() and all you need is to load last_checkpoint

one year ago

0 Hi, I Would Like To Check What Would Be The Recommended Hardware Specs For The Server Host Clearml Server. I Had One Configured With 32 Cpu Cores, 64Gb Ram And I Noticed That If We Have A Surge In Remote Task Creation, The Following Delays Occurs.

no worries

3 years ago

0 Hi Guys! Is There A Way To Tell An Agent To Run A Task In An Existing Venv (Without Creating A New One)?

Oh if this is the case you can probably do
` import os
import subprocess
from clearml import Task
from clearml.backend_api.session.client import APIClient

client = APIClient()

queue_ids = client.queues.get_all(name="queue_name_here")

while True:
result = client.queues.get_next_task(queue=queue_ids[0].id)
if not result or not result.entry:
sleep(5)
continue
task_id = result.entry.task
client.tasks.started(task=task_id)
env = dict(**os.environ)
env['CLEARML_TASK_ID'] = ta...

2 years ago

0 I’M Getting These Errors When Using Agent In Docker Mode

it works if I run the same command manually.

What do you mean?
Can you do:
docker run -it <my container here> bashThen immediately get an interactive bash ?

3 years ago

0 Hi I Saw This On The Clearml-Agent Docs But Other Than The Docker Image, I'M Not Sure How To Integrate This With Clearml Py And Clearml-Server. Please Advise.

Hi SubstantialElk6
I'm not sure what you are asking 🙂
Basically the clearml-agent will pull a Task from an execution queue, and execute it (based on the definition on the Task, i.e. git repo, python packages docker image etc.)

3 years ago

Show more results