AgitatedDove14

49 Questions, 8124 Answers

Active since 10 January 2023

Last activity one year ago

Reputation

Badges 1

25 × Eureka!

Answers 8124

0 Any Idea Why Only A Single Instance Of Mujoco Can Be Run With Clearml-Agent? I Run 2 Clearm-Agents, One Per Gpu On My Workstation. However, The Second Task Failes With One Of The Following Errors:

Let me check ....

4 years ago

0 Pytorch Lightning Question About Logging A Figure. I Have The Following Code:

DefeatedCrab47 yes that is correct. I actually meant if you see it on the tensorboard's UI 🙂
Anyhow if it there, you should find it in the Tasks Results Debug Samples

4 years ago

0 Any Pointers On Running Gpu Tasks With K8S Glue?

Can you let me know if i can override the docker image using template.yaml?

No, you cannot.
But you can pass OS environment "CLEARML_DOCKER_IMAGE" to set a diff default one

4 years ago

0 Hi, I'Ve Got A Quick Question About

connect_configuration

seems to take about the same amount of time unfortunately!

I think it is a better solution, that said from your description it sounds the issue is the upload bandwidth (i.e. json-ing the dict itself), could that be it?
(and even 1000 entries seems like something that would end up at 1mb upload, that is not that much)

3 years ago

0 Pytorch Lightning Question About Logging A Figure. I Have The Following Code:

The reason is because it is logged as an image, not a plot 🙂

4 years ago

0 How Can I Tell Clearml-Agent Not To Run Pip Install Unless My Requierments.Txt File Was Changed. It Seems To Run Pip Install Every Time I Run A Task Although Nothing Have Changed...

@<1577468638728818688:profile|DelightfulArcticwolf22>

How can I tell clearml-agent not to run pip install unless my requierments.txt file was changed.

the agent has built in cache, it will reuse the previous venv if nothing changed (cache local on the agent's machine).
Make sure this is line is not commented :
None

2 years ago

I think they (DevOps) said something about next week, internal roll-out is this week (I think)

3 years ago

0 Pytorch Lightning Question About Logging A Figure. I Have The Following Code:

Based on your code snippet:
Logger.current_logger().report_confusion_matrix(title='confusion', series=confusion', value=confmat_tensor.cpu().numpy(), iteration=i)or Task.current_task().get_logger()
which is the same as Logger.current_logger()

4 years ago

0 Hi, Is There Any Way To Upload Data To A Clearml Dataset Without Compression At All? I Have Very Small Text Files That Make Up A Dataset And Compression Seems To Take Most Of The Upload Time And It Provide Almost No Benefits W.R.T Size

Just dropping this here but I've had some funky compressions with very small datasets!

Odd deflate behavior ...?!

2 years ago

HugeArcticwolf77 from the CLI you cannot control it (but we could probably add that), from code you can:
https://github.com/allegroai/clearml/blob/d17903d4e9f404593ffc1bdb7b4e710baae54662/clearml/datasets/dataset.py#L646
pass compression=ZIP_STORED

2 years ago

0 Hi All, Is It Possible To Control The Number Of Steps Of The Pipeline During Run Time. Eg. If User Wants #N Parallel Steps In The Pipeline

Hi @<1523701523954012160:profile|ShallowCormorant89>
This is generally based on number of agents, or am I missing something ? Also is it based on Task or decorated functions ?

2 years ago

0 Pytorch Lightning Question About Logging A Figure. I Have The Following Code:

With offline mode,
Later if you need you can actually import the execution (including artifacts etc.) you just need the zip file it creates when you are done.

4 years ago

0 Pytorch Lightning Question About Logging A Figure. I Have The Following Code:

Correct 🙂

4 years ago

0 Hello Everyone, I Am Having An Issue With File Paths When Running Clearml Agent. Somehow, Directories Are Being Created Inside The

Hi @<1634001100262608896:profile|LazyAlligator31>

Is this because the code repo is being recreated in this directory?

Yes this is correct 🙂
Basically the entire code base + venv is installed there, to make sure it does not intyerfere with the "system" preinstalled environment
(it also allows for caching on the host machine 🙂 )

one year ago

0 Hi All! I Can'T Use Scalar Tab In All Experiments Due To Elastic Search Error:

Hi @<1569496075083976704:profile|SweetShells3>
Are you using the standard docker-compose ? are using the default elastic container ?
What exactly changed ?

2 years ago

0 What Happens To File That Are Downloaded To A Remote_Execution Via Storagemanager? Are They Removed At The End Of The Run, Or Does It Continuously Increases Disk Space?

Honestly, this is all related to issue #340.

makes total sense.
But actually this id different from #340. The feature is to store the Data on the Task, this means each Task in your "pipeline" will be upload a new copy of the data. No?

I'd suggest some

task.detach()

method for remote execution maybe

That is a good idea, in theory it can also be used in local execution

3 years ago

0 Hi All, Is It Possible To Control The Number Of Steps Of The Pipeline During Run Time. Eg. If User Wants #N Parallel Steps In The Pipeline

. but when we try to do a "New Run" from UI, it tries to follow the DAG of previous run (the run with all child nodes skipped) and the new run fails too.

This is odd, is this reproducible ? what's the clearml python package version ?

2 years ago

0 Hi All, Is It Possible To Control The Number Of Steps Of The Pipeline During Run Time. Eg. If User Wants #N Parallel Steps In The Pipeline

@<1523701523954012160:profile|ShallowCormorant89> can you verify it is reproducible in 1.9.3 ? because if it is I'd like to fix that 🙂

will it be possible for us to configure the "new run" button in a way so that it always clones from a particular pipeline ?

What do you mean by "particular pipeline" ? by default it will clone the last successful one, and by right clicking a specific one you can run a copy of that one. what am I missing ?

2 years ago

0 Hey Everyone

when u say use

Task.current_task()

you for logging? which i’m guessing that the fastai binding should do right?

right, this is a fancy way to say, make sure the actual sub-process is initializing ClearML so all the automagic kicks in, since this is not "forked" but a whole new process, calling Task.current_task is the equivalent of calling Task.init with the same arguments (which you can also do, I'm not sure which one is more straight forward, wdyt?)

3 years ago

0 Hi All, I Have A Question Regarding Multi-Node Training Using The Clearml-Agent. What Is The Recommended Setup In This Case? Say I Have 3 Nodes With 3 Agents Running On Them. How Do I Make Sure They All Run The Same Job?

The problem is not really for the agents to wait (this is easily solved by additional high priority queue) the problem is will you have a "free" agent... you see my point ?

4 years ago

0 Hello, I Am Trying To Run The

Hi ShinyRabbit94

system_site_packages: true

This is set automatically when running in "docker mode" no need to worry 🙂
What is exactly the error you are getting ?
Could it be the container itself has the python packages installed in a venv not as "system packages" ?

3 years ago

0 I'M Using Trains Hyperparameter Optimizer, As In This Example:

BeefyCow3 if you are trying to optimizer a specific metric (i.e. a scalar on a graph). The template Task should report it with the same title/series combination, which should be easy enough to verify in the UI 🙂
You can either report with Tensorboard or with the Trains Logger, either way will work.

5 years ago

0 Hi Anyone

That sounds like an internal tritonserver error.
https://forums.developer.nvidia.com/t/provided-ptx-was-compiled-with-an-unsupported-toolchain-error-using-cub/168292

4 years ago

0 Hi I Saw This On The Clearml-Agent Docs But Other Than The Docker Image, I'M Not Sure How To Integrate This With Clearml Py And Clearml-Server. Please Advise.

For example:
examples/k8s_glue_example.py --queue k8s_gpu - --namespace pod-clearml-conf ~/trains.conf --template-yaml example/base.yml

4 years ago

0 Hi Everyone! We Are Trying To Run Pipelines From Gitlab Ci Runners, But Are Faced With The Following Error When Performing

OSError: [Errno 28] No space left on deviceHi PreciousParrot26
I think this says it all 🙂 there is no more storage left to run all those subprocesses

btw:

I am curious about why a

ThreadPool

of

16

threads is gathered,

This is the maximum simultaneous jobs it will try to launch (it will launch more after the launching is doe, notice not the actual execution) but this is just a way to limit it.

3 years ago

0 Hello, I Have Been Using Clearml Interactive Session For More Than 3 Months And I Am Facing With Random Ssh Disconnection Errors In Vscode Once In A While After Creating The Session. Sometimes Reconnecting Works, If It Does Not Work I Reconnect The Clear

@<1699955693882183680:profile|UpsetSeaturtle37> good progress, regrading the error, 0.15.0 is supposed to be out tomorrow, it includes a fix to that one.
BTW: can you run with --debug

one year ago

0 Colors Of Cm Reporting Are Strange... Is It Possible To Adjust The Default Ones

This is strange... Could you send the browser console log, maybe there is an exception there

5 years ago

0 Hi All, Is It Possible To Control The Number Of Steps Of The Pipeline During Run Time. Eg. If User Wants #N Parallel Steps In The Pipeline

I see, so in theory you could call add_step with a pipeline parameter (i.e. pipe.add_parameter etc.)
But currently the implementation is such that if you are starting the pipeline from the UI
(i.e. rerunning it with a different argument), the pipeline DAG is deserialized from the Pipeline Task (the idea that one could control the entire DAG externally without changing the code)
I think a good idea would be to actually allow the pipeline class to have an argument saying always create from cod...

2 years ago

0 Hey Everyone

this

from fastai.callbacks.tensorboard import LearnerTensorboardWriter

doesn’t exist anymore in fastai2

Hmm we should definitely update the example to fastai2 API

maybe the fastai bindings in clearml package are outdated

Are you getting any scalars reported to clearml?

they also appear to be relying on the tensorboard callback which seems not to work on distributed training

Yes that is correct, usually the way it works all nodes report back to "master...

3 years ago

0 Hey There, I Would Like To Increase The

I think you cannot change it for a running process, do you want me to check for you if this can be done ?

4 years ago

Show more results