PanickyMoth78

34 Questions, 167 Answers

Active since 10 January 2023

Last activity 5 months ago

Reputation

Badges 1

166 × Eureka!

Questions 34
Answers 167

0 Votes

2 Answers

1K Views

0 Votes 2 Answers 1K Views

Hi. I'M Using

Hi. I'm using @PipelineDecorator.component to define a task from a function (to run in a pipeline) I'd like to get the task object within this function so th...

clearml

2 years ago

0 Votes

22 Answers

1K Views

0 Votes 22 Answers 1K Views

I Started Two Pipelines (Using Aws Autoscaler In App.Clear.Ml ). The Pipelines Ran Concurrently, Using The Same Pipeline Code. Both Failed In The Same Component Half-Way Though The Pipeline Run With:

I started two pipelines (using AWS autoscaler in app.clear.ml ). The pipelines ran concurrently, using the same pipeline code. Both failed in the same compon...

aws mlops

2 years ago

0 Votes

16 Answers

1K Views

0 Votes 16 Answers 1K Views

Hi. Question About Dataset Upload Errors: When Uploading A

Hi. Question about Dataset upload errors: When uploading a clearml.Dataset created with output_uri=" gs://lavi_test/datasets after adding 20 files of size 50...

gcp

2 years ago

0 Votes

1 Answers

1K Views

0 Votes 1 Answers 1K Views

Suppose I Use A Pipeline Decorator To Define A Pipeline:

suppose I use a pipeline decorator to define a pipeline: @PipelineDecorator.pipeline(name='my-pipeline', project='my-project', version='0.2') def my_pipeline...

clearml

2 years ago

Show more results

0 Is There Some Built-In Way In Clearml To Trigger Further Action On Task Fail (Or Pipeline Fail)?

Yes.
Some mechanism that would allow for followup code execution. Ideally in a way that would not be susceptible to the same things that may cause a task to fail.

2 years ago

0 Hi. Question About Dataset Upload Errors: When Uploading A

this is the printout I get:

2 years ago

0 Hi. I'M Encountering A Problem With

BTW:

If I try to find the right model in the

task.models["output"]

(this time there is just one but in my code there may be several) it appears with the

(see other attached screenshot).

What would make sense here ? (I have to be honest I'm not sure).

If the model was saved with a file name (is that the trigger for auto-upload?), I think it makes sense for the model name to match the file name (not the task name), especially when there may be ...

2 years ago

0 Is There Some Built-In Way In Clearml To Trigger Further Action On Task Fail (Or Pipeline Fail)?

There may be cases where failure occurs before my code starts to run (and, perhaps, after it completes)

2 years ago

0 Hi There I'M Trying Out Clearml. I Saw Mention That Clearml Can Capture Tensorboard Output So I Tried It With This Little Script (Image Below). The Events File Is Filled, The Clearml Task Is Created, And Marked Complete However There Is Nothing In The Sc

here is the code in text if you feel like giving it a try:
import tensorboard_logger as tb_logger from clearml import Task task = Task.init(project_name="great project", task_name="test_tb_logging") task_tb_logger = tb_logger.Logger(logdir='./tb/run1', flush_secs=2) for i in range(10): task_tb_logger.log_value("some_metric", 42, i) task.close()

2 years ago

0 Autoscaler Parallelization Issue: I Have An Aws Autoscaler Set Up With A Resource That Has A Max Of 3 Instances Assigned To The

These paths are pathlib.Path . Would that be a problem?

2 years ago

0 Hi. Question About Dataset Upload Errors: When Uploading A

I have google-cloud-storage==2.6.0 installed

2 years ago

0 Hi. I'M Encountering A Problem With

yes. several checkpoints + the one that did best on validation data.

2 years ago

0 Hi There. I'M Trying To Switch Pipeline Code From A Local Run Using

I'm on clearml 1.6.2
The jupyter notebook service and two clear-ml agents ( version1.3.0, one in queue "default" and one in queue "services" and with --cpu-only flag) ) are all running inside a docker container

2 years ago

0 Autoscaler Parallelization Issue: I Have An Aws Autoscaler Set Up With A Resource That Has A Max Of 3 Instances Assigned To The

That's amazing speed 🚀

2 years ago

0 Hi. Question About Dataset Upload Errors: When Uploading A

This seems relevant:
https://stackoverflow.com/questions/61001454/why-does-upload-from-file-google-cloud-storage-function-throws-timeout-error

2 years ago

0 I Have A Training Task That Auto-Magically Saves A Model For Me To Gcs

Simpler than I had thought, thanks !

2 years ago

0 Hi There. I'M Trying To Switch Pipeline Code From A Local Run Using

actually, re-running pipeline_from_decorator.py a second time (and a third time) from the command line seem to have executed without the that ValueError so maybe that issue was some fluke.
Nevertheless, those runs exit prior to line
print('process completed')
and I would definitely prefer the command executing_pipeline to not kill the process that called it.
For example, maybe, having started the pipeline I'd like my code to also report having started the pipeline to som...

2 years ago

0 Hi. I'M Encountering A Problem With

Right. Thanks.
With several models saved by the training process (whose code is not task-aware) I suspect that doing the update call after training completed will only update the last of the uploaded models.
I'm currently looking at a workaround where:
I disable auto saving by https://clear.ml/docs/latest/docs/clearml_sdk/task_sdk/#automatic-logging Manually upload the models Manually register the models with https://github.com/allegroai/clearml/blob/cf7361e134554f4effd939ca67e8ecb2345b...

2 years ago

0 Hi. I Have A Job That Processes Images And Creates ~5 Gb Of Processed Image Files (Lots Of Small Ones). At The End - It Creates A

I tried playing with those parameters on my laptop to no great effect.

Here is code you can use to reproduce the issue:

` import os
from pathlib import Path
from tqdm import tqdm
from clearml import Dataset, Task

def dataset_upload_test(project_id:str, bucket_name:str
):
def _random_file(fpath, sizekb):
fileSizeInBytes = 1024 * sizekb
with open(fpath, "wb") as fout:
fout.write(os.urandom(fileSizeInBytes))

def random_dataset(dataset_path, num_files, file...

2 years ago

0 Autoscaler Parallelization Issue: I Have An Aws Autoscaler Set Up With A Resource That Has A Max Of 3 Instances Assigned To The

I'll try and reproduce this in simpler code

2 years ago

0 Hi. I'M Encountering A Problem With

To be specific there is "model name" which is not unique , and there is model-key which is unique to the Task

not sure why the two fields don't simply match. I guess that there may be situations where file name (without the full path) may be used several times.

2 years ago

0 Hi. I'M Encountering A Problem With

sort of. Though it seems like the rules for model.name can be a bit non-obvious.
I think that the first model saved gets the task name as its name and the following models take f"{task_name} - {file_name}"

2 years ago

0 Hello! When Trying To Use Clearml Datasets With Google Cloud Storage With The Authorized User Credentials It Will Fail And Say Some Fields Are Missing From The Json. This Isn'T An Issue If The User Is Using A Service Account Json Key, Is A Service Account

nice !

2 years ago

0 Hi. I Have A Job That Processes Images And Creates ~5 Gb Of Processed Image Files (Lots Of Small Ones). At The End - It Creates A

would setting the max_workers to 1 be a (slower) workaround?

2 years ago

0 Hi. I'M Using Clearml Agent 1.16.1 My Code Is Running A Multi-Process Pool With "Spawn" (See

We're using a self-hosted clearml server version 1.14.0

5 months ago

0 Hi. I'M Using Clearml Agent 1.16.1 My Code Is Running A Multi-Process Pool With "Spawn" (See

Oh, cool. So would this then report the activities of the spawned processes to the same task as that of the spawning process?

5 months ago

Would you expect this fastai callback to work?
(Uses SummaryWriter):
https://github.com/fastai/fastai/blob/d7f4863f1ee3c0fa9f2d9feeb6a05f0625a53696/fastai/callback/tensorboard.py
It seems to have failed as well (but I'd need to check more carefully)

2 years ago

thanks. Switching to SummaryWriter shouldn't be hard for us.

2 years ago

0 Task Struck At

any news on this? I also got a similar issue

For me the problem sort of went away. My code evolved a bit after posting this so that dataset creation and training tasks run in separate python sessions. I did not investigate further.

one year ago

0 Task Struck At

there may have been some interaction between the training task and a preceding dataset creation task :shrug:

2 years ago

0 Hi. I'M Encountering A Problem With

another weird thing:
Before my training task is done:
print(task.models['output'].keys())outputs
odict_keys(['Output Model #0', 'Output Model #1', 'Output Model #2'])
after task.close()
I can do:
task = Task.get_task(task_id) for i in range(100): print(task.models["output"].keys())which prints
odict_keys(['Output Model #0', 'Output Model #1', 'Output Model #2'])in the first iteration
and prints the file names in the latter iterations:
` od...

2 years ago

0 Bug?

I don't mind assigning to the task the same name that I'd assign to the dataset. I just think that the create function should expect dataset_name to be None in the case of use_current_task=True (or allow the dataset name to differ from the task name)

2 years ago

0 Bug?

I was doing it with the task that I had been using. Mostly for logging arguments that control what the dataset will contain.

2 years ago

0 Task Struck At

I mean that it was uploading console logs scalar plots and images fine just a while ago and then it seems to have stopped uploading all scalar plot metrics and the figures but log upload was still fine.

Anyway, it is back to working properly now without any code change (as far as I can tell. I tried commenting out a line or two and then brought them all back)

If I end up with something reproducible I'll post here.

2 years ago

Show more results