PanickyMoth78

34 Questions, 167 Answers

Active since 10 January 2023

Last activity one year ago

Reputation

Badges 1

166 × Eureka!

Questions 34
Answers 167

0 Votes

2 Answers

2K Views

0 Votes 2 Answers 2K Views

I Have A Training Task That Auto-Magically Saves A Model For Me To Gcs

I have a training task that auto-magically saves a model for me to GCS task = Task.init( project_name=project_name, task_name=f"Image classification training...

clearml

2 years ago

0 Votes

25 Answers

2K Views

0 Votes 25 Answers 2K Views

Autoscaler Parallelization Issue: I Have An Aws Autoscaler Set Up With A Resource That Has A Max Of 3 Instances Assigned To The

Autoscaler parallelization issue: I have an AWS Autoscaler set up with a resource that has a max of 3 instances assigned to the default queue I've given it a...

clearml

3 years ago

0 Votes

20 Answers

3K Views

0 Votes 20 Answers 3K Views

Task Struck At

task struck at task.flush(wait_for_uploads=True) : I've been running a model training task - a variation on this clearml dataset example: https://github.com/...

tensorboard

2 years ago

0 Votes

3 Answers

2K Views

0 Votes 3 Answers 2K Views

Hi. First Time User Here

Hi. First time user here 👋 I have experienced a problem following the getting started documentation. I opened an account on https://app.clear.ml/ I then fol...

clearml

3 years ago

Show more results

0 Hi. Question About Dataset Upload Errors: When Uploading A

If

Dataset.upload()

does not crash or return a success value that I can check and

Are you saying that with this error showing upload data does not crash? (edited)

Unfortunately that is correct. It continues as if nothing happened!

To replicate this in linux (even with max_workers=1 ):
https://averagelinuxuser.com/limit-bandwidth-linux/ to throttle your connection: sudo apt-get install wondershaper
Throttle your connection to 1mb/s with somethin...

3 years ago

0 I Started Two Pipelines (Using Aws Autoscaler In App.Clear.Ml ). The Pipelines Ran Concurrently, Using The Same Pipeline Code. Both Failed In The Same Component Half-Way Though The Pipeline Run With:

the same occures when I run a single training component instead of two

3 years ago

0 Hi There I'M Trying Out Clearml. I Saw Mention That Clearml Can Capture Tensorboard Output So I Tried It With This Little Script (Image Below). The Events File Is Filled, The Clearml Task Is Created, And Marked Complete However There Is Nothing In The Sc

Would you expect this fastai callback to work?
(Uses SummaryWriter):
https://github.com/fastai/fastai/blob/d7f4863f1ee3c0fa9f2d9feeb6a05f0625a53696/fastai/callback/tensorboard.py
It seems to have failed as well (but I'd need to check more carefully)

3 years ago

0 Hi. I Have A Job That Processes Images And Creates ~5 Gb Of Processed Image Files (Lots Of Small Ones). At The End - It Creates A

I tried playing with those parameters on my laptop to no great effect.

Here is code you can use to reproduce the issue:

` import os
from pathlib import Path
from tqdm import tqdm
from clearml import Dataset, Task

def dataset_upload_test(project_id:str, bucket_name:str
):
def _random_file(fpath, sizekb):
fileSizeInBytes = 1024 * sizekb
with open(fpath, "wb") as fout:
fout.write(os.urandom(fileSizeInBytes))

def random_dataset(dataset_path, num_files, file...

2 years ago

Unfortunately, waiting a while did not make this go away 🙂

3 years ago

0 Hi. I Have A Question About Pipelines And Their Generated Dependency Graphs. I Took The Code Of The Clearml Pipeline From Decorator Example:

Sure. It is a minor change from the code in the clearml examples for pipelines.
I just repeat the last two pipeline steps from that code in a loop (x3)
https://github.com/allegroai/clearml/blob/master/examples/pipeline/pipeline_from_decorator.py

3 years ago

0 Hi. I'D Like To Try The Gcp Autoscaler.

I'll try a more carefully checked run a bit later but I know it's getting a bit late in your time zone

3 years ago

I have tried this several times now. Sometimes one runs an the other fails and sometimes both fail with this same error

3 years ago

0 Hi There. I'M Trying To Switch Pipeline Code From A Local Run Using

Hmm interesting, so like a callback?!

like https://github.com/allegroai/clearml/blob/bca9a6de3095f411ae5b766d00967535a13e8401/examples/pipeline/pipeline_from_tasks.py#L54-L55 pipe-step level callbacks? I guess that mechanism could serve. Where do these callbacks run? In the instantiating process? If so, that would work (since the callback function can be any code I wish, right?)

I might want to dispatch other jobs from within the same process.

This is actually something t...

3 years ago

here is the log from the failing component:
File "/root/.clearml/venvs-builds/3.8/lib/python3.8/site-packages/clearml/utilities/locks/portalocker.py", line 140, in lock fcntl.flock(file_.fileno(), flags) BlockingIOError: [Errno 11] Resource temporarily unavailable

3 years ago

0 Hi. Question About Dataset Upload Errors: When Uploading A

https://github.com/allegroai/clearml/issues/819

3 years ago

0 Hi. I Have A Job That Processes Images And Creates ~5 Gb Of Processed Image Files (Lots Of Small Ones). At The End - It Creates A

That job was using clearml 1.8.3 so I take it that setting max_workers to 1 would not make a difference?
Looking at the docs:
https://clear.ml/docs/latest/docs/references/sdk/dataset/#upload
they say that max_workers = number of cores but looking at the log it does seem like it's doing one chunk every 5 minutes (long time for 500mb upload for a node running in gcp...)

2 years ago

0 Hi There. I'M Trying To Switch Pipeline Code From A Local Run Using

If I run from terminal, I see:
ValueError: Task object can only be updated if created or in_progress [status=stopped fields=['configuration']]

3 years ago

0 Hi. I'M Encountering A Problem With

anyhow - looks like the keys are simple enough to use (so I can just ignore the model names)

2 years ago

0 Autoscaler Parallelization Issue: I Have An Aws Autoscaler Set Up With A Resource That Has A Max Of 3 Instances Assigned To The

Thanks 🙂
I wonder if it'll also include the fix that went into in the RC I was using there ( 1.6.3rc0 )

3 years ago

0 Hi (Again... Sorry For Asking So Many Questions) Question About Using Google Cloud Storage In A Clearml Agent Running In Aws Ec2 Instance. My

For anyone following, you can "inject" a credentials json file for a google cloud service account so at to get access to your google cloud storage from agents on aws ec2 instances that are managed by the AWS autoscaler by providing the following in the ADDITIONAL CLEARML CONFIGURATION when starting the autoscaler:
` sdk.google.storage.credentials_json: "/root/gs.cred"
sdk.google.storage.project: "<my-gcp-project-id>"
files {
gsc {
contents: """<copy-paste the contents of yo...

3 years ago

0 Hi There. I'M Trying To Switch Pipeline Code From A Local Run Using

first, thanks for having these discussions. I appreciate this kind of support is an effort 🙏
Yes. i perfectly understand that once a pipeline job (or a task) is sent off in this manner, it executes separately (and, most likely in a different machine) from the process that instantiated it.
I still feel strongly that such a command should not be thought of as a fire and exit operation. I can think of several scenarios where continued execution of the instantiating process is desired:
I ...

3 years ago

0 Hi. I'M Just Starting Out Here Trying To Evaluate Clearml Ease Of Use. I'D Like To Understand Whether Clearml (Paid Service) Can Receive Access To A Gcp Project And Use Gke To Spin Clusters Up And Workers Or Would That Be On The Customer To Manage.

or, barring that, something similar on AWS?

3 years ago

0 Autoscaler Parallelization Issue: I Have An Aws Autoscaler Set Up With A Resource That Has A Max Of 3 Instances Assigned To The

👍

3 years ago

0 Hi. I Have A Job That Processes Images And Creates ~5 Gb Of Processed Image Files (Lots Of Small Ones). At The End - It Creates A

Q: is there an equivalent env var for sdk.google.storage.pool_connections/pool_maxsize ? My jobs are running remotely and not within a clearml agent at the moment so they get clearml config through env vars.

2 years ago

0 Hi. I Have A Job That Processes Images And Creates ~5 Gb Of Processed Image Files (Lots Of Small Ones). At The End - It Creates A

Hi. Just a reminder that I'd love to know if / when this issue is looked at

2 years ago

0 Hi. I'D Like To Try The Gcp Autoscaler.

Is there any chance the experiment itself has a docker image specified?

It does not as far as I know. The decorators do not have docker fields specified

3 years ago

0 Hi. Question About Dataset Upload Errors: When Uploading A

🙏

3 years ago

0 Hi I'M Looking Into How Clearml Supports Datasets And Dataset Versioning And I'M A Bit Confused. Is Dataset Versioning Not Supported At All In The Non-Enterprise Or Is Versioning Available By A Different Mechanism? I See That

console output shows uploads of 500 files on every new dataset. The lineage is as expected, each additional upload is the same size as the previous ones (~50mb) and Dataset.get on the last dataset's ID retreives all the files from the separate parts to one local folder.
Checking the remote storage location (gs://) shows artifact zip files, each with 500 files

3 years ago

0 I Have 5 Unarchived Pipeline Runs That Were Defined With This Decorator:

I can find the tasks in the "all experiments" project but there are over 500 tasks there (I guess in includes the archived tasks as well) so that's not much help.

3 years ago

0 Hi. I'M Running This Little Pipeline:

Hi again.
Thanks for the previous replies and links but I haven't been able to find the answer to my question: How do I prevent the content of a uri returned by a task from being saved by clearml at all.

I'm using this simplified snippet (that avoids fastai and large data)
` from clearml.automation.controller import PipelineDecorator
from clearml import TaskTypes

@PipelineDecorator.component(
return_values=["run_datasets_path"], cache=False, task_type=TaskTypes.data_processing
)
def ma...

3 years ago

reading this..
https://github.com/allegroai/clearml-agent#kubernetes-integration-optional

3 years ago

0 Hi. Help

I had several pipeline components getting it and uploading files to is concurrently.
Can Datsets handle that?

3 years ago

0 Hi. I'M Encountering A Problem With

another weird thing:
Before my training task is done:
print(task.models['output'].keys())outputs
odict_keys(['Output Model #0', 'Output Model #1', 'Output Model #2'])
after task.close()
I can do:
task = Task.get_task(task_id) for i in range(100): print(task.models["output"].keys())which prints
odict_keys(['Output Model #0', 'Output Model #1', 'Output Model #2'])in the first iteration
and prints the file names in the latter iterations:
` od...

2 years ago

oops, I deleted two messages here because I had a bug in a test I've done.
I'm retesting now

3 years ago

Show more results