PanickyMoth78

Moderator

34 Questions, 167 Answers

Active since 10 January 2023

Last activity 9 months ago

Reputation

Badges 1

166 × Eureka!

Questions 34
Answers 167

0 Votes

7 Answers

1K Views

0 Votes 7 Answers 1K Views

Hi. I Am Experimenting With

Hi. I am experimenting with clearml.Dataset and encountering an error. LockException: [Errno 11] Resource temporarily unavailable In my experiment, I make a ...

clearml

2 years ago

0 Votes

22 Answers

2K Views

0 Votes 22 Answers 2K Views

I Started Two Pipelines (Using Aws Autoscaler In App.Clear.Ml ). The Pipelines Ran Concurrently, Using The Same Pipeline Code. Both Failed In The Same Component Half-Way Though The Pipeline Run With:

I started two pipelines (using AWS autoscaler in app.clear.ml ). The pipelines ran concurrently, using the same pipeline code. Both failed in the same compon...

aws mlops

2 years ago

0 Votes

7 Answers

1K Views

0 Votes 7 Answers 1K Views

I Have 5 Unarchived Pipeline Runs That Were Defined With This Decorator:

I have 5 unarchived pipeline runs that were defined with this decorator:

clearml

2 years ago

0 Votes

1 Answers

1K Views

0 Votes 1 Answers 1K Views

Suppose I Use A Pipeline Decorator To Define A Pipeline:

suppose I use a pipeline decorator to define a pipeline:

clearml

2 years ago

Show more results questions

0 Hi There I'M Trying Out Clearml. I Saw Mention That Clearml Can Capture Tensorboard Output So I Tried It With This Little Script (Image Below). The Events File Is Filled, The Clearml Task Is Created, And Marked Complete However There Is Nothing In The Sc

here is the code in text if you feel like giving it a try:
import tensorboard_logger as tb_logger from clearml import Task task = Task.init(project_name="great project", task_name="test_tb_logging") task_tb_logger = tb_logger.Logger(logdir='./tb/run1', flush_secs=2) for i in range(10): task_tb_logger.log_value("some_metric", 42, i) task.close()

2 years ago

0 Hi (Again... Sorry For Asking So Many Questions) Question About Using Google Cloud Storage In A Clearml Agent Running In Aws Ec2 Instance. My

I now get this error:
2022-07-18 21:51:29,168 - clearml.storage - ERROR - Failed creating storage object Reason: [Errno 2] No such file or directory: '~/gs.cred'
to be clear, I replaced <this is your GCP storage credentials file> with the contents of that file, escaping every " with a \" and removing newlines.

2 years ago

0 Hi. Help

essentially, several running processes were performing:
model_evals_dataset = Dataset.get( dataset_project=dataset_project, dataset_name=f"model_evals", ) model_evals_dataset.add_files(run_eval_path) model_evals_dataset.upload()

2 years ago

0 Hi. Help

sorry..

2 years ago

0 Hi I'M Looking Into How Clearml Supports Datasets And Dataset Versioning And I'M A Bit Confused. Is Dataset Versioning Not Supported At All In The Non-Enterprise Or Is Versioning Available By A Different Mechanism? I See That

oops, I deleted two messages here because I had a bug in a test I've done.
I'm retesting now

2 years ago

0 Hi. I Have A

AgitatedDove14
Adding adding repo and repo_branch to the pipeline.component decorator worked (and I can move on to my next issue 🙂 ).
I'm still unclear on why cloning the repo in use happens automatically for the pipeline task and not for component tasks.

2 years ago

0 Hi. First Time User Here

In case anyone else is interested. We found two alternative solutions:
Repeating the first steps but from within a Docker container ( docker run -it --rm python:3.9 bash ) worked for me.alternatively
The example tasks (or at least those I've tried) that appear in the clear ml examples within a new workspace have clearml==0.17.5 (an old clearml version) listed in "INSTALLED PACKAGES". Updating the clearml package within the task to 1.5.0 let me run the clear-ml agent daemon lo...

2 years ago

0 Hi. I Have A

also, whereas the pipeline agent's log has:
Executing task id [7a0ad1fb243a4ff3b9e6c477442ded4a]: repository = git@github.com:shpigi/clearml_evaluation.git branch = main version_num = e045904094cf2f4fa61ce92f7b91682f5de64ab8
The component agent's log has:
Executing task id [90de043e354b4b28a84d5cc0788fe63c]: repository = branch = version_num =

2 years ago

0 Hi. I Am Experimenting With

I'm on clearml==1.6.3rc1

2 years ago

0 Hi. Shoulf This Command Succeed In The Presence Of Project

That would be a better message however, I must have misunderstood the meaning of auto_create=True
I thought that flag made the get function into a "get-or-create"

2 years ago

0 I Have A Training Task That Auto-Magically Saves A Model For Me To Gcs

Simpler than I had thought, thanks !

2 years ago

0 Another Question On The Topic Of How A Remote Execution Of A Pipeline Kills The Calling Process (Previously Discussed

You can have

parents

as one of the

@PipelineDecorator.component

args. The step will be executed only after all the

parents

are executed and completed

Is there an example of using parents some place? Im not sure what to pass and also, how to pass a component from one pipeline that was just kicked off to execute remotely (which I'd like to block on) to a component of the next pipeline's run

2 years ago

0 I Started Two Pipelines (Using Aws Autoscaler In App.Clear.Ml ). The Pipelines Ran Concurrently, Using The Same Pipeline Code. Both Failed In The Same Component Half-Way Though The Pipeline Run With:

now trying with added lines as Alon suggested:
` @PipelineDecorator.component(
return_values=["run_model_path", "run_info"],
cache=True,
task_type=TaskTypes.training,
repo="git@github.com:shpigi/clearml_evaluation.git",
repo_branch="main",
packages="./requirements.txt",
)
def train_image_classifier_component(
clearml_dataset,
backbone_name,
image_resize: int,
batch_size: int,
run_model_uri,
run_tb_uri,
local_data_path,
num_epochs: int,
)...

2 years ago

0 Hi. I'M Running This Little Pipeline:

2 years ago

0 Hello Community. I'D Like To Try The Aws Autoscaler (I Actually Prefer To Try The Gcp One But I Think It'S Broken Or, At Least, I'Ve Failed To Make It Work So Far) I Can'T Find Documentation On What Permissions Would Be Required From An Aws Sub-Account

I'm on the pro tier

2 years ago

0 Hi. Help

I had several pipeline components getting it and uploading files to is concurrently.
Can Datsets handle that?

2 years ago

Would you expect this fastai callback to work?
(Uses SummaryWriter):
https://github.com/fastai/fastai/blob/d7f4863f1ee3c0fa9f2d9feeb6a05f0625a53696/fastai/callback/tensorboard.py
It seems to have failed as well (but I'd need to check more carefully)

2 years ago

clearml 1.6.1

2 years ago

0 Hi. I'M Just Starting Out Here Trying To Evaluate Clearml Ease Of Use. I'D Like To Understand Whether Clearml (Paid Service) Can Receive Access To A Gcp Project And Use Gke To Spin Clusters Up And Workers Or Would That Be On The Customer To Manage.

reading this..
https://github.com/allegroai/clearml-agent#kubernetes-integration-optional

2 years ago

uploads are a bit slow though (~4 minutes for 50mb)

2 years ago

0 Hi (Again... Sorry For Asking So Many Questions) Question About Using Google Cloud Storage In A Clearml Agent Running In Aws Ec2 Instance. My

My local environment has clearml version 1.6.3rc0
and agents in aws were started with the AWS Autoscaler which has no explicit place for google credentials.

I see a place for Additional ClearML Configuration in the AWS autoscaler UI which I suspect may help but I don't see how I can pass a secrets file along with my agent.

2 years ago

trying the AWS Autoscaler for the first time I get his error on instance spin up:
An error occurred (InvalidAMIID.NotFound) when calling the RunInstances operation: The image id '[ami-04c0416d6bd8e4b1f]' does not existI tried both us-west-2 and us-east-1b (thinking it might be zone specific).

I'm not sure if this is a permissions issue or a config issue.

The same occures when I try a different image:
ami-06bafe528da33cdb8
(an aws public image)

2 years ago

console output shows uploads of 500 files on every new dataset. The lineage is as expected, each additional upload is the same size as the previous ones (~50mb) and Dataset.get on the last dataset's ID retreives all the files from the separate parts to one local folder.
Checking the remote storage location (gs://) shows artifact zip files, each with 500 files

2 years ago

0 Hi. Help

It seems to be doing ok on the app side:
I didn't realise Datasets had tasks associated with them but there is one and it seems to be doing ok.
I've attached it's log file which only mentions skipping one file (a warning)

2 years ago

or, barring that, something similar on AWS?

2 years ago

0 Hi. I Have A Question About Pipelines And Their Generated Dependency Graphs. I Took The Code Of The Clearml Pipeline From Decorator Example:

Sure. It is a minor change from the code in the clearml examples for pipelines.
I just repeat the last two pipeline steps from that code in a loop (x3)
https://github.com/allegroai/clearml/blob/master/examples/pipeline/pipeline_from_decorator.py

2 years ago

0 Hi. I Have A Question About Pipelines And Their Generated Dependency Graphs. I Took The Code Of The Clearml Pipeline From Decorator Example:

feature request: tell me what gets passed along each edge of the pipeline graph

2 years ago

I'm looking for a minimal set of permissions because we have other sensitive ec2 instances running in the same account and our IT people are rightfully concerned about providing access to that account externally.

2 years ago

thanks. Seems like I was on the right path. Do datasets specified as parents need to be https://clear.ml/docs/latest/docs/clearml_data/clearml_data_sdk/#finalizing-a-dataset ?

2 years ago

This idea seems to work.
I tested this for a scenario where data is periodically added to a dataset and, to "version" the steps, I create a new dataset with the old as parent:
To do so, I split a set of image files into separate folders (pets_000, pets_001, ... pets_015), each with 500 image files
I then run the code here to make the datasets.

2 years ago

Show more results compactanswers