PanickyMoth78

34 Questions, 167 Answers

Active since 10 January 2023

Last activity 5 months ago

Reputation

Badges 1

166 × Eureka!

Questions 34
Answers 167

0 Votes

11 Answers

1K Views

0 Votes 11 Answers 1K Views

Hi. I Have A Few Questions About The Snippet Attached

Hi. I have a few questions about the snippet attached re-running this code produces the same printouts... I chose 47 out of 100 in the pipeline ... I chose 8...

clearml

2 years ago

0 Votes

2 Answers

1K Views

0 Votes 2 Answers 1K Views

Hi. I'M Using

Hi. I'm using @PipelineDecorator.component to define a task from a function (to run in a pipeline) I'd like to get the task object within this function so th...

clearml

2 years ago

0 Votes

3 Answers

1K Views

0 Votes 3 Answers 1K Views

Hi. Looking Into Clearml Support For Datasets, I'D Like To Understand How To Work With Large Datasets And Cases Where Not All The Data Is Downloaded At Once. (E.G. 1. Each Training Epoch Is Performed On A (Preferably Random) Sample Of The Data That Is Dow

Hi. Looking into clearml support for datasets, I'd like to understand how to work with large datasets and cases where not all the data is downloaded at once....

clearml

2 years ago

0 Votes

7 Answers

1K Views

0 Votes 7 Answers 1K Views

I Have 5 Unarchived Pipeline Runs That Were Defined With This Decorator:

I have 5 unarchived pipeline runs that were defined with this decorator: @PipelineDecorator.pipeline( name="fastai_image_classification_pipeline", project="l...

clearml

2 years ago

Show more results

0 Hi. I'M Running This Little Pipeline:

2 years ago

0 Hi. I Have A Question About Pipelines And Their Generated Dependency Graphs. I Took The Code Of The Clearml Pipeline From Decorator Example:

Sure. It is a minor change from the code in the clearml examples for pipelines.
I just repeat the last two pipeline steps from that code in a loop (x3)
https://github.com/allegroai/clearml/blob/master/examples/pipeline/pipeline_from_decorator.py

2 years ago

0 Hi (Again... Sorry For Asking So Many Questions) Question About Using Google Cloud Storage In A Clearml Agent Running In Aws Ec2 Instance. My

I now get this error:
2022-07-18 21:51:29,168 - clearml.storage - ERROR - Failed creating storage object Reason: [Errno 2] No such file or directory: '~/gs.cred'
to be clear, I replaced <this is your GCP storage credentials file> with the contents of that file, escaping every " with a \" and removing newlines.

2 years ago

0 Hi. I Have A Question About Pipelines And Their Generated Dependency Graphs. I Took The Code Of The Clearml Pipeline From Decorator Example:

feature request: tell me what gets passed along each edge of the pipeline graph

2 years ago

0 Hi. Help

It seems to be doing ok on the app side:
I didn't realise Datasets had tasks associated with them but there is one and it seems to be doing ok.
I've attached it's log file which only mentions skipping one file (a warning)

2 years ago

0 Hi. I'M Encountering A Problem With

Ooh nice.
I wasn't aware task.models["output"] also acts like a dict.
I can get the one I care about in my code with something like task.models["output"]["best_model"]
however can you see the inconsistency between the key and the name there:

2 years ago

0 Hi (Again... Sorry For Asking So Many Questions) Question About Using Google Cloud Storage In A Clearml Agent Running In Aws Ec2 Instance. My

Thanks AgitatedDove14 for all the guidance.

2 years ago

0 Hi. I Have A Job That Processes Images And Creates ~5 Gb Of Processed Image Files (Lots Of Small Ones). At The End - It Creates A

That job was using clearml 1.8.3 so I take it that setting max_workers to 1 would not make a difference?
Looking at the docs:
https://clear.ml/docs/latest/docs/references/sdk/dataset/#upload
they say that max_workers = number of cores but looking at the log it does seem like it's doing one chunk every 5 minutes (long time for 500mb upload for a node running in gcp...)

2 years ago

0 Hi. Looking Into Clearml Support For Datasets, I'D Like To Understand How To Work With Large Datasets And Cases Where Not All The Data Is Downloaded At Once. (E.G. 1. Each Training Epoch Is Performed On A (Preferably Random) Sample Of The Data That Is Dow

cool. How can I get started with hyper datasets? is it part of the clearml package?
Is it limited to https://clear.ml/pricing/?gclid=Cj0KCQjw5ZSWBhCVARIsALERCvzehkqVOiqJPaum5fsVyyTNMKce91PBHZd1IhQpEFaKvV7toze2A_0aAgXXEALw_wcB accounts?

2 years ago

0 Bug?

hmm.
this isn't supported though:
dataset_args = dataset.connect(dataset_args)

2 years ago

0 Hi. Shoulf This Command Succeed In The Presence Of Project

That would be a better message however, I must have misunderstood the meaning of auto_create=True
I thought that flag made the get function into a "get-or-create"

2 years ago

0 Hi. I Have A Job That Processes Images And Creates ~5 Gb Of Processed Image Files (Lots Of Small Ones). At The End - It Creates A

I ran another version of the above code where
output_uri="./random_dataset_local_target"
(i.e. db target on local disk instead of gcp).
I still see large memory usage.
I also find it worrisome that while generating the random dataset and writing it to disk took under 3 minutes, generating the hash took 9 minutes and saving the files to a dataset target in an adjacent folder took 30 minutes (10 times longer than writing the original files)! Simply copying the files to an adjacent folde...

2 years ago

0 Hi. I'M Encountering A Problem With

I imagine that one workaround is to
Disable automatic model uploads Perform manual model upload (with the correct name).Can you point me to how to do these?

2 years ago

0 Hi. I Have A Few Questions About The Snippet Attached

Thanks,

Just to be clear, you are saying the "random" results are consistent over runs ?

yes !
By re-runs I mean re-running this script (not cloning the pipeline)

2 years ago

0 Hi. I Have A Few Questions About The Snippet Attached

multi_instance_support=True lets me run the pipeline again 👍
The second run prints out the same (non) "random" numbers as the first run

2 years ago

0 Hi. I Am Experimenting With

TimelyPenguin76 , this turned out to be the reason I was having locking issues https://clearml.slack.com/archives/CTK20V944/p1658761943458649 :
SweetBadger76 , CostlyOstrich36 : I've attempted essentially the same thing before https://clearml.slack.com/archives/CTK20V944/p1657124102133519 and I thought it had worked in the past so I'm not sure why it is failing me now.

2 years ago

0 Hi. I Am Experimenting With

I'm on clearml==1.6.3rc1

2 years ago

0 Hi. I Have A

Yes. I thought this happened automagically with the current git repo when I send a pipeline for execution from my local python environment. Shouldn't it?
It seems to have happened with the agent running the pipeline task.

I'll try adding repo and repo_branch to the pipeline.component decorator

2 years ago

0 Hi. Help

essentially, several running processes were performing:
model_evals_dataset = Dataset.get( dataset_project=dataset_project, dataset_name=f"model_evals", ) model_evals_dataset.add_files(run_eval_path) model_evals_dataset.upload()

2 years ago

0 Hello Community. I'D Like To Try The Aws Autoscaler (I Actually Prefer To Try The Gcp One But I Think It'S Broken Or, At Least, I'Ve Failed To Make It Work So Far) I Can'T Find Documentation On What Permissions Would Be Required From An Aws Sub-Account

trying the AWS Autoscaler for the first time I get his error on instance spin up:
An error occurred (InvalidAMIID.NotFound) when calling the RunInstances operation: The image id '[ami-04c0416d6bd8e4b1f]' does not existI tried both us-west-2 and us-east-1b (thinking it might be zone specific).

I'm not sure if this is a permissions issue or a config issue.

The same occures when I try a different image:
ami-06bafe528da33cdb8
(an aws public image)

2 years ago

I'm on the pro tier

2 years ago

0 Hi. I Have A Question About Pipelines And Their Generated Dependency Graphs. I Took The Code Of The Clearml Pipeline From Decorator Example:

I think this should be a valid use of pipelines. for example - at some step I choose to sweep across several values of some parameter and the rest of the steps are duplicated for each value of that parameter.
The additional edges in the graph suggest that these steps somehow contain dependencies that I do not wish them to have.

2 years ago

0 I Have 5 Unarchived Pipeline Runs That Were Defined With This Decorator:

In fact, all my projects seems empty of tasks.

2 years ago

0 Hi. I Have A Few Questions About The Snippet Attached

perhaps anecdotal but just calling random.seed() will set the seed using the system time for you
https://docs.python.org/3/library/random.html#random.seed

2 years ago

0 Hi. I'D Like To Try The Gcp Autoscaler.

Trying to switch to a resources using gpu-enabled VMs failed with that same error above.
Looking at spawned VMs, they were spawned by the autoscaler without gpu even though I checked that my settings ( n1-standard-1 and nvidia-tesla-t4 and https://console.cloud.google.com/compute/imagesDetail/projects/ml-images/global/images/c0-deeplearning-common-cu113-v20220701-debian-10?project=ml-tooling-test-external image for the VM) can be used to make vm instances and my gcp autoscaler...

2 years ago

0 Hi. I'D Like To Try The Gcp Autoscaler.

so..
I restarted the autoscaler with this configuration object:
` [{"resource_name": "cpu_default", "machine_type": "n1-standard-1", "cpu_only": true, "gpu_type": null, "gpu_count": 1, "preemptible": false, "num_instances": 5, "queue_name": "default", "source_image": "projects/ubuntu-os-cloud/global/images/ubuntu-1804-bionic-v20220131", "disk_size_gb": 100}, {"resource_name": "cpu_services", "machine_type": "n1-standard-1", "cpu_only": true, "gpu_type": null, "gpu_count": 1, "preemptible": fa...

2 years ago

0 Hi. I'D Like To Try The Gcp Autoscaler.

Here are screen shots of a VM I started with a gpu and one stared by the autoscaler with the setting above but whose GPU is missing (both in the zame gcp zone, us-central1-f ) . I may have misconfigured something or perhaps the autoscaler is failing to specify the GPU requirement correctly. :shrug:

2 years ago

0 Is There Some Built-In Way In Clearml To Trigger Further Action On Task Fail (Or Pipeline Fail)?

I suppose one way to perform this is with a https://clear.ml/docs/latest/docs/references/sdk/scheduler that kicks off a health check task (check exit state of executed tasks). It seems more efficient to support a triggered response to task fail.

2 years ago

0 Hi. I Have A Few Questions About The Snippet Attached

Re
re-running this code produces the same printoutsI guess repeatable behaviour is a great default to have for, well, repeatability 🙂

I'm able to "randomize" my results by adding a seed pipeline argument and calling random.seed(seed)
within the pipeline and component. Results then change with change of seed.

I think most veteran ML practitioners are bitten at some point by randomising when they shouldn't and not randomising when they should. It would be nice to have some docu...

2 years ago

Just updating here that I got the AWS autoscaler working with CostlyOstrich36 ’s generous help 🎉

I thought I'd share here some details in case others experience similar difficulties

With regards to permissions, this is the list of actions that the autoscaler uses which your aws account would need to permit:
GetConsoleOutput RequestSpotInstances DescribeSpotInstanceRequests RunInstances DescribeInstances TerminateInstances DescribeInstancesthe instance image ` ami-04c0416d6bd8e...

2 years ago

Show more results