Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Profile picture
PanickyMoth78
Moderator
33 Questions, 165 Answers
  Active since 10 January 2023
  Last activity one month ago

Reputation

0

Badges 1

164 × Eureka!
0 Hi. I'M Encountering A Problem With

BTW:

If I try to find the right model in the

task.models["output"]

(this time there is just one but in my code there may be several) it appears with the

(see other attached screenshot).

What would make sense here ? (I have to be honest I'm not sure).

If the model was saved with a file name (is that the trigger for auto-upload?), I think it makes sense for the model name to match the file name (not the task name), especially when there may be ...

one year ago
one year ago
0 Bug?

I don't mind assigning to the task the same name that I'd assign to the dataset. I just think that the create function should expect dataset_name to be None in the case of use_current_task=True (or allow the dataset name to differ from the task name)

one year ago
0 Hi. I Have A Job That Processes Images And Creates ~5 Gb Of Processed Image Files (Lots Of Small Ones). At The End - It Creates A

Q: is there an equivalent env var for sdk.google.storage.pool_connections/pool_maxsize ? My jobs are running remotely and not within a clearml agent at the moment so they get clearml config through env vars.

one year ago
0 Hi. I Have A Job That Processes Images And Creates ~5 Gb Of Processed Image Files (Lots Of Small Ones). At The End - It Creates A

I tried playing with those parameters on my laptop to no great effect.

Here is code you can use to reproduce the issue:

` import os
from pathlib import Path
from tqdm import tqdm
from clearml import Dataset, Task

def dataset_upload_test(project_id:str, bucket_name:str
):
def _random_file(fpath, sizekb):
fileSizeInBytes = 1024 * sizekb
with open(fpath, "wb") as fout:
fout.write(os.urandom(fileSizeInBytes))

def random_dataset(dataset_path, num_files, file...
one year ago
0 Hi. I Have A Job That Processes Images And Creates ~5 Gb Of Processed Image Files (Lots Of Small Ones). At The End - It Creates A

That job was using clearml 1.8.3 so I take it that setting max_workers to 1 would not make a difference?
Looking at the docs:
https://clear.ml/docs/latest/docs/references/sdk/dataset/#upload
they say that max_workers = number of cores but looking at the log it does seem like it's doing one chunk every 5 minutes (long time for 500mb upload for a node running in gcp...)

one year ago
0 Hi. I Have A Job That Processes Images And Creates ~5 Gb Of Processed Image Files (Lots Of Small Ones). At The End - It Creates A

I ran another version of the above code where
output_uri="./random_dataset_local_target"
(i.e. db target on local disk instead of gcp).
I still see large memory usage.
I also find it worrisome that while generating the random dataset and writing it to disk took under 3 minutes, generating the hash took 9 minutes and saving the files to a dataset target in an adjacent folder took 30 minutes (10 times longer than writing the original files)! Simply copying the files to an adjacent folde...

one year ago
one year ago
0 Bug?

I have a task where I create a dataset but I also create a set of matplotlib figures, some numeric statistics and a pandas table that describe the data which I wish to have associated with the dataset and vieawable from the clearml web page for the dataset.

one year ago
0 Bug?

I was doing it with the task that I had been using. Mostly for logging arguments that control what the dataset will contain.

one year ago
0 Bug?

hmm.
this isn't supported though:
dataset_args = dataset.connect(dataset_args)

one year ago
0 Bug?

Yeah. I was only using the task for the process of creating the dataset.

My code does start out with a step that checks for the existence of the dataset, returning it if it exists (search by project name/dataset name/version) rather than recreating it.
I noticed the name mismatch when that check kept failing me...

I think that init-ing the encompassing task with the relevant dataset name still allows me to search for the dataset by dataset_name=task_name / project_name (shared by both datas...

one year ago
0 Bug?

here is what I do:
` try:
dataset = Dataset.get(
dataset_project=bucket_name,
dataset_name=dataset_name,
dataset_version=dataset_version,
)
print(
f"dataset found {dataset.project}/{dataset.name} v{dataset.version}\n(id: {dataset.id})"
)
return dataset
except ValueError:
pass

task = Task.current_task()
if task is None:
    task = Task.init(
        project_name=bucket_name,...
one year ago
0 Task Struck At

I mean that it was uploading console logs scalar plots and images fine just a while ago and then it seems to have stopped uploading all scalar plot metrics and the figures but log upload was still fine.

Anyway, it is back to working properly now without any code change (as far as I can tell. I tried commenting out a line or two and then brought them all back)

If I end up with something reproducible I'll post here.

one year ago
0 Task Struck At

no retry mesages
CLEARML_FILES_HOST is gs
CLEARML_API_HOST is a self hosted clearml server (in google compute engine).

Note that earlier in the process the code uploads a dataset just fine

one year ago
0 Task Struck At

there may have been some interaction between the training task and a preceding dataset creation task :shrug:

one year ago
0 Hi. Question About Dataset Upload Errors: When Uploading A

If

Dataset.upload()

does not crash or return a success value that I can check and

Are you saying that with this error showing upload data does not crash? (edited)

Unfortunately that is correct. It continues as if nothing happened!

To replicate this in linux (even with max_workers=1 ):
https://averagelinuxuser.com/limit-bandwidth-linux/ to throttle your connection: sudo apt-get install wondershaper
Throttle your connection to 1mb/s with somethin...

one year ago
0 Hi. Question About Dataset Upload Errors: When Uploading A

Thanks AgitatedDove14
setting max_workers to 1 prevents the error (but, I assume, it may come the cost of slower sequential uploads).

My main concern now is that this may happen within a pipeline leading to unreliable data handling.

If Dataset.upload() does not crash or return a success value that I can check and if Dataste.get_local_copy() also does not complain as it retrieves partial data - how will I ever know that I lost part of my dataset?

one year ago
0 Hi. Question About Dataset Upload Errors: When Uploading A

I have google-cloud-storage==2.6.0 installed

one year ago
0 Hi. I'M Encountering A Problem With

Right. Thanks.
With several models saved by the training process (whose code is not task-aware) I suspect that doing the update call after training completed will only update the last of the uploaded models.
I'm currently looking at a workaround where:
I disable auto saving by https://clear.ml/docs/latest/docs/clearml_sdk/task_sdk/#automatic-logging Manually upload the models Manually register the models with https://github.com/allegroai/clearml/blob/cf7361e134554f4effd939ca67e8ecb2345b...

one year ago
0 Is There Some Built-In Way In Clearml To Trigger Further Action On Task Fail (Or Pipeline Fail)?

There may be cases where failure occurs before my code starts to run (and, perhaps, after it completes)

one year ago
0 Hi. Help

essentially, several running processes were performing:
model_evals_dataset = Dataset.get( dataset_project=dataset_project, dataset_name=f"model_evals", ) model_evals_dataset.add_files(run_eval_path) model_evals_dataset.upload()

one year ago
0 I Started Two Pipelines (Using Aws Autoscaler In App.Clear.Ml ). The Pipelines Ran Concurrently, Using The Same Pipeline Code. Both Failed In The Same Component Half-Way Though The Pipeline Run With:

Hey Alon,
See
https://clearml.slack.com/archives/CTK20V944/p1658892624753219
I was able to isolate this as a bug in clearml 1.6.3rc1 that can be reproduced outside of a task / app simply be doing get_local_copy() on a dataset with parents.

one year ago
0 I Started Two Pipelines (Using Aws Autoscaler In App.Clear.Ml ). The Pipelines Ran Concurrently, Using The Same Pipeline Code. Both Failed In The Same Component Half-Way Though The Pipeline Run With:

switching back to version 1.6.2. cleared this issue (but re-introduced others for which I have been using the release candidate)

one year ago
0 Another Question On The Topic Of How A Remote Execution Of A Pipeline Kills The Calling Process (Previously Discussed

You can have

parents

as one of the

@PipelineDecorator.component

args. The step will be executed only after all the

parents

are executed and completed

Is there an example of using parents some place? Im not sure what to pass and also, how to pass a component from one pipeline that was just kicked off to execute remotely (which I'd like to block on) to a component of the next pipeline's run

one year ago
Show more results compactanswers