would setting the max_workers to 1 be a (slower) workaround?
AgitatedDove14
Adding adding repo and repo_branch to the pipeline.component decorator worked (and I can move on to my next issue 🙂 ).
I'm still unclear on why cloning the repo in use happens automatically for the pipeline task and not for component tasks.
sort of. Though it seems like the rules for model.name can be a bit non-obvious.
I think that the first model saved gets the task name as its name and the following models take f"{task_name} - {file_name}"
Hey Alon,
See
https://clearml.slack.com/archives/CTK20V944/p1658892624753219
I was able to isolate this as a bug in clearml 1.6.3rc1 that can be reproduced outside of a task / app simply be doing get_local_copy() on a dataset with parents.
perhaps anecdotal but just calling random.seed() will set the seed using the system time for you
https://docs.python.org/3/library/random.html#random.seed
also, whereas the pipeline agent's log has:Executing task id [7a0ad1fb243a4ff3b9e6c477442ded4a]: repository = git@github.com:shpigi/clearml_evaluation.git branch = main version_num = e045904094cf2f4fa61ce92f7b91682f5de64ab8
The component agent's log has:Executing task id [90de043e354b4b28a84d5cc0788fe63c]: repository = branch = version_num =
Trying to switch to a resources using gpu-enabled VMs failed with that same error above.
Looking at spawned VMs, they were spawned by the autoscaler without gpu even though I checked that my settings ( n1-standard-1 and nvidia-tesla-t4 and https://console.cloud.google.com/compute/imagesDetail/projects/ml-images/global/images/c0-deeplearning-common-cu113-v20220701-debian-10?project=ml-tooling-test-external image for the VM) can be used to make vm instances and my gcp autoscaler...
There may be cases where failure occurs before my code starts to run (and, perhaps, after it completes)
Hmm interesting, so like a callback?!
like https://github.com/allegroai/clearml/blob/bca9a6de3095f411ae5b766d00967535a13e8401/examples/pipeline/pipeline_from_tasks.py#L54-L55 pipe-step level callbacks? I guess that mechanism could serve. Where do these callbacks run? In the instantiating process? If so, that would work (since the callback function can be any code I wish, right?)
I might want to dispatch other jobs from within the same process.
This is actually something t...
another weird thing:
Before my training task is done:print(task.models['output'].keys())outputsodict_keys(['Output Model #0', 'Output Model #1', 'Output Model #2'])
after task.close()
I can do:task = Task.get_task(task_id) for i in range(100): print(task.models["output"].keys())which printsodict_keys(['Output Model #0', 'Output Model #1', 'Output Model #2'])in the first iteration
and prints the file names in the latter iterations:
` od...
Oh sure, use
they will be visible on the Dataset page on the version in question
That sounds simple enough.
Though I imagine I'd need to explicitly report every figure. Correct?
I have a task where I create a dataset but I also create a set of matplotlib figures, some numeric statistics and a pandas table that describe the data which I wish to have associated with the dataset and vieawable from the clearml web page for the dataset.
here is what I do:
` try:
dataset = Dataset.get(
dataset_project=bucket_name,
dataset_name=dataset_name,
dataset_version=dataset_version,
)
print(
f"dataset found {dataset.project}/{dataset.name} v{dataset.version}\n(id: {dataset.id})"
)
return dataset
except ValueError:
pass
task = Task.current_task()
if task is None:
task = Task.init(
project_name=bucket_name,...
I can try switching to gpu-enabled machines just to see if that path can be made to work but the services queue shouldn't need gpu so I hope we figure out running the pipeline task on cpu nodes
Something else that I feel is missing from the docs regarding pipelines, as someone who has given kubeflow pipelines a try (in the http://vertex.ai pipelines environment), is some explanation of how functions become pipelines and components.
More specifically, I've learned to watch out for kubeflow pipeline code which is run at definition time (at compilation time, to be more accurate) instead of at pipeline execution time.
This whole experiment with random numbers started as my attempt ...
