Reputation
Badges 1
166 × Eureka!in order for the autoscaler to access your git , in the wizard you have to provide the git user/token
git_pass has the token
Perhaps I should have mentined that I start the AWS autoscaler with the https://app.clear.ml/applications/aws-autoscaler/ .
Hmm, how does the decorator of the component looks like ? meaning did you specify a repo/branch/commit there
Neither my pipeline decorator not my component specify any repos:
` # pipeline
@PipelineDecorator.pipeline(
name=...
Note that if I change the component to return a regular meaningless string - "mock_path" , the pipeline completes rather quickly and the dataset is not uploaded.
Two values:
`
@PipelineDecorator.component(
return_values=["run_model_path", "run_tb_path"],
cache=False,
task_type=TaskTypes.training,
packages=[
"clearml",
"tensorboard_logger",
"timm",
"fastai",
"torch==1.11.0",
"torchvision==0.12.0",
"protobuf==3.19.*",
"tensorboard",
"google-cloud-storage>=1.13.2",
],
repo="git@github.com:shpigi/clearml_evaluation.git",
repo_branch="main",
)
def train_ima...
(I see the same thing in some evaluation code that I've written so I thought I'd reproduce it in the standard example)
I get the same error with those added lines
would setting the max_workers to 1 be a (slower) workaround?
AgitatedDove14
Adding adding repo and repo_branch to the pipeline.component decorator worked (and I can move on to my next issue 🙂 ).
I'm still unclear on why cloning the repo in use happens automatically for the pipeline task and not for component tasks.
sort of. Though it seems like the rules for model.name can be a bit non-obvious.
I think that the first model saved gets the task name as its name and the following models take f"{task_name} - {file_name}"
Hey Alon,
See
https://clearml.slack.com/archives/CTK20V944/p1658892624753219
I was able to isolate this as a bug in clearml 1.6.3rc1 that can be reproduced outside of a task / app simply be doing get_local_copy() on a dataset with parents.
perhaps anecdotal but just calling random.seed() will set the seed using the system time for you
https://docs.python.org/3/library/random.html#random.seed
also, whereas the pipeline agent's log has:Executing task id [7a0ad1fb243a4ff3b9e6c477442ded4a]: repository = git@github.com:shpigi/clearml_evaluation.git branch = main version_num = e045904094cf2f4fa61ce92f7b91682f5de64ab8
The component agent's log has:Executing task id [90de043e354b4b28a84d5cc0788fe63c]: repository = branch = version_num =
I have a task where I create a dataset but I also create a set of matplotlib figures, some numeric statistics and a pandas table that describe the data which I wish to have associated with the dataset and vieawable from the clearml web page for the dataset.
Trying to switch to a resources using gpu-enabled VMs failed with that same error above.
Looking at spawned VMs, they were spawned by the autoscaler without gpu even though I checked that my settings ( n1-standard-1 and nvidia-tesla-t4 and https://console.cloud.google.com/compute/imagesDetail/projects/ml-images/global/images/c0-deeplearning-common-cu113-v20220701-debian-10?project=ml-tooling-test-external image for the VM) can be used to make vm instances and my gcp autoscaler...
There may be cases where failure occurs before my code starts to run (and, perhaps, after it completes)
Hmm interesting, so like a callback?!
like https://github.com/allegroai/clearml/blob/bca9a6de3095f411ae5b766d00967535a13e8401/examples/pipeline/pipeline_from_tasks.py#L54-L55 pipe-step level callbacks? I guess that mechanism could serve. Where do these callbacks run? In the instantiating process? If so, that would work (since the callback function can be any code I wish, right?)
I might want to dispatch other jobs from within the same process.
This is actually something t...
another weird thing:
Before my training task is done:print(task.models['output'].keys())outputsodict_keys(['Output Model #0', 'Output Model #1', 'Output Model #2'])
after task.close()
I can do:task = Task.get_task(task_id) for i in range(100): print(task.models["output"].keys())which printsodict_keys(['Output Model #0', 'Output Model #1', 'Output Model #2'])in the first iteration
and prints the file names in the latter iterations:
` od...
