Unanswered
I Started Two Pipelines (Using Aws Autoscaler In App.Clear.Ml ). The Pipelines Ran Concurrently, Using The Same Pipeline Code. Both Failed In The Same Component Half-Way Though The Pipeline Run With:
start a training task. From what I can tell from the console log, the agent hasn't actually started running the component.
This is the component code. It is a wrapper around a non-component training function
` @PipelineDecorator.component(
return_values=["run_model_path", "run_info"],
cache=True,
task_type=TaskTypes.training,
repo="git@github.com:shpigi/clearml_evaluation.git",
repo_branch="main",
packages="./requirements.txt",
)
def train_image_classifier_component(
clearml_dataset,
backbone_name,
image_resize: int,
batch_size: int,
run_model_uri,
run_tb_uri,
local_data_path,
num_epochs: int,
):
import sys
sys.path.insert(0, "/src/clearml_evaluation/")
from image_classifier_training import training_functions
return training_functions.train_image_classifier(
clearml_dataset,
backbone_name,
image_resize,
batch_size,
run_model_uri,
run_tb_uri,
local_data_path,
num_epochs,
) `
179 Views
0
Answers
2 years ago
one year ago
Tags