Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Hello! I'M Trying To Figure Out How To Deploy A Scheduled Pipeline. I Have A Sample Pipeline Here

I'm trying to figure out how to deploy a scheduled pipeline.
I have a sample pipeline here

###------EXAMPLE FROM GITHUB------
from clearml.automation.controller import PipelineDecorator
from clearml import TaskTypes

# Make the following function an independent pipeline component step
# notice all package imports inside the function will be automatically logged as
# required packages for the pipeline execution step
@PipelineDecorator.component(return_values=['data_frame'], cache=True, task_type=TaskTypes.data_processing)
def step_one(pickle_data_url: str, extra: int = 43):
    # make sure we have scikit-learn for this step, we need it to use to unpickle the object
    import sklearn  # noqa
    import pickle
    import pandas as pd
    from clearml import StorageManager
    local_iris_pkl = StorageManager.get_local_copy(remote_url=pickle_data_url)
    with open(local_iris_pkl, 'rb') as f:
        iris = pickle.load(f)
    data_frame = pd.DataFrame(iris['data'], columns=iris['feature_names'])
    data_frame.columns += ['target']
    data_frame['target'] = iris['target']
    return data_frame

    return_values=['X_train, X_test, y_train, y_test'], cache=True, task_type=TaskTypes.data_processing
def step_two(data_frame, test_size=0.2, random_state=42):
    # make sure we have pandas for this step, we need it to use the data_frame
    import pandas as pd  # noqa
    from sklearn.model_selection import train_test_split
    y = data_frame['target']
    X = data_frame[(c for c in data_frame.columns if c != 'target')]
    X_train, X_test, y_train, y_test = train_test_split(
        X, y, test_size=test_size, random_state=random_state

    return X_train, X_test, y_train, y_test

@PipelineDecorator.component(return_values=['model'], cache=True, task_type=TaskTypes.training)
def step_three(X_train, y_train):
    # make sure we have pandas for this step, we need it to use the data_frame
    import pandas as pd  # noqa
    from sklearn.linear_model import LogisticRegression
    model = LogisticRegression(solver='liblinear', multi_class='auto')
    model.fit(X_train, y_train)
    return model

@PipelineDecorator.component(return_values=['accuracy'], cache=True, task_type=TaskTypes.qc)
def step_four(model, X_data, Y_data):
    from sklearn.linear_model import LogisticRegression  # noqa
    from sklearn.metrics import accuracy_score
    Y_pred = model.predict(X_data)
    return accuracy_score(Y_data, Y_pred, normalize=True)

# The actual pipeline execution context
# Only when a return value is used, the pipeline logic will wait for the component execution to complete
@PipelineDecorator.pipeline(name='custom pipeline logic', project='jean_clearml_demo', version='0.0.5')
def executing_pipeline(pickle_url, mock_parameter='mock'):
    print('pipeline args:', pickle_url, mock_parameter)

    # Use the pipeline argument to start the pipeline and pass it ot the first step
    print('launch step one')
    data_frame = step_one(pickle_url)

    print('launch step two')
    X_train, X_test, y_train, y_test = step_two(data_frame)

    print('launch step three')
    model = step_three(X_train, y_train)

    print('returned model: {}'.format(model))

    print('launch step four')
    accuracy = 100 * step_four(model, X_data=X_test, Y_data=y_test)

    # we actually deserialize the object from the fourth step, and thus wait for the fourth step to complete.

if __name__ == '__main__':
    # set the pipeline steps default execution queue (per specific step we can override it with the decorator)
    # PipelineDecorator.set_default_execution_queue('default')
    # Run the pipeline steps as subprocesses on the current machine, great for local executions
    # (for easy development / debugging, use `PipelineDecorator.debug_pipeline()` to execute steps as regular functions)

    # Start the pipeline execution logic.

    print('process completed')

And the scheduler script here

"""Schedules for ClearML"""
from clearml import Task
from clearml.automation import TaskScheduler

# Create the scheduler
scheduler = TaskScheduler(
    force_create_task_name='sample schedule'

# Get the task that we want to rerun
task_to_schedule = Task.get_task(project_name="jean_clearml_demo", task_name='custom pipeline logic')

# Add the to run the pipeline, default timezone is UTC, 2 minutes just for demo

# Start the scheduler

I set up an agent using clearml-agent daemon --queue default
What I'm unclear about is how to set up the agent remotely, so that it runs within the server, even when my machine is shut down

Posted 9 months ago
Votes Newest

Answers 9

Hi @<1603560525352931328:profile|BeefyOwl35> , what do you mean set up the agent remotely? You run the agent on your machine and want the agent to run when it's shut down?

Posted 9 months ago

Sorry, still new to how this environment works; would this then work if I deployed it, say from an EC2 instance rather than my machine?

Posted 9 months ago

It would work from your machine as well, but the machine needs to be turned on... like when an ec2 instance that is running.

Posted 9 months ago

But... if the machine is shut down, how do you expect the agent to run?

Posted 9 months ago

normally, you should have a agent running behind a "services" queue, as part of your docker-compose. You just need to make sure that you populate the appropriate configuration on the Server (aka set the right environment variable for the docker services)
That agent will run as long as your self-hosted server is running

Posted 9 months ago

Hi @<1523701070390366208:profile|CostlyOstrich36> yes, this is what I'd want to do. Would this be the right way to do it?

Posted 9 months ago

The agent is basically a daemon process that is sitting on any machine and is capable of running jobs. You can set it up on any machine you would like but the machine has to be turned on...

Posted 9 months ago

@<1603560525352931328:profile|BeefyOwl35> You would usually run the agent in a remote machine. Are you using a self-deployed ClearML server?

Posted 9 months ago

Yes @<1523701087100473344:profile|SuccessfulKoala55> , a self deployed server

Posted 9 months ago
9 Answers
9 months ago
9 months ago
Similar posts