Reputation
Badges 1
33 × Eureka!Additionally, I have the following error now:
` 2022-08-10 19:53:25,366 - clearml.Task - INFO - Waiting to finish uploads
2022-08-10 19:53:36,726 - clearml.Task - INFO - Finished uploading
Traceback (most recent call last):
File "/home/zanini/repo/RecSys/src/dataset/backtest.py", line 186, in <module>
backtest = run_backtest(
File "/home/zanini/repo/RecSys/.venv/lib/python3.9/site-packages/clearml/automation/controller.py", line 3329, in internal_decorator
a_pipeline.stop()
File...
` import importlib
import argparse
from datetime import datetime
import pandas as pd
from clearml.automation.controller import PipelineDecorator
from clearml import TaskTypes, Task
@PipelineDecorator.component(
return_values=['model', 'features_to_build']
)
def get_model_and_features(task_id, model_type):
from clearml import Task
import sys
sys.path.insert(0,'/home/zanini/repo/RecSys')
from src.dataset.backtest import load_model
task = Task.get_task(task_id=task_i...
Should work as long as they are in the same file, you can however launch and wait any Task (see pipelines from tasks)
Do I call it as a function normally as in the other or do I need to import? (My initial problem was actually that is was not founding the other function as a pipeline component, so I thought it was not able to import as a secondary sub-component)
Apparently the error comes when I try to access from get_model_and_features
the pipeline component load_model
. If it is not set as pipeline component and only as helper function (provided it is declared before the components that calls it (I already understood that and fixed, different from the code I sent above).
That's the script that produces the error. You can also observe the struggle with importing the load_model function. (Any tips on best practices to structure the pipeline are also gladly accepted)
I noticed that when a pipeline step returns an instance of a class, it tries to pickle. I am currently facing the issue with it not being able to pickle the output of the "load_baseline_model" function
` Traceback (most recent call last):
File "/tmp/tmpqr2zwiom.py", line 37, in <module>
task.upload_artifact(name=name, artifact_object=artifact)
File "/home/zanini/repo/RecSys/.venv/lib/python3.9/site-packages/clearml/task.py", line 1877, in upload_artifact
return self._artifacts_man...
It works if I use as a helper function, but not as a component (using the decorator)
It is an instance of a custom class.
But how do I link it to the specific task to be listed as artifact?
The error comes out after the execution of the component backtest_prod
Apparently found out a solution:dataset_zip = dataset._task.artifacts['data'].get()
will return the path to the zip file containing all the files (that will be downloaded to the local machine)
after that:import zipfile zip_file = zipfile.ZipFile(d, 'r') files = zip_file.namelist()
retrieving the names of the files
unzip usingimport os os.system(f'unzip {dataset_zip}') # in this case to your script directory
and using the files
list one can them open them selectively
Could you supply any reference of this dataset containing other datasets? I might have skipped that when reading the documentation, but I do not recall seeing this functionality.
UnsightlyHorse88 , do you know?
` from importlib.machinery import EXTENSION_SUFFIXES
import catboost
from clearml import Task, Logger, Dataset
import lightgbm as lgb
import numpy as np
import pandas as pd
import dask.dataframe as dd
import matplotlib.pyplot as plt
MODELS = {
'catboost': {
'model_class': catboost.CatBoostClassifier,
'file_extension': 'cbm'
},
'lgbm': {
'model_class': lgb.LGBMClassifier,
'file_extension': 'txt'
}
}
class ModelTrainer():
def init(sel...
Simplified a little bit and removed private parameters, but thats pretty much the code. We did not try with toy examples, since that was already done with the example pipelines when we implemented and the model training itself is quite simple basic there already (only few hyperparameters set)
That would make sense, although clearml, at least on UI, shows the deeper level of the nested dict as a int, as one would expect
oooohhh.. you mean the key of the nested dict, that would make a lot of sense
I will try the suggested edit here
It worked!
Martin, if you want, feel free to add your answer in the stackoverflow so that I can mark it as a solution.
I saw regarding the chunks, but it is not clear how one can retrieve the dataset based on files
I was checking here, and apparently if I use a parameter as suggested, together with a Task.init(task_name=f'{task name in this loop}')
for each of the loops it should work, right? Creating different tasks in the server
yes, variations of the data, using only a subset of the features
Looks quite good indeed! Thanks! Is there in the repository the experiment template used in this example? Just not fully sure how the parameters are used/connected in it. Could I just build it and log these parameters using task.set_parameters()
so that I call task.get_parameters()
later?
yes, but is there a way to generate multiple tasks like I mentioned using task.init in different points of a .py and and run each of them as a separate remote exercution? Didn you just say that once I trigger the task.execute_remotely it will ignore the task.init?
regarding (2), if use run_remote, does it also ignore the init?
Considering something along the lines of
https://github.com/allegroai/clearml/blob/master/examples/advanced/execute_remotely_example.py
Is there a way to do that to trigger separate remote executions?