
Reputation
Badges 1
33 × Eureka!Looks quite good indeed! Thanks! Is there in the repository the experiment template used in this example? Just not fully sure how the parameters are used/connected in it. Could I just build it and log these parameters using task.set_parameters()
so that I call task.get_parameters()
later?
` from importlib.machinery import EXTENSION_SUFFIXES
import catboost
from clearml import Task, Logger, Dataset
import lightgbm as lgb
import numpy as np
import pandas as pd
import dask.dataframe as dd
import matplotlib.pyplot as plt
MODELS = {
'catboost': {
'model_class': catboost.CatBoostClassifier,
'file_extension': 'cbm'
},
'lgbm': {
'model_class': lgb.LGBMClassifier,
'file_extension': 'txt'
}
}
class ModelTrainer():
def init(sel...
I will try the suggested edit here
Simplified a little bit and removed private parameters, but thats pretty much the code. We did not try with toy examples, since that was already done with the example pipelines when we implemented and the model training itself is quite simple basic there already (only few hyperparameters set)
My code pretty much createas a dataset, uploads it, trains a model (thats where the current task starts), evaluates it and upload all the artifacts and metrics. The artifacts and configurations are upload alright, but the metrics and plots are not. As with Lavi, my code hangs on the task.close(), where it seems to be waiting for the metrics, etc but never finishes. No retry message is shown as well.
After a print I added for debug right before task.close() the only message I get in the consol...
oooohhh.. you mean the key of the nested dict, that would make a lot of sense
Is there a way to do that to trigger separate remote executions?
I was checking here, and apparently if I use a parameter as suggested, together with a Task.init(task_name=f'{task name in this loop}')
for each of the loops it should work, right? Creating different tasks in the server
That would make sense, although clearml, at least on UI, shows the deeper level of the nested dict as a int, as one would expect
yes, but is there a way to generate multiple tasks like I mentioned using task.init in different points of a .py and and run each of them as a separate remote exercution? Didn you just say that once I trigger the task.execute_remotely it will ignore the task.init?
regarding (2), if use run_remote, does it also ignore the init?
Considering something along the lines of
https://github.com/allegroai/clearml/blob/master/examples/advanced/execute_remotely_example.py
I saw regarding the chunks, but it is not clear how one can retrieve the dataset based on files
yes, variations of the data, using only a subset of the features
UnsightlyHorse88 , do you know?
Martin, if you want, feel free to add your answer in the stackoverflow so that I can mark it as a solution.
It worked!
Apparently found out a solution:dataset_zip = dataset._task.artifacts['data'].get()
will return the path to the zip file containing all the files (that will be downloaded to the local machine)
after that:import zipfile zip_file = zipfile.ZipFile(d, 'r') files = zip_file.namelist()
retrieving the names of the files
unzip usingimport os os.system(f'unzip {dataset_zip}') # in this case to your script directory
and using the files
list one can them open them selectively
Could you supply any reference of this dataset containing other datasets? I might have skipped that when reading the documentation, but I do not recall seeing this functionality.
Yes, seems indeed it was waiting for the uploads, which weren't happening ( I did give it quite a while to try to finish the process in my tests). I thought it was a problem with metrics, but apprently it was more like the artifacts before them. The artifacts were shown in the webui dashboard, but were not on S3
` all done
ClearML Monitor: Could not detect iteration reporting, falling back to iterations as seconds-from-start
^CTraceback (most recent call last):
File "/home/zanini/repo/RecSys/src/cli/retraining_script.py", line 710, in <module>
mr.retrain()
File "/home/zanini/repo/RecSys/src/cli/retraining_script.py", line 701, in retrain
self.task.close()
File "/home/zanini/repo/RecSys/.venv/lib/python3.9/site-packages/clearml/task.py", line 1783, in close
self.__shutdown()
File "...
After commenting all the metric/plot reporting, we noticed the model was not uploading the artifacts to S3. A solution was to add wait_for_upload
in task.upload_artifact()
sorted by using command below before docker-compose callexport DOCKER_DEFAULT_PLATFORM=linux/amd64
But how do I link it to the specific task to be listed as artifact?