Reputation
Badges 1
33 × Eureka!My code pretty much createas a dataset, uploads it, trains a model (thats where the current task starts), evaluates it and upload all the artifacts and metrics. The artifacts and configurations are upload alright, but the metrics and plots are not. As with Lavi, my code hangs on the task.close(), where it seems to be waiting for the metrics, etc but never finishes. No retry message is shown as well.
After a print I added for debug right before task.close() the only message I get in the consol...
UnsightlyHorse88 , do you know?
` all done
ClearML Monitor: Could not detect iteration reporting, falling back to iterations as seconds-from-start
^CTraceback (most recent call last):
File "/home/zanini/repo/RecSys/src/cli/retraining_script.py", line 710, in <module>
mr.retrain()
File "/home/zanini/repo/RecSys/src/cli/retraining_script.py", line 701, in retrain
self.task.close()
File "/home/zanini/repo/RecSys/.venv/lib/python3.9/site-packages/clearml/task.py", line 1783, in close
self.__shutdown()
File "...
That's the script that produces the error. You can also observe the struggle with importing the load_model function. (Any tips on best practices to structure the pipeline are also gladly accepted)
did manage to get it working, but only by hardcoding the path of the repository using sys.path.append()
with absolute repository path on my machine
Looks quite good indeed! Thanks! Is there in the repository the experiment template used in this example? Just not fully sure how the parameters are used/connected in it. Could I just build it and log these parameters using task.set_parameters()
so that I call task.get_parameters()
later?
` import importlib
import argparse
from datetime import datetime
import pandas as pd
from clearml.automation.controller import PipelineDecorator
from clearml import TaskTypes, Task
@PipelineDecorator.component(
return_values=['model', 'features_to_build']
)
def get_model_and_features(task_id, model_type):
from clearml import Task
import sys
sys.path.insert(0,'/home/zanini/repo/RecSys')
from src.dataset.backtest import load_model
task = Task.get_task(task_id=task_i...
Yes, seems indeed it was waiting for the uploads, which weren't happening ( I did give it quite a while to try to finish the process in my tests). I thought it was a problem with metrics, but apprently it was more like the artifacts before them. The artifacts were shown in the webui dashboard, but were not on S3