
Reputation
Badges 1
131 × Eureka!Hey, did you checked that out ? None
Yes but not in the controller itself, which is also remotely executed in a docker container
AnxiousSeal95 Okay it seems to work with a compute optimized c2-standard-4
instance
AgitatedDove14 I have annotation logs from the end-user that I fetch periodically, I process it and I want to add it as a new version of my dataset where all versions correspond to the data collected during a precise time window, currently I'm doing it by fetching the latest dataset, incrementing the versionmm and creating a new dataset version
And by extension is there a way to upsert a dataset by automatically creating an entry wich a incremented version or create it if it does not exists ? Or am I forced to do a get, check if the latest version is fainallyzed, then increment de version of that version and create my new version ?
And running with a Python 3.10
interpreter
The expected behavior is that the task would capture the iteration scalar of the PL trainer but nothing is recorded
import clearml
from darts.models import TFTModel
model = TFTModel(
input_chunk_length=28,
output_chunk_length=14,
n_epochs=300,
batch_size=4096,
add_relative_index=True,
num_attention_heads=4,
dropout=0.3,
full_attention=True,
save_checkpoints=True,
)
task = Task.init(
project_name='sales-prediction',
task_name='TFT Training 2'...
I doubt there is a direct way to do it since they are stored as archive chunks 😕
No I was was pointing out the lack of one, but turns out on some model the iteration is so slow even on GPUs when training on a lots of time serie that you have to set the pytorch lightning trainer argument log_every_n_steps
to 1
(default 50
) to prevent the ClearML iteration logger from timing-out
Sorry, I meant the scalar logging doesn't collect anything like it would do during a vanilla Pytorch Lightning training, here is the repo of the lib https://github.com/unit8co/darts
I have a pipeline with a single component:
` @PipelineDecorator.component(
return_values=['dataset_id'],
cache=True,
task_type=TaskTypes.data_processing,
execution_queue='Quad_VCPU_16GB'
)
def generate_dataset(start_date: str, end_date: str, input_aws_credentials_profile: str = 'default'):
"""
Convert autocut logs from a specified time window into usable dataset in generic format.
"""
print('[STEP 1/4] Generating dataset from autocut logs...')
import os
...
SmugDolphin23 But the training.py has already a CLearML task created under the hood since its integration with ClearML, beside initing the task before the execution of the file like in my snippet is not sufficient ?
Nice it works 😍
I'll try to update the version in the image I provide to the workers of th autoscaler app (but sadly I don't control the version of those in itself since it's CLearML managed)
Takling about that decorator which shouyld also have a docker_arg param since it is executed as an "orchestration component" but the param is missing https://clear.ml/docs/latest/docs/references/sdk/automation_controller_pipelinecontroller/#pipelinedecoratorpipeline
Hey SuccessfulKoala55 currently using the clearml
package version 1.7.1
and my server is a PRO SaaS deployment
SuccessfulKoala55 Mostly the VM instances types and properties, execution queue and app name.
As specified in the initial message, the instance type used is e2-standard-4
Well if you have:
ret_obj = None
for in in range(5):
ret_obj = step_x(ret_obj)
SInce the orchestration automatically determine the order of execution using the logic of return objects the controller will execute them sequentially.
However, if your steps don't have dependencies like this:
for i in range(5):
step_x(...)
It will try to execute them concurrently
ClearML package version used: 1.9.1
ClearML Server: SaaS - Pro Tier
Oh wow, would definitely try it out if there were an Autoscaler App integrating it with ClearML
You can set a dummy step which is executed in parallel of your pre-processing step and which is set to be executed in your GPU queue, provided that your autoscaler doesn't scale back your compute before your pre-processing is complete that should do the trick
Would gladly try to run it on a remote instance to verify the thesis on some local cache acting up but unfortunately also ran into an issue with the GCP autoscaler https://clearml.slack.com/archives/CTK20V944/p1665664690293529
So basically CostlyOstrich36 I feel like debug_pipeline()
use the latest version of my code as it is defined on my filesystem but the run_locally()
used a previous version it cached somehow
The value of start_date
and end_date
seems to be None
Hey, I'm a SaaS user in PRO tier and I was wondering if it was a feature available on the auto-scaler apps so I could improve the cost-efficiency of my provisionned GCP A100 instances
print(f"start_date: {start_date} end_date: {end_date}") time_range = pd.date_range(start=start_date, end=end_date, freq='D').to_pydatetime().tolist()
AgitatedDove14 Here you go, I think it's inside the container since it's after the worker pulls the image
Component's prototype seems fine:@PipelineDecorator.component( return_values=['dataset_id'], cache=False, task_type=TaskTypes.data_processing, execution_queue='Quad_VCPU_16GB', ) def generate_dataset(start_date: str, end_date: str, input_aws_credentials_profile: str = 'default'):
Nice, that's a great feature! I'm also trying to have a component executing Giskard QA test suites on model and data, is there a planned feature when I can suspend execution of the pipeline, and display on the UI that this pipeline "steps" require a human confirmation to go on or stop while displaying arbitrary text/plot information ?