 
			Reputation
Badges 1
131 × Eureka!I got some credentials issues to in some pipelines steps and I solved it using
task = Task.current_task()
task.setup_aws_upload(...)
It can allows you to explicitaly specify credentials
So it seems to be an issue with the component parameter called in:
` @PipelineDecorator.pipeline(
name="VINZ Auto-Retrain",
project="VINZ",
version="0.0.1",
pipeline_execution_queue="Quad_VCPU_16GB"
)
def executing_pipeline(start_date, end_date):
print("Starting VINZ Auto-Retrain pipeline...")
print(f"Start date: {start_date}")
print(f"End date: {end_date}")
window_dataset_id = generate_dataset(start_date, end_date)
if name == 'main':
PipelineDec...
I can test it empirically but I want to be sure what is the expected behavior so my pipeline don't get auto-magically broken after a patch
I suppose your worker is not persistent, so I might suggest having a very cheap instance as a persistent worker where you have your dataset persistently synced using . https://clear.ml/docs/latest/docs/references/sdk/dataset/#sync_folder and then taking the subset of files that interests you and pushing it as a different dataset, marking it as a subset of your main dataset id using a tag
If you're using Helm it would be at the service level in your  values.yml , not pod level
CostlyOstrich36 Should I start a new issue since I pinpointed the exact problem given than the beginning of this one was clearly confusing for both of us ?
Nope same result after having deleted  .clearml
Hey, did you checked that out ? None
Would have been great if the CLearML resolver would just inline the code of locally defined vanilla functions and execute that inlined code under the import scope of the component from which it is called
THat make sense since this function executes your component as classic pythonic functions
Okay thanks! Please keep me posted when the hotfix is out on the SaaS
When running with  PipelineDecorator.run_locally()   I get the legitimate pandas error that I fixed by specifying the  freq  param in the  pd.date_range(....  line in the component:Launching step [generate_dataset] ClearML results page: `
[STEP 1/4] Generating dataset from autocut logs...
Traceback (most recent call last):
File "/tmp/tmp2jgq29nl.py", line 137, in <module>
results = generate_dataset(**kwargs)
File "/tmp/tmp2jgq29nl.py", line 18, in generate_dataset
...
I doubt there is a direct way to do it since they are stored as archive chunks 😕
Did you properly install Docker and Docker nvidia toolkit ? here's the init script i'm using on my autoscaled workers:
#!/bin/sh
sudo apt-get update -y
sudo apt-get install -y \
    ca-certificates \
    curl \
    gnupg \
    lsb-release
sudo mkdir -p /etc/apt/keyrings
curl -fsSL 
 | sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg
echo \
  "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] 
 \
  $(lsb_release -cs) stable" | s...Yup, if you want to access it through https you're required to have a domain pointing to that IP with a certificate in place (using letsencrypt as instance) or else you'll get some SSL error
The expected behavior is that the task would capture the iteration scalar of the PL trainer but nothing is recorded
import clearml
from darts.models import TFTModel
model = TFTModel(
    input_chunk_length=28,
    output_chunk_length=14,
    n_epochs=300,
    batch_size=4096,
    add_relative_index=True,
    num_attention_heads=4,
    dropout=0.3,
    full_attention=True,
    save_checkpoints=True,
)
task = Task.init(
    project_name='sales-prediction',
    task_name='TFT Training 2'...AgitatedDove14 Here you go, I think it's inside the container since it's after the worker pulls the image
Hey  SuccessfulKoala55  currently using the  clearml  package version  1.7.1   and my server is a PRO SaaS deployment
pip package  clearml-agent  is version  1.3.0
And additionally does the  When executing a Task (experiment) remotely, this method has no effect).   part means that if it is executed in a remote worker inside a pipeline without the dataset downloaded the method will have no effect ?
I was launching a pipeline run, but I don't remember having set the autoscaler to use spot instances (I believe the GCP terminology for spot instance is "preemptible" and I set it to false)
AnxiousSeal95  Okay it seems to work with a compute optimized  c2-standard-4  instance
@<1523701087100473344:profile|SuccessfulKoala55> I had already bumped boto3 to its latest version and all the files I added to the dataset were pickle binary files
There is a gap in the GPU offer on GCP and there is no modern middle-ground for a TPU with more than 16GB GRAM and less than 40GB, so sometime we need to provision a A100 to get the training speed we want but we don't use all the GRAM so I figured out if we could batch 2 training tasks on the same A100 instance we would still be on the winning side in term of CUDA cores and getting the most of the GPU-time we're paying.
Oh, that's nice, if I import a model using InputModel do I still need to specify a OutputModel ?
Thus the main difference of behavior must be coming from the  _debug_execute_step_function  property in the  Controller  class, currently skimming through it to try to identify a cause, did I provide you enough info btw  CostlyOstrich36 ?
Takling about that decorator which shouyld also have a docker_arg param since it is executed as an "orchestration component" but the param is missing https://clear.ml/docs/latest/docs/references/sdk/automation_controller_pipelinecontroller/#pipelinedecoratorpipeline
Thanks a lot @<1523701435869433856:profile|SmugDolphin23> ❤