Reputation
Badges 1
131 × Eureka!But the task appeared with the correct name and outputs in the pipeline and the experiment manager
Sorry, I meant the scalar logging doesn't collect anything like it would do during a vanilla Pytorch Lightning training, here is the repo of the lib https://github.com/unit8co/darts
@<1523701205467926528:profile|AgitatedDove14> Yup I tested to no avail, a bit sad that there is no working integration with one of the leading time series framework...
Sure, as mentioned above: "I had to revert the change a simple policy".
The upload worked after the rollback, thus justifying my suspicions about the lifecycle policy causing the issue.
The worker docker image was running on python 3.8 and weare running on a PRO tier SaaS deployment, this failed run is from a few weeks ago and we did not run any pipeline since then
This is funny cause the auto-scaler on GPU instances is working fine, but as the backtrace suggests it seems to be linked to this instance family
And this is a standard pro saas deployment, the autoscaler scale up was triggered by the remote execution attempt of a pipeline
Okay great! I haven't tested it yet but I was wondering while writing my pipeline
looks like the user running your clearML agent is not added to the docker group
Does it happens for all your packages or for a specific one ?
Hey CostlyOstrich36 did you find anything on interest on the issue ?
Okay thanks! Please keep me posted when the hotfix is out on the SaaS
Looks like you need the https://clear.ml/docs/latest/docs/clearml_serving/clearml_serving and https://clear.ml/docs/latest/docs/pipelines/pipelines features with a https://clear.ml/pricing/ in SaaS deployment so you can use the https://clear.ml/docs/latest/docs/webapp/applications/apps_gcp_autoscaler to manage the workers for you
Have you identified yet if it was a strictly internal issue or should I continue my investigation on my side ?
Yes but not in the controller itself, which is also remotely executed in a docker container
I'm considering doing a PR in a few days to add the param if it is not too complex
AnxiousSeal95 Okay it seems to work with a compute optimized c2-standard-4
instance
Ohwow, okay Ill test it with another type
Nice, thank you for the reactivity ❤
Ah thank you I'll try that ASAP
I have a pipeline with a single component:
` @PipelineDecorator.component(
return_values=['dataset_id'],
cache=True,
task_type=TaskTypes.data_processing,
execution_queue='Quad_VCPU_16GB'
)
def generate_dataset(start_date: str, end_date: str, input_aws_credentials_profile: str = 'default'):
"""
Convert autocut logs from a specified time window into usable dataset in generic format.
"""
print('[STEP 1/4] Generating dataset from autocut logs...')
import os
...
Btw AgitatedDove14 is there a way to define parallel tasks and use pipeline as an acyclic compute graph instead of simply sequential tasks ?
As opposed to the Controller/Task component where the add_step()
only allows to sequentially execute them
Nice, that's a great feature! I'm also trying to have a component executing Giskard QA test suites on model and data, is there a planned feature when I can suspend execution of the pipeline, and display on the UI that this pipeline "steps" require a human confirmation to go on or stop while displaying arbitrary text/plot information ?
We're using Ray for hyperparameter search for non-CV model successfully on ClearML
Are you executing your script using the right python interpreter ?
/venv/bin/python my_clearml_script.py
No I was was pointing out the lack of one, but turns out on some model the iteration is so slow even on GPUs when training on a lots of time serie that you have to set the pytorch lightning trainer argument log_every_n_steps
to 1
(default 50
) to prevent the ClearML iteration logger from timing-out
Hey Mathias,
The project SDK is pretty barebone and according to the doc you should use the REST API for further actions , the simplest approach would be to simply use the project id on the POST /projects.get_by_id
endpoint .
Best regards,
Yup we too had to implement a lots of little things for ClearML in our tooling library due to it being pretty bare bone in some area