Reputation
Badges 1
131 × Eureka!Are you executing your script using the right python interpreter ?
/venv/bin/python my_clearml_script.py
Takling about that decorator which shouyld also have a docker_arg param since it is executed as an "orchestration component" but the param is missing https://clear.ml/docs/latest/docs/references/sdk/automation_controller_pipelinecontroller/#pipelinedecoratorpipeline
This is funny cause the auto-scaler on GPU instances is working fine, but as the backtrace suggests it seems to be linked to this instance family
(currently I am a SaaS customer in Pro tier)
Thanks @<1523701435869433856:profile|SmugDolphin23> , tho are you sure I don't need to override the deserialization function even if I pass multiple distinct objects as a tuple ?
You can enable the logarithmic scale in the graph settings if I remember correctly
Did you properly install Docker and Docker nvidia toolkit ? here's the init script i'm using on my autoscaled workers:
#!/bin/sh
sudo apt-get update -y
sudo apt-get install -y \
ca-certificates \
curl \
gnupg \
lsb-release
sudo mkdir -p /etc/apt/keyrings
curl -fsSL
| sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg
echo \
"deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg]
\
$(lsb_release -cs) stable" | s...
I suppose your worker is not persistent, so I might suggest having a very cheap instance as a persistent worker where you have your dataset persistently synced using . https://clear.ml/docs/latest/docs/references/sdk/dataset/#sync_folder and then taking the subset of files that interests you and pushing it as a different dataset, marking it as a subset of your main dataset id using a tag
AnxiousSeal95 Okay it seems to work with a compute optimized c2-standard-4
instance
Well I simply duplicated code across my components instead of centraliwing the operations that needed that env variable in the controller
It doesnt seem so if you look at the REST api documentation, might be available as an ENterprise plan feature
This is an instance than I launched like last week and was running fine until now, the version is v1.6.0-335
(if for instance in wanna pull a yolov5
repo in the retraining component)
Okay the force_store_standalone_script()
works
Ah thank you I'll try that ASAP
Well aside from the abvious removal of the line PipelineDecorator.run_locally()
on both our sides, the decorators arguments seems to be the same:@PipelineDecorator.component( return_values=['dataset_id'], cache=True, task_type=TaskTypes.data_processing, execution_queue='Quad_VCPU_16GB', repo=False )
And my pipeline controller:
` @PipelineDecorator.pipeline(
name="VINZ Auto-Retrain",
project="VINZ",
version="0.0.1",
pipeline_execution_queue="Quad_V...
Well its not working, this params seems to be used to override the repo to pull since it has a str type annotation anyway, ClearML still attempted to pull the repo
If you feel you have a specific enough issue you can also post a github issue and link this thread to it
Oh, it's a little strange the comment lines about it were in the agent section
@<1523701087100473344:profile|SuccessfulKoala55> here you go
The default compression parameter value is ZIP_MINIMAL_COMPRESSION
, I guess you could try to check if there is a Tarball only option but anyway most of the CPU time took by the upload process is the generation of the hashes of the file entries
Thanks a lot @<1523701435869433856:profile|SmugDolphin23> ❤
Well given a file architecture looking like this:|_ __init__.py |_ my_pipeline.py |_ my_utils.py
With the content of my_pipeline.py
being:
` from clearml.automation.controller import PipelineDecorator
from clearml import Task, TaskTypes
from my_utils import do_thing
Task.force_store_standalone_script()
@PipelineDecorator.component(...)
def my_component(dataset_id: str):
import pandas as pd
from clearml import Dataset
dataset = Dataset.get(dataset_id=input_dataset_id...
Okay looks like the call dependency resolver does not supports cross-file calls and relies instead on the local repo cloning feature to handle multiple files so the Task.force_store_standalone_script()
does not allow for a pipeline defined cross multiple files (now that you think of it it was kinda implied by the name), but what is interesting is that calling an auxiliary function in the SAME file from a component also raise a NameError: <function_name> is not defined
, that's ki...
Well it is also failing within the same file if you read until the end, but for the cross-file issue, it's mostly because of my repo architecture organized in a v1/v2 scheme and I didn't want to pull a lot of unused files and inject github PATs that frankly lack gralunarity in the worker
I can test it empirically but I want to be sure what is the expected behavior so my pipeline don't get auto-magically broken after a patch
I would try to not run it locally but in your execution queues on a remote worker, if that's not it it is likely a bug