Reputation
Badges 1
383 × Eureka!Can you point me at relevant code in ClearML for the autoconnect so that I can understand exactly what's happening
Would adding support for some sort of post task script help? Is something already there?
Gitlab has support for S3 based cache btw.
Task.add_requirements
would fit the bill yeah, thanks
This commit doesnât really fix the issue - https://github.com/allegroai/clearml/commit/189a2b54dec071ec5d58835aef526bfaf5842155
I use a custom helm chart and terraform helm provider for these things
Not really right? They deprecated a param which wasnât removed in that commit i mentioned above
Thanks CostlyOstrich36
I just run the k8s daemon with a simple helm chart and use it with terraform with the helm provider. Nothing much to share as itâs just a basic chart đ
I was having this confusion as well. Did behavior for execute_remote change that it used to be Draft is Aborted now?
Any specific use case for the required âdraftâ mode?
Nothing except that Draft makes sense feels like the task is being prepped and Aborted feels like something went wrong
I would prefer controlled behavior than some available version being used. Here triggered a bunch of jobs that all went fine and even evaluations were fine and then when we triggered a inference deploy it failed
In this case, particularly because of pickle protocol version between 3.7 and 3.8
that or in clearml.conf or both
But it seems to make the current task the data processing task. I don't want it to take over the task.
Cool, didn't know it was disabled. This exact reason was why I created a wrapper over ClearML for my use so that people don't ever accidentally talk to demo server
AgitatedDove14 - this was an interesting one. I think I have found the issue, but verifying the fix as of now.
One of the devs was using shutil.copy2
to copy parts of dataset to a temporary directory in a with
block - something like:
with TemporaryDirectory(dir=temp_dir) as certificates_directory: for file in test_paths: shutil.copy2(f"{dataset_local}/{file}", f"{certificates_directory}/file")
My suspicion is since copy2 copies with full data and symlin...
Fix - use shutil.copy
instead of shutil.copy2
- verifying now.
Also itâs not happening when running locally, but only in remote on a agent
Will try it out. A weird one this.
Only one. Will replicate it in detail and see whatâs actually up
pipeline code itself is pretty standard
` pipe = PipelineController(
default_execution_queue="minerva-default",
add_pipeline_tags=True,
target_project=pipelines_project,
)
for step in self.config["steps"]:
name = self._experiment_name(step)
pipe.add_step(
name=name,
base_task_project=pipelines_project,
base_task_name=name,
parents=self._get_parents(step),
task_overrides...
AgitatedDove14 - are there cases when it tries to skip steps?