Reputation
Badges 1
164 × Eureka!I believe n1-standard-8
would work for that. I initially just tried going with the autoscaler defaults which has gpu on but that n1-standard-1
specified as the machine
I'll give it a try.
And if I wanted to support GPU in the default
queue, are you saying that I'd need a different machine from the n1-standard-1
?
switching the base image seems to have failed with the following error :2022-07-13 14:31:12 Unable to find image 'nvidia/cuda:10.2-runtime-ubuntu18.04' locally
attached is a pipeline task log file
Is there any chance the experiment itself has a docker image specified?
It does not as far as I know. The decorators do not have docker fields specified
Trying to switch to a resources using gpu-enabled VMs failed with that same error above.
Looking at spawned VMs, they were spawned by the autoscaler without gpu even though I checked that my settings ( n1-standard-1
and nvidia-tesla-t4
and https://console.cloud.google.com/compute/imagesDetail/projects/ml-images/global/images/c0-deeplearning-common-cu113-v20220701-debian-10?project=ml-tooling-test-external image for the VM) can be used to make vm instances and my gcp autoscaler...
I'm on clearml 1.6.2
The jupyter notebook service and two clear-ml agents ( version1.3.0, one in queue "default" and one in queue "services" and with --cpu-only flag) ) are all running inside a docker container
If I run from terminal, I see:ValueError: Task object can only be updated if created or in_progress [status=stopped fields=['configuration']]
Hi Martin. See that ValueError
https://clearml.slack.com/archives/CTK20V944/p1657583310364049?thread_ts=1657582739.354619&cid=CTK20V944 Perhaps something else is going on?
actually, re-running pipeline_from_decorator.py
a second time (and a third time) from the command line seem to have executed without the that ValueError so maybe that issue was some fluke.
Nevertheless, those runs exit prior to lineprint('process completed')
and I would definitely prefer the command executing_pipeline
to not kill the process that called it.
For example, maybe, having started the pipeline I'd like my code to also report having started the pipeline to som...
What I think would be preferable is that the pipeline be deployed and that the python process that deployed it were allowed to continue on to whatever I had planned for it to do next (i.e. not exit)
first, thanks for having these discussions. I appreciate this kind of support is an effort 🙏
Yes. i perfectly understand that once a pipeline job (or a task) is sent off in this manner, it executes separately (and, most likely in a different machine) from the process that instantiated it.
I still feel strongly that such a command should not be thought of as a fire and exit operation. I can think of several scenarios where continued execution of the instantiating process is desired:
I ...
Hmm interesting, so like a callback?!
like https://github.com/allegroai/clearml/blob/bca9a6de3095f411ae5b766d00967535a13e8401/examples/pipeline/pipeline_from_tasks.py#L54-L55 pipe-step level callbacks? I guess that mechanism could serve. Where do these callbacks run? In the instantiating process? If so, that would work (since the callback function can be any code I wish, right?)
I might want to dispatch other jobs from within the same process.
This is actually something t...
Thanks ! 🎉
I'll give it a try.
I think that clearml should be able to do parameter sweeps using pipelines in a manner that makes use of parallelisation.
If that's not happening with the new RC, I wonder how I would do a parameter sweep within the pipelines framework.
For example - how would this task-based example be done with pipelines?
https://github.com/allegroai/clearml/blob/master/examples/automation/manual_random_param_search_example.py
I'm thinking of a case where you want t...
Thanks for the fix and the mock HPO example code !
Pipeline behaviour with the fix is looking good.
I see the point about changes to data inside the controller possibly causing dependencies for step 3 (or, at least, making it harder for the interpreter to know).
Sure. It is a minor change from the code in the clearml examples for pipelines.
I just repeat the last two pipeline steps from that code in a loop (x3)
https://github.com/allegroai/clearml/blob/master/examples/pipeline/pipeline_from_decorator.py
I'm connecting to the hosted clear.ml
packages in use are:# Python 3.8.10 (default, Mar 15 2022, 12:22:08) [GCC 9.4.0] clearml == 1.6.2 fastai == 2.7.5
in case it matters, I'm running this code in a jupyter notebook within a docker container (to keep things vell isolated). The /data
path is volume mapped to my local filesystem (and, in fact, already contains the dataset files, so the fastai call to untar_data should see the data there and return immediately)
That same make_data fu...
The pipeline eventually completed after ~20 minutes and the log shows it has downloaded a 755mb file.
I can also download the zip file from the artifacts tab for the component now.
Why is the data being up/down loaded? Can I prevent that?
I get that clearml likes to take good care of my data but I must be doing something wrong here as it doesn't make sense for a dataset to be uploaded to files.clear.ml
.
Note that if I change the component to return a regular meaningless string - "mock_path"
, the pipeline completes rather quickly and the dataset is not uploaded.
Thanks TimelyPenguin76 .
From your reply I understand that I have control over what the destination is but that all files generated in a task get transferred regardless of the return_values
decorator argument. Is that correct? Can I disable auto-save of artifacts?
Ideally, I'd like to have better control over what gets auto-saved. E.g. I'm happy for tensorboard events to be captured and shown in clearml and for matplotlib figures to be uploaded (perhaps to gcs) but I'd like to avoid ...
Hi again.
Thanks for the previous replies and links but I haven't been able to find the answer to my question: How do I prevent the content of a uri returned by a task from being saved by clearml at all.
I'm using this simplified snippet (that avoids fastai and large data)
` from clearml.automation.controller import PipelineDecorator
from clearml import TaskTypes
@PipelineDecorator.component(
return_values=["run_datasets_path"], cache=False, task_type=TaskTypes.data_processing
)
def ma...
anyhow - looks like the keys are simple enough to use (so I can just ignore the model names)
trying the AWS Autoscaler for the first time I get his error on instance spin up:An error occurred (InvalidAMIID.NotFound) when calling the RunInstances operation: The image id '[ami-04c0416d6bd8e4b1f]' does not exist
I tried both us-west-2
and us-east-1b
(thinking it might be zone specific).
I'm not sure if this is a permissions issue or a config issue.
The same occures when I try a different image:ami-06bafe528da33cdb8
(an aws public image)
Just updating here that I got the AWS autoscaler working with CostlyOstrich36 ’s generous help 🎉
I thought I'd share here some details in case others experience similar difficulties
With regards to permissions, this is the list of actions that the autoscaler uses which your aws account would need to permit:GetConsoleOutput RequestSpotInstances DescribeSpotInstanceRequests RunInstances DescribeInstances TerminateInstances DescribeInstances
the instance image ` ami-04c0416d6bd8e...
Hi John. sort of. It seems that archiving pipelines does not also archive the tasks that they contain so /projects/lavi-testing/.pipelines/fastai_image_classification_pipeline
is a very long list..
cool. How can I get started with hyper datasets? is it part of the clearml package?
Is it limited to https://clear.ml/pricing/?gclid=Cj0KCQjw5ZSWBhCVARIsALERCvzehkqVOiqJPaum5fsVyyTNMKce91PBHZd1IhQpEFaKvV7toze2A_0aAgXXEALw_wcB accounts?
This idea seems to work.
I tested this for a scenario where data is periodically added to a dataset and, to "version" the steps, I create a new dataset with the old as parent:
To do so, I split a set of image files into separate folders (pets_000, pets_001, ... pets_015), each with 500 image files
I then run the code here to make the datasets.
thanks. Seems like I was on the right path. Do datasets specified as parents need to be https://clear.ml/docs/latest/docs/clearml_data/clearml_data_sdk/#finalizing-a-dataset ?
oops, I deleted two messages here because I had a bug in a test I've done.
I'm retesting now