Reputation
Badges 1
166 × Eureka!perhaps anecdotal but just calling random.seed()
will set the seed using the system time for you
https://docs.python.org/3/library/random.html#random.seed
Hmm interesting, so like a callback?!
like https://github.com/allegroai/clearml/blob/bca9a6de3095f411ae5b766d00967535a13e8401/examples/pipeline/pipeline_from_tasks.py#L54-L55 pipe-step level callbacks? I guess that mechanism could serve. Where do these callbacks run? In the instantiating process? If so, that would work (since the callback function can be any code I wish, right?)
I might want to dispatch other jobs from within the same process.
This is actually something t...
Re
re-running this code produces the same printoutsI guess repeatable behaviour is a great default to have for, well, repeatability 🙂
I'm able to "randomize" my results by adding a seed
pipeline argument and calling random.seed(seed)
within the pipeline and component. Results then change with change of seed.
I think most veteran ML practitioners are bitten at some point by randomising when they shouldn't and not randomising when they should. It would be nice to have some docu...
Thanks ! 🎉
I'll give it a try.
I think that clearml should be able to do parameter sweeps using pipelines in a manner that makes use of parallelisation.
If that's not happening with the new RC, I wonder how I would do a parameter sweep within the pipelines framework.
For example - how would this task-based example be done with pipelines?
https://github.com/allegroai/clearml/blob/master/examples/automation/manual_random_param_search_example.py
I'm thinking of a case where you want t...
Something else that I feel is missing from the docs regarding pipelines, as someone who has given kubeflow pipelines a try (in the http://vertex.ai pipelines environment), is some explanation of how functions become pipelines and components.
More specifically, I've learned to watch out for kubeflow pipeline code which is run at definition time (at compilation time, to be more accurate) instead of at pipeline execution time.
This whole experiment with random numbers started as my attempt ...
Is there any chance the experiment itself has a docker image specified?
It does not as far as I know. The decorators do not have docker fields specified
If I run from terminal, I see:ValueError: Task object can only be updated if created or in_progress [status=stopped fields=['configuration']]
Hi John. sort of. It seems that archiving pipelines does not also archive the tasks that they contain so /projects/lavi-testing/.pipelines/fastai_image_classification_pipeline
is a very long list..
did you mean that I was running in CPU mode? I'll tried both but I'll try cpu mode with that base docker image
Thanks AgitatedDove14
setting max_workers to 1 prevents the error (but, I assume, it may come the cost of slower sequential uploads).
My main concern now is that this may happen within a pipeline leading to unreliable data handling.
If Dataset.upload()
does not crash or return a success value that I can check and if Dataste.get_local_copy()
also does not complain as it retrieves partial data - how will I ever know that I lost part of my dataset?
Trying to switch to a resources using gpu-enabled VMs failed with that same error above.
Looking at spawned VMs, they were spawned by the autoscaler without gpu even though I checked that my settings ( n1-standard-1
and nvidia-tesla-t4
and https://console.cloud.google.com/compute/imagesDetail/projects/ml-images/global/images/c0-deeplearning-common-cu113-v20220701-debian-10?project=ml-tooling-test-external image for the VM) can be used to make vm instances and my gcp autoscaler...
Hi TimelyPenguin76
Thanks for working on this. The clearml gcp autoscaler is a major feature for us to have. I can't really evaluate clearml without some means of instantiating multiple agents on GCP machines and I'd really prefer not to have to set up a k8 cluster with agents and manage scaling it myself.
I tried the settings above with two resources, one for default queue and one for the services queue (making sure I use that image you suggested above for both).
The autoscaler started up...
I can't find version 1.8.1rc1
but I believe I see a relevant change in code of Dataset.upload
in 1.8.1rc0
I have a task where I create a dataset but I also create a set of matplotlib figures, some numeric statistics and a pandas table that describe the data which I wish to have associated with the dataset and vieawable from the clearml web page for the dataset.
For componenttask=Task.current_task()
Will get me the task object. (right?)
This does not work for pipeline. Is pipeline a task?
Edit: The same works for pipeline
Here are screen shots of a VM I started with a gpu and one stared by the autoscaler with the setting above but whose GPU is missing (both in the zame gcp zone, us-central1-f ) . I may have misconfigured something or perhaps the autoscaler is failing to specify the GPU requirement correctly. :shrug:
oops, should it have been multi_instance_support=True
?
I'll try a more carefully checked run a bit later but I know it's getting a bit late in your time zone
actually, re-running pipeline_from_decorator.py
a second time (and a third time) from the command line seem to have executed without the that ValueError so maybe that issue was some fluke.
Nevertheless, those runs exit prior to lineprint('process completed')
and I would definitely prefer the command executing_pipeline
to not kill the process that called it.
For example, maybe, having started the pipeline I'd like my code to also report having started the pipeline to som...
You can have
parents
as one of the
@PipelineDecorator.component
args. The step will be executed only after all the
parents
are executed and completed
Is there an example of using parents some place? Im not sure what to pass and also, how to pass a component from one pipeline that was just kicked off to execute remotely (which I'd like to block on) to a component of the next pipeline's run
Thanks,
Just to be clear, you are saying the "random" results are consistent over runs ?
yes !
By re-runs I mean re-running this script (not cloning the pipeline)
I'm on clearml 1.6.2
The jupyter notebook service and two clear-ml agents ( version1.3.0, one in queue "default" and one in queue "services" and with --cpu-only flag) ) are all running inside a docker container
I was doing it with the task that I had been using. Mostly for logging arguments that control what the dataset will contain.
on the same topic. What if (I were able to iterate and) I wanted the pipelines calls to be blocking so that the next pipeline executes only after the previous one completes?
Yeah. I was only using the task for the process of creating the dataset.
My code does start out with a step that checks for the existence of the dataset, returning it if it exists (search by project name/dataset name/version) rather than recreating it.
I noticed the name mismatch when that check kept failing me...
I think that init-ing the encompassing task with the relevant dataset name still allows me to search for the dataset by dataset_name=task_name / project_name (shared by both datas...
What I think would be preferable is that the pipeline be deployed and that the python process that deployed it were allowed to continue on to whatever I had planned for it to do next (i.e. not exit)
I believe n1-standard-8
would work for that. I initially just tried going with the autoscaler defaults which has gpu on but that n1-standard-1
specified as the machine
I have google-cloud-storage==2.6.0
installed