Re
re-running this code produces the same printoutsI guess repeatable behaviour is a great default to have for, well, repeatability 🙂
I'm able to "randomize" my results by adding a seed
pipeline argument and calling random.seed(seed)
within the pipeline and component. Results then change with change of seed.
I think most veteran ML practitioners are bitten at some point by randomising when they shouldn't and not randomising when they should. It would be nice to have some documentation proclaiming how randomness behaves when running tasks (in all their variations). E.g. Should I trust seeds to be reset or should I not assume anything and do my own control over seeds.
It would be nice to have some documentation proclaiming how randomness behaves when running tasks (in all their variations). E.g. Should I trust seeds to be reset or should I not assume anything and do my own control over seeds.
That is a good point, I'll make sure we mention it somewhere in the docs. Any thoughts on where?
re-running this code produces the same printoutsJust to be clear, you are saying the "random" results are consistent over runs ?
If I don't specify the type for N in the component I get an error because N is interpreted as a string.
Yes the default value is used for proper casting, In the next version we will use the type hints for that as well 🙂
If I un-comment the last two lines and rerun this script, the second pipeline call results in an error:I think that If you need multiple pipeline runs you should do:@PipelineDecorator.pipeline(..., multi_instance_support=True)
is some explanation of how functions become pipelines and components.
That is a good point! I'll make sure we mention it in the pipeline section of the docs
This whole experiment with random numbers started as my attempt at verifying that code in clearml.pipeline
Correct 🙂 BTW: this is why we added PipelineDecorator.run_locally()
so it is easier to debug, you can also use PipelineDecorator.debug_pipeline()
to run the entire pipeline in a single process as python functions
That is a good point, I'll make sure we mention it somewhere in the docs. Any thoughts on where?
maybe in (all of) these places:
https://clear.ml/docs/latest/docs/faq
https://clear.ml/docs/latest/docs/fundamentals/task
https://clear.ml/docs/latest/docs/clearml_sdk/task_sdk
Thanks,
Just to be clear, you are saying the "random" results are consistent over runs ?
yes !
By re-runs I mean re-running this script (not cloning the pipeline)
multi_instance_support=True
lets me run the pipeline again 👍
The second run prints out the same (non) "random" numbers as the first run
The second run prints out the same (non) "random" numbers as the first run
ClearML sets the initial random seed for you, basically trying to help with reproducibility. That said inside the function you can always do:import random import time random.seed(time.time())
perhaps anecdotal but just calling random.seed()
will set the seed using the system time for you
https://docs.python.org/3/library/random.html#random.seed
Something else that I feel is missing from the docs regarding pipelines, as someone who has given kubeflow pipelines a try (in the http://vertex.ai pipelines environment), is some explanation of how functions become pipelines and components.
More specifically, I've learned to watch out for kubeflow pipeline code which is run at definition time (at compilation time, to be more accurate) instead of at pipeline execution time.
This whole experiment with random numbers started as my attempt at verifying that code in clearml.pipeline is executed at pipeline execution time 😝