Hi. I Have A Few Questions About The Snippet Attached

Answered

Hi.
I have a few questions about the snippet attached
re-running this code produces the same printouts... I chose 47 out of 100 in the pipeline ... I chose 80 out of 100 within componentIt it because the seed is reset? How would I randomise the results in the pipeline and in the component?
If I don't specify the type for N in the component I get an error because N is interpreted as a string. Do component definitions expect typing hints? What types are accepted in these definitions? If I un-comment the last two lines and rerun this script, the second pipeline call results in an error: KeyError: 'random_number_component'why is that?

  				
Posted 
	2 years ago

					More  		
  Report
		
					PanickyMoth78
				
					0
					 × 1

Votes Newest

Answers 11

multi_instance_support=True lets me run the pipeline again 👍
The second run prints out the same (non) "random" numbers as the first run

  				
Posted 
	2 years ago

					More  		
  Report
		
					PanickyMoth78
				
					0
					 × 1

That is a good point, I'll make sure we mention it somewhere in the docs. Any thoughts on where?

maybe in (all of) these places:
https://clear.ml/docs/latest/docs/faq
https://clear.ml/docs/latest/docs/fundamentals/task
https://clear.ml/docs/latest/docs/clearml_sdk/task_sdk

  				
Posted 
	2 years ago

					More  		
  Report
		
					PanickyMoth78
				
					0
					 × 1

Re
re-running this code produces the same printoutsI guess repeatable behaviour is a great default to have for, well, repeatability 🙂

I'm able to "randomize" my results by adding a seed pipeline argument and calling random.seed(seed)
within the pipeline and component. Results then change with change of seed.

I think most veteran ML practitioners are bitten at some point by randomising when they shouldn't and not randomising when they should. It would be nice to have some documentation proclaiming how randomness behaves when running tasks (in all their variations). E.g. Should I trust seeds to be reset or should I not assume anything and do my own control over seeds.

  				
Posted 
	2 years ago

					More  		
  Report
		
					PanickyMoth78
				
					0
					 × 1

Something else that I feel is missing from the docs regarding pipelines, as someone who has given kubeflow pipelines a try (in the http://vertex.ai pipelines environment), is some explanation of how functions become pipelines and components.
More specifically, I've learned to watch out for kubeflow pipeline code which is run at definition time (at compilation time, to be more accurate) instead of at pipeline execution time.

This whole experiment with random numbers started as my attempt at verifying that code in clearml.pipeline is executed at pipeline execution time 😝

  				
Posted 
	2 years ago

					More  		
  Report
		
					PanickyMoth78
				
					0
					 × 1

perhaps anecdotal but just calling random.seed() will set the seed using the system time for you
https://docs.python.org/3/library/random.html#random.seed

  				
Posted 
	2 years ago

					More  		
  Report
		
					PanickyMoth78
				
					0
					 × 1

Thanks! (and good to know)

  				
Posted 
	2 years ago

					More  		
  Report
		
					PanickyMoth78
				
					0
					 × 1

It would be nice to have some documentation proclaiming how randomness behaves when running tasks (in all their variations). E.g. Should I trust seeds to be reset or should I not assume anything and do my own control over seeds.

That is a good point, I'll make sure we mention it somewhere in the docs. Any thoughts on where?

  				
Posted 
	2 years ago

					More  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Thanks,

Just to be clear, you are saying the "random" results are consistent over runs ?

yes !
By re-runs I mean re-running this script (not cloning the pipeline)

  				
Posted 
	2 years ago

					More  		
  Report
		
					PanickyMoth78
				
					0
					 × 1

The second run prints out the same (non) "random" numbers as the first run

ClearML sets the initial random seed for you, basically trying to help with reproducibility. That said inside the function you can always do:
import random import time random.seed(time.time())

  				
Posted 
	2 years ago

					More  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

re-running this code produces the same printoutsJust to be clear, you are saying the "random" results are consistent over runs ?

If I don't specify the type for N in the component I get an error because N is interpreted as a string.

Yes the default value is used for proper casting, In the next version we will use the type hints for that as well 🙂
If I un-comment the last two lines and rerun this script, the second pipeline call results in an error:I think that If you need multiple pipeline runs you should do:
@PipelineDecorator.pipeline(..., multi_instance_support=True)

  				
Posted 
	2 years ago

					More  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

is some explanation of how functions become pipelines and components.

That is a good point! I'll make sure we mention it in the pipeline section of the docs

This whole experiment with random numbers started as my attempt at verifying that code in clearml.pipeline

Correct 🙂 BTW: this is why we added PipelineDecorator.run_locally() so it is easier to debug, you can also use PipelineDecorator.debug_pipeline() to run the entire pipeline in a single process as python functions

  				
Posted 
	2 years ago

					More  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Write your answer

1K Views

11 Answers

2 years ago