Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
Hi. I Have A Few Questions About The Snippet Attached

Hi.
I have a few questions about the snippet attached
re-running this code produces the same printouts... I chose 47 out of 100 in the pipeline ... I chose 80 out of 100 within componentIt it because the seed is reset? How would I randomise the results in the pipeline and in the component?
If I don't specify the type for N in the component I get an error because N is interpreted as a string. Do component definitions expect typing hints? What types are accepted in these definitions? If I un-comment the last two lines and rerun this script, the second pipeline call results in an error: KeyError: 'random_number_component'why is that?

  
  
Posted 2 years ago
Votes Newest

Answers 11


Re
re-running this code produces the same printoutsI guess repeatable behaviour is a great default to have for, well, repeatability 🙂

I'm able to "randomize" my results by adding a seed pipeline argument and calling random.seed(seed)
within the pipeline and component. Results then change with change of seed.

I think most veteran ML practitioners are bitten at some point by randomising when they shouldn't and not randomising when they should. It would be nice to have some documentation proclaiming how randomness behaves when running tasks (in all their variations). E.g. Should I trust seeds to be reset or should I not assume anything and do my own control over seeds.

  
  
Posted 2 years ago

It would be nice to have some documentation proclaiming how randomness behaves when running tasks (in all their variations). E.g. Should I trust seeds to be reset or should I not assume anything and do my own control over seeds.

That is a good point, I'll make sure we mention it somewhere in the docs. Any thoughts on where?

  
  
Posted 2 years ago

re-running this code produces the same printoutsJust to be clear, you are saying the "random" results are consistent over runs ?

If I don't specify the type for N in the component I get an error because N is interpreted as a string.

Yes the default value is used for proper casting, In the next version we will use the type hints for that as well 🙂
If I un-comment the last two lines and rerun this script, the second pipeline call results in an error:I think that If you need multiple pipeline runs you should do:
@PipelineDecorator.pipeline(..., multi_instance_support=True)

  
  
Posted 2 years ago

is some explanation of how functions become pipelines and components.

That is a good point! I'll make sure we mention it in the pipeline section of the docs

This whole experiment with random numbers started as my attempt at verifying that code in clearml.pipeline

Correct 🙂 BTW: this is why we added PipelineDecorator.run_locally() so it is easier to debug, you can also use PipelineDecorator.debug_pipeline() to run the entire pipeline in a single process as python functions

  
  
Posted 2 years ago

Thanks! (and good to know)

  
  
Posted 2 years ago

That is a good point, I'll make sure we mention it somewhere in the docs. Any thoughts on where?

maybe in (all of) these places:
https://clear.ml/docs/latest/docs/faq
https://clear.ml/docs/latest/docs/fundamentals/task
https://clear.ml/docs/latest/docs/clearml_sdk/task_sdk

  
  
Posted 2 years ago

Thanks,

Just to be clear, you are saying the "random" results are consistent over runs ?

yes !
By re-runs I mean re-running this script (not cloning the pipeline)

  
  
Posted 2 years ago

multi_instance_support=True lets me run the pipeline again 👍
The second run prints out the same (non) "random" numbers as the first run

  
  
Posted 2 years ago

The second run prints out the same (non) "random" numbers as the first run

ClearML sets the initial random seed for you, basically trying to help with reproducibility. That said inside the function you can always do:
import random import time random.seed(time.time())

  
  
Posted 2 years ago

perhaps anecdotal but just calling random.seed() will set the seed using the system time for you
https://docs.python.org/3/library/random.html#random.seed

  
  
Posted 2 years ago

Something else that I feel is missing from the docs regarding pipelines, as someone who has given kubeflow pipelines a try (in the http://vertex.ai pipelines environment), is some explanation of how functions become pipelines and components.
More specifically, I've learned to watch out for kubeflow pipeline code which is run at definition time (at compilation time, to be more accurate) instead of at pipeline execution time.

This whole experiment with random numbers started as my attempt at verifying that code in clearml.pipeline is executed at pipeline execution time 😝

  
  
Posted 2 years ago
1K Views
11 Answers
2 years ago
one year ago
Tags