Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
Hello! I Am Trying To Play Around With The Platform In Order To Gain Some Understanding Of It. I Am Using This Example:

Hello! I am trying to play around with the platform in order to gain some understanding of it. I am using this example:

https://github.com/allegroai/clearml/tree/master/examples/pipeline

I have been able to make it work, running the agent in my laptop and using the demo server. I have run the pipeline twice, with the same parameters, and I see some things that confuse me a little bit:
Both runs have generated different copies of the iris dataset (with different prefix, that I guess should be related with the task_id, althogh I cannot establish the connection). The same goes for X_test, y_test, etc. Both have been generated. However, the model has been overwritten. I guess this is due to this instruction: joblib.dump(model, 'model.pkl', compress=True)Maybe this is the normal way to go, but I wish I could understand better why. I also don't understand how thas ClearML know that the model dumped is what should be registered as the "output model" of the task.

If I wanted to reuse the previous tasks outputs (in case neither code nor parameters nor data has changed), as I said in my conversation with Martin last week, how could I change the pipline_controller.py script?

I am sorry for asking basic questions...

  
  
Posted 3 years ago
Votes Newest

Answers 2


Hi ShinyWhale52
Every execution of the pipeline (by definition) will create a new job based on the pipeline steps
This is the reason you see all the steps twice (the default assumption is you wish to re-run the step, as this is part of the processing workflow (e.g. training a model)

the model has been overwritten. I guess this is due to this instruction:

This is because you are storing it locally to the same path, it just reflects the fact you just overwrote your model.
To create a new unique copy of the model on the clearml-server (or any other object storage),
pass the output_uri to the Task.init call in the specific step (or configure a default_oiutput_uri in the clearml.conf of the agent)
Task.init(..., output_uri='s3://my_bucket/storage')or to store on the clearml-server:
Task.init(..., output_uri=True) # could also be `

If I wanted to reuse the previous tasks outputs (in case neither code nor parameters nor data has changed), as I said in my conversation with Martin last week, how could I change the pipline_controller.py script?

So the question is what exactly is the logic for reusing Tasks ?
I this is like a parameter for the "Dataset" then adding a parameter to the Pipeline itself makes a lot of sense (the pipeline is also a Task so we can add arguments that we can later control from the UI).
wdyt?

  
  
Posted 3 years ago

Thank you very much, Martin. Step by step I am understanding better the platform (and the more I do, the more I like it!). If you don't mind, I will write down a summary of a use case for the reusing of Tasks, taken from a recent project I made using Luigi.

  
  
Posted 3 years ago
538 Views
2 Answers
3 years ago
one year ago
Tags
Similar posts