Hello! I Am Trying To Play Around With The Platform In Order To Gain Some Understanding Of It. I Am Using This Example:

Answered

Hello! I am trying to play around with the platform in order to gain some understanding of it. I am using this example:

https://github.com/allegroai/clearml/tree/master/examples/pipeline

I have been able to make it work, running the agent in my laptop and using the demo server. I have run the pipeline twice, with the same parameters, and I see some things that confuse me a little bit:
Both runs have generated different copies of the iris dataset (with different prefix, that I guess should be related with the task_id, althogh I cannot establish the connection). The same goes for X_test, y_test, etc. Both have been generated. However, the model has been overwritten. I guess this is due to this instruction: joblib.dump(model, 'model.pkl', compress=True)Maybe this is the normal way to go, but I wish I could understand better why. I also don't understand how thas ClearML know that the model dumped is what should be registered as the "output model" of the task.

If I wanted to reuse the previous tasks outputs (in case neither code nor parameters nor data has changed), as I said in my conversation with Martin last week, how could I change the pipline_controller.py script?

I am sorry for asking basic questions...

  				
Posted 
	4 years ago

					More  		
  Report
		
					ShinyWhale52
				
					0
					 × 1

Votes Newest

Answers 2

Thank you very much, Martin. Step by step I am understanding better the platform (and the more I do, the more I like it!). If you don't mind, I will write down a summary of a use case for the reusing of Tasks, taken from a recent project I made using Luigi.

  				
Posted 
	4 years ago

					More  		
  Report
		
					ShinyWhale52
				
					0
					 × 1

Hi ShinyWhale52
Every execution of the pipeline (by definition) will create a new job based on the pipeline steps
This is the reason you see all the steps twice (the default assumption is you wish to re-run the step, as this is part of the processing workflow (e.g. training a model)

the model has been overwritten. I guess this is due to this instruction:

This is because you are storing it locally to the same path, it just reflects the fact you just overwrote your model.
To create a new unique copy of the model on the clearml-server (or any other object storage),
pass the output_uri to the Task.init call in the specific step (or configure a default_oiutput_uri in the clearml.conf of the agent)
Task.init(..., output_uri='s3://my_bucket/storage')or to store on the clearml-server:
Task.init(..., output_uri=True) # could also be `

If I wanted to reuse the previous tasks outputs (in case neither code nor parameters nor data has changed), as I said in my conversation with Martin last week, how could I change the pipline_controller.py script?

So the question is what exactly is the logic for reusing Tasks ?
I this is like a parameter for the "Dataset" then adding a parameter to the Pipeline itself makes a lot of sense (the pipeline is also a Task so we can add arguments that we can later control from the UI).
wdyt?

  				
Posted 
	4 years ago

					More  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Write your answer

1K Views

2 Answers

4 years ago

2 years ago