What Happens If The Task.Init Doesn'T Happen In The Same Py File As The "Data Science" Stuff I Have A List Of Classes That Do The Coding And I Initialise The Task Outside Of Them. Something Like

Answered

What happens if the Task.init doesn't happen in the same py file as the "data science" stuff

I have a list of classes that do the coding and I initialise the task outside of them. Something like
task = Task.init(...) for foo in foos: foo.run()
Where in foo.run() all kind of dataframe/xgboost/sql stuff is happening

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					ConvolutedSealion94
				
					0
					 × 1

Votes Newest

Answers 30

pickle.dump({ 'model': model, 'X_train': X_train, 'Y_train': Y_train, 'X_test': X_test, 'Y_test': Y_test, 'impute_values': impute_values }, open(self.output_filename, 'wb'))

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					ConvolutedSealion94
				
					0
					 × 1

Why do I need an output_uri for the model saving? The dataset API can figure this out on its own

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					ConvolutedSealion94
				
					0
					 × 1

This will allow them to experiment outside of clearml and only switch to it when they are in an OK state. This will also helpnot to pollute clearml spaces with half backed ideas

What's the value of runnign outside of an experiment management context ? don't you want to log it?
There is no real penalty here, no?!

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

if I swap it to Framework then it autocrecords both

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					ConvolutedSealion94
				
					0
					 × 1

apparently it did not work

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					ConvolutedSealion94
				
					0
					 × 1

I think I figured this out but now I have a problem:

auto_connect_frameworks={ 'xgboost': False, 'scikitlearn': False }

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					ConvolutedSealion94
				
					0
					 × 1

So the only thing left is models

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					ConvolutedSealion94
				
					0
					 × 1

That's because then I need to teach every DS how to use the ClearML api

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					ConvolutedSealion94
				
					0
					 × 1

task = Task.init(...) for foo in foos: data = foo.run() for key,value in data.items(): if key in 'auc', 'f1', etc: logger.log(key, value) elif key.startswith('model'): savemodel etc

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					ConvolutedSealion94
				
					0
					 × 1

No, pickling is the only thing that will Not trigger clearml (it is just too generic to automagically log)

So what is the mechanism that you "automagically" pick things up (for information, I don't think this is relevant to our usecase)

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					ConvolutedSealion94
				
					0
					 × 1

but here I can tell them: return a dictionary of what you want to save

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					ConvolutedSealion94
				
					0
					 × 1

I want the model to be stored in a way that clearml-serving can recognise it as a model

Then OutputModel or task.update_output_model(...)
You have to serialize it, in a way that later your code will be able to load it.
With XGBoost, when you do model.save clearml automatically picks and uploads it for you
assuming you created the Task.init(..., output_uri=True)
You can also manually upload the model with task.update_output_model or equivalent with OutputModel class.
if you want to disable the auto logging for xgboost:
task. Task.init(...., auto_connect_frameworks={'xgboost': False,})You can check the docs / docstring for more details on that

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

I solve the artifact,dataset,table,scalar anything by simply making foo.run()
return a dictionary like
{ 'artifact_X_train': X_train, 'table_confusion_matrix: cf, 'dataset_x': _ ... }And then call the appropriate logger or artifact uploader or dataset uploader. (In case the dataset uploader I use the foo.output_file which every foo has.

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					ConvolutedSealion94
				
					0
					 × 1

I didn't realise that pickling is what triggers clearml to pick it up. I am actually saving a dictionary that contains the model as a value (+ training datasets)

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					ConvolutedSealion94
				
					0
					 × 1

BTW you are not exporting Framework in __ init __ so you need to import it like from clearml.model import Framework

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					ConvolutedSealion94
				
					0
					 × 1

What I try to do is that DSes have some lightweight baseclass that is independent of clearml they use and a framework have all the clearml specific code. This will allow them to experiment outside of clearml and only switch to it when they are in an OK state. This will also help not to pollute clearml spaces with half backed ideas

So you want the DS to manually tell the baseclasss what to store ?
then the base class will store it for them, for example with joblib , is this the use case?

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

I didn't realise that pickling is what triggers clearml to pick it up.

No, pickling is the only thing that will Not trigger clearml (it is just too generic to automagically log)

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

ahh, ok, well, I tried to find an example that I can extend but this was the only reference I found: https://github.com/allegroai/clearml/blob/ca384aa75c236e0a8af7c5dd85406a359c3eb703/clearml/model.py#L35

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					ConvolutedSealion94
				
					0
					 × 1

I just disabled all of them with auto_connect_frameworks=False

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					ConvolutedSealion94
				
					0
					 × 1

What I try to do is that DSes have some lightweight baseclass that is independent of clearml they use and a framework have all the clearml specific code. This will allow them to experiment outside of clearml and only switch to it when they are in an OK state. This will also help not to pollute clearml spaces with half backed ideas

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					ConvolutedSealion94
				
					0
					 × 1

but here I can tell them: return a dictionary of what you want to save

If this is the case you have two options, either store the dict as an artifact (this makes sense if this is not standalone model you would like to later use), or store as an artifact.
Artifact example:
https://github.com/allegroai/clearml/blob/master/examples/reporting/artifacts.py
getting them back
https://github.com/allegroai/clearml/blob/master/examples/reporting/artifacts_retrieval.py
Model example:
https://github.com/allegroai/clearml/blob/master/examples/reporting/model_reporting.py

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Is there an explicit OutputModel + xgboost example somewhere?

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					ConvolutedSealion94
				
					0
					 × 1

ConvolutedSealion94 try scikit not scikitlearn
I think we should add a warning if a key is there and is being ignored... let me make sure of that

I just disabled all of them with

auto_connect_frameworks=False

Yep that also works

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

I was just looking at the model example. How does output model store the binary? For example of an xgboost model

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					ConvolutedSealion94
				
					0
					 × 1

TBH ClearML doesn't seem to be picking the model up so I need to do it manually

This is odd, cleamrl will pick framework level serialization, but not just any pickle call

Why do I need an output_uri for the model saving? The dataset API can figure this out on its own

So that it knows where to upload it, if your are setting True this will be the default files server, you can also set iy for shared files system, S3 GCP storage etc.
If no value is passed, it will just log the local file being stored. Make sense ?

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

If I do this it still autorecords the sklearn one

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					ConvolutedSealion94
				
					0
					 × 1

I want the model to be stored in a way that clearml-serving can recognise it as a model

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					ConvolutedSealion94
				
					0
					 × 1

TBH ClearML doesn't seem to be picking the model up so I need to do it manually

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					ConvolutedSealion94
				
					0
					 × 1

I am actually saving a dictionary that contains the model as a value (+ training datasets)

How are you specifically doing that? pickle?

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

So what is the mechanism that you "automagically" pick things up (for information, I don't think this is relevant to our usecase)

If you use joblib.dump (which is like pickle but safer/faster) it will be auto logged
https://github.com/allegroai/clearml/blob/4945182fa449f8de58f2fc6d380918075eec5bcf/examples/frameworks/scikit-learn/sklearn_joblib_example.py#L28

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Write your answer

2K Views

30 Answers

3 years ago

2 years ago