That's because then I need to teach every DS how to use the ClearML api
Is there an explicit OutputModel + xgboost example somewhere?
TBH ClearML doesn't seem to be picking the model up so I need to do it manually
If I do this it still autorecords the sklearn one
BTW you are not exporting Framework in __
init
__
so you need to import it like from clearml.model import Framework
I want the model to be stored in a way that clearml-serving can recognise it as a model
Then OutputModel or task.update_output_model(...)
You have to serialize it, in a way that later your code will be able to load it.
With XGBoost, when you do model.save clearml automatically picks and uploads it for you
assuming you created the Task.init(..., output_uri=True)
You can also manually upload the model with task.update_output_model or equivalent with OutputModel class.
if you want to disable the auto logging for xgboost:task. Task.init(...., auto_connect_frameworks={'xgboost': False,})
You can check the docs / docstring for more details on that
I solve the artifact,dataset,table,scalar anything by simply making foo.run()
return a dictionary like{ 'artifact_X_train': X_train, 'table_confusion_matrix: cf, 'dataset_x': _ ... }
And then call the appropriate logger or artifact uploader or dataset uploader. (In case the dataset uploader I use the foo.output_file which every foo has.
I want the model to be stored in a way that clearml-serving can recognise it as a model
ahh, ok, well, I tried to find an example that I can extend but this was the only reference I found: https://github.com/allegroai/clearml/blob/ca384aa75c236e0a8af7c5dd85406a359c3eb703/clearml/model.py#L35
This will allow them to experiment outside of clearml and only switch to it when they are in an OK state. This will also helpnot to pollute clearml spaces with half backed ideas
What's the value of runnign outside of an experiment management context ? don't you want to log it?
There is no real penalty here, no?!
TBH ClearML doesn't seem to be picking the model up so I need to do it manually
This is odd, cleamrl will pick framework level serialization, but not just any pickle call
Why do I need an output_uri for the model saving? The dataset API can figure this out on its own
So that it knows where to upload it, if your are setting True
this will be the default files server, you can also set iy for shared files system, S3 GCP storage etc.
If no value is passed, it will just log the local file being stored. Make sense ?
So what is the mechanism that you "automagically" pick things up (for information, I don't think this is relevant to our usecase)
If you use joblib.dump (which is like pickle but safer/faster) it will be auto logged
https://github.com/allegroai/clearml/blob/4945182fa449f8de58f2fc6d380918075eec5bcf/examples/frameworks/scikit-learn/sklearn_joblib_example.py#L28
I didn't realise that pickling is what triggers clearml to pick it up.
No, pickling is the only thing that will Not trigger clearml (it is just too generic to automagically log)
What I try to do is that DSes have some lightweight baseclass that is independent of clearml they use and a framework have all the clearml specific code. This will allow them to experiment outside of clearml and only switch to it when they are in an OK state. This will also help not to pollute clearml spaces with half backed ideas
So you want the DS to manually tell the baseclasss what to store ?
then the base class will store it for them, for example with joblib
, is this the use case?
I think I figured this out but now I have a problem:
auto_connect_frameworks={ 'xgboost': False, 'scikitlearn': False }
I was just looking at the model example. How does output model store the binary? For example of an xgboost model
What I try to do is that DSes have some lightweight baseclass that is independent of clearml they use and a framework have all the clearml specific code. This will allow them to experiment outside of clearml and only switch to it when they are in an OK state. This will also help not to pollute clearml spaces with half backed ideas
No, pickling is the only thing that will Not trigger clearml (it is just too generic to automagically log)
So what is the mechanism that you "automagically" pick things up (for information, I don't think this is relevant to our usecase)
but here I can tell them: return a dictionary of what you want to save
If this is the case you have two options, either store the dict as an artifact (this makes sense if this is not standalone model you would like to later use), or store as an artifact.
Artifact example:
https://github.com/allegroai/clearml/blob/master/examples/reporting/artifacts.py
getting them back
https://github.com/allegroai/clearml/blob/master/examples/reporting/artifacts_retrieval.py
Model example:
https://github.com/allegroai/clearml/blob/master/examples/reporting/model_reporting.py
task = Task.init(...) for foo in foos: data = foo.run() for key,value in data.items(): if key in 'auc', 'f1', etc: logger.log(key, value) elif key.startswith('model'): savemodel etc
Why do I need an output_uri for the model saving? The dataset API can figure this out on its own
ConvolutedSealion94 try scikit
not scikitlearn
I think we should add a warning if a key is there and is being ignored... let me make sure of that
I just disabled all of them with
auto_connect_frameworks=False
Yep that also works
but here I can tell them: return a dictionary of what you want to save
I just disabled all of them with auto_connect_frameworks=False
if I swap it to Framework then it autocrecords both
I didn't realise that pickling is what triggers clearml to pick it up. I am actually saving a dictionary that contains the model as a value (+ training datasets)
pickle.dump({ 'model': model, 'X_train': X_train, 'Y_train': Y_train, 'X_test': X_test, 'Y_test': Y_test, 'impute_values': impute_values }, open(self.output_filename, 'wb'))
I am actually saving a dictionary that contains the model as a value (+ training datasets)
How are you specifically doing that? pickle?