Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
What Happens If The Task.Init Doesn'T Happen In The Same Py File As The "Data Science" Stuff I Have A List Of Classes That Do The Coding And I Initialise The Task Outside Of Them. Something Like

What happens if the Task.init doesn't happen in the same py file as the "data science" stuff

I have a list of classes that do the coding and I initialise the task outside of them. Something like
task = Task.init(...) for foo in foos: foo.run()
Where in foo.run() all kind of dataframe/xgboost/sql stuff is happening

  
  
Posted one year ago
Votes Newest

Answers 30


What I try to do is that DSes have some lightweight baseclass that is independent of clearml they use and a framework have all the clearml specific code. This will allow them to experiment outside of clearml and only switch to it when they are in an OK state. This will also help not to pollute clearml spaces with half backed ideas

  
  
Posted one year ago

apparently it did not work

  
  
Posted one year ago

What I try to do is that DSes have some lightweight baseclass that is independent of clearml they use and a framework have all the clearml specific code. This will allow them to experiment outside of clearml and only switch to it when they are in an OK state. This will also help not to pollute clearml spaces with half backed ideas

So you want the DS to manually tell the baseclasss what to store ?
then the base class will store it for them, for example with joblib , is this the use case?

  
  
Posted one year ago

This will allow them to experiment outside of clearml and only switch to it when they are in an OK state. This will also helpnot to pollute clearml spaces with half backed ideas

What's the value of runnign outside of an experiment management context ? don't you want to log it?
There is no real penalty here, no?!

  
  
Posted one year ago

That's because then I need to teach every DS how to use the ClearML api

  
  
Posted one year ago

but here I can tell them: return a dictionary of what you want to save

  
  
Posted one year ago

but here I can tell them: return a dictionary of what you want to save

If this is the case you have two options, either store the dict as an artifact (this makes sense if this is not standalone model you would like to later use), or store as an artifact.
Artifact example:
https://github.com/allegroai/clearml/blob/master/examples/reporting/artifacts.py
getting them back
https://github.com/allegroai/clearml/blob/master/examples/reporting/artifacts_retrieval.py
Model example:
https://github.com/allegroai/clearml/blob/master/examples/reporting/model_reporting.py

  
  
Posted one year ago

task = Task.init(...) for foo in foos: data = foo.run() for key,value in data.items(): if key in 'auc', 'f1', etc: logger.log(key, value) elif key.startswith('model'): savemodel etc

  
  
Posted one year ago

I was just looking at the model example. How does output model store the binary? For example of an xgboost model

  
  
Posted one year ago

I want the model to be stored in a way that clearml-serving can recognise it as a model

  
  
Posted one year ago

I want the model to be stored in a way that clearml-serving can recognise it as a model

Then OutputModel or task.update_output_model(...)
You have to serialize it, in a way that later your code will be able to load it.
With XGBoost, when you do model.save clearml automatically picks and uploads it for you
assuming you created the Task.init(..., output_uri=True)
You can also manually upload the model with task.update_output_model or equivalent with OutputModel class.
if you want to disable the auto logging for xgboost:
task. Task.init(...., auto_connect_frameworks={'xgboost': False,})You can check the docs / docstring for more details on that

  
  
Posted one year ago

TBH ClearML doesn't seem to be picking the model up so I need to do it manually

  
  
Posted one year ago

No, pickling is the only thing that will Not trigger clearml (it is just too generic to automagically log)

So what is the mechanism that you "automagically" pick things up (for information, I don't think this is relevant to our usecase)

  
  
Posted one year ago

pickle.dump({ 'model': model, 'X_train': X_train, 'Y_train': Y_train, 'X_test': X_test, 'Y_test': Y_test, 'impute_values': impute_values }, open(self.output_filename, 'wb'))

  
  
Posted one year ago

So what is the mechanism that you "automagically" pick things up (for information, I don't think this is relevant to our usecase)

If you use joblib.dump (which is like pickle but safer/faster) it will be auto logged
https://github.com/allegroai/clearml/blob/4945182fa449f8de58f2fc6d380918075eec5bcf/examples/frameworks/scikit-learn/sklearn_joblib_example.py#L28

  
  
Posted one year ago

I think I figured this out but now I have a problem:

auto_connect_frameworks={ 'xgboost': False, 'scikitlearn': False }

  
  
Posted one year ago

I solve the artifact,dataset,table,scalar anything by simply making foo.run()
return a dictionary like
{ 'artifact_X_train': X_train, 'table_confusion_matrix: cf, 'dataset_x': _ ... }And then call the appropriate logger or artifact uploader or dataset uploader. (In case the dataset uploader I use the foo.output_file which every foo has.

  
  
Posted one year ago

So the only thing left is models

  
  
Posted one year ago

BTW you are not exporting Framework in __ init __ so you need to import it like from clearml.model import Framework

  
  
Posted one year ago

I just disabled all of them with auto_connect_frameworks=False

  
  
Posted one year ago

if I swap it to Framework then it autocrecords both

  
  
Posted one year ago

If I do this it still autorecords the sklearn one

  
  
Posted one year ago

ConvolutedSealion94 try scikit not scikitlearn
I think we should add a warning if a key is there and is being ignored... let me make sure of that

I just disabled all of them with

auto_connect_frameworks=False

Yep that also works

  
  
Posted one year ago

ahh, ok, well, I tried to find an example that I can extend but this was the only reference I found: https://github.com/allegroai/clearml/blob/ca384aa75c236e0a8af7c5dd85406a359c3eb703/clearml/model.py#L35

  
  
Posted one year ago

TBH ClearML doesn't seem to be picking the model up so I need to do it manually

This is odd, cleamrl will pick framework level serialization, but not just any pickle call

Why do I need an output_uri for the model saving? The dataset API can figure this out on its own

So that it knows where to upload it, if your are setting True this will be the default files server, you can also set iy for shared files system, S3 GCP storage etc.
If no value is passed, it will just log the local file being stored. Make sense ?

  
  
Posted one year ago

I didn't realise that pickling is what triggers clearml to pick it up.

No, pickling is the only thing that will Not trigger clearml (it is just too generic to automagically log)

  
  
Posted one year ago

I didn't realise that pickling is what triggers clearml to pick it up. I am actually saving a dictionary that contains the model as a value (+ training datasets)

  
  
Posted one year ago

Is there an explicit OutputModel + xgboost example somewhere?

  
  
Posted one year ago

Why do I need an output_uri for the model saving? The dataset API can figure this out on its own

  
  
Posted one year ago

I am actually saving a dictionary that contains the model as a value (+ training datasets)

How are you specifically doing that? pickle?

  
  
Posted one year ago
609 Views
30 Answers
one year ago
one year ago
Tags