Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
Hello. I Have A Very Basic Question. I'M Still Exploring Clearml To See If It Fits Our Needs. I Have Taken A Look At The Webui, And I Am Confused About What Constitutes A Project. It Seems That A Project Is Composed By A Series Of Experiments And Models,

Hello. I have a very basic question. I'm still exploring ClearML to see if it fits our needs. I have taken a look at the WebUI, and I am confused about what constitutes a project. It seems that a project is composed by a series of experiments and models, basically. And I miss the data. There is no way of directly especifying the data sources and data transformations of a project? Excuse me for asking something that basic, but I am a little bit confused...

  
  
Posted 3 years ago
Votes Newest

Answers 6


Ok, this makes more sense. Thank you very much. I'll take a closer look at your code when I have a better picture of ClearML.

  
  
Posted 3 years ago

Hi ShinyWhale52
This is just a suggestion, but this is what I would do:

  1. use clearml-data and create a dataset from the local CSV file
    clearml-data create ... clearml-data sync --folder (where the csv file is)2. Write a python code that takes the csv file from the dataset and creates a new dataset of the preprocessed data
    ` from clearml import Dataset

original_csv_folder = Dataset.get(dataset_id=args.dataset).get_local_copy()

process csv file -> generate a new csv

preprocessed = Dataset.create(...)
preprocessed.add_files(new_created_file)
preprocessed.upload()
preprocessed.close() `3. Train the model (i.e. get the dataset prepared in (2)), add output_uri to upload the model (say to your S3 bucket of clearml-server)

` preprocessed_csv_folder = Dataset.get(dataset_id='preprocessed_dataset_if').get_local_copy()

Train here `

  1. Use the clearml model repository (see the Models Tab in the Project experiment table) to get / download the trained model

wdyt?

  
  
Posted 3 years ago

To organize work, we designate a special task type for datasets (so it's easy to search and browse through them) as well as tags that help you get finer granularity search capabilities.

  
  
Posted 3 years ago

Ok, thanks a lot. This is not exactly what I expected, so I don't fully understand. For example, let's say you have a basic project in which the workflow is:
You read a csv stored in your filesystem. You transform this csv adding some new features, scaling and things like that. You train a model (usually doing several experiments with different hyperparameters). You deploy the model and is ready for making predictions. How would you structure this workflow in Tasks in ClearML?

  
  
Posted 3 years ago

ShinyWhale52 any time 🙂
Feel free to followup with more questions

  
  
Posted 3 years ago

In ClearML Opensource, a dataset is represented by a task (or experiment in UI terms). You can add datasets to projects to indicate that the dataset is related to the project, but it's completely a logic entity, IE, you can have a dataset (or datasets) per project, or a project with all your datasets.

  
  
Posted 3 years ago
1K Views
6 Answers
3 years ago
one year ago
Tags