Answered

Hello. I Have A Very Basic Question. I'M Still Exploring Clearml To See If It Fits Our Needs. I Have Taken A Look At The Webui, And I Am Confused About What Constitutes A Project. It Seems That A Project Is Composed By A Series Of Experiments And Models,

Hello. I have a very basic question. I'm still exploring ClearML to see if it fits our needs. I have taken a look at the WebUI, and I am confused about what constitutes a project. It seems that a project is composed by a series of experiments and models, basically. And I miss the data. There is no way of directly especifying the data sources and data transformations of a project? Excuse me for asking something that basic, but I am a little bit confused...

  				
Posted 
	4 years ago

					More  		
  Report
		
					ShinyWhale52
				
					0
					 × 1

Votes Newest

Answers 6

Ok, this makes more sense. Thank you very much. I'll take a closer look at your code when I have a better picture of ClearML.

  				
Posted 
	4 years ago

					More  		
  Report
		
					ShinyWhale52
				
					0
					 × 1

In ClearML Opensource, a dataset is represented by a task (or experiment in UI terms). You can add datasets to projects to indicate that the dataset is related to the project, but it's completely a logic entity, IE, you can have a dataset (or datasets) per project, or a project with all your datasets.

  				
Posted 
	4 years ago

					More  		
  Report
		
					AnxiousSeal95
				
					0
					 × 1

To organize work, we designate a special task type for datasets (so it's easy to search and browse through them) as well as tags that help you get finer granularity search capabilities.

  				
Posted 
	4 years ago

					More  		
  Report
		
					AnxiousSeal95
				
					0
					 × 1

Ok, thanks a lot. This is not exactly what I expected, so I don't fully understand. For example, let's say you have a basic project in which the workflow is:
You read a csv stored in your filesystem. You transform this csv adding some new features, scaling and things like that. You train a model (usually doing several experiments with different hyperparameters). You deploy the model and is ready for making predictions. How would you structure this workflow in Tasks in ClearML?

  				
Posted 
	4 years ago

					More  		
  Report
		
					ShinyWhale52
				
					0
					 × 1

Hi ShinyWhale52
This is just a suggestion, but this is what I would do:

use clearml-data and create a dataset from the local CSV file
clearml-data create ... clearml-data sync --folder (where the csv file is)2. Write a python code that takes the csv file from the dataset and creates a new dataset of the preprocessed data
` from clearml import Dataset

original_csv_folder = Dataset.get(dataset_id=args.dataset).get_local_copy()

process csv file -> generate a new csv

preprocessed = Dataset.create(...)
preprocessed.add_files(new_created_file)
preprocessed.upload()
preprocessed.close() `3. Train the model (i.e. get the dataset prepared in (2)), add output_uri to upload the model (say to your S3 bucket of clearml-server)

` preprocessed_csv_folder = Dataset.get(dataset_id='preprocessed_dataset_if').get_local_copy()

Train here `

Use the clearml model repository (see the Models Tab in the Project experiment table) to get / download the trained model

wdyt?

  				
Posted 
	4 years ago

					More  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

ShinyWhale52 any time 🙂
Feel free to followup with more questions

  				
Posted 
	4 years ago

					More  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Write your answer

1K Views

6 Answers

4 years ago

2 years ago