Hi ShinyWhale52
This is just a suggestion, but this is what I would do:
- use
clearml-data
and create a dataset from the local CSV fileclearml-data create ... clearml-data sync --folder (where the csv file is)
2. Write a python code that takes the csv file from the dataset and creates a new dataset of the preprocessed data
` from clearml import Dataset
original_csv_folder = Dataset.get(dataset_id=args.dataset).get_local_copy()
process csv file -> generate a new csv
preprocessed = Dataset.create(...)
preprocessed.add_files(new_created_file)
preprocessed.upload()
preprocessed.close() `3. Train the model (i.e. get the dataset prepared in (2)), add output_uri to upload the model (say to your S3 bucket of clearml-server)
` preprocessed_csv_folder = Dataset.get(dataset_id='preprocessed_dataset_if').get_local_copy()
Train here `
- Use the clearml model repository (see the Models Tab in the Project experiment table) to get / download the trained model
wdyt?
Ok, this makes more sense. Thank you very much. I'll take a closer look at your code when I have a better picture of ClearML.
To organize work, we designate a special task type for datasets (so it's easy to search and browse through them) as well as tags that help you get finer granularity search capabilities.
In ClearML Opensource, a dataset is represented by a task (or experiment in UI terms). You can add datasets to projects to indicate that the dataset is related to the project, but it's completely a logic entity, IE, you can have a dataset (or datasets) per project, or a project with all your datasets.
ShinyWhale52 any time 🙂
Feel free to followup with more questions
Ok, thanks a lot. This is not exactly what I expected, so I don't fully understand. For example, let's say you have a basic project in which the workflow is:
You read a csv stored in your filesystem. You transform this csv adding some new features, scaling and things like that. You train a model (usually doing several experiments with different hyperparameters). You deploy the model and is ready for making predictions. How would you structure this workflow in Tasks in ClearML?