![Profile picture](https://clearml-web-assets.s3.amazonaws.com/scoold/avatars/EnviousStarfish54.png)
Reputation
Badges 1
151 × Eureka!For example, I am logging these metrics as a "configuration/hyperparameters". The reason I am not using report_scalar() because it only support the "last/min/max". This way I can control whatever custom logic I need in my code.
I need to compare the metadata across experiments. Although the dashboard support choosing "min/max/last", it cannot support comparing "the lowest loss" across experiment.
I want the support for click as well, or is there any adhoc solution?
It would be nice if there is an "export" function to just export all/selected experiment table view
task_reporting = Task.init(project_name='project', task_name='report') tasks = Task.get_tasks(project_name='project', task_name='partial_task_name_here') for t in tasks: t.get_last_scalar_metrics() task_reporting.get_logger().report_something
Instead of get_last_scalar_metrics()
, I am using t._data.hyperparams['summary'] to get the metrics I needed
Wow! Just need this, I am surprised that I don't need to configure on server side
Hi, just to be clear, self hosted option is still available right? I need to know this as we have spent some effort on integrating Trains internally and expect to continue the development for a while.
It's good that you have version your dataset with name, I have seen many trained model that people just replace the dataset directly.
may I ask is there a planned release date?
Cool, versioning the difference is useful. It also depends on what kind of data. For example, for tabular data, database might be a natural choice, however, how to integrate it and keep track of the metadata could be tricky. While for images, it probably more suitable for blob storage or per file basis.
Oh I did not realize I asked this in a old thread, sorry about that.
VivaciousPenguin66 What's your thought on Prefect? There are so many pipeline library and I wasn't so sure how different are they. I have experience with Airflow. With Kedro, we were in hope that data scientist will write the pipeline themselves with minimal effort to handover to another engineer to work on. For serious production (need to scale), we consider convert Kedro pipeline to Airflow, there are plugin to do that, tho I am not sure how mature they are.
i.e. some files in a shared drive, then someone silently updated the files and all the experiments become invalid and no one knows when did that happened.
lol...... mine is best_model_20210611_v1.pkl
and better_model_20210611_v2.pkl
or best_baseline_model_with_more_features.pkl
Yup, I am only more familiar with the experiment tracking part, so I don't know if I have a good understanding before I have reasonable knowledge of the entire ClearML system.
VivaciousPenguin66 How are you using the dataset tool? Love to hear more about that.
Hi, I think I can confirm this is a bug of Trains. Is that ok if I submit a PR to fix this?
TimelyPenguin76 It works fine. I may need to check on my side, I just notice it was caused by @funcy.log_durations decorator. It may changes the function signature and causing some issue with it. I don't have time to look into it yet, but the example works fine.
Ok, will prepare a PR and script to reproduce the error
potentially both, but let just say structure data first, like CSV, pickle (may not be a table, could be any python object), feather, parquet, some common data format
AgitatedDove14
are the data versioning completely different from the Trains Artifact/storage solution? or it's some enhanced feature.
I am interested in machine learning experiment mangament tools.
I understand Trains already handle a lot of things on the model side, i.e. hyperparameters, logging, metrics, compare two experiments.
I also want it to help reproducible. To achieve that, I need code/data/configuration all tracked.
For code and configuration I am happy with current Trains solution, but I am not sure about the data versioning.
So if you have more details about the dataset versioning with the enterprise offer...
for the most common workflow, I may have some csv, which may be updated from time to time
I wonder what's the extra features is offered in the enterprise solution tho
for the open source version, if I use artifact, if I already have a local file, does it knows to skip downloading it or it will always replace the file? As my dataset is large (~100GBs), I cannot afford it to be re-downloaded everytime
AnxiousSeal95 At first sight, the pipeline logic of ClearML seems binding with ClearML quite a bit. Back then I was considering I need something that can convert to Production pipeline (e.g. Airflow DAGs) easily, as we need pipelines not just for Experiments, Airflow seems to be the default one.
Also, clearml-data was not available when we started the development of internal framework. As for clear-agent, from my previous experience, it seems not working great with Window sometimes, and als...
Great discussion, I agree with you both. For me, we are not using clearml-data, so I am a bit curious how does a "published experiment" locked everything (including input? I assume someone can still just go inside the S3 bucket and delete the file without Clearml noticing).
From my experience, absolute reproducibility is code + data + parameter + execution sequence. For example, random seed or some parallelism can cause different result and could be tricky to deal with sometimes. We did bu...
Ok, then maybe it can be still used as a data versioning solution. Except that I have to manually track the task id (those generate artifact) for versioning myself.