ClearML doesnt add any yaml or lock files to your repo, actually I think it only needs read permissions.
In the full/official documentation the
clearml-data
CLI is not mentioned anywhere, so perhaps it should be refreshed
Yep, you are right, The clearml-data
is really new, and it will be in one of the next documentation version.
UnevenDolphin73 FYI: clearml-data is documented , unfortunately only in GitHub:
https://github.com/allegroai/clearml/blob/master/docs/datasets.md
Hi UnevenDolphin73
I’m not sure I understand - can you share the use case you are looking for? You want to interact with the ClearML-agent?
If everything is managed with a git repo, does this also mean PRs will have a messy metadata file attached to them?
For data versioning you can use the ClaerML data managemant.
Its being done with the CLI, an easy installation and you are ready to go, you can view a full example in this link - https://github.com/allegroai/clearml/blob/master/docs/datasets.md , including the installation.
Every task in ClearML includes the git repo, the changes and the full running environment.
You have some more cool things you can use (like pipelines, HPO, ClearML task CLI and more), you can find all of them here - https://github.com/allegroai/clearml#additional-modules
is this what you were looking for?
I think so, it was just missing from the official documentation 🙂 Thanks!
Not everything is manage with a git repo, if your script is a standalone, the full script will be in the uncommitted changes
section (EXECUTION tab).
The repository information is the repository location, the uncommitted changes, the branch with commit id / tag.
BTW, the full link to the docs - https://allegro.ai/clearml/docs/
Thanks Alon. In the full/official documentation the clearml-data
CLI is not mentioned anywhere, so perhaps it should be refreshed 😉
I think we're referring to different things here.
I won't be using the UI (and neither will my team).
But as mentioned, we've used DVC before and it adds a lot of junk metadata files to each GitHub PR (many dvc.yaml
, dvc.lock
and .gitignore
files). We're trying to avoid that as much as possible, hence my question about GitHub pull requests when using clearml-data
.
I'm trying to decide if ClearML is a good use case for my team 🙂
Right now we're not looking for a complete overhaul into new tools, just some enhancements (specifically, model repository, data versioning).
We've been burnt by DVC and the likes before, so I'm trying to minimize the pain for my team before we set out to explore ClearML.