Reputation
Badges 1
13 × Eureka!Hey Alon, thank you for the quick response! 🙂 This clarifies some points, we also experimented a little more now with it.
Our use-cases are unfortunately not completely covered I guess.
Let's say we have a pool of >300k images and growing. With queries in a database, we identify 80k that should form a dataset. We can create a dataset A and have it stored in the cloud, managed by clearml-data
. Let's say we query another time and get 60k images. Now it is not trivial to create a new d...
Hm OK 🤔
I am not sure whether it's heresy to say that here, but why wouldn't you use a mechanism comparable to what DVC does in the backend?
When you create a dataset, you could hash the individual files and upload them to a cache. Datasets are then groupings of file hashes. When you want to download a dataset, all you have to do is reproduce the folder structure with the files identified by hashes.
This way, it does not matter whether you recreate a dataset with the same files, they wou...
Thank you for the hint with Dataset.sync
and the explanation AgitatedDove14 🙂
The interfaces look alright. I think we are rather concerned about the performance of a backend implementation detail - but maybe I misunderstood?
When I create a dataset with say 5GB of images, it will be uploaded to the server/cloud as one .zip
archive. Let's say I now create several 5GB datasets A, B, C and then want to create a new dataset D that inherits 1GB each of A, B, C. If I checkout/downl...
Hm I tried it again (even cleaning up the vcs cache before since that caused an issue before) but it still does not work. Looking at the code, I also could not find the place where this should happen. For all I can tell, there are only translations from https->ssh
and ssh->https
, but not ssh->ssh
.
To add that, I quickly coded up this PR:
https://github.com/allegroai/clearml-agent/pull/72
Could you take a look at it? On our installation here, it shows the desired behavior...
Ah OK 🤔 So should I maybe update the PR to not touch the URL if neither user nor port are 'force-set'?
When we run a script containing Task.init
from within our repo, it creates a repo URL that looks like this:
Now the agents trying to execute this task fail with:cloning:
agent_user@git.mycompany.com: Permission denied (publickey). fatal: Could not read from remote repository. Please make sure you have the correct access rights and the repository exists.
An easy fix is to change the URL in the UI to include the user, e.g. ssh://git@git.mycompany.com:2022/myuser/repo.git , bu...
Hey AgitatedDove14 - thank you for the help! 🙂 Though in our case, most developers have the repo setup with ssh key authentication. Thus the task gets a 'ssh url' like ssh://
and not https://
. Consequently, the conversion is never called. Or is it already expected behavior that ClearML agent rewrites ssh://mydomain.com:2022/ ...
to ssh://git@mydomain.com:2022/ ...
if I have force_git_ssh_protocol: true
and force_git_ssh_user: "git"
?
Found a https://github.com/allegroai/clearml-agent/issues/42#issuecomment-887331420 . Though would any of the above proposed solutions be feasible?
Ah OK, thank you a lot for clarifying SuccessfulKoala55 ! 🙂 Then I guess in our case, we should just use our Dev image as default image of the docker agents. For debugging, it would be cool to avoid having to install libraries and a minimal venv everytime, but we do need the repo cloning, so I think we will not run in standalone mode.
For debugging, those 2-3mins setup time are annoying but for production use where jobs run for hours/days, it does not matter so much I guess 🤔
LazyTurkey38 OK thank you for sharing! 🙂 I'll have a look in a few days 👍