EagerOtter28

3 Questions, 13 Answers

Active since 10 January 2023

Last activity one year ago

Reputation

Badges 1

13 × Eureka!

Questions 3
Answers 13

0 Votes

16 Answers

960 Views

0 Votes 16 Answers 960 Views

Hi!

Hi! 👋 I am still quite new to ClearML but we are currently assessing how useful it is at the place where I work - and so far, it looks pretty cool! 🙂 We ha...

clearml

3 years ago

0 Votes

6 Answers

1K Views

0 Votes 6 Answers 1K Views

Clearml-Data - Incremental Changes And Hashing On Per-File Basis?

clearml-data - incremental changes and hashing on per-file basis? Hi! In our team, we are now also looking at clearml-data . The integration with cloud backi...

dataset

3 years ago

0 Votes

4 Answers

1K Views

0 Votes 4 Answers 1K Views

Running The Docker Agent With Own Image The Right Way?

Running the Docker Agent with own image the right way? In our company, we have some development/production docker images to use on developer laptops and in d...

mlops

3 years ago

0 Hi!

OK but ssh->ssh?

3 years ago

0 Clearml-Data - Incremental Changes And Hashing On Per-File Basis?

Hey Alon, thank you for the quick response! 🙂 This clarifies some points, we also experimented a little more now with it.

Our use-cases are unfortunately not completely covered I guess.
Let's say we have a pool of >300k images and growing. With queries in a database, we identify 80k that should form a dataset. We can create a dataset A and have it stored in the cloud, managed by clearml-data . Let's say we query another time and get 60k images. Now it is not trivial to create a new d...

3 years ago

0 Clearml-Data - Incremental Changes And Hashing On Per-File Basis?

Hm OK 🤔
I am not sure whether it's heresy to say that here, but why wouldn't you use a mechanism comparable to what DVC does in the backend?

When you create a dataset, you could hash the individual files and upload them to a cache. Datasets are then groupings of file hashes. When you want to download a dataset, all you have to do is reproduce the folder structure with the files identified by hashes.

This way, it does not matter whether you recreate a dataset with the same files, they wou...

3 years ago

0 Clearml-Data - Incremental Changes And Hashing On Per-File Basis?

Thank you for the hint with Dataset.sync and the explanation AgitatedDove14 🙂
The interfaces look alright. I think we are rather concerned about the performance of a backend implementation detail - but maybe I misunderstood?
When I create a dataset with say 5GB of images, it will be uploaded to the server/cloud as one .zip archive. Let's say I now create several 5GB datasets A, B, C and then want to create a new dataset D that inherits 1GB each of A, B, C. If I checkout/downl...

3 years ago

0 Hi!

Hm I tried it again (even cleaning up the vcs cache before since that caused an issue before) but it still does not work. Looking at the code, I also could not find the place where this should happen. For all I can tell, there are only translations from https->ssh and ssh->https , but not ssh->ssh .

To add that, I quickly coded up this PR:
https://github.com/allegroai/clearml-agent/pull/72

Could you take a look at it? On our installation here, it shows the desired behavior...

3 years ago

0 Hi!

Alright will do so later 🙂 👍

3 years ago

0 Hi!

Ah OK 🤔 So should I maybe update the PR to not touch the URL if neither user nor port are 'force-set'?

3 years ago

0 Hi!

Updated and tested, see https://github.com/allegroai/clearml-agent/pull/72/files#diff-2257fdf0f263af3f937a094db4f0007cc09af5defa61907fcff291249112d49eR313 .

3 years ago

0 Hi!

When we run a script containing Task.init from within our repo, it creates a repo URL that looks like this:

Now the agents trying to execute this task fail with:
cloning: agent_user@git.mycompany.com: Permission denied (publickey). fatal: Could not read from remote repository. Please make sure you have the correct access rights and the repository exists.
An easy fix is to change the URL in the UI to include the user, e.g. ssh://git@git.mycompany.com:2022/myuser/repo.git , bu...

3 years ago

0 Hi!

Hey AgitatedDove14 - thank you for the help! 🙂 Though in our case, most developers have the repo setup with ssh key authentication. Thus the task gets a 'ssh url' like ssh:// and not https:// . Consequently, the conversion is never called. Or is it already expected behavior that ClearML agent rewrites ssh://mydomain.com:2022/ ... to ssh://git@mydomain.com:2022/ ... if I have force_git_ssh_protocol: true and force_git_ssh_user: "git" ?

3 years ago

0 Hi!

Found a https://github.com/allegroai/clearml-agent/issues/42#issuecomment-887331420 . Though would any of the above proposed solutions be feasible?

3 years ago

0 Running The Docker Agent With Own Image The Right Way?

Ah OK, thank you a lot for clarifying SuccessfulKoala55 ! 🙂 Then I guess in our case, we should just use our Dev image as default image of the docker agents. For debugging, it would be cool to avoid having to install libraries and a minimal venv everytime, but we do need the repo cloning, so I think we will not run in standalone mode.

For debugging, those 2-3mins setup time are annoying but for production use where jobs run for hours/days, it does not matter so much I guess 🤔

3 years ago

0 Running The Docker Agent With Own Image The Right Way?

LazyTurkey38 OK thank you for sharing! 🙂 I'll have a look in a few days 👍

3 years ago