Reputation
Badges 1
27 × Eureka!Ingress is enabled - I can't control the ports in Ingress, so I had to use the subdomain method.
clearml.conf is the file that clearml-init
suppose to create, right?
Sure, with pleasure. However, we're using a self-hosted (on premise) version of ClearML...
Thanks! And how can I validate it? that it properly connects?
I did not use Chart nor Helm.
We just don't have enough credentials here to use those. I had to install everything manually, service by service.
For us it is both - having the process/pipeline presented in a clear UI, and the ability to trigger it, e.g. every evening.
In addition, tools like Dagster offer code-organization, and a separation of the code itself from the data and the configuration. So that we can use the same data/ml pipeline for different use-cases.
Sorry - my bad.
It did not work.
When I don't set anything in the clearml.conf
- it took the repo from the cache. When I delete the cache, it can't get the repo any longer.
Yes, I think - however - that it is our over strict security policy, that don't let anyone to enter the repo. We're lucky that they let the developers see their code...
We have defined an SSH key for clearml, and it is also set in the /clearml-agent/.ssh/config
and it still can't clone it. So it must be some security issue internally...
The manual clearml.conf worked 🙂 thanks for this!!
I'm also currently in a similar process, and giving a shot to http://DAGster.io
Thank you, SuccessfulKoala55 !
yes, we're "fighting" here to setup the ES on our local K8S through Rancher ( https://rancher.com/ ). Is it mandatory, for example, to label the node app=clearml?
Yup, these were the values I was missing! 🙏 Thank you so much!
I am very new to all these things, and I didn't even use helm chart (stupid me...) only after you asked, I checked about it, and saw I could have made use of it, and that it could have saved me so much time 😛
Well, we always get smarter by learning from these experiences. So, next time... 🙂
Hi SuccessfulKoala55 , thank you for the clarification. So how can I change it to locate the API on our k8s service? If it's nginx I guess I'll have to manually configure it...?
Thank you both so much for the efforts to fix it 🙂
One of my colleagues ran once some training, with tons of data in the git folder which was not .gitignored - so I suspect it's related to this.
hmm... the volume is already attached - already used by clearml-fileserver ... so it fails on this
I'll continue reporting if it happens again
Hi SuccessfulKoala55 , thanks for assisting, yes we used the helm to install it. It isn't the latest version though. We installed it a month or two ago.
I wonder what did we do to reach it, though... Could be we flooded it at some point.
and the storage class name (I hope that what you meant, SuccessfulKoala55 ) is ceph-c2-prod-rz01-cephfs
The version we're using is: 1.1.1-135 • 1.1.1 • 2.14
Hi AgitatedDove14 , thanks for the quick response.
I didn't set git_host
, only force_git_ssh_protocol: true
force_git_ssh_port: ...
force_git_ssh_user: ...
If I understand correctly, git_host
/ git_user
/ git_password
are all for HTTP, and we're using the SSH to clone the project through the agent.
I've only set the force_git_ssh_protocol
to false, but kept the force_git_ssh_protocol
/ force_git_ssh_protocol
- which are set to simply some port and 'git'. It didn't work unfortunately... could not connect to github with the port (connection timed out)
If I change or remove the port, I can't clone the whole project, so I don't even reach the installation of the detectron part.
A public one is easy:
https://github.com/facebookresearch/detectron2
The internal one is something like:
ssh://repo.our-domain.com:1234
We use that ssh so that we can easily access it, without the need of storing name/password, or without the need that each one who uses the code has to set up in advance their credentials in ENV vars and such...
Exactly - we need a mixed behavior!
We host in a private (self-hosed) git-Lab, which can only be cloned through the SSH, and we would like to import packages by compiling it from a public git-Hub (or by installing it using wheels with find-links).
After a restart, that seems to have helped, thanks! 🙂
Now we just need to solve the compilation of the git source installation... would have nice to have the find-links
for the wheels, but I understand it's unstable in reproduce-ability.
I have also no idea how it happened.
I managed to redeploy it and it seems to be accessible now