git config --system credential.helper 'store --file /root/.git-credentials'
Maybe we should use this hack for cloning with user/token in general ...
What's the error you are getting ?
(open the browser web developer, see if you get something on the console log)
Shout-out to Emilio for quickly stumbling on this rare bug and letting us know. If you have a feeling your process is stuck on exit, just upgrade to 1.0.1 😉
VictoriousPenguin97 I'm assuming the exact same server version ?
Hi SoreHorse95
I am exploring hiding our clearml server behind
Do you mean add additional reverse proxy to authenticate clearml-server from outside ?
So clearml server already contains an authentication layer (JWT Token), and you do have a full user management on top:
https://clear.ml/docs/latest/docs/deploying_clearml/clearml_server_config#web-login-authentication
Basically what I'm saying if you add httpS on top of the communication, and only open the 3 ports, you should be good to go. Now if you really need SSO (AD included) for user login etc, unfortunately this is not part of the open source, but I know they have it in the scale/ent...
You mean does one solution is better than combining maintaining and automating 3+ solutions (dvc/lakefs + mlflow + cubeflow/airflow)
Yes I'd say it is. BTW if you have airflow running for other automations you can very easily combine the automation with clearml and have a single airflow automation for everything, but the main difference now airflow only launches logic, never actual compute/data (which are launched and scaled via clearml
Does that make sense?
Hi @<1555362936292118528:profile|AdventurousElephant3>
I think your issue is that Task supports two types of code,
- single script/jupyter notebook
- git repo + git diffIn your example (If I understand correctly) you have a notebook calling another notebook, which means the first notebook will be stored on the Task, but the second notebook (not being part of a repository) will not be stored on the task, and this is why when the agent is running the code it fails to find the second notebook....
VictoriousPenguin97 basically spin down sereverA (this should flush all DBs) then copy /opt/clearml to the new server and spin it with docker-compose. As long as the new server is on the same address as the previous one, everything should work out of the box
the issue was related to task.connect being called multiple times I guess.
This is odd?! how would that effect the crash?
Do notice that when you connect objects, each time you call connect you are basically deserializing the configuration from the backend into the code, maybe this somehow effected the object?
@<1523703080200179712:profile|NastySeahorse61> so glad you managed to solve it 🎊 🚀
So the only difference is how I log in into machine to start clear-ml
the only different that I can think of is the OS Environments in the two login types:
can you run export
in the two cases and check the diff between them?export
I can't see any reason it should not work 😀
Hi JuicyFox94 ,
Actually we just added that 🙂 (still on GitHub , RC soon)
https://github.com/allegroai/clearml/blob/400c6ec103d9f2193694c54d7491bb1a74bbe8e8/clearml/automation/controller.py#L696
Hey LethalDolphin75 , when it works, could you PR it?
Hmm this is odd in deed, let me verify (thanks! @<1643060801088524288:profile|HarebrainedOstrich43> )
Okay, progress.
What are you getting when running the following from the git repo folder:git ls-remote --get-url origin
okay the odd thing git ls-remote --get-url origin
should have returned the same...
what's your git version? (git --version)
I will take any suggestion 🙂git remote -v
could be a good start but I'm not familiar with the output structure, is there a template for parsing ?
I think, this all ties into the none-standard git repo definition. I cannot find any other reason for it. Is it actually stuck for 5 min at the end of the process, waiting for the repo detection ?
LOL, thanks!
Hi WittyOwl57
I think what happens is it auto-logs the joblib load/save calls, these calls track models used/created by the code, and attach them to the model repository representing these model.
I'm assuming there are multiple load/save , and there are multiple model instances pointing to the same local file "file:///tmp/..." . The earning basically says it is re-registering existing models.
Make sense ?
(BTW: you can disable the auto-logging feature of joblib)Task.init(..., auto_connect_frameworks={'scikit': False})
Can I make the Tasks that I'm adding to the pipeline also run locally, such that the entire pipeline runs locally?
Ohh I think only if you have an agent running on your machine.
What is the use case ? (maybe we can add local execution as well?!)
Hmm I think you are correct:param auto_create: Create new dataset if it does not exist yet
it should have created it, this seems like a bug, I'll make sure to pass along 🙂
well it should fail, but I think the error message should be fixed 🙂
maybe:ValueError: dataset 'tmp_datset' not found in project
lavi-testing' `wdyt?
BattyLion34 is this consistent?
(Really I can't see eny difference, one time it is able to create the venv and another it is failing with permission error)
ReassuredTiger98 that is a good point, at the moment they are designed as "machine level" configs, but we do have built in support to allow multiple configurations. The technical issue is we have to read the configuration file before we initial the Task object, that means we still are not aware of the git root (which I assume is where we could put a configuration file)
BTW: regrading the detect_with_conda_freeze
we hope that this flag is rarely used, as the Clearml should auto-detect t...