HI QuizzicalDove0
I guess the reason is that the idea is integration is literally 2 lines, and it will take less time to execute the code on a system with working env (we assume there is one) then to configure all the git , python packages, arguments etc...
All that said you can create an experiment from code , using Task.import_task https://allegro.ai/docs/task.html#trains.task.Task.import_task
Sorry my bad, you are looking for:
None
RC you can see on the main readme, (for some reason the Conda badge will show RC and the PyPi won't)
https://github.com/allegroai/clearml/
Hmm so the Task.init should be called on the main process, this way the subprocess knows the Task is already created (you can call Task.init twice to get the task object). I wonder if we somehow can communicate between the sub processes without initializing in the main one...
Hi @<1687643893996195840:profile|RoundCat60> , I just saw the message,
Just by chance I set the SSH deploy keys to write access and now we're able to clone the repo. Why would the SSH key need write access to the repo to be able to clone?
Let me explain, the default use case for the agent is to use user/pass (as configured in the clearml.conf file(
It will change any ssh links to https links and will add the credentials to clone the repository.
You can also provide SSH keys (basicall...
Create one experiment (I guess in the scheduler)
task = Task.init('test', 'one big experiment')
Then make sure the the scheduler creates the "main" process as subprocess, basically the default behavior)
Then the sub process can call Task.init and it will get the scheduler Task (i.e. it will not create a new task). Just make sure they all call Task init with the same task name and the same project name.
Hi AverageBee39
It seems the json is corrupted, could that be ?
UnevenDolphin73 sounds great, any chance you can open a git issue on clearml-agent repo for this feature request ?
So what you’re saying is to first kick off a new run and then rename the underlying Pipeline Task, which will cause that particular run to become a new pipeline name?
Correct, basically you are not changing the "pipeline" per-se but the execution name of the pipeline, if that makes sense
What would be most ideal would be to be able to right-click on a pipeline run and have a “clone” option, like you can with a task, where you can start a new run with a new name in a single step.
...
If you want to rename it (any pipeline), click on the "Full details" in the "Run Info" (right hand side panel), then in the full detail of the Pipeline Task you will be able to rename the pipeline execution
(Is renaming useful? should we add a right click to rename ?)
Hi @<1533620191232004096:profile|NuttyLobster9>
Hi All, is there a way to clone a pipeline from the web UI like you can with a task?
Right click on the pipeline and select Run (it is basically the same thing as cloning it)
Hi GreasyPenguin66
So the way clearml can store your notebook is by using the jupyter-notebook rest api. It assumes, that it can communicate with it as the kernel is running on the same machine. What exactly is the setup? is the jupyter-lab/notebook running inside the docker? maybe the docker itself is running with some --network argument ?
Is there any contingency plan for an agent to continue running a task without reading the repository on the GitLab server?
Not sure what can be done ... any suggestions ?
At runtime, can I ask the agent to use some cached repository?
sometimes you will have it (as the agent stores a cached copy, but I would hardly count on it (and it might be at different states on different machines...)
... (due to regular maintenance service, something I cannot control).
Maybe let "th...
Hi Martin, of course not,
Smart!
I was just wondering if it has been patched yet and if not what is the expected timeline for patching it
Yes, I believe the target is a patch version 1.15.1 to be released in a couple of weeks. This is not a major issue but it's always better to have have it fixed. (btw: the enterprise version never had this issue to being with, because it is of course authenticated, as well as it has additional RBAC layer on top.)
Hi @<1689808977149300736:profile|CharmingKoala14> , let me double check that
Hi @<1658281099807166464:profile|SmallCamel52>
Lack of authentication in all versions of the fileserver component
Are you leaving the fileserver open to the world ?
quick update, still trying to reproduce ...
(Venv mode makes sense if running inside a container, if you need docker support you will need to mount the docker socket inside)
What is exactly the error you re getting from clearml? And what do you have in the configuration file?
Can you try to manually install it and see what you are getting?python3.10 -m pip install /home/boris/.clearml/pip-download-cache/cu117/torch-1.12.1+cu116-cp310-cp310-linux_x86_64.whl
Hi IcySwallow94
Are you deploying the clearml server with the helm chart ?
is it also possible to somehow propagate ssh keys to the agent pod? Not sure how to approach that
I would use the k8s secret manager to do that (there is a way to mount secrets files into pod, SSH is relatively standard to do)
GaudyPig83
I think there is some mismatch between the code creating the pipeline and the actual Task?! Could that somehow be the case? "relaunch_on_instance_failure" is a missing argument somehow
can you try to launch the entire Pipeline with the latest RC ?pip3 install clearml==1.7.3rc0
GrumpySeaurchin29 you can pass s3 credential for the autoscaler, but all the tasks will have them. Are you saying two diff sets of credentials is the issue, or is it the visibility?
I think that just backing up /opt/clearml and moving it should be just fine 🤞
When I look at the details, model artifact in the ClearML UI, it's been saved the usual way, and no tags that I added in the OutputModel constructor are there.
Did you disable the autologging ? Are you saying the tags not appearing is a bug (it might be) ?
Also, I don't mind auto logging either if I have control over publishing the model or not directly from that script, and adding tags etc, like OutputModel.
Sure you can publish models / add tags etc, wither from the UI or pr...
WittyOwl57 that is odd there is a specific catch for SystemExit
https://github.com/allegroai/clearml/blob/51d70efbffa87aa41b46c2024918bf4c584f29cf/clearml/backend_interface/task/repo/scriptinfo.py#L773
How do I reproduce this issue/warning ?
Also: "Repository and package analysis timed out (300.0 sec), giving up" seriously ove 5 minutes ?! how large is the git repo?