Okay, so I can't figure why it would "kill" the new experiments, I mean it should run them, but is there any "smart stopping" that causes it to kill he process before it ends ?
BTW: can this be reproduced with the clearml hydra example ?
Hi AttractiveCockroach17
. Many of these experiments appear with status running on clearml even though they have finish running,
Could it be their process just terminated? (i.e. not properly shutdown) ?
How are you running these multiple experiments?
BTW: if the server does not see any change in a Task for (I think the default is 2 hours) it will automatically mark these Task as aborted
You mean to design the entire pipeline from YAML?
(this assumes your Tasks know how to process links to artifacts)
Is this what you are after?
(BTW: any reason for working with YAML files instead of coding it?)
The imports inside the functions are because the function itself becomes a stand-alone job running on a remote machine, not the entire pipeline code. This also automatically picks packages to be installed on the remote machine. Make sense?
In regards to the YAML how would you pass data? Like the pipeline from tasks example?
BroadSeaturtle49 agent RC is out with a fix:pip3 install clearml-agent==1.5.0rc0
Let me know if it solved the issue
Hi PerfectChicken66
every X iterations and delete the older ones with
I have to ask, why not just overwrite the artifact? it is basically the same, no ?!
older ones with
delete_artifacts
from
Task
I think you are correct, when you delete the entire Task you can specify, delete artifacts, but it does not do that on delete_artifact 😞
You can manually do that with:
` task._delete_uri(task.artifacts["artifact"].url)
task.delete_artifact() ...
Hi RotundSquirrel78
Could those be the example experiments ?
Are you running your own server, is it the saas free tier server?
If you wan to change the Args, go to the Args section in the Configuration tab, when the Task is in draft mode you can edit them there
So essentially, the server helm chart creates randomly generated secret pair and deploys it as a shared k8 secret that pods can access.
This is the tricky part, for the helm chart to be able to create it, it means it can login to the server it means there is a secret embedded in the helm chart that lets you access the default server. you see my point ?
Bummer... that seems like a bit of an oversight tbh.
There is never a solution for those, unless the helm chart "knows" something about the server before spinning it the first time, which basically means a predefined access-key, I do not think we want that 😉
I have to admit, I'm not sure...
Let me talk to backend guys, in theory you are correct the "initial secret" can be injected via the helm env var, but I'm not sure how that would work in this specific case
I think you are correct and the first time you spin the server it is not possible (I mean you need it up to get the access/secerey and only then you can insert them into the helm values) ... 😞
Hi ZippyAlligator65
You mean like env vars?
Yes you have to spin the server in order to generate the access/secret key...
hmm DeliciousKoala34
what are you getting if you put this at the top of your code (the one you are running in the remote docker)import os print([(k, os.environ[k]) for k in os.environ if k.startswith("CLEARML_")])
BTW: latest PyCharm plugin with 2022 support was just released:
https://github.com/allegroai/clearml-pycharm-plugin/releases/tag/1.1.0
And this is with the latest pycharm plugin 1.1.0 ?
So now for it to take place you need to enqueue the Task and set an agent to pick it up and run it.
When the agent is running the Task the new parameter will be passed.
does that make sense ?
Hi @<1540142641931358208:profile|FancyBaldeagle86>
You mean in the UI? i.e. clone an experiment hover over the Configuration / Hyperparameter section and clicking edit ?
Hi @<1546665666675740672:profile|AttractiveFrog67>
- Make sure you stored the model's checkpoint (either pass
output_uri=True
inTask.init
or manually upload) - When you call
Task.init
pass "continue_last_task=True
" - Now you can do
last_checkpoint=task.models["output"][-1].get_local_copy()
and all you need is to loadlast_checkpoint
Then running by using the
, am I right?
yep
I have put the
--save-period
while running Yolov5 and ClearML does not save the weight per epoch that I have trained. Why is this happened?
But do you still see it in the clearml UI ? do you see the models logged in the clearml UI ?
Hi @<1734020162731905024:profile|RattyBluewhale45>
What's the clearml agent version? And could you verify with the latest RC?
Lastly how are you running the agent, docker mode? What's the bade container?
If this is the case then the easiest is:from clearml.backend_api.session.client import APIClient client = APIClient() res = client.events.get_task_plots(task="<task-id>")
We should defiantly have a nice interface 🙂
. Looking at this example here, it looks like it only works with tasks:
Aha! Pipeline is a Task 🙂 (a specific type of Task, nonetheless a Task)
Just use the pipeline ID, and make sure you push it into the services queue, voila
Hi @<1654294828365647872:profile|GorgeousShrimp11>
can you run a pipeline on a
schedule
or are schedules only for Tasks?
I think one tiny details got lost here, Pipelines (the logic driving them) are a type of Task, this means you can clone and enqueue them like other tasta
(Task.enqueue / Task.clone)
Other than that looks good to me, did I miss anything ?
'
' error [Errno 13] Permission denied:
Seems like a permission issue ?
Try to remove your entire clearml cache folder None
nice @<1724960458047229952:profile|EnergeticKoala33> !
The issue was that the agent was trying to start the docker but had no credentials to do that, your solution is exactly what was needed to be done
Notice: dataset_rgb.list_files()
will list the content of the dataset, Not the local files:
e.g.: /folder/myfile.ext
and not /hone/user/cache/folder/myfile.ext
So basically i think you are just not passing actual files, you should probably do:for local_file in Path(folder_rgb).rglob('*'): ...