![Profile picture](https://clearml-web-assets.s3.amazonaws.com/scoold/avatars/ExasperatedCrab78.png)
Reputation
Badges 1
2 × Eureka!Indeed that should be the case. By default debian is used, but it's good that you ran with a custom image, so now we know it's not clear that more permissions are needed
Great! Please let me know if it works when adding this permission, we'll update the docs in a jiffy!
Are you running a self-hosted/enterprise server or on app.clear.ml? Can you confirm that the field in the screenshot is empty for you?
Or are you using the SDK to create an autoscaler script?
Yeah, I do the same thing all the time. You can limit the amount of tasks that are kept in HPO with the save_top_k_tasks_only
parameter and you can create subprojects by simply using a slash in the name π https://clear.ml/docs/latest/docs/fundamentals/projects#creating-subprojects
Just for reference, the main issue is that ClearML does not allow non-string types as dict keys for its configuration. Usually the labeling mapping does have ints as keys. Which is why we need to cast them to strings first, then pass them to ClearML then cast them back.
It looks like you need to add the compute.imageUser
role to your credentials: None
Did you by any chance set up the autoscaler to use a custom image? It's trying to use βprojects/image-processing/global/images/image-for-clearmlβ which is a path I don't recognise. Is this your own, custom image? If so, we can add this role to the documentation as required when using a custom image π
You will have to provide more information. What other docker containers are running and how did you start the server?
Also, the answer to blocking on the pipeline might be in the .wait()
function: https://clear.ml/docs/latest/docs/references/sdk/automation_controller_pipelinecontroller#wait-1
TimelyPenguin76 I can't seem to make it work though, on which object should I run the .wait()
method?
Great to hear! Then it comes down to waiting for the next hugging release!
An update: using your code (the snippet above) I was getting no scalars when simply installing ultralytics and clearml packages using pip. Because indeed tensorboard is not installed. When I do install tensorboard, I get metrics in like normal, so I can't seem to reproduce the issue when tensorboard is correctly installed. That said, maybe we should look at not having this dependency π€
Would you mind posting a pip freeze of your environment that you're using to run yolo?
Can you elaborate a bit more, I don't quite understand yet. So it works when you update an existing task by adding a tag to it, but it doesn't work when adding a tag for the first time?
ExuberantBat52 The dataset alias thing giving you multiple prompts is still an issue I think, but it's on the backlog of our devs π
How large are the datasets? To learn more you can always try to run something like line_profiler/kerprof, to get exactly how long a specific python line takes. How fast/stable is your internet?
I tried answering them as well, let us know what you end up choosing, we're always looking to make clearml better for everyone!
Hi @<1541592213111181312:profile|PleasantCoral12> thanks for sending me the details. Out of curiosity, could it be that your codebase / environment (apart from the clearml code, e.g. the whole git repo) is quit large? ClearML does a scan of your repo and packages every time a task is initialized, maybe that could be it. In the meantime I'm asking our devs if they can see any weird lag with your account on our end π
When creating it, I found that this hack should be on our side, not on Huggingface's. So I'm only going to fix issue 1 with the PR, issue 2 is ours π
Wow awesome! Really nice find! Would you mind compiling your findings to a github issue, then we can help you search better :) this info is enough to get us going at least!
That's a good idea! I think the YOLO models would be a great fit for a tutorial/example like this. We can add it to our internal list of TODOs, or if you want, you could take a stab at it and we'll try to support you through it π It might take some engineering though! Serving is never drag and drop π
That said, I think it should be quite easy to do since YOLOv8 supports exporting to tensorrt format, which is native to Triton serving which underlies ClearML serving. So the process shoul...
Yes, with docker auto-starting containers is def a thing π We set the containers to restart automatically (a reboot will do that too) for when the container crashes it will immediately restarts, let's say in a production environment.
So the best thing to do there is to use docker ps
to get all running containers and then kill them using docker kill <container_id>
. Chatgpt tells me this command should kill all currently running containers:docker rm -f $(docker ps -aq)
And I...
Wow! Awesome to hear :D
AstonishingRabbit13 If I'm not mistaken, you can add images to the preview tab by reporting them as debug samples.
So you'd run: dataset.get_logger().report_image()
or report_media()
This is not scalable though, so don't expect the server to handle millions of images well, for that you'd need Hyperdatasets π
But it works well (as the name suggests) for some previews of the images!
Relevant docs:
https://clear.ml/docs/latest/docs/references/sdk/dataset/#get_logger
https://...
Hi NuttyCamel41 !
Your suspicion is correct, there should be no need to specify the config.pbtxt
manually, normally this file is made automatically using the information you provide using the command line.
It might be somehow silently failing to parse your CLI input to correctly build the config.pbtxt
. One difference I see immediately is that you opted for "[1, 64]"
notation instead of the 1 64
notation from the example. Might be worth a try to change the input for...
Hey @<1523701949617147904:profile|PricklyRaven28> , So as discussed above there were 2 issues. The first one is still waiting on the second, it's on the backlog of our devs and should be done soon(tm).
That said, in the meantime I also wanted to do fun stuff with transformers, so I've written a quick hack that deals with the bug. It's bascially 2 functions that keep track of which types of keys are in the dict.
def cast_keys_to_string(d, changed_keys=dict()):
nd = dict()
for k...
Nope! The helm chart sets up all the infrastructure to run everything. What exactly to run is decided using the clearml-serving CLI. Using it, you can swap out models, setup A/B testing of different versions, do canary rollouts etc. But the HELM stack is there only to run what you defined using the CLI
Hi @<1523701949617147904:profile|PricklyRaven28> sorry that this is happening. I tried to run your minimal example, but get a IndexError: Invalid key: 5872 is out of bounds for size 0
error. That said, I get the same error without the code running in a pipeline. There seems to be no difference between simply running the code and the pipeline (for me). Do you have an updated example, maybe also including getting a local copy of an artifact, so I can check?
Ah I see. So then I would guess it is due to the remote machine (the clearml agent) not being able to properly access your clearml server
I'm not quite sure what you mean here? From the docs it seems like you should be able to simply send an HTTP request to the localhost url to get the metrics. Is this not working for you? Otherwise, all the metrics end up in Prometheus, so you can also query that instead or use something like Grafana to visualize it
It should, but please check first. This is some code I quickly made for myself. It did make tests for it, but it would be nice to hear from someone else that it worked (as evidenced by the error above π )