
Reputation
Badges 1
2 Γ Eureka!Hi @<1547028116780617728:profile|TimelyRabbit96> Awesome that you managed to get it working!
What might also help is to look inside the triton docker container while it's running. You can check the example, there should be a pbtxt file in there. Just to doublecheck that it is also in your own folder
Yes, with docker auto-starting containers is def a thing π We set the containers to restart automatically (a reboot will do that too) for when the container crashes it will immediately restarts, let's say in a production environment.
So the best thing to do there is to use docker ps
to get all running containers and then kill them using docker kill <container_id>
. Chatgpt tells me this command should kill all currently running containers:docker rm -f $(docker ps -aq)
And I...
Most likely you are running a self-hosted server. External embeds are not available for self-hosted servers due to difficult network routing and safety concerns (need access from the public internet). The free hosted server at app.clear.ml does have it.
Hey @<1539780272512307200:profile|GaudyPig83> !
Are you running a self-hosted server? Is this the only type of HTTP call that fails or does e.g. logging experiments also not work? A connection error usually means your docker containers can't reach each other.
Thank you so much! In the meantime, I check once more and the closest I could get was using report_single_value()
. It forces you to report each an every row though, but the comparison looks a little better this way. No color coding yet, but maybe it can already help you a little π
Yes, you will indeed need to add all ensemble endpoints separately π
I can see 2 kinds of errors:Error: Failed to initialize NVML
and Unable to allocate pinned system memory, pinned memory pool will not be available: CUDA driver version is insufficient for CUDA runtime version
These 2 lines make me think something went wrong with the GPU itself. Chances are you won't be able to run nvidia-smi
this looks like a non-clearml issue π It might be that triton hogs the GPU memory if not properly closed down (doubl ctrl-c). It says the driver ver...
Wow! Awesome to hear :D
Wow awesome! Really nice find! Would you mind compiling your findings to a github issue, then we can help you search better :) this info is enough to get us going at least!
Hey ExasperatedCrocodile76 and ExuberantBat52
Thanks for your contributions, I've updated the example here: https://github.com/allegroai/clearml-blogs/tree/master/urbansounds8k
Hi GrittyHawk31 ! ClearML is integrated with a bunch of frameworks from which it tries to automatically gather information. You can find a list here: https://clear.ml/docs/latest/docs/integrations/libraries
For example, if you're already reporting scalars to tensorboard, you won't have to add any clearml code, it will automatically be captured. The same will happen with e.g. LightGBM. Take a look at the example codes in the link to find what is automatically supported for your framework.
...
Yeah, I do the same thing all the time. You can limit the amount of tasks that are kept in HPO with the save_top_k_tasks_only
parameter and you can create subprojects by simply using a slash in the name π https://clear.ml/docs/latest/docs/fundamentals/projects#creating-subprojects
Hi ComfortableShark77 !
Which commands did you use exactly to deploy the model?
You're not the first one with this problem, so I think I'll ask the devs to maybe add it as a parameter for clearml-agent
in that way it will show up in the docs and you might have found it sooner. Do you think that would help?
@<1523701949617147904:profile|PricklyRaven28> Please use this patch instead of the one previously shared. It excludes the dict hack :)
Also, this might be a little stupid sorry, but your torch save command saves the model in the current folder, whereas you give clearml the 'model_folder/model' path instead. Could it be that the path is just incorrect?
With the screenshots above, the locally run experiment (left), does it have an http url for the model url field? The one you whited out?
Ah I see. So then I would guess it is due to the remote machine (the clearml agent) not being able to properly access your clearml server
Usually those models are Pytorch right? So, yeah, you should be able to, feel free to follow the Pytorch example if you want to know how π
Hi Fawad, maybe this can help you get started! They're both c++ and python examples of triton inference. Be careful though, the pre and postprocessing used is specific to the model (in this case yolov4) and you'll have to change it to your own model's needs
ExuberantBat52 The dataset alias thing giving you multiple prompts is still an issue I think, but it's on the backlog of our devs π
Yes you can! The filter syntax can be quite confusing, but for me it helps to print task.__
dict__
on an existing task object to see what options are available. You can get values in a nested dict by appending them into a string with a .
Example code:
` from clearml import Task
task = Task.get_task(task_id="17cbcce8976c467d995ab65a6f852c7e")
print(task.dict)
list_of_tasks = Task.query_tasks(task_filter={
"all": dict(fields=['hyperparams.General.epochs.value'], p...
You can apply git diffs by copying the diff to a file and then running git apply <file_containing_diff>
But check this thread to make sure to dry-run first, to check what it will do, before you overwrite anything
https://stackoverflow.com/questions/2249852/how-to-apply-a-patch-generated-with-git-format-patch
Are you running a self-hosted/enterprise server or on app.clear.ml? Can you confirm that the field in the screenshot is empty for you?
Or are you using the SDK to create an autoscaler script?
HomelyShells16 Thanks for the detailed write-up and minimal example. I'm running it now too
Hi OddShrimp85
Do you have some more information than that? It could be a whole list of things π
Also a big thank you for so thoroughly testing the system and providing this amount of feedback, it really does help us make the tool better for everyone! π
AgitatedDove14 I was able to recreate the error. Simply by running Lavi's example on clearml==1.6.3rc1
in a fresh env. I don't know what is unique to the flow itself, but it does seem reproducible