Reputation
Badges 1
2 × Eureka!effectively making us lose 24 hours of GPU compute
Oof, sorry about that, man 😞
With the screenshots above, the locally run experiment (left), does it have an http url for the model url field? The one you whited out?
Usually those models are Pytorch right? So, yeah, you should be able to, feel free to follow the Pytorch example if you want to know how 🙂
As I understand it, vertical scaling means giving each container more resources to work with. This should always be possible in a k8s context, because you decide which types of machines go in your pool and your define the requirements for each container yourself 🙂 So if you want to set the container to use 10.000 CPUs feel free! Unless you mean something else with this, in which case please counter!
Don't paste your API keys! 🙈
Hi there!
Technically there should be nothing stopping you from deploying a python backend model. I just checked the source code and ClearML basically just downloads the model artifact and renames it based on the inferred type of model.
As far as I'm aware (could def be wrong here!), the Triton Python backend essentially requires a folder...
In order to prevent these kinds of collisions it's always necessary to provide a parent dataset ID at creation time, so it's very clear which dataset and updated one is based on. If multiple of them happen at the same time, they won't know of each other and both use the same dataset as the parent. This will lead to 2 new versions based on the same parent dataset, but not sharing data with each other. If that happens, you could create a 3rd dataset (potentially automatically) that can have bot...
Wow! Awesome to hear :D
Pipelines! 😄
ClearML allows you to create pipelines, with each step either being created from code or from pre-existing tasks. Each task btw. can have a custom docker container assigned that it should be run inside of, so it should fit nicely with your workflow!
Youtube videos:
https://www.youtube.com/watch?v=prZ_eiv_y3c
https://www.youtube.com/watch?v=UVBk337xzZo
Relevant Documentation:
https://clear.ml/docs/latest/docs/pipelines/
Custom docker container per task:
https://...
AdventurousButterfly15 The fact that it tries to ping localhost means you are running the ClearML server locally right? In that case, it is a docker thing: it cannot access localhost
because localhost inside a docker image is not the same one as your machine itself. They're isolated.
That said, adding --network=host
to the docker command usually fixes this by connecting the container to the local network instead of the internal docker one.
You can add a custom argument either i...
Hi ComfortableShark77 !
Which commands did you use exactly to deploy the model?
That would explain why it reports the task id to be 'a' in the error. It tried to index the first element in a list, but took the first character of a string instead.
Thank you so much! In the meantime, I check once more and the closest I could get was using report_single_value()
. It forces you to report each an every row though, but the comparison looks a little better this way. No color coding yet, but maybe it can already help you a little 🙂
Hi! Have you tried adding custom metrics to the experiment table itself? You can add any scalar as a column in the experiment list, it does not have color formatting, but it might be more like what you want in contrast to the compare functionality 🙂
Hi Jax! We have a blogpost explaining how to use it almost ready to go. I'll ping you here when its out.
In the meantime you can check out the https://catalog.ngc.nvidia.com/orgs/nvidia/teams/tao/resources/tao-getting-started of TAO. Download the zipfile with examples and under notebooks>tao_launcher_starter_kit>detectnet_v2
you'll find a notebook with an example on how to use the integration.
Also a big thank you for so thoroughly testing the system and providing this amount of feedback, it really does help us make the tool better for everyone! 😄
Just for reference, the main issue is that ClearML does not allow non-string types as dict keys for its configuration. Usually the labeling mapping does have ints as keys. Which is why we need to cast them to strings first, then pass them to ClearML then cast them back.
Hey! Thanks for all the work you're putting in and the awesome feedback 😄
So, it's weird you get the shm error, this is most likely our fault for not configuring the containers correctly 😞 The containers are brought up using the docker-compose file, so you'll have to add it in there. The service you want is called clearml-serving-triton
, you can find it [here](https://github.com/allegroai/clearml-serving/blob/2d3ac1fe63637db1978df2b3f5ea4903ef59788a/docker/docker-...
That wasn't my intention! Not a dumb question, just a logical one 😄
As long as your clearml-agents have access to the redis instance it should work! Cool usecase though, interested to see how well it would work 🙂
That's what happens in the background when you click "new run". A pipeline is simply a task in the background. You can find the task using querying and you can clone it too! It is places in a "hidden" folder called .pipelines
as a subfolder on your main project. Check out the settings, you can enable "show hidden folders"
Also, please note that since the video has been uploaded, the dataset UI has changed. So now a dataset will be found under the dataset tab on the left instead of in the experiment manager 🙂
Thank you so much ExasperatedCrocodile76 , I'll check it tomorrow 🙂
I tried answering them as well, let us know what you end up choosing, we're always looking to make clearml better for everyone!
Hey @<1526371965655322624:profile|NuttyCamel41> Thanks for coming back on this and sorry for the late reply. This looks like a bug indeed, especially because it seems to be working when coming from the clearml servers.
Would you mind just copy pasting this info into a github issue on clearml-serving repo? Then we can track the progress we make at fixing it 🙂
I see. Are you able to manually boot a VM on GCP and then manually SSHing into it and running the docker login command from there? Just to be able to cross out networking or permissions as possible issues.
Maybe you can add https://clear.ml/docs/latest/docs/references/sdk/automation_controller_pipelinecontroller/#set_default_execution_queue to your pipelinecontroller, only have the actual value be linked to a pipeline parameter? So when you create a new run, you can manually enter a queue name and the parameter will be used by the pipeline controller script to set the default execution queue.
The point of the alias is for better visibility in the Experiment Manager. Check the screenshots above for what it looks like in the UI. Essentially, setting an Alias makes sure the task that is getting the dataset automatically logs the ID that it gets using Dataset.get()
. The reason being that if you later on look back to your experiment, you can also see what dataset was .get()
't back then.
ExuberantBat52 When you still get the log messages, where did you specify the alias?...
Not exactly sure what is going wrong without an exact error or reproducible example.
However, passing around the dataset object is not ideal, because passing info from one step to another in a pipeline requires ClearML to pickle said object and I'm not exactly sure a Dataset obj is picklable.
Next to that, running get_local_copy() in the first step does not guarantee that you can access that data from the other step. Both might be executed in different docker containers or even on different...