Reputation
Badges 1
2 × Eureka!Can you try setting the env variables to 1
instead of True
? In general, those should indeed be the correct variables to set. For me it works when I start the agent with the following command:
CLEARML_AGENT_SKIP_PYTHON_ENV_INSTALL=1 CLEARML_AGENT_SKIP_PIP_VENV_INSTALL=1 clearml-agent daemon --queue "demo-queue"
I'm sorry, but I will need more context. Where exactly is this log from? Can you confirm you're working with a self-hosted open source server? Which container/microservices is giving you this last error message?
Usually those models are Pytorch right? So, yeah, you should be able to, feel free to follow the Pytorch example if you want to know how 🙂
With the screenshots above, the locally run experiment (left), does it have an http url for the model url field? The one you whited out?
VivaciousBadger56 Thanks for your patience, I was away for a week 🙂 Can you check that you properly changed the project name in the line above the one you posted?
In the example, by default, the project name is "ClearML Examples/Urbansounds". But it should give you an error when first running the get_data.py
script that you can't actually modify that project (by design). You need to change it to one of you own choice. You might have done that in get_data.py
but forgot to do s...
The scheduler just downloads a dataset using the ID right? So if you don't upload a new dataset, the scheduler is just downloading the dataset from the last known ID then. I don't really see how that could lead to a new dataset with it's own ID as the parent. Would you mind explaining your setup in a little more detail? 🙂
Unfortunately, ClearML HPO does not "know" what is inside the task it is optimizing. It is like that by design, so that you can run HPO with no code changes inside the experiment. That said, this also limits us in not being able to "smartly" optimize.
However, is there a way you could use caching within your code itself? Such as using functools' LRU cache? This is built-in in python and will cache function return values if it's ever called again with the same input arguments.
There also see...
I'm able to reproduce, but your workaround seems to be the best one for now. I tried launching with clearml-task
command as well, but we have the same issue there: only argparse arguments are allowed.
AgitatedDove14 any better workaround for this, other than waiting for the jsonargparse issue to be fixed?
Maybe you can add https://clear.ml/docs/latest/docs/references/sdk/automation_controller_pipelinecontroller/#set_default_execution_queue to your pipelinecontroller, only have the actual value be linked to a pipeline parameter? So when you create a new run, you can manually enter a queue name and the parameter will be used by the pipeline controller script to set the default execution queue.
VivaciousBadger56 hope you had a great time while away :)
That looks correct indeed. Do you mind checking for me if the dataset actually contains the correct metadata?
Go to the datasets section, select the one you need and on the right click on more information. It should send you to the experiment manager view. Then, under artifacts, do you see a key in the list named metadata? Can you post a screenshot?
When I run the example this way, everything seems to be working.
Hey @<1523701949617147904:profile|PricklyRaven28> I'm checking! Have you updated anything else and on which exact commit of transformers are you now?
Hey ExasperatedCrocodile76 ! Thanks for checking back in and letting me know 😄 Glad I could help!
The above works for me, so if you try and the command line version does not work, there might be a bug. Please post the exact commands you use when you try it 🙂
That's a good idea! I think the YOLO models would be a great fit for a tutorial/example like this. We can add it to our internal list of TODOs, or if you want, you could take a stab at it and we'll try to support you through it 🙂 It might take some engineering though! Serving is never drag and drop 🙂
That said, I think it should be quite easy to do since YOLOv8 supports exporting to tensorrt format, which is native to Triton serving which underlies ClearML serving. So the process shoul...
Well, that's what open source is for 😉 code borrowing is like 90% of the job of software engineers 😄
Hey! Thanks for all the work you're putting in and the awesome feedback 😄
So, it's weird you get the shm error, this is most likely our fault for not configuring the containers correctly 😞 The containers are brought up using the docker-compose file, so you'll have to add it in there. The service you want is called clearml-serving-triton
, you can find it [here](https://github.com/allegroai/clearml-serving/blob/2d3ac1fe63637db1978df2b3f5ea4903ef59788a/docker/docker-...
Hi CourageousKoala93 ! Have you tried https://clear.ml/docs/latest/docs/references/sdk/task#set_comment by any chance? There's a description field under the info tab 🙂
What might also help is to look inside the triton docker container while it's running. You can check the example, there should be a pbtxt file in there. Just to doublecheck that it is also in your own folder
Do you have a screenshot of what happens? Have you checked the console when pressing f12?
From any computer that has ClearML serving installed. It is simply used to submit configurations, not actually run anything. Please refer to the step by step setup here for more info 🙂
One more thing: are you running the snippet inside a jupyter notebook (Wondering this because you have Jupyter in your environment)
Hi there! There are several services who need persistent storage, check here for an overview diagram.
If I'm not mistaken, there's the fileserver, elastic, mongo and redis. All info is scattered over these (e.g. model files on fileserver, logs on elastic) so there is no one server holding everything.
I'm not a k8s expert, but I think that even a dynamic PVC should not delete itself. Just to be sure though, you can indee...
Indeed that should be the case. By default debian is used, but it's good that you ran with a custom image, so now we know it's not clear that more permissions are needed
I'm still struggling to reproduce the issue. Trying on my own PC locally as well as on google colab yields nothing.
The fact that you do get tensorboard logs, but none of them are captured by ClearML means there might be something wrong with our tensorboard bindings, but it's hard to pinpoint exactly what if I can't get it to fail like yours 😅 Let me try and instal exactly your environment using your packages above. Which python version are you using?
Thanks again for the extra info Jax, we'll take it back to our side and see what we can do 🙂
Well I'll be had, you're 100% right, I can recreate the issue. I'm logging it as a bug now and we'll fix it asap! Thanks for sharing!!
What do you mean exactly? Is it that you want more visibility into what kind of preprocessing code is running for each endpoint?
VivaciousBadger56 Thank you for the screenshots! I appreciate the effort. You indeed clicked on the right link, I was on mobile so had to instruct from memory 🙂
First of all: every 'object' in the ClearML ecosystem is a task. Experiments are tasks, so are dataset versions and even pipelines! Each task can be viewed using the experiment manager UI, that's just how the backend is structured. Of course we keep experiments and data separate by giving them a separate tab and different UI, but...