Reputation
Badges 1
2 × Eureka!Hi Adib!
I saw this question about the datastores before and it was answered then with this:Redis is used for caching so it's fairly 'lightly' used, you don't need many resources for it. Mongo is for artifacts, system info and some metadata. Elastic is for events and logs, this one might require more resources depending on your usage.Hope it can already help a bit!
If I'm not mistaken:
Fileserver - Model files and artifacts
MongoDB - all experiment objects are saved there.
Elastic - Console logs, debug samples, scalars all is saved there.
Redis - caching regarding agents I think
Hello!
What is the usecase here, why would you want to do that? If they're the same dataset, you don't really need lineage, no?
Doing this might actually help with the previous issue as well, because when there are multiple docker containers running they might interfere with each other 🙂
Yes, with docker auto-starting containers is def a thing 🙂 We set the containers to restart automatically (a reboot will do that too) for when the container crashes it will immediately restarts, let's say in a production environment.
So the best thing to do there is to use docker ps to get all running containers and then kill them using docker kill <container_id> . Chatgpt tells me this command should kill all currently running containers:docker rm -f $(docker ps -aq)And I...
Wow! Awesome to hear :D
Do you have a screenshot of what happens? Have you checked the console when pressing f12?
For the record, this is a minimal reproducible example:
Local folder structure:
` ├── remove_folder
│ ├── batch_0
│ │ ├── file_0_0.txt
│ │ ├── file_0_1.txt
│ │ ├── file_0_2.txt
│ │ ├── file_0_3.txt
│ │ ├── file_0_4.txt
│ │ ├── file_0_5.txt
│ │ ├── file_0_6.txt
│ │ ├── file_0_7.txt
│ │ ├── file_0_8.txt
│ │ └── file_0_9.txt
│ └── batch_1
│ ├── file_1_0.txt
│ ├── file_1_1.txt
│ ├── file_1_2.txt
│ ├── file_1_3.txt
│ ├── fi...
Did you by any chance save the checkpoint without any file extention? Or with a weird name containing slashes or points? The error seems to suggest the content type was not properly parsed
Isitdown seems to be reporting it as up. Any issues with other websites?
@<1558986839216361472:profile|FuzzyCentipede59> Would you mind sharing how you're running the training? i.e. a minimal code example so we can reproduce the issue?
Hi William!
1 So if I understand correctly, you want to get an artifact from another task into your preprocessing.
You can do this using the Task.get_task() call. So imagine your anomaly detection task is called anomaly_detection it produces an artifact called my_anomaly_artifact and is located in the my_project project you can do:
` from clearml import Task
anomaly_task = Task.get_task(project_name='my_project', task_name='anomaly_detection')
treshold = anomaly_ta...
1 Can you give a little more explanation about your usecase? It seems I don't fully understand yet. So you have multiple endpoints, but always the same preprocessing script to go with it? And you need to gather a different threshold for each of the models?
2 Not completely sure of this, but I think an AMD APU simply won't work. ClearML serving is using triton as inference engine for GPU based models and that is written by nvidia, specifically for nvidia hardware. I don't think triton will ...
It's been accepted in master, but was not released yet indeed!
As for the other issue, it seems like we won't be adding support for non-string dict keys anytime soon. I'm thinking of adding a specific example/tutorial on how to work with Huggingface + ClearML so people can do it themselves.
For now (using the patch) the only thing you need to be careful about is to not connect a dict or object with ints as keys. If you do need to (e.g. ususally huggingface models need the id2label dict some...
I can see 2 kinds of errors:Error: Failed to initialize NVML and Unable to allocate pinned system memory, pinned memory pool will not be available: CUDA driver version is insufficient for CUDA runtime version
These 2 lines make me think something went wrong with the GPU itself. Chances are you won't be able to run nvidia-smi this looks like a non-clearml issue 🙂 It might be that triton hogs the GPU memory if not properly closed down (doubl ctrl-c). It says the driver ver...
Most likely you are running a self-hosted server. External embeds are not available for self-hosted servers due to difficult network routing and safety concerns (need access from the public internet). The free hosted server at app.clear.ml does have it.
Hi NuttyCamel41 !
Your suspicion is correct, there should be no need to specify the config.pbtxt manually, normally this file is made automatically using the information you provide using the command line.
It might be somehow silently failing to parse your CLI input to correctly build the config.pbtxt . One difference I see immediately is that you opted for "[1, 64]" notation instead of the 1 64 notation from the example. Might be worth a try to change the input for...
Hey CheekyFox58 like Martin said, it should at least work locally. If not, can you give us some more details on what exactly the werid behaviour is?
Hey @<1539780305588588544:profile|ConvolutedLeopard95> , unfortunately this is not built-in into the YOLOv8 tracker. Would you mind opening an issue on the YOLOv8 github page and atting me? (I'm thepycoder on github)
I can then follow up the progress on it, because it makes sense to expose this parameter through the yaml.
That said, to help you right now, please change [this line](https://github.com/ultralytics/ultralytics/blob/fe61018975182f4d7645681b4ecc09266939dbfb/ultralytics/yolo/uti...
Thank you so much, sorry for the inconvenience and thank you for your patience! I've pushed it internally and we're looking for a patch 🙂
Hi @<1523701949617147904:profile|PricklyRaven28> just letting you know I still have this on my TODO, I'll update you as soon as I have something!
Sorry, I jumped the gun before I fully understood your question 🙂 So with simple docker compose file, you mean you don't want to use docker-compose-triton.yaml file and so want to run the huggingface model on CPU instead of Triton?
Or do you want to know if the general docker compose version is able to handle a huggingface model?
@<1547028116780617728:profile|TimelyRabbit96>
Pipelines has little to do with serving, so let's not focus on that for now.
Instead, if you need a ensemble_scheduling block, you can use the CLI's --aux-config command to add any extra stuff that needs to be in the config.pbtxt
For example here, under the Setup section step 2, we use the --aux-config flag to add a dynamic batching block: None
Ah I see 😄 I have submitted a ClearML patch to Huggingface transformers: None
It is merged, but not in a release yet. Would you mind checking if it works if you install transformers from github? (aka the latest master version)
Can you please post the result of running df -h in this chat? Chances are quite high your actual machine does indeed have no more space left 🙂
Can you try setting the env variables to 1 instead of True ? In general, those should indeed be the correct variables to set. For me it works when I start the agent with the following command:
CLEARML_AGENT_SKIP_PYTHON_ENV_INSTALL=1 CLEARML_AGENT_SKIP_PIP_VENV_INSTALL=1 clearml-agent daemon --queue "demo-queue"
I'm still struggling to reproduce the issue. Trying on my own PC locally as well as on google colab yields nothing.
The fact that you do get tensorboard logs, but none of them are captured by ClearML means there might be something wrong with our tensorboard bindings, but it's hard to pinpoint exactly what if I can't get it to fail like yours 😅 Let me try and instal exactly your environment using your packages above. Which python version are you using?
Nice! Well found and thanks for posting the solution!
May I ask out of curiosity, why mount X11? Are you planning to use a GUI app on the k8s cluster?
It is not filled in by default?
projects/debian-cloud/global/images/debian-10-buster-v20210721