Reputation
Badges 1
2 × Eureka!RoundMosquito25 it is true that the TaskScheduler
requires a task_id
, but that does not mean you have to run the pipeline every time 🙂
When setting up, you indeed need to run the pipeline once, to get it into the system. But from that point on, you should be able to just use the task_scheduler on the pipeline ID. The scheduler should automatically clone the pipeline and enqueue it. It will basically use the 1 existing pipeline as a "template" for subsequent runs.
Don't paste your API keys! 🙈
Could you use tags for that? In that case you can easily filter on which group you're interested in, or do you have a more impactful UI change in mind to implement groups? 🙂
Can you elaborate a bit more, I don't quite understand yet. So it works when you update an existing task by adding a tag to it, but it doesn't work when adding a tag for the first time?
It's been accepted in master, but was not released yet indeed!
As for the other issue, it seems like we won't be adding support for non-string dict keys anytime soon. I'm thinking of adding a specific example/tutorial on how to work with Huggingface + ClearML so people can do it themselves.
For now (using the patch) the only thing you need to be careful about is to not connect a dict or object with ints as keys. If you do need to (e.g. ususally huggingface models need the id2label dict some...
Damn it, you're right 😅
# Allow ClearML access to the training args and allow it to override the arguments for remote execution
args_class = type(training_args)
args, changed_keys = cast_keys_to_string(training_args.to_dict())
Task.current_task().connect(args)
training_args = args_class(**cast_keys_back(args, changed_keys)[0])
Just for reference, the main issue is that ClearML does not allow non-string types as dict keys for its configuration. Usually the labeling mapping does have ints as keys. Which is why we need to cast them to strings first, then pass them to ClearML then cast them back.
HomelyShells16 Thanks for the detailed write-up and minimal example. I'm running it now too
@<1523701949617147904:profile|PricklyRaven28> Please use this patch instead of the one previously shared. It excludes the dict hack :)
After re-reading your question, it might be difficult to have cross-process communication though. So if you want the preprocessing to happen at the same time as the training and the training to pull data from the preprocessing on the fly, that might be more difficult. Is this your usecase?
As long as your clearml-agents have access to the redis instance it should work! Cool usecase though, interested to see how well it would work 🙂
Sorry, I jumped the gun before I fully understood your question 🙂 So with simple docker compose file, you mean you don't want to use docker-compose-triton.yaml
file and so want to run the huggingface model on CPU instead of Triton?
Or do you want to know if the general docker compose version is able to handle a huggingface model?
As I understand it, vertical scaling means giving each container more resources to work with. This should always be possible in a k8s context, because you decide which types of machines go in your pool and your define the requirements for each container yourself 🙂 So if you want to set the container to use 10.000 CPUs feel free! Unless you mean something else with this, in which case please counter!
Hi EmbarrassedSpider34 , would you mind showing us a screenshot of your machine configuration? Can you check for any output logs that ClearML might have given you? Depending on the region, maybe there were no GPUs available, so could you maybe also check if you can manually spin up a GPU vm?
Hi @<1534344450795376640:profile|VividSwallow28> ! I've seen your github issue and will answer you there 🙂 I'll leave a link here for others facing the same issue.
Thanks! I've asked this to the autoscaler devs and it might be a possible bug, you are the second one. He's checking and we'll come back to you!
Hi @<1523701949617147904:profile|PricklyRaven28> sorry that this is happening. I tried to run your minimal example, but get a IndexError: Invalid key: 5872 is out of bounds for size 0
error. That said, I get the same error without the code running in a pipeline. There seems to be no difference between simply running the code and the pipeline (for me). Do you have an updated example, maybe also including getting a local copy of an artifact, so I can check?
No worries! And thanks for putting in the time.
Hi @<1541592213111181312:profile|PleasantCoral12> thanks for sending me the details. Out of curiosity, could it be that your codebase / environment (apart from the clearml code, e.g. the whole git repo) is quit large? ClearML does a scan of your repo and packages every time a task is initialized, maybe that could be it. In the meantime I'm asking our devs if they can see any weird lag with your account on our end 🙂
I'll update you once I have more!
VivaciousBadger56 hope you had a great time while away :)
That looks correct indeed. Do you mind checking for me if the dataset actually contains the correct metadata?
Go to the datasets section, select the one you need and on the right click on more information. It should send you to the experiment manager view. Then, under artifacts, do you see a key in the list named metadata? Can you post a screenshot?
You should be able to! But we just saw that it isn't supported as part of the dataset interface, only on the task interface. So you can get the datasets's underlying task for now and add a comment this way:
your_dataset._task.comment = "your text here"
We have flagged this as a bug and we'll add this soon!
Thank you so much, sorry for the inconvenience and thank you for your patience! I've pushed it internally and we're looking for a patch 🙂
Hi there!
Technically there should be nothing stopping you from deploying a python backend model. I just checked the source code and ClearML basically just downloads the model artifact and renames it based on the inferred type of model.
As far as I'm aware (could def be wrong here!), the Triton Python backend essentially requires a folder...
VivaciousBadger56 Thanks for your patience, I was away for a week 🙂 Can you check that you properly changed the project name in the line above the one you posted?
In the example, by default, the project name is "ClearML Examples/Urbansounds". But it should give you an error when first running the get_data.py
script that you can't actually modify that project (by design). You need to change it to one of you own choice. You might have done that in get_data.py
but forgot to do s...
Hi Oriel!
If you want to only serve an if-else model, why do you want to use clearml-serving for that? What do you mean by "online featurer"?
Yes, with docker auto-starting containers is def a thing 🙂 We set the containers to restart automatically (a reboot will do that too) for when the container crashes it will immediately restarts, let's say in a production environment.
So the best thing to do there is to use docker ps
to get all running containers and then kill them using docker kill <container_id>
. Chatgpt tells me this command should kill all currently running containers:docker rm -f $(docker ps -aq)
And I...
Wow! Awesome to hear :D