Reputation
Badges 1
371 × Eureka!I'm assuming the triton serving engine is running on the serving queue in my case. Is the serving example also running on the serving queue or is it running on the services queue? And lastly, I don't have a clearml agent listening to the services queue, does clearml do this on its own?
for now installing venv fixes the problem.
Doesn't matter how many times I run this code, it'll always give this same output. The tag gets appended to the list but isn't saved. Unless there's something else I'm supposed to do as well.
So I took dataset trigger from this and added it to my own test code, which needs to run a task every time this trigger is activated.
Thank you, I'll take a look
I've also mentioned it on the issue I created but I had the issue even when I set the type to bool in parser.add_argument(type=bool)
are there other packages other than venv required on the agent? Since I'm not sure exactly what packages do I need on the agent. Since the function normally wouldn't need venv. It just adds a number by 1
Would you know what the pros would be to learning online other than the fact that the incoming data is as close to the current distribution of data based on time as possible for us. Also would those benefits worth it to train online?
I just copied the commands in order from the page and pasted them. All of the linux ones specifically.
Also, the steps say that I should run the serving process on the default queue but I've run it on a queue I created called a serving queue and have an agent listening for it.
Okay so when I add trigger_on_tags, the repetition issue is resolved.
Is there a difference? I mean my use case is pretty simple. I have a training and it basically creates a lot of checkpoints. I just want to keep the n best checkpoints and whenever there are more than N checkpoints, I'll delete the worst performing one. Deleted both locally and from the the task artifacts.
I'll test it with the updated one.
Since that is an immediate concern for me as well.
Also my execution just completed and as of yet, I can only see the hyperparameters as a report. not in a configurable form. I've just started with ClearML and am having these issues.
I get what you're saying. I was considering training on just the new data to see how it works. To me it felt like that was the fastest way to deal with data drift. I understand that it may introduce instability however. I was curious how other developers who have successfully managed to set up continuous training deal with it. 100% new data, or a ratio between new and old data. And if it is the latter, what should be the case, which should be the majority, old data or new data?
My main query is do I wait for it to be a sufficient batch size or do I just send each image as soon as it comes to train
That but also in proper directory on the File System
So minimum would be 2 cores with 8 gigs for ram. I'm going to assume 4 cores and 16 gigs would be recommended.
I don't think I changed anything.
Basically if I pass an arg with a default value of False, which is a bool, it'll run fine originally, since it just accepted the default value.
That's weird, it doesn't work on my main ubuntu installation but does work on a vm i created of ubuntu on windows
Its a simple DAG pipeline.
I have a step, at which I want to run a task which finds the model I need.
That makes sense. But doesn't that also hold true for dataset.get_local_mutable_copy()?
So it won't work without clearml-agent? Sorry for the barrage of questions. I'm just very confused right now.
It's basically data for binary image classification, simple.
I recall being able to pass a script to the agent using the command line along with a requirements file.
I get the following error.