Reputation
Badges 1
371 × Eureka!Agreed. The issue does not occur when I set the trigger_on_publish to True, or when I use tag matching.
It works, however it shows the task is enqueued and pending. Note I am using .start() and not .start_remotely() for now
alright, so is there no way to kill it using worker id or worker name?
Also, since I plan to not train on the whole dataset and instead only on a subset of the data, I was thinking of making each batch of data a new dataset and then just merging the subset of data I want to train on.
I'm kind of new to developing end to end applications so I'm also learning how the predefined pipelines work as well. I'll take a look at the clear ml custom pipelines
The situation is such that I needed a continuous training pipeline to train a detector, the detector being Ultralytics Yolo V5.
To me, it made sense that I would have a training task. The whole training code seemed complex to me so I just modified it just a bit to fit my needs of it getting dataset and model from clearml. Nothing more.
I think created a task using clearml-task and pointed it towards the repo I had created. The task runs fine.
I am unsure at the details of the training code...
Have never done something like this before, and I'm unsure about the whole process from successfully serving the model to sending requests to it for inference. Is there any tutorial or example for it?
You can see there's no task bar on the left. basically I can't get any credentials to the server or check queues or anything.
Thank you, I'll start reading up on this once I've finished setting up the basic pipeline
I just made a custom repo from the ultralytics yolov5 repo, where I get data and model using data id and model id.
Ok. I kind of have a confusion now. Suppose I have an agent listening to some Queue X. If someone else on some other machine enqueues their task on Queue X, will my agent run it?
I did this but this gets me an InputModel. I went through the InputModel class but I'm still unsure how to get the actual tensorflow model.
This is the simplest I could get for the inference request. The model and input and output names are the ones that the server wanted.
Alright, but is it saved as a text file or pickle file?
Basically since I want to train AI Models right. I'm trying to set up the architecture where I can automate the process from data fetching to model training, and need GPU for training.
My current approach is, watch a folder, when there are sufficient data points, just move N of them into another folder and create a raw dataset and call the pipeline with this dataset.
It gets downloaded, preprocessed, and then uploaded again.
In the final step, the preprocessed dataset is downloaded and is used to train the model.
I however have another problem. I have a dataset trigger that has a schedule task.
Basically when I have to re run the experiment with different hyperparameters, I should clone the previous experiment and change the hyperparameters then before putting it in the queue?
AgitatedDove14 I'm also trying to understand why this is happening, is this normal and how it should be or am I doing something wrong
That is true. If I'm understanding correctly, by configuration parameters, you mean using arg parse right?
The server is on a different machine. I'm experimenting on the same machine though.
Previously I wasn't. I would just call model.save, but I was unsure how to do modifications in the output model, which is why I made the output model.
I hope you understood my problem statement. I want to solve the issue with or without output model. Any help would be appreciated.
Basically, at the least, would like to be able to add tags, set the name and choose to publish the model that I'm saving.
I'll create a github issue. Overall I hope you understand.
Basically trying to keep track of how much of the tracking and record keeping is done by ClearML for me? And what things do I need to keep a track of manually in a database.