Basically since I want to train AI Models right. I'm trying to set up the architecture where I can automate the process from data fetching to model training, and need GPU for training.
It works, however it shows the task is enqueued and pending. Note I am using .start() and not .start_remotely() for now
So it won't work without clearml-agent? Sorry for the barrage of questions. I'm just very confused right now.
set the host variable to the ip assigned to my laptop by the network.
there are other parameters for add_task as well, I'm just curious as to how do I pass the folder and batch size in the schedule_fn=watch_folder part
Okay so when I add trigger_on_tags, the repetition issue is resolved.
How would the two be different? Other than I can pass the directory to local mutable copy
Thank you, I'll take a look
Let me share the code with you, and how I think they interact with eachother.
and then also write down my git username and password.
Lastly, I have asked this question multiple times, but since the MLOps process is so new, I want to learn from others experience regarding evaluation strategies. What would be a good evaluation strategy? Splitting the batch into train test? that would mean less data for training but we can test it asap. Another idea I had was training on the current batch, then evaluating it on incoming batches. Any other ideas?
AgitatedDove14 I'm also trying to understand why this is happening, is this normal and how it should be or am I doing something wrong
Let me tell you what I think is happening and you can correct me where I'm going wrong.
Under certain conditions at certain times, a Dataset is published, that activates a Dataset trigger. So if every day I publish one dataset, I activate a Dataset Trigger that day once it's published.
N publishes = N Triggers = N Anonymous Tasks, right?
I'll read the 3 examples now. Am I right to assume that I should drop Pipeline_Controller.py
Wait, so the pipeline step only runs if the pre execute callback returns True? It'll stop if it doesn't run?
Also, the task just prints a small string on the console.
The situation is such that I needed a continuous training pipeline to train a detector, the detector being Ultralytics Yolo V5.
To me, it made sense that I would have a training task. The whole training code seemed complex to me so I just modified it just a bit to fit my needs of it getting dataset and model from clearml. Nothing more.
I think created a task using clearml-task and pointed it towards the repo I had created. The task runs fine.
I am unsure at the details of the training code...
basically don't want the storage to be filled up on the ClearML Server machine.
So I took dataset trigger from this and added it to my own test code, which needs to run a task every time this trigger is activated.
Thanks, I went through it and this seems easy
Basically when I'm loading the model in InputModel, it loads it fine but I can't seem to get a local copy.
Let me try to be a bit more clear.
If I have a training task in which I'm getting multiple ClearML Datasets from multiple ClearML IDs. I get local copies, train the model, save the model, and delete the local copy in that script.
Does ClearML keep track of which data versions were gotten and used from ClearML Data?
Can you guys let me know what finalize and publish methods do?
I don't think I changed anything.