Reputation
Badges 1
369 × Eureka!I'm getting this error.
clearml_agent: ERROR: Failed cloning repository.
- Make sure you pushed the requested commit:
- Check if remote worker has valid credentials
Also the repository is on bitbucket which is why I set git_host to that.
Alright. Anyway I'm practicing with the pipeline. I have an agent listening to the queue. Only problem is, it fails because of requirement issues but I don't know how to pass requirements in this case.
AgitatedDove14 Can you help me with this? Maybe something like storing the returned values or something in a variable outside the pipeline?
when you connect to the server properly, you're able to see the dashboard like this with menu options on the side.
There's a whole task bar on the left in the server. I only get this page when i use the ip 0.0.0.0
Ok this worked. Thank you.
Should I just train for 1 epoch? Or multiple epochs? Given I'm only training on the new batch of data and not the whole dataset?
from sklearn.datasets import load_iris
import tensorflow as tf
import numpy as np
from clearml import Task, Logger
import argparse
def main():
parser = argparse.ArgumentParser()
parser.add_argument('--epochs', metavar='N', default=64, type=int)
args = parser.parse_args()
parsed_args = vars(args)
task = Task.init(project_name="My Workshop Examples", task_name="scikit-learn joblib example")
iris = load_iris()
data = iris.data
target = i...
AgitatedDove14 Sorry for pinging you on this old thread. I had an additional query. If you've worked on a process similar to the one mentioned above, how do you set the learning rate? And what was the learning strategy? ADAM? RMSProp?
It'll be labeled in the folder I'm watching it.
Quick follow up question. Once I parse args, should they be directly available for i even enque the project for the first time or will i be able to access hyperparameters after running it once?
Lastly, I have asked this question multiple times, but since the MLOps process is so new, I want to learn from others experience regarding evaluation strategies. What would be a good evaluation strategy? Splitting the batch into train test? that would mean less data for training but we can test it asap. Another idea I had was training on the current batch, then evaluating it on incoming batches. Any other ideas?
I'm using clear-ml agent right now. I just upload the task inside a project. I've used arg parse as well however as of yet, I have not been able find writable hyperparameters in the UI. Is there any tutorial video you can recommend that deals with this or something? I was following https://www.youtube.com/watch?v=Y5tPfUm9Ghg&t=1100s this one on youtube but I can't seem to recreate his steps as he sifts through his code.
Basically when I have to re run the experiment with different hyperparameters, I should clone the previous experiment and change the hyperparameters then before putting it in the queue?
I've finally gotten the triton engine to run. I'll be going through nvidia triton docs to find how to make an inference request. If you have an example inference request, I'll appreciate if you can share it with me.
I get what you're saying. I was considering training on just the new data to see how it works. To me it felt like that was the fastest way to deal with data drift. I understand that it may introduce instability however. I was curious how other developers who have successfully managed to set up continuous training deal with it. 100% new data, or a ratio between new and old data. And if it is the latter, what should be the case, which should be the majority, old data or new data?
Understandable. I mainly have regular image data, not video sequences so I can do the train test splits like you mentioned normally. What about the epochs though? Is there a recommended number of epochs when you train on that new batch?
I was getting a different error when I posted this question. Now i'm just getting this connection error
My use case is basically if I want to now access this dataset from somewhere else, shouldn't I be able to do so using its id?
It's basically data for binary image classification, simple.
For anyone who's struggling with this. This is how I solved it. I'd personally not worked with GRPC so I instead looked at the HTTP docs and that one was much simpler to use.
Agreed. The issue does not occur when I set the trigger_on_publish to True, or when I use tag matching.
Another question, in the parents sequence in pipe.add_step, we have to pass in the name of the step right?
Thank you. I didn't realize that the output could be accessed like this.