Well yeah, you can say that. In add function step, I pass in a function which returns something. And since I've written the name of the returned parameter in add_function_step, I can use it, but I can't seem to figure out a way to do something similar using a task in add_step
I got to that conclusion I think yeah. Basically, can access them as artifacts.
Thus I wanted to pass the model id from the prior step to the next one.
I understand that storing data outside ClearML won't ensure its immutability. I guess this can be built in as a feature into ClearML at some future point.
AgitatedDove14 Your second option is somewhat like how shortcuts work right? Storing pointers to the actual data?
On both the main ubuntu and the vm, I simply installed it in a conda environment using pip
the storage is basically the machine the clearml server is on, not using s3 or anything
So I got my answer, for the first one. I found where the data is stored in the server
Still unsure between finalize and publish? Since upload should upload the data to the server
I'm not in the best position to answer these questions right now.
In the case of api call,
given that i have id of the task I want to stop, I would make a post request to [CLEARML_SERVER_URL]:8080/tasks.stop
with the request body set up like the one mentioned in the api?
Would you know what the pros would be to learning online other than the fact that the incoming data is as close to the current distribution of data based on time as possible for us. Also would those benefits worth it to train online?
Lastly, I have asked this question multiple times, but since the MLOps process is so new, I want to learn from others experience regarding evaluation strategies. What would be a good evaluation strategy? Splitting the batch into train test? that would mean less data for training but we can test it asap. Another idea I had was training on the current batch, then evaluating it on incoming batches. Any other ideas?
Should I just train for 1 epoch? Or multiple epochs? Given I'm only training on the new batch of data and not the whole dataset?
It's basically data for binary image classification, simple.
It'll be labeled in the folder I'm watching it.
With online learning, my two main concerns are that the training would be completely stochastic in nature, I would not be able to split the data into train test splits, and that it would be very expensive and inefficient to train online.
AgitatedDove14 Sorry for pinging you on this old thread. I had an additional query. If you've worked on a process similar to the one mentioned above, how do you set the learning rate? And what was the learning strategy? ADAM? RMSProp?
Basically trying to keep track of how much of the tracking and record keeping is done by ClearML for me? And what things do I need to keep a track of manually in a database.
It does to me. However I'm proposing a situation where a user gets N number of Datasets using Dataset.get, but uses m number of datasets for training where m < n. Would it make sense to only log the m datasets that were used for training? How would that be done?
Ok. I kind of have a confusion now. Suppose I have an agent listening to some Queue X. If someone else on some other machine enqueues their task on Queue X, will my agent run it?
Basically since I want to train AI Models right. I'm trying to set up the architecture where I can automate the process from data fetching to model training, and need GPU for training.
And multiple agents can listen to the same queue right?
Also the repository is on bitbucket which is why I set git_host to that.
You mean I should set it to this?