I however have another problem. I have a dataset trigger that has a schedule task.
Basically when I have to re run the experiment with different hyperparameters, I should clone the previous experiment and change the hyperparameters then before putting it in the queue?
AgitatedDove14 I'm also trying to understand why this is happening, is this normal and how it should be or am I doing something wrong
That is true. If I'm understanding correctly, by configuration parameters, you mean using arg parse right?
Previously I wasn't. I would just call model.save, but I was unsure how to do modifications in the output model, which is why I made the output model.
I hope you understood my problem statement. I want to solve the issue with or without output model. Any help would be appreciated.
Basically, at the least, would like to be able to add tags, set the name and choose to publish the model that I'm saving.
Basically trying to keep track of how much of the tracking and record keeping is done by ClearML for me? And what things do I need to keep a track of manually in a database.
I'll look into those 3. Do those files use step 1, step 2 and step 3 files though?
Also I made another thread regarding clear ml agent. can you respond to that? I'm gonna be trying to set up a clear ml server properly on a server machine. Want to test how to train models and enqueue tasks and automate this whole process with GPU training included.
I hope what I said was clear. Basically in reality they both seem mutable, with just the directory downloaded to being optional in one and in the other it's downloaded to the cache folder always.
How about instead of uploading the entire dataset to the clearml server, upload a text file with the location of the dataset on the machine. I would think that should do the trick.
is this the correct way to upload an artifact?
checkpoint.split('.')[0] is the name that I want it assigned and the second argument is the path to the file.
I want to serve using Nvidia Triton for now.
My use case is that the code using pytorch saves additional info like the state dict when saving the model. I'd like to save that information as an artifact as well so that I can load it later.
I'm not sure about auto logging, since you might be using different datasets or you might get a dataset but might not use it based on specific conditions. However as a developer choosing to use such as ClearML who considers it more of an ecosystem instead of just a continuous training pipeline, I would want as many aspects of the MLOPS process and the information around the experiment to be able to be logged within the bounds of ClearML without having to use external databases or libraries.
Basically when I'm loading the model in InputModel, it loads it fine but I can't seem to get a local copy.
This here shows my situation. You can see the code on the left and the tasks called 'Cassava Training' on the right. They keep getting enqueued even though I only sent a trigger once. By that I mean I only published a dataset once.
Basically want the model to be uploaded to the server alongside the experiment results.
I already have the dataset id as a hyperparameter. I get said dataset. I'm only handling one dataset right now but merging multiple ones is a simple task as well.
Also I'm not very experienced and am unsure what proposed querying is and how and if it works in ClearML here.
Let me share the code with you, and how I think they interact with eachother.
I'm kind of at a point where I don't know a lot of what to even search for.
Well yeah, you can say that. In add function step, I pass in a function which returns something. And since I've written the name of the returned parameter in add_function_step, I can use it, but I can't seem to figure out a way to do something similar using a task in add_step
Wait is it possible to do what i'm doing but with just one big Dataset object or something?
I checked and it seems when i an example from git, it works as it should. but when I try to run my own script, the draft is in read only mode.
Creating a new dataset object for each batch allows me to just publish said batches introducing immutability.