Reputation
Badges 1
371 × Eureka!What about amount of storage required?
My use case is basically if I want to now access this dataset from somewhere else, shouldn't I be able to do so using its id?
This works, thanks. Do you have any link to where I can also see the parameters of the Dataset class or was it just on git?
Here's the thread
https://clearml.slack.com/archives/CTK20V944/p1636613509403900
The question has been answered though you can take a look if I understood correctly there.
You could be right, I just had a couple of packages with this issue so I just removed the version requirement for now. Another issue that might be the case, might be that I'm on ubuntu some of the packages might've been for windows thus the different versions not existing
Also I made another thread regarding clear ml agent. can you respond to that? I'm gonna be trying to set up a clear ml server properly on a server machine. Want to test how to train models and enqueue tasks and automate this whole process with GPU training included.
My use case is that the code using pytorch saves additional info like the state dict when saving the model. I'd like to save that information as an artifact as well so that I can load it later.
I actually just asked about this in another thread. Here's the link. Asking about the usage of the upload_artifact
I just made a custom repo from the ultralytics yolov5 repo, where I get data and model using data id and model id.
That is true. If I'm understanding correctly, by configuration parameters, you mean using arg parse right?
It does to me. However I'm proposing a situation where a user gets N number of Datasets using Dataset.get, but uses m number of datasets for training where m < n. Would it make sense to only log the m datasets that were used for training? How would that be done?
I'm kind of new to developing end to end applications so I'm also learning how the predefined pipelines work as well. I'll take a look at the clear ml custom pipelines
Yeah, I kept seeing the message but I was sure there were files in the location.
I just realized, I hadn't worked with the Datasets api for a while and I forgot that I'm supposed to call add_files(location) and then upload, not upload(location). My bad.
I understand that storing data outside ClearML won't ensure its immutability. I guess this can be built in as a feature into ClearML at some future point.
Understandable. I mainly have regular image data, not video sequences so I can do the train test splits like you mentioned normally. What about the epochs though? Is there a recommended number of epochs when you train on that new batch?
This is the original repo which I've slightly modified.
After the step which gets the merged dataset, I should use pipe.stop if it returned -1?
I've also mentioned it on the issue I created but I had the issue even when I set the type to bool in parser.add_argument(type=bool)
Alright. Anyway I'm practicing with the pipeline. I have an agent listening to the queue. Only problem is, it fails because of requirement issues but I don't know how to pass requirements in this case.
Any way to make it automatically install any packages it finds that it requires? Or do I have to explicitly pass them in packages?
Considering I don't think the function itself requires Venv to run normally but in this case it says it can't find venv
Can you spot something here? Because to me it still looks like it should only create a new Dataset object if batch size requirement is fulfilled, after which it creates and publishes the dataset and empties the directory.
Once the data is published, a dataset trigger is activated in the checkbox_.... file. which creates a clearml-task for training the model.
Let me share the code with you, and how I think they interact with eachother.
I'll test it with the updated one.
I'll read the 3 examples now. Am I right to assume that I should drop Pipeline_Controller.py
I think I understand. Still I've possibly pinned down the issue to something else. I'm not sure if i'll be able to fix it on my own though.