
Reputation
Badges 1
371 × Eureka!So I had an issue that it didn't add the tags for some reason. There was no error, just that there were no tags on the model.
Agreed. The issue does not occur when I set the trigger_on_publish to True, or when I use tag matching.
I want to maybe have a variable in the simple-pipeline.py, which has the value returned by split_dataset
Is there a difference? I mean my use case is pretty simple. I have a training and it basically creates a lot of checkpoints. I just want to keep the n best checkpoints and whenever there are more than N checkpoints, I'll delete the worst performing one. Deleted both locally and from the the task artifacts.
How do I go about uploading those registered artifacts, would I just pass artifacts[i] and the name for the artifact?
Here's the thread
https://clearml.slack.com/archives/CTK20V944/p1636613509403900
The question has been answered though you can take a look if I understood correctly there.
I got to that conclusion I think yeah. Basically, can access them as artifacts.
I ran a training code from a github repo. It saves checkpoints every 2000 iterations. Only problem is I'm training it for 3200 epochs and there's more than 37000 iterations in each epoch. So the checkpoints just added up. I've stopped the training for now. I need to delete all of those checkpoints before I start training again.
What about amount of storage required?
Yes it works, thanks for the overall help.
Basically want to be able to serve a model, and also send requests to it for inference.
As of yet, I can only select ones that are visible and to select more, i'll have to click on view more, which gets extremely slow.
Is there a code example for this.
I was going through the python api and the closest thing that resembled to my use case was task.close() however it didn't do anything.
Can you please share the endpoint link?
In the case of api call,
given that i have id of the task I want to stop, I would make a post request to [CLEARML_SERVER_URL]:8080/tasks.stop
with the request body set up like the one mentioned in the api?
Let me give it a try.
Thank you for the help.
When I try to get local copy, I get this error.
File "load_model.py", line 8, in <module>
location = input_model.get_local_copy()
File "/home/fawad-nizamani/anaconda3/envs/ocr-sip-clearml/lib/python3.8/site-packages/clearml/model.py", line 424, in get_local_copy
return self.get_weights_package(return_path=True, raise_on_error=raise_on_error)
File "/home/fawad-nizamani/anaconda3/envs/ocr-sip-clearml/lib/python3.8/site-packages/clearml/model.py", line 318, in get_weights_package
...
when you connect to the server properly, you're able to see the dashboard like this with menu options on the side.
Even though I ended my schedulers and triggers, the anonymous tasks keep increasing.
My main query is do I wait for it to be a sufficient batch size or do I just send each image as soon as it comes to train
def watch_folder(folder, batch_size):
count = 0
classes = os.listdir(folder)
class_count = len(classes)
files = []
dirs = []
for cls in classes:
class_dir = os.path.join(folder, cls)
fls = os.listdir(class_dir)
count += len(fls)
files.append(fls)
dirs.append(class_dir)
if count >= batch_size:
dataset = Dataset.create(project='data-repo')
dataset.add_files(folder)
dataset.upload()
dataset.final...
AgitatedDove14 Alright I think I understand, changes made in storage will be visible in the front end directly.
Will using Model.remove, completely delete from storage as well?
My use case is that the code using pytorch saves additional info like the state dict when saving the model. I'd like to save that information as an artifact as well so that I can load it later.
Also what's the difference between Finalize vs Publish?