Reputation
Badges 1
371 × Eureka!I know how to enqueue in using the UI. I'm trying to do it programatically.
Let me give it a try.
Wait, so the pipeline step only runs if the pre execute callback returns True? It'll stop if it doesn't run?
when i pass the repo in clearml-task with the parameters, it runs fine and finishes. Basically when I clone and attempt the task again, I get the above assert error I don't know why.
Big thank you though.
Thanks, I went through it and this seems easy
Finalizes locks the model and publish I assume publishes it to the server
However when i'll reset or clone the task, now it won't just accept the default value but clearml will pass the arg directly
Basically, as soon as I get the trigger that a new dataset has been published, I want to pass the dataset id to the script as an cli argument and pass the code to the agent
If it helps, I can try and record my steps in a video.
Yeah exact same usage.
Since I want to save the model to the clearml server? What should the port be alongside the url?
And in that case, if I do, model.save('test'), it will also save the model to the clearml server?
Then I accessed it using the ip directly instead of local host.
up to date with https://fawad_nizamani@bitbucket.org/fawad_nizamani/custom_yolov5 ✅
Traceback (most recent call last):
File "train.py", line 682, in <module>
main(opt)
File "train.py", line 525, in main
assert os.path.isfile(ckpt), 'ERROR: --resume checkpoint does not exist'
AssertionError: ERROR: --resume checkpoint does not exist
for which I basically forked it for myself. and made it accept clearml dataset and model ids to use.
I ran a training code from a github repo. It saves checkpoints every 2000 iterations. Only problem is I'm training it for 3200 epochs and there's more than 37000 iterations in each epoch. So the checkpoints just added up. I've stopped the training for now. I need to delete all of those checkpoints before I start training again.
AgitatedDove14 Alright I think I understand, changes made in storage will be visible in the front end directly.
Will using Model.remove, completely delete from storage as well?
After the previous code, I got the model uploaded by the previous code using its ID. Now when I add tags here, they were visible in the UI
Sorry for the late reply. The situation is that when I ran the task initially, it took arguments in using ArgParse. It took in a lot of arguments. Now my understanding is that add_step() clones that task. I want that to happen but I would like to be able to modify some of the values of the args, e.g epochs or some other argument.
Shouldn't I get redirected to the login page if i'm not logged in instead of the dashboard? 😞
In another answer, I was shown that I can access it like this. How can I go about accessing the value of merged_dataset_id which was returned by merge_n_datasets and stored as an artifact.
It does to me. However I'm proposing a situation where a user gets N number of Datasets using Dataset.get, but uses m number of datasets for training where m < n. Would it make sense to only log the m datasets that were used for training? How would that be done?
Also, the task just prints a small string on the console.
I basically had to set the tag manually in the UI
This works, thanks. Do you have any link to where I can also see the parameters of the Dataset class or was it just on git?
Thank you for the help.
Another issue I'm having is I ran a task using clearml-task and did it using a repo. It runs fine, when I clone said task however and run it on the same queue again, it throws an error from the code. I can't seem to figure out why its happening.
Is there a code example for this.
I was going through the python api and the closest thing that resembled to my use case was task.close() however it didn't do anything.