Reputation
Badges 1
371 × Eureka!Even though I ended my schedulers and triggers, the anonymous tasks keep increasing.
Can you please share the endpoint link?
As of yet, I can only select ones that are visible and to select more, i'll have to click on view more, which gets extremely slow.
Can you give me an example url for the api call to stop_many?
Yeah, I kept seeing the message but I was sure there were files in the location.
I just realized, I hadn't worked with the Datasets api for a while and I forgot that I'm supposed to call add_files(location) and then upload, not upload(location). My bad.
Thanks for the help.
Retrying (Retry(total=239, connect=239, read=240, redirect=240, status=240)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7fb2191dcaf0>: Failed to establish a new connection: [Errno 111] Connection refused')': /auth.login
Retrying (Retry(total=238, connect=238, read=240, redirect=240, status=240)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7fb2191e10a0>: Failed to establish a new connection: ...
I think I understand now that I first need to have clearml server up and running.
Another issue I'm having is I ran a task using clearml-task and did it using a repo. It runs fine, when I clone said task however and run it on the same queue again, it throws an error from the code. I can't seem to figure out why its happening.
I just made a custom repo from the ultralytics yolov5 repo, where I get data and model using data id and model id.
when i pass the repo in clearml-task with the parameters, it runs fine and finishes. Basically when I clone and attempt the task again, I get the above assert error I don't know why.
for which I basically forked it for myself. and made it accept clearml dataset and model ids to use.
The situation is such that I needed a continuous training pipeline to train a detector, the detector being Ultralytics Yolo V5.
To me, it made sense that I would have a training task. The whole training code seemed complex to me so I just modified it just a bit to fit my needs of it getting dataset and model from clearml. Nothing more.
I think created a task using clearml-task and pointed it towards the repo I had created. The task runs fine.
I am unsure at the details of the training code...
I've basically just added dataset id and model id parameters in the args.
I download the dataset and model, and load them. Before training them again.
However cloning it uses it from the clearml args, which somehow converts it to string?
Anyway, in the docs, there is a function called task.register_artifact()
Takes in a name and an artifact object.
up to date with https://fawad_nizamani@bitbucket.org/fawad_nizamani/custom_yolov5 ✅
Traceback (most recent call last):
File "train.py", line 682, in <module>
main(opt)
File "train.py", line 525, in main
assert os.path.isfile(ckpt), 'ERROR: --resume checkpoint does not exist'
AssertionError: ERROR: --resume checkpoint does not exist
This is the original repo which I've slightly modified.
Its a simple DAG pipeline.
I have a step, at which I want to run a task which finds the model I need.
I think I get what you're saying yeah. I don't know how I would give each server a different cookie name. I can see this problem being resolved by clearing cookies or manually entering /login at the end of the url
Is this how I'm supposed to send the request to stop all running tasks, if task_ids is the list of task ids which are still running?
Yes it works, thanks for the overall help.
Shouldn't I get redirected to the login page if i'm not logged in instead of the dashboard? 😞
I have a lot of anonymous tasks running which I would like to close immediately.