FYI CostlyOstrich36
After CLearML restart, all experiments appear again 😃
CostlyOstrich36
NO errors in developer tools and result code is 200:{"meta":{"id":"f131cde7b77545a5b4802e73f1b5e78e","trx":"f131cde7b77545a5b4802e73f1b5e78e","endpoint":{"name":"tasks.get_all_ex","requested_version":"2.17","actual_version":"1.0"},"result_code":200,"result_subcode":0,"result_msg":"OK","error_stack":"","error_data":{}},"data":{"tasks":[{"company":{"id":"d1bd92a3b039400cbafc60a7a5b1e52b"},"id":"4b2fcf54203e4930b7a9a7b511e31ca3","last_change":"2022-11-21T20:40:53.107000+00:00","last_iteration":274900,"last_update":"2022-11-21T20:40:53.107000+00:00","name":"mae_arch_masked_75_random_fix-eval","project":{"id":"316e67462401437e9f17971564a3e5e2","name":"beit_v2/ablation"},"started":"2022-11-21T17:50:09.835000+00:00","status":"completed","system_tags":["development"],"tags":[],"type":"training","user":{"id":"0283ba889ae7105ba1db4e91bdada228","name":"Trains default user"}},{"company":{"id":"d1bd92a3b039400cbafc60a7a5b1e52b"},"id":"6badc9f25ae44e1ea50c3907c54304f9","last_change":"2022-11-21T17:57:09.176000+00:00","last_iteration":10,"last_update":"2022-11-21T17:57:09.176000+00:00","name":"bb_fitter_cpu_21","project":{"id":"ed78ce96b0654383be5de08b4a49a437","name":"box_fitter"},"started":"2022-11-21T17:51:00.789000+00:00","status":"stopped","system_tags":["development"],"tags":[],"type":"training","user":{"id":"0283ba889ae7105ba1db4e91bdada228","name":"Trains default user"}},{"company":{"id":"d1bd92a3b039400cbafc60a7a5b1e52b"},"id":"3ac0f27555fe4c0fb90d608d88dd781f","last_change":"2022-11-22T07:43:08.636000+00:00","last_iteration":27,"last_update":"2022-11-22T07:43:08.636000+00:00","name":"cp-141-filter-below-32-pts-evaluate","project":{"id":"aa617227670a4c65b314def279557ddd","name":"fusion-ptk/centerpoint"},"started":"2022-11-21T18:01:53.991000+00:00","status":"completed","system_tags":["development"],"tags":["adamkapl"],"type":"testing","user":{"id":"0283ba889ae7105ba1db4e91bdada228","name":"Trains default user"}},{"company":{"id":"d1bd92a3b039400cbafc60a7a5b1e52b"},"id":"c0b950b59d8e430d95a67c3e515bcbe9","last_change":"2022-11-21T20:08:04.351000+00:00","last_iteration":746,"last_update":"2022-11-21T20:08:04.351000+00:00","name":"LRG-hm-only-10epochs-7frames-evaluate","project":{"id":"60ff76d2a0b44527ab54778aae0125cc","name":"fusion-ptk/LidarRoadGeometryGen2"},"started":"2022-11-21T18:19:06.004000+00:00","status":"completed","system_tags":["development"],"tags":["aryehn"],"type":"testing","user":{"id":"0283ba889ae7105ba1db4e91bdada228","name":"Trains default user"}},{"company":{"id":"d1bd92a3b039400cbafc60a7a5b1e52b"},"id":"952d80049c5a49e6bd9abb5e422db034","last_change":"2022-11-21T20:29:23.826000+00:00","last_iteration":3,"last_update":"2022-11-21T20:29:23.826000+00:00","name":"bb_fitter_cpu_11","project":{"id":"ed78ce96b0654383be5de08b4a49a437","name":"box_fitter"},"started":"2022-11-21T18:21:46.538000+00:00","status":"stopped","system_tags":["development"],"tags":[],"type":"training","user":{"id":"0283ba889ae7105ba1db4e91bdada228","name":"Trains default user"}},{"company":{"id":"d1bd92a3b039400cbafc60a7a5b1e52b"},"id":"b274410fe2df4999acceb9a4552512af","last_change":"2022-11-22T01:32:54.475000+00:00","last_iteration":26480,"last_update":"2022-11-22T01:32:54.475000+00:00","name":"cp-142-fltr-post-agmnt-front","project":{"id":"aa617227670a4c65b314def279557ddd","name":"fusion-ptk/centerpoint"},"started":"2022-11-21T18:31:39.042000+00:00","status":"completed","system_tags":["development"],"tags":["adamkapl"],"type":"training","user":{"id":"0283ba889ae7105ba1db4e91bdada228","name":"Trains default user"}},{"company":{"id":"d1bd92a3b039400cbafc60a7a5b1e52b"},"id":"a41e7e90f4f947febae4e465b7c573b4","last_change":"2022-11-22T08:56:25.553000+00:00","last_iteration":145,"last_update":"2022-11-22T08:56:25.553000+00:00","name":"bb_fitter_cpu1_2","project":{"id":"ed78ce96b0654383be5de08b4a49a437","name":"box_fitter"},"started":"2022-11-21T18:32:50.456000+00:00","status":"stopped","system_tags":["development"],"tags":[],"type":"training","user":{"id":"0283ba889ae7105ba1db4e91bdada228","name":"Trains default user"}},{"company":{"id":"d1bd92a3b039400cbafc60a7a5b1e52b"},"id":"45040a9e5ad34dc7a19efe5261af3555","last_change":"2022-11-23T13:05:53.042000+00:00","last_iteration":240000,"last_update":"2022-11-23T13:05:53.042000+00:00","name":"cloud-head-no-dups-mask","project":{"id":"6249944a4b6a401b8c5b429ce6e49232","name":"mae/tsr-downstream-classification"},"started":"2022-11-21T18:33:00.953000+00:00","status":"stopped","system_tags":["development"],"tags":[],"type":"training","user":{"id":"0283ba889ae7105ba1db4e91bdada228","name":"Trains default user"}},{"company":{"id":"d1bd92a3b039400cbafc60a7a5b1e52b"},"id":"60961a23b45c4dbe81ff693d53cc7873","last_change":"2022-11-22T01:09:12.390000+00:00","last_iteration":26000,"last_update":"2022-11-22T01:09:12.390000+00:00","name":"cp-143-fltr-post-agmnt-right","project":{"id":"aa617227670a4c65b314def279557ddd","name":"fusion-ptk/centerpoint"},"started":"2022-11-21T18:34:04.889000+00:00","status":"completed","system_tags":["development"],"tags":["adamkapl"],"type":"training","user":{"id":"0283ba889ae7105ba1db4e91bdada228","name":"Trains default user"}},{"company":{"id":"d1bd92a3b039400cbafc60a7a5b1e52b"},"id":"344cee521d854601a2d23a07d7c5af3c","last_change":"2022-11-23T11:29:34.993000+00:00","last_iteration":250000,"last_update":"2022-11-23T11:29:34.993000+00:00","name":"cloud-head-no-dups-w-mean-class","project":{"id":"6249944a4b6a401b8c5b429ce6e49232","name":"mae/tsr-downstream-classification"},"started":"2022-11-21T18:36:50.327000+00:00","status":"stopped","system_tags":["development"],"tags":[],"type":"training","user":{"id":"0283ba889ae7105ba1db4e91bdada228","name":"Trains default user"}},{"company":{"id":"d1bd92a3b039400cbafc60a7a5b1e52b"},"id":"f926d65de1384aef89c3f25f8baa6fd0","last_change":"2022-11-23T14:29:36.020000+00:00","last_iteration":250000,"last_update":"2022-11-23T14:29:36.020000+00:00","name":"cloud-head-no-dups-w-mean-feature","project":{"id":"6249944a4b6a401b8c5b429ce6e49232","name":"mae/tsr-downstream-classification"},"started":"2022-11-21T18:42:55.661000+00:00","status":"stopped","system_tags":["development"],"tags":[],"type":"training","user":{"id":"0283ba889ae7105ba1db4e91bdada228","name":"Trains default user"}},{"company":{"id":"d1bd92a3b039400cbafc60a7a5b1e52b"},"id":"00bf5ba428ea4e9db2d773fb26ecab13","last_change":"2022-11-22T01:45:16.395000+00:00","last_iteration":39,"last_update":"2022-11-22T01:45:16.395000+00:00","name":"cp-142-fltr-post-agmnt-front-evaluate","project":{"id":"aa617227670a4c65b314def279557ddd","name":"fusion-ptk/centerpoint"},"started":"2022-11-21T18:49:16.200000+00:00","status":"completed","system_tags":["development"],"tags":["adamkapl"],"type":"testing","user":{"id":"0283ba889ae7105ba1db4e91bdada228","name":"Trains default user"}},{"company":{"id":"d1bd92a3b039400cbafc60a7a5b1e52b"},"id":"10eb8c3cbe7f4d188c5c30073da077d8","last_change":"2022-11-22T01:19:58.806000+00:00","last_iteration":39,"last_update":"2022-11-22T01:19:58.806000+00:00","name":"cp-143-fltr-post-agmnt-right-evaluate","project":{"id":"aa617227670a4c65b314def279557ddd","name":"fusion-ptk/centerpoint"},"started":"2022-11-21T18:51:21.202000+00:00","status":"completed","system_tags":["development"],"tags":["adamkapl"],"type":"testing","user":{"id":"0283ba889ae7105ba1db4e91bdada228","name":"Trains default user"}},{"company":{"id":"d1bd92a3b039400cbafc60a7a5b1e52b"},"id":"367b275d60b646aa8781e1f421c4eec8","last_change":"2022-11-22T13:28:26.071000+00:00","last_iteration":107300,"last_update":"2022-11-22T13:28:26.071000+00:00","name":"bf_main_aug_wd001_5a","project":{"id":"ed78ce96b0654383be5de08b4a49a437","name":"box_fitter"},"started":"2022-11-21T19:43:48.115000+00:00","status":"completed","system_tags":["development"],"tags":[],"type":"training","user":{"id":"0283ba889ae7105ba1db4e91bdada228","name":"Trains default user"}},{"company":{"id":"d1bd92a3b039400cbafc60a7a5b1e52b"},"id":"54fde807ef204a1a87b1825fc6b7f91e","last_change":"2022-11-23T16:14:36.763000+00:00","last_iteration":2,"last_update":"2022-11-23T16:14:36.763000+00:00","name":"cp-128-fix-negloss","project":{"id":"aa617227670a4c65b314def279557ddd","name":"fusion-ptk/centerpoint"},"started":"2022-11-21T20:37:05.014000+00:00","status":"stopped","system_tags":["development"],"tags":["nivk","shazut"],"type":"testing","user":{"id":"0283ba889ae7105ba1db4e91bdada228","name":"Trains default user"}}],"scroll_id":"ac0595892c3f4ca59c02864df56f3adf"}}
When going to the UI, open developer tools (F12) and see what returns when you go to 'all experiments' to see what is called and what is returned for tasks.get_all_ex
We tried with copying on test machine and it worked(delete and then restore tasks in DB - it appears again in UI).
When did same on prod - nothing happened.
Also we see all data in the fileserver is not removed and mongo shows 17000 tasks so it looks like tasks were removed from UI but still appear in Mongo DB and locally at file system.
CostlyOstrich36
` > db.task.count()
17262
db.task__trash.count()
383
db.task__trash__trash.count()
15
db.task__trash.aggregate([ {$merge: "task"}])
db.task__trash__trash.count()
15
db.task__trash.count()
385
db.task.count()
17647 `
Thanks CostlyOstrich36
Actually I was able to find IP of the machine where API call was triggered in web logs and found the user who run the delete action.
User tried to remove only archived experiments in his project( tried several times and got some errors ) and that is what we see in API call - somehow Clearml removed all server experiments 🤔
Any idea why this might happened if user only run "delete archived experiments of his project" in WEB UI ?xx.xxx.xxx.xx - - [21/Nov/2022:17:39:14 +0000] "POST /api/v2.17/tasks.delete_many HTTP/1.1" 499 0 "
" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/107.0.0.0 Safari/537.36" "-"
You can restore these tasks by copying or moving them from task__trash into task collection. But the events for these tasks cannot be restored. About the user who deleted them unfortunately ClearML does not record this info in Mongo and without logging to ES there is no place to retrieve it (I can suggest using Kibana to monitor ES). You can try to inspect the mongo collection url_to_delete. It contains all the links from the deleted tasks that should be removed from the fileserver. If you see there any documents that correspond to files from the deleted tasks then the user recorded in this docs is the one who performed the delete.
Hi CostlyOstrich36
We indeed see tasks in task__trash
collection in mongodb backend database.
Is there any way to restore it?
Also can we see in logs who triggered the command?
Please check the task__trash collection in mongodb backend database. If you find all your tasks there then someone indeed deleted them
Elastic only holds part of the tasks data. Mongo is actually what stores the task objects. Can you look inside to see whats there?
Hi LackadaisicalHedgehong78 . It seems that someone/something sent a command to delete a bunch of tasks. Do you have backups?