Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
Hello, I'M Getting This Weird Error From Time To Time When Running A Pipeline, It Add My Tasks As Drafts But Never Launch Them, When I Checked The Logs, I See The Following ;

Hello,
I'm getting this weird error from time to time when running a pipeline, it add my tasks as drafts but never launch them, when I checked the logs, I see the following ;
launch step one 2022-02-25 13:46:31,253 - clearml.Task - ERROR - Action failed <500/100: events.get_task_events/v1.0 (General data error (NotFoundError(404, 'index_not_found_exception', 'no such index [events-training_debug_image-d1bd92a3b039400cbafc60a7a5b1e52b]', events-training_debug_image-d1bd92a3b039400cbafc60a7a5b1e52b, index_or_alias)))> (task=945ff9ec87904964a0c7763467033e26, order=asc, batch_size=100, event_type=training_debug_image) 2022-02-25 13:46:31,253 - clearml.Task - ERROR - Task deletion failed: Action failed <500/100: events.get_task_events/v1.0 (General data error (NotFoundError(404, 'index_not_found_exception', 'no such index [events-training_debug_image-d1bd92a3b039400cbafc60a7a5b1e52b]', events-training_debug_image-d1bd92a3b039400cbafc60a7a5b1e52b, index_or_alias)))> (task=945ff9ec87904964a0c7763467033e26, order=asc, batch_size=100, event_type=training_debug_image) launch step two 2022-02-25 13:46:31,417 - clearml.Task - ERROR - Action failed <500/100: events.get_task_events/v1.0 (General data error (NotFoundError(404, 'index_not_found_exception', 'no such index [events-training_debug_image-d1bd92a3b039400cbafc60a7a5b1e52b]', events-training_debug_image-d1bd92a3b039400cbafc60a7a5b1e52b, index_or_alias)))> (task=88be3bfc9e784a5d8cfb7836e22ed3f3, order=asc, batch_size=100, event_type=training_debug_image) 2022-02-25 13:46:31,417 - clearml.Task - ERROR - Task deletion failed: Action failed <500/100: events.get_task_events/v1.0 (General data error (NotFoundError(404, 'index_not_found_exception', 'no such index [events-training_debug_image-d1bd92a3b039400cbafc60a7a5b1e52b]', events-training_debug_image-d1bd92a3b039400cbafc60a7a5b1e52b, index_or_alias)))> (task=88be3bfc9e784a5d8cfb7836e22ed3f3, order=asc, batch_size=100, event_type=training_debug_image) launch step three 2022-02-25 13:46:31,684 - clearml.Task - ERROR - Action failed <500/100: events.get_task_events/v1.0 (General data error (NotFoundError(404, 'index_not_found_exception', 'no such index [events-training_debug_image-d1bd92a3b039400cbafc60a7a5b1e52b]', events-training_debug_image-d1bd92a3b039400cbafc60a7a5b1e52b, index_or_alias)))> (task=aa026690cdbc46a9bef3c53764e2dda7, order=asc, batch_size=100, event_type=training_debug_image) 2022-02-25 13:46:31,684 - clearml.Task - ERROR - Task deletion failed: Action failed <500/100: events.get_task_events/v1.0 (General data error (NotFoundError(404, 'index_not_found_exception', 'no such index [events-training_debug_image-d1bd92a3b039400cbafc60a7a5b1e52b]', events-training_debug_image-d1bd92a3b039400cbafc60a7a5b1e52b, index_or_alias)))> (task=aa026690cdbc46a9bef3c53764e2dda7, order=asc, batch_size=100, event_type=training_debug_image) 2022-02-25 14:46:37 pipeline completed with model: <xgboost.core.Booster object at 0x7f9e85a45a90> 2022-02-25 13:46:32,061 - clearml.Task - INFO - Waiting to finish uploads 2022-02-25 14:46:42 2022-02-25 13:46:41,899 - clearml.Task - INFO - Finished uploading

  
  
Posted 2 years ago
Votes Newest

Answers 14


I think the issue is coming from task caching, because once I deactivated it, it starts working again

  
  
Posted 2 years ago

Can you give an example of a pipeline to play with?
Are you running self deployed?

  
  
Posted 2 years ago

and I'm executing the pipeline script locally

  
  
Posted 2 years ago

AgitatedDove14 on the logs, I see nothing out of the ordinary, and I tried redeploying the container and removing the persistence volume attached to it, but I still got the same error

  
  
Posted 2 years ago

BulkyTiger31 could it be there is some issue with the elastic container ?
Can you see any experiment's metrics ?

  
  
Posted 2 years ago

How did you do that?

  
  
Posted 2 years ago

it's the same error I'm getting on clearml dashboard

  
  
Posted 2 years ago

all the messages are like that

  
  
Posted 2 years ago

Yep... some went wrong with the elastic container, I think it lost it's indexes (or they got screwed somehow)
Do you have a backup of the persistence volume attached to the container? Can you try restoring it?

I would restart the entire clearml-server (docker-compose), then can you post here the startup logs? It should provide some info on what's wrong

  
  
Posted 2 years ago

I have clearml running on a k8s cluster

  
  
Posted 2 years ago

Can you gain access to the apiserver logs?

  
  
Posted 2 years ago

CostlyOstrich36 on the pipeline decorator, there is a parameter cache , I disabled it.

  
  
Posted 2 years ago

actually on the logs of apiserver I see this :

  
  
Posted 2 years ago
1K Views
14 Answers
2 years ago
one year ago
Tags
Similar posts