Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
Hello, I'M Getting This Weird Error From Time To Time When Running A Pipeline, It Add My Tasks As Drafts But Never Launch Them, When I Checked The Logs, I See The Following ;

Hello,
I'm getting this weird error from time to time when running a pipeline, it add my tasks as drafts but never launch them, when I checked the logs, I see the following ;
launch step one 2022-02-25 13:46:31,253 - clearml.Task - ERROR - Action failed <500/100: events.get_task_events/v1.0 (General data error (NotFoundError(404, 'index_not_found_exception', 'no such index [events-training_debug_image-d1bd92a3b039400cbafc60a7a5b1e52b]', events-training_debug_image-d1bd92a3b039400cbafc60a7a5b1e52b, index_or_alias)))> (task=945ff9ec87904964a0c7763467033e26, order=asc, batch_size=100, event_type=training_debug_image) 2022-02-25 13:46:31,253 - clearml.Task - ERROR - Task deletion failed: Action failed <500/100: events.get_task_events/v1.0 (General data error (NotFoundError(404, 'index_not_found_exception', 'no such index [events-training_debug_image-d1bd92a3b039400cbafc60a7a5b1e52b]', events-training_debug_image-d1bd92a3b039400cbafc60a7a5b1e52b, index_or_alias)))> (task=945ff9ec87904964a0c7763467033e26, order=asc, batch_size=100, event_type=training_debug_image) launch step two 2022-02-25 13:46:31,417 - clearml.Task - ERROR - Action failed <500/100: events.get_task_events/v1.0 (General data error (NotFoundError(404, 'index_not_found_exception', 'no such index [events-training_debug_image-d1bd92a3b039400cbafc60a7a5b1e52b]', events-training_debug_image-d1bd92a3b039400cbafc60a7a5b1e52b, index_or_alias)))> (task=88be3bfc9e784a5d8cfb7836e22ed3f3, order=asc, batch_size=100, event_type=training_debug_image) 2022-02-25 13:46:31,417 - clearml.Task - ERROR - Task deletion failed: Action failed <500/100: events.get_task_events/v1.0 (General data error (NotFoundError(404, 'index_not_found_exception', 'no such index [events-training_debug_image-d1bd92a3b039400cbafc60a7a5b1e52b]', events-training_debug_image-d1bd92a3b039400cbafc60a7a5b1e52b, index_or_alias)))> (task=88be3bfc9e784a5d8cfb7836e22ed3f3, order=asc, batch_size=100, event_type=training_debug_image) launch step three 2022-02-25 13:46:31,684 - clearml.Task - ERROR - Action failed <500/100: events.get_task_events/v1.0 (General data error (NotFoundError(404, 'index_not_found_exception', 'no such index [events-training_debug_image-d1bd92a3b039400cbafc60a7a5b1e52b]', events-training_debug_image-d1bd92a3b039400cbafc60a7a5b1e52b, index_or_alias)))> (task=aa026690cdbc46a9bef3c53764e2dda7, order=asc, batch_size=100, event_type=training_debug_image) 2022-02-25 13:46:31,684 - clearml.Task - ERROR - Task deletion failed: Action failed <500/100: events.get_task_events/v1.0 (General data error (NotFoundError(404, 'index_not_found_exception', 'no such index [events-training_debug_image-d1bd92a3b039400cbafc60a7a5b1e52b]', events-training_debug_image-d1bd92a3b039400cbafc60a7a5b1e52b, index_or_alias)))> (task=aa026690cdbc46a9bef3c53764e2dda7, order=asc, batch_size=100, event_type=training_debug_image) 2022-02-25 14:46:37 pipeline completed with model: <xgboost.core.Booster object at 0x7f9e85a45a90> 2022-02-25 13:46:32,061 - clearml.Task - INFO - Waiting to finish uploads 2022-02-25 14:46:42 2022-02-25 13:46:41,899 - clearml.Task - INFO - Finished uploading

  
  
Posted one year ago
Votes Newest

Answers 14


Can you give an example of a pipeline to play with?
Are you running self deployed?

  
  
Posted one year ago

I have clearml running on a k8s cluster

  
  
Posted one year ago

and I'm executing the pipeline script locally

  
  
Posted one year ago

Can you gain access to the apiserver logs?

  
  
Posted one year ago

all the messages are like that

  
  
Posted one year ago

I think the issue is coming from task caching, because once I deactivated it, it starts working again

  
  
Posted one year ago

BulkyTiger31 could it be there is some issue with the elastic container ?
Can you see any experiment's metrics ?

  
  
Posted one year ago

How did you do that?

  
  
Posted one year ago

CostlyOstrich36 on the pipeline decorator, there is a parameter cache , I disabled it.

  
  
Posted one year ago

AgitatedDove14 on the logs, I see nothing out of the ordinary, and I tried redeploying the container and removing the persistence volume attached to it, but I still got the same error

  
  
Posted one year ago

actually on the logs of apiserver I see this :

  
  
Posted one year ago

it's the same error I'm getting on clearml dashboard

  
  
Posted one year ago

Yep... some went wrong with the elastic container, I think it lost it's indexes (or they got screwed somehow)
Do you have a backup of the persistence volume attached to the container? Can you try restoring it?

I would restart the entire clearml-server (docker-compose), then can you post here the startup logs? It should provide some info on what's wrong

  
  
Posted one year ago
434 Views
14 Answers
one year ago
10 months ago
Tags
Similar posts