java - How to Diagnose ElasticSearch Search Queue Growth -
i'm trying diagnose issue our elasticsearch search queue seemingly randomly fills up.
the behavior observe in our monitor on 1 node of our cluster search queue growth (just one) , after search thread pool used start getting timeouts of course. there seems 1 query blocking while thing. way resolve problem @ moment restart node.
you can see below relevant behavior in charts: first queue size, pending cluster tasks (to show no other operations blocking or queing up, e.g. index operations or so) , active threads search thread pool. spike @ 11 o'clock restart of node.
the log files on nodes show no entries during hour before or after issue until restarted node. garbage collection events of around 200 -600ms , 1 on relevant node around 20 minutes before event.
my questions: - how can debug there no information logged anywhere on failing or timing out query? - possible reasons this? don't have dynamic queries or similar - can set query timeout or clear / reset active searches when happens prevent node restart?
some more details don't apply, based on questions far:
- exactly same hardware (16 cores, 60gb mem)
- same config, no special nodes
- no swap enabled
- nothing noticeable on other metrics io or cpu
- not master node
- no special shards, 3 shards per node each node, pertty standard queries, queries getting send es 10 minutes before queries typically finish within 5-10ms, ones timeout on same, no increase in query rate or else
- we have 5 nodes deployment, accessed round robin
- we have slow log of 2 seconds on info level, no entries
the hot threads after 1 minute of queue build @ https://gist.github.com/elm-/5ed398054ea6b46522c0, several snapshots of dumps on few moments.
andrei stefan's answer isn't wrong i'd start looking @ hot_threads clogged node rather trying figure out might special node.
i don't know of way inside queue. slowlogs, andrei says, great idea though.

Comments
Post a Comment