Distributed key-value stores employ replication for high availability. Yet, they do not always efficiently take advantage of the availability of multiple replicas for each value, and read operations often exhibit high tail latencies. Various replica selection strategies have been proposed to address this problem, together with local request scheduling policies. It is difficult, however, to determine what is the absolute performance gain each of these strategies can achieve. We present a formal framework allowing the systematic study of request scheduling strategies in key-value stores. We contribute a definition of the optimization problem related to reducing tail-latency in a replicated key-value store as a minimization problem with respect to the maximum weighted flow criterion. By using scheduling theory, we show the difficulty of this problem, and therefore the need to develop performance guarantees. We also study the behavior of heuristic methods using simulations, which highlight which properties are useful for limiting tail-latency: for example, the EFT strategy—which uses the earliest available time of servers—exhibits a tail-latency more than twice lower compared to state-of-the-art strategies and often matches lower bound. Our study also illustrates the importance to consider other metrics, such as the stretch, to properly evaluate a replica selection strategy and a local execution policy.
Taming Tail Latency in Key-Value Stores: a Scheduling Perspective