The newly-added tracepoint shows the following results on
the tscdeadline_latency test:
        qemu-kvm-8387  [002]  6425.558974: kvm_vcpu_wakeup:      poll time 10407 ns
        qemu-kvm-8387  [002]  6425.558984: kvm_vcpu_wakeup:      poll time 0 ns
        qemu-kvm-8387  [002]  6425.561242: kvm_vcpu_wakeup:      poll time 10477 ns
        qemu-kvm-8387  [002]  6425.561251: kvm_vcpu_wakeup:      poll time 0 ns
and so on.  This is because we need to go through kvm_vcpu_block again
after the timer IRQ is injected.  Avoid it by polling once before
entering kvm_vcpu_block.
On my machine (Xeon E5 Sandy Bridge) this removes about 500 cycles (7%)
from the latency of the TSC deadline timer.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
 
 static inline int vcpu_block(struct kvm *kvm, struct kvm_vcpu *vcpu)
 {
-       srcu_read_unlock(&kvm->srcu, vcpu->srcu_idx);
-       kvm_vcpu_block(vcpu);
-       vcpu->srcu_idx = srcu_read_lock(&kvm->srcu);
-
-       if (!kvm_check_request(KVM_REQ_UNHALT, vcpu))
-               return 1;
+       if (!kvm_arch_vcpu_runnable(vcpu)) {
+               srcu_read_unlock(&kvm->srcu, vcpu->srcu_idx);
+               kvm_vcpu_block(vcpu);
+               vcpu->srcu_idx = srcu_read_lock(&kvm->srcu);
+               if (!kvm_check_request(KVM_REQ_UNHALT, vcpu))
+                       return 1;
+       }
 
        kvm_apic_accept_events(vcpu);
        switch(vcpu->arch.mp_state) {