Doing high IOPS testing with blk-cgroups enabled spends ~15-20% of the
time just doing ktime_get_ns() -> readtsc. We essentially read and
set the start time twice, one for the bio and then again when that bio
is mapped to a request.
Given that the time between the two is very short, inherit the bio
start time instead of reading it again. This cuts 1/3rd of the overhead
of the time keeping.
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
        trace_block_rq_issue(rq);
 
        if (test_bit(QUEUE_FLAG_STATS, &q->queue_flags)) {
-               rq->io_start_time_ns = ktime_get_ns();
+               u64 start_time;
+#ifdef CONFIG_BLK_CGROUP
+               if (rq->bio)
+                       start_time = bio_issue_time(&rq->bio->bi_issue);
+               else
+#endif
+                       start_time = ktime_get_ns();
+               rq->io_start_time_ns = start_time;
                rq->stats_sectors = blk_rq_sectors(rq);
                rq->rq_flags |= RQF_STATS;
                rq_qos_issue(q, rq);