Add a new blk_rq_dma_map / blk_rq_dma_unmap pair that does away with
the wasteful scatterlist structure. Instead it uses the mapping iterator
to either add segments to the IOVA for IOMMU operations, or just maps
them one by one for the direct mapping. For the IOMMU case instead of
a scatterlist with an entry for each segment, only a single [dma_addr,len]
pair needs to be stored for processing a request, and for the direct
mapping the per-segment allocation shrinks from
[page,offset,len,dma_addr,dma_len] to just [dma_addr,len].
The major downѕide of this API is that the IOVA collapsing only works
when the driver sets a virt_boundary that matches the IOMMU granule.
Note that struct blk_dma_vec, struct blk_dma_mapping and blk_rq_dma_unmap
aren't really block specific, but for they are kept with the only mapping
routine to keep things simple.