This adds support for writing and reading back in a single
oneshot packet.  This is needed to send a tlb invalidation
and wait for ack in a single operation.
v2: squash sdma hang fix into this patch
Reviewed-by: Huang Rui <ray.huang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Emily Deng <Emily.Deng@amd.com>
        .pad_ib = sdma_v4_0_ring_pad_ib,
        .emit_wreg = sdma_v4_0_ring_emit_wreg,
        .emit_reg_wait = sdma_v4_0_ring_emit_reg_wait,
+       .emit_reg_write_reg_wait = amdgpu_ring_emit_reg_write_reg_wait_helper,
 };
 
 static void sdma_v4_0_set_ring_funcs(struct amdgpu_device *adev)