Move the tx bottom function from NAPI to a new tasklet. Then, for
multi-cores, the bottom functions of tx and rx may be run at same
time with different cores. This is used to improve performance.
On x86, Tx/Rx 943/943 Mbits/sec -> 945/944.
For arm platform, Tx/Rx: 917/917 Mbits/sec -> 933/933.
Signed-off-by: Hayes Wang <hayeswang@realtek.com> Signed-off-by: David S. Miller <davem@davemloft.net>