Currently the destination rep pointer is only used for comparisons or to
obtain vport number from it. Since it is used both during flow creation and
deletion it may point to representor of another eswitch instance which can
be deallocated during driver unload even when there are rules pointing to
it[0]. Refactor the code to store vport number and 'valid' flag instead of
the representor pointer.
[0]:
[176805.886303] ==================================================================
[176805.889433] BUG: KASAN: slab-use-after-free in esw_cleanup_dests+0x390/0x440 [mlx5_core]
[176805.892981] Read of size 2 at addr 
ffff888155090aa0 by task modprobe/27280
[176805.895462] CPU: 3 PID: 27280 Comm: modprobe Tainted: G    B              6.6.0-rc3+ #1
[176805.896771] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 
rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
[176805.898514] Call Trace:
[176805.899026]  <TASK>
[176805.899519]  dump_stack_lvl+0x33/0x50
[176805.900221]  print_report+0xc2/0x610
[176805.900893]  ? mlx5_chains_put_table+0x33d/0x8d0 [mlx5_core]
[176805.901897]  ? esw_cleanup_dests+0x390/0x440 [mlx5_core]
[176805.902852]  kasan_report+0xac/0xe0
[176805.903509]  ? esw_cleanup_dests+0x390/0x440 [mlx5_core]
[176805.904461]  esw_cleanup_dests+0x390/0x440 [mlx5_core]
[176805.905223]  __mlx5_eswitch_del_rule+0x1ae/0x460 [mlx5_core]
[176805.906044]  ? esw_cleanup_dests+0x440/0x440 [mlx5_core]
[176805.906822]  ? xas_find_conflict+0x420/0x420
[176805.907496]  ? down_read+0x11e/0x200
[176805.908046]  mlx5e_tc_rule_unoffload+0xc4/0x2a0 [mlx5_core]
[176805.908844]  mlx5e_tc_del_fdb_flow+0x7da/0xb10 [mlx5_core]
[176805.909597]  mlx5e_flow_put+0x4b/0x80 [mlx5_core]
[176805.910275]  mlx5e_delete_flower+0x5b4/0xb70 [mlx5_core]
[176805.911010]  tc_setup_cb_reoffload+0x27/0xb0
[176805.911648]  fl_reoffload+0x62d/0x900 [cls_flower]
[176805.912313]  ? mlx5e_rep_indr_block_unbind+0xd0/0xd0 [mlx5_core]
[176805.913151]  ? __fl_put+0x230/0x230 [cls_flower]
[176805.913768]  ? filter_irq_stacks+0x90/0x90
[176805.914335]  ? kasan_save_stack+0x1e/0x40
[176805.914893]  ? kasan_set_track+0x21/0x30
[176805.915484]  ? kasan_save_free_info+0x27/0x40
[176805.916105]  tcf_block_playback_offloads+0x79/0x1f0
[176805.916773]  ? mlx5e_rep_indr_block_unbind+0xd0/0xd0 [mlx5_core]
[176805.917647]  tcf_block_unbind+0x12d/0x330
[176805.918239]  tcf_block_offload_cmd.isra.0+0x24e/0x320
[176805.918953]  ? tcf_block_bind+0x770/0x770
[176805.919551]  ? _raw_read_unlock_irqrestore+0x30/0x30
[176805.920236]  ? mutex_lock+0x7d/0xd0
[176805.920735]  ? mutex_unlock+0x80/0xd0
[176805.921255]  tcf_block_offload_unbind+0xa5/0x120
[176805.921909]  __tcf_block_put+0xc2/0x2d0
[176805.922467]  ingress_destroy+0xf4/0x3d0 [sch_ingress]
[176805.923178]  __qdisc_destroy+0x9d/0x280
[176805.923741]  dev_shutdown+0x1c6/0x330
[176805.924295]  unregister_netdevice_many_notify+0x6ef/0x1500
[176805.925034]  ? netdev_freemem+0x50/0x50
[176805.925610]  ? _raw_spin_lock_irq+0x7b/0xd0
[176805.926235]  ? _raw_spin_lock_bh+0xe0/0xe0
[176805.926849]  unregister_netdevice_queue+0x1e0/0x280
[176805.927592]  ? unregister_netdevice_many+0x10/0x10
[176805.928275]  unregister_netdev+0x18/0x20
[176805.928835]  mlx5e_vport_rep_unload+0xc0/0x200 [mlx5_core]
[176805.929608]  mlx5_esw_offloads_unload_rep+0x9d/0xc0 [mlx5_core]
[176805.930492]  mlx5_eswitch_unload_vf_vports+0x108/0x1a0 [mlx5_core]
[176805.931422]  ? mlx5_eswitch_unload_sf_vport+0x50/0x50 [mlx5_core]
[176805.932304]  ? rwsem_down_write_slowpath+0x11f0/0x11f0
[176805.932987]  mlx5_eswitch_disable_sriov+0x6f9/0xa60 [mlx5_core]
[176805.933807]  ? mlx5_core_disable_hca+0xe1/0x130 [mlx5_core]
[176805.934576]  ? mlx5_eswitch_disable_locked+0x580/0x580 [mlx5_core]
[176805.935463]  mlx5_device_disable_sriov+0x138/0x490 [mlx5_core]
[176805.936308]  mlx5_sriov_disable+0x8c/0xb0 [mlx5_core]
[176805.937063]  remove_one+0x7f/0x210 [mlx5_core]
[176805.937711]  pci_device_remove+0x96/0x1c0
[176805.938289]  device_release_driver_internal+0x361/0x520
[176805.938981]  ? kobject_put+0x5c/0x330
[176805.939553]  driver_detach+0xd7/0x1d0
[176805.940101]  bus_remove_driver+0x11f/0x290
[176805.943847]  pci_unregister_driver+0x23/0x1f0
[176805.944505]  mlx5_cleanup+0xc/0x20 [mlx5_core]
[176805.945189]  __x64_sys_delete_module+0x2b3/0x450
[176805.945837]  ? module_flags+0x300/0x300
[176805.946377]  ? dput+0xc2/0x830
[176805.946848]  ? __kasan_record_aux_stack+0x9c/0xb0
[176805.947555]  ? __call_rcu_common.constprop.0+0x46c/0xb50
[176805.948338]  ? fpregs_assert_state_consistent+0x1d/0xa0
[176805.949055]  ? exit_to_user_mode_prepare+0x30/0x120
[176805.949713]  do_syscall_64+0x3d/0x90
[176805.950226]  entry_SYSCALL_64_after_hwframe+0x46/0xb0
[176805.950904] RIP: 0033:0x7f7f42c3f5ab
[176805.951462] Code: 73 01 c3 48 8b 0d 75 a8 1b 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa b8 b0 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 45 a8 1b 00 f7 d8 64 89 01 48
[176805.953710] RSP: 002b:
00007fff07dc9d08 EFLAGS: 
00000206 ORIG_RAX: 
00000000000000b0
[176805.954691] RAX: 
ffffffffffffffda RBX: 
000055b6e91c01e0 RCX: 
00007f7f42c3f5ab
[176805.955691] RDX: 
0000000000000000 RSI: 
0000000000000800 RDI: 
000055b6e91c0248
[176805.956662] RBP: 
000055b6e91c01e0 R08: 
0000000000000000 R09: 
0000000000000000
[176805.957601] R10: 
00007f7f42d9eac0 R11: 
0000000000000206 R12: 
000055b6e91c0248
[176805.958593] R13: 
0000000000000000 R14: 
000055b6e91bfb38 R15: 
0000000000000000
[176805.959599]  </TASK>
[176805.960324] Allocated by task 20490:
[176805.960893]  kasan_save_stack+0x1e/0x40
[176805.961463]  kasan_set_track+0x21/0x30
[176805.962019]  __kasan_kmalloc+0x77/0x90
[176805.962554]  esw_offloads_init+0x1bb/0x480 [mlx5_core]
[176805.963318]  mlx5_eswitch_init+0xc70/0x15c0 [mlx5_core]
[176805.964092]  mlx5_init_one_devl_locked+0x366/0x1230 [mlx5_core]
[176805.964902]  probe_one+0x6f7/0xc90 [mlx5_core]
[176805.965541]  local_pci_probe+0xd7/0x180
[176805.966075]  pci_device_probe+0x231/0x6f0
[176805.966631]  really_probe+0x1d4/0xb50
[176805.967179]  __driver_probe_device+0x18d/0x450
[176805.967810]  driver_probe_device+0x49/0x120
[176805.968431]  __driver_attach+0x1fb/0x490
[176805.968976]  bus_for_each_dev+0xed/0x170
[176805.969560]  bus_add_driver+0x21a/0x570
[176805.970124]  driver_register+0x133/0x460
[176805.970684]  0xffffffffa0678065
[176805.971180]  do_one_initcall+0x92/0x2b0
[176805.971744]  do_init_module+0x22d/0x720
[176805.972318]  load_module+0x58c3/0x63b0
[176805.972847]  init_module_from_file+0xd2/0x130
[176805.973441]  __x64_sys_finit_module+0x389/0x7c0
[176805.974045]  do_syscall_64+0x3d/0x90
[176805.974556]  entry_SYSCALL_64_after_hwframe+0x46/0xb0
[176805.975566] Freed by task 27280:
[176805.976077]  kasan_save_stack+0x1e/0x40
[176805.976655]  kasan_set_track+0x21/0x30
[176805.977221]  kasan_save_free_info+0x27/0x40
[176805.977834]  ____kasan_slab_free+0x11a/0x1b0
[176805.978505]  __kmem_cache_free+0x163/0x2d0
[176805.979113]  esw_offloads_cleanup_reps+0xb8/0x120 [mlx5_core]
[176805.979963]  mlx5_eswitch_cleanup+0x182/0x270 [mlx5_core]
[176805.980763]  mlx5_cleanup_once+0x9a/0x1e0 [mlx5_core]
[176805.981477]  mlx5_uninit_one+0xa9/0x180 [mlx5_core]
[176805.982196]  remove_one+0x8f/0x210 [mlx5_core]
[176805.982868]  pci_device_remove+0x96/0x1c0
[176805.983461]  device_release_driver_internal+0x361/0x520
[176805.984169]  driver_detach+0xd7/0x1d0
[176805.984702]  bus_remove_driver+0x11f/0x290
[176805.985261]  pci_unregister_driver+0x23/0x1f0
[176805.985847]  mlx5_cleanup+0xc/0x20 [mlx5_core]
[176805.986483]  __x64_sys_delete_module+0x2b3/0x450
[176805.987126]  do_syscall_64+0x3d/0x90
[176805.987665]  entry_SYSCALL_64_after_hwframe+0x46/0xb0
[176805.988667] Last potentially related work creation:
[176805.989305]  kasan_save_stack+0x1e/0x40
[176805.989839]  __kasan_record_aux_stack+0x9c/0xb0
[176805.990443]  kvfree_call_rcu+0x84/0xa30
[176805.990973]  clean_xps_maps+0x265/0x6e0
[176805.991547]  netif_reset_xps_queues.part.0+0x3f/0x80
[176805.992226]  unregister_netdevice_many_notify+0xfcf/0x1500
[176805.992966]  unregister_netdevice_queue+0x1e0/0x280
[176805.993638]  unregister_netdev+0x18/0x20
[176805.994205]  mlx5e_remove+0xba/0x1e0 [mlx5_core]
[176805.994872]  auxiliary_bus_remove+0x52/0x70
[176805.995490]  device_release_driver_internal+0x361/0x520
[176805.996196]  bus_remove_device+0x1e1/0x3d0
[176805.996767]  device_del+0x390/0x980
[176805.997270]  mlx5_rescan_drivers_locked.part.0+0x130/0x540 [mlx5_core]
[176805.998195]  mlx5_unregister_device+0x77/0xc0 [mlx5_core]
[176805.998989]  mlx5_uninit_one+0x41/0x180 [mlx5_core]
[176805.999719]  remove_one+0x8f/0x210 [mlx5_core]
[176806.000387]  pci_device_remove+0x96/0x1c0
[176806.000938]  device_release_driver_internal+0x361/0x520
[176806.001612]  unbind_store+0xd8/0xf0
[176806.002108]  kernfs_fop_write_iter+0x2c0/0x440
[176806.002748]  vfs_write+0x725/0xba0
[176806.003294]  ksys_write+0xed/0x1c0
[176806.003823]  do_syscall_64+0x3d/0x90
[176806.004357]  entry_SYSCALL_64_after_hwframe+0x46/0xb0
[176806.005317] The buggy address belongs to the object at 
ffff888155090a80
                 which belongs to the cache kmalloc-64 of size 64
[176806.006774] The buggy address is located 32 bytes inside of
                 freed 64-byte region [
ffff888155090a80, 
ffff888155090ac0)
[176806.008773] The buggy address belongs to the physical page:
[176806.009480] page:
00000000a407e0e6 refcount:1 mapcount:0 mapping:
0000000000000000 index:0x0 pfn:0x155090
[176806.010633] flags: 0x200000000000800(slab|node=0|zone=2)
[176806.011352] page_type: 0xffffffff()
[176806.011905] raw: 
0200000000000800 ffff888100042640 ffffea000422b1c0 dead000000000004
[176806.012949] raw: 
0000000000000000 0000000000200020 00000001ffffffff 0000000000000000
[176806.013933] page dumped because: kasan: bad access detected
[176806.014935] Memory state around the buggy address:
[176806.015601]  
ffff888155090980: fa fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
[176806.016568]  
ffff888155090a00: fa fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
[176806.017497] >
ffff888155090a80: fa fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
[176806.018438]                                ^
[176806.019007]  
ffff888155090b00: fa fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
[176806.020001]  
ffff888155090b80: fa fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
[176806.020996] ==================================================================
Fixes: a508728a4c8b ("net/mlx5e: VF tunnel RX traffic offloading")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>