9fc6a635 hselasky June 21, 2022, 9:33 a.m.
If uverbs_user_mmap_disassociate() is called while the mmap is
concurrently doing exit_mmap then the ordering of the
rdma_user_mmap_entry_put() is not reliable.

The put must be done before uvers_user_mmap_disassociate() returns,
otherwise there can be a use after free on the ucontext, and a left over
entry in the xarray. If the put is not done here then it is done during
rdma_umap_close() later.

Add the missing put to the error exit path.

Linux commit:

PR:		264473
MFC after:	3 days
Sponsored by:	NVIDIA Networking
55d18336 hselasky June 21, 2022, 9:33 a.m.
The kernel commit cited below restructured ib device management
so that the device kobject is initialized in ib_alloc_device.

As part of the restructuring, the kobject is now initialized in
procedure ib_alloc_device, and is later added to the device hierarchy
in the ib_register_device call stack, in procedure
ib_device_register_sysfs (which calls device_add).

However, in the ib_device_register_sysfs error flow, if an error
occurs following the call to device_add, the cleanup procedure
device_unregister is called. This call results in the device object
being deleted -- which results in various use-after-free crashes.

The correct cleanup call is device_del -- which undoes device_add
without deleting the device object.

The device object will then (correctly) be deleted in the
ib_register_device caller's error cleanup flow, when the caller invokes

Linux commit:

PR:		264472
MFC after:	3 days
Sponsored by:	NVIDIA Networking
f2deb5e4 pho June 21, 2022, 8:20 a.m.
0ba1d860 alc June 21, 2022, 4:48 a.m.
Release the domain lock when iommu_gas_reserve_region_extend()'s call to
iommu_gas_reserve_region_locked() fails.

MFC after:	2 weeks
32e82bcc alc June 21, 2022, 4:03 a.m.
Since OFF_TO_IDX() inherently truncates the given value, there is no
need to perform trunc_page() on it.

MFC after:	2 weeks
70b5d8fa dougm June 21, 2022, 12:34 a.m.
The loop iteration in iommu_gas_lowermatch checks the bound
a->common->lowaddr twice per loop iteration. Rewrite to test only once
per iteration.  Do not worry about passing to iommu_gas_match_one a
range wholly beyond lowaddr. Since that function checks the upper end
of the range against lowaddr, it'll get rejected there.

Reviewed by:	alc
MFC after:	3 weeks
Differential Revision:	https://reviews.freebsd.org/D35522
0586a129 rmacklem June 20, 2022, 8:23 p.m.
The vfs_flags() macro was used to make the code compatible
with Mac OSX, for the Mac OSX port.
For FreeBSD, this macro just obscured the code, so
remove it to clean up the code.

This commit should not result in a semantics change.
164491fb alc June 20, 2022, 5:30 p.m.
As of 19bb5a7244ff, the size passed to iommu_gas_map is no longer
required to be a multiple of the CPU page size.

MFC after:	2 weeks
6405997f markj June 20, 2022, 4:48 p.m.
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
e123264e markj June 20, 2022, 4:48 p.m.
Commit 4b8365d752ef introduced the ability to dynamically register
VM object types, for use by tmpfs, which creates swap-backed objects.
As a part of this, checks for such objects changed from

  object->type == OBJT_DEFAULT || object->type == OBJT_SWAP


  object->type == OBJT_DEFAULT || (object->flags & OBJ_SWAP) != 0

In particular, objects of type OBJT_DEFAULT do not have OBJ_SWAP set;
the swap pager sets this flag when converting from OBJT_DEFAULT to

A few of these checks are done without the object lock held.  It turns
out that this can result in false negatives since the swap pager
converts objects like so:

  object->type = OBJT_SWAP;
  object->flags |= OBJ_SWAP;

Fix the problem by adding explicit tests for OBJT_SWAP objects in
unlocked checks.

PR:		258932
Fixes:		4b8365d752ef ("Add OBJT_SWAP_TMPFS pager")
Reported by:	bdrewery
Reviewed by:	kib
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D35470
9553bc89 markj June 20, 2022, 4:48 p.m.
- Remove the AIO proc zone.  This zone gets one allocation per AIO
  daemon process, which isn't enough to warrant a dedicated zone.  Plus,
  unlike other AIO structures, aiops are small (32 bytes with LP64), so
  UMA doesn't provide better space efficiency than malloc(9).  Change
  one of the malloc types in vfs_aio.c to make it more general.

- Don't set the NOFREE flag on the other AIO zones.  This flag means
  that memory allocated to the AIO subsystem is never freed back to the
  VM, so it's always preferable to avoid using it when possible.  NOFREE
  was set without explanation when AIO was converted to use UMA 20 years
  ago, but it does not appear to be required; all of the structures
  allocated from UMA (per-process kaioinfo, kaiocb, and aioliojob) keep
  track of references and get freed only when none exist.  Plus, these
  structures will contain dangling pointer after they're freed (e.g.,
  the "cred", "fd_file" and "uiop" fields of struct kaiocb), so
  use-after-frees are dangerous even when the structures themselves are

Reviewed by:	asomers
MFC after:	1 month
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D35493
60b4ad4b markj June 20, 2022, 4:48 p.m.
BPF headers are word-aligned when copied into the store buffer.  Ensure
that pad bytes following the preceding packet are cleared.

Reported by:	KMSAN
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
c88f6908 markj June 20, 2022, 4:48 p.m.
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
540da48d markj June 20, 2022, 4:48 p.m.
This addresses a couple of false positive reports for memory returned by

Sponsored by:	The FreeBSD Foundation
a932a5a6 markj June 20, 2022, 4:48 p.m.
Otherwise zone initializers can produce false positives, e.g., when
lock_init() attempts to detect double initialization.

Sponsored by:	The FreeBSD Foundation