dc1ab04e mjg July 5, 2021, 8:42 a.m.
Instead serialize against these operations with a dedicated lock.

Prior to the change, When pushing 17 mln pps of traffic, calling
DIOCRGETTSTATS in a loop would restrict throughput to about 7 mln.  With
the change there is no slowdown.

Reviewed by:	kp (previous version)
Sponsored by:	Rubicon Communications, LLC ("Netgate")
cgit
f92c21a2 mjg July 5, 2021, 8:42 a.m.
Creating tables and zeroing their counters induces excessive IPIs (14
per table), which in turns kills single- and multi-threaded performance.

Work around the problem by extending per-CPU counters with a general
counter populated on "zeroing" requests -- it stores the currently found
sum. Then requests to report the current value are the sum of per-CPU
counters subtracted by the saved value.

Sample timings when loading a config with 100k tables on a 104-way box:

stock:

pfctl -f tables100000.conf  0.39s user 69.37s system 99% cpu 1:09.76 total
pfctl -f tables100000.conf  0.40s user 68.14s system 99% cpu 1:08.54 total

patched:

pfctl -f tables100000.conf  0.35s user 6.41s system 99% cpu 6.771 total
pfctl -f tables100000.conf  0.48s user 6.47s system 99% cpu 6.949 total

Reviewed by:	kp (previous version)
Sponsored by:	Rubicon Communications, LLC ("Netgate")
cgit
c5d6dd80 pho July 5, 2021, 7:16 a.m.
7ebe83dd pho July 5, 2021, 7:14 a.m.
Reviewed by:	 rgrimes
cgit
16789751 lwhsu July 5, 2021, 2:14 a.m.
Reported by:	jrtc27
MFC with:	ffe6afc4f0121f1909f2fa88694228f771dd3fcb
cgit
5fa1eb1c wulf July 5, 2021, 12:22 a.m.
8b33cb83 wulf July 5, 2021, 12:20 a.m.
as a thin wrapper around native version found in sys/seqc.h.
This replaces out-of-base GPLv2-licensed code used by drm-kmod.

Reviewed by:	hselasky
Differential revision:	https://reviews.freebsd.org/D31006
cgit
019391bf wulf July 5, 2021, 12:20 a.m.
strscpy copies the src string, or as much of it as fits, into the dst
buffer.  The dst buffer is always NUL terminated, unless it's zero-sized.
strscpy returns the number of characters copied (not including the
trailing NUL) or -E2BIG if len is 0 or src was truncated.

Currently drm-kmod replaces strscpy with strncpy that is not quite
correct as strncpy does not NUL-terminate truncated strings and returns
different values on exit.

Reviewed by:	hselasky, imp, manu
MFC after:	2 weeks
Differential revision:	https://reviews.freebsd.org/D31005
cgit
98a6984a wulf July 5, 2021, 12:20 a.m.
This allows to remove unimplemented attrs parameter which type differs
between Linux kernel versions and to compile both drm-kmod and ofed
callers unmodified.
Also convert it to 'unsigned long' type to match modern Linuxes.

Reviewed by:	hselasky
Differential revision:	https://reviews.freebsd.org/D30932
cgit
864b1100 wulf July 5, 2021, 12:20 a.m.
irq_work_sync() performs draining of irq_work task.
Required by drm-kmod.

Reviewed by:	hselasky
MFC after:	2 weeks
Differential revision:	https://reviews.freebsd.org/D30818
cgit
1ab61a19 wulf July 5, 2021, 12:19 a.m.
Linux docs explicitly state that this is not required [1]:

"Important note: The rcu_barrier() function is not, repeat, not,
obligated to wait for a grace period.  It is instead only required to
wait for RCU callbacks that have already been posted.  Therefore, if
there are no RCU callbacks posted anywhere in the system, rcu_barrier()
is within its rights to return immediately.  Even if there are
callbacks posted, rcu_barrier() does not necessarily need to wait for
a grace period."

[1] https://www.kernel.org/doc/Documentation/RCU/Design/Requirements/Requirements.html

Reviewed by:	emaste, hselasky, manu
MFC after:	2 weeks
Differential revision:	https://reviews.freebsd.org/D30809
cgit
c0862b2b wulf July 5, 2021, 12:19 a.m.
so this list-traversal primitive may safely run concurrently with the
_rcu list-mutation primitives such as list_add_rcu() as long as the
traversal is guarded by rcu_read_lock().

Do it by reusing the "list_for_each_entry_rcu" macro which does the same.
On Linux it implements some additional lockdep stuff which we skip.

Also move the macro to linux/rculist.h where it resides on Linux.

Reviewed by:	hselasky
Differential revision:	https://reviews.freebsd.org/D30795
cgit
c77ec79b wulf July 5, 2021, 12:19 a.m.
On Linux atomic_dec_and_lock_irqsave is a wrapper macro which provides
a reference to third parameter rather than parameter value itself to
implementation routine called _atomic_dec_and_lock_irqsave [1].

While here, implement a fast path.

[1] https://github.com/torvalds/linux/blob/master/include/linux/spinlock.h#L476

Reviewed by:	hselasky
Differential revision:	https://reviews.freebsd.org/D30781
cgit
78a02d8b wulf July 5, 2021, 12:18 a.m.
Reviewed by:	hselasky, manu
MFC after:	2 weeks
Differential revision:	https://reviews.freebsd.org/D30767
cgit
a2b83b59 wulf July 5, 2021, 12:18 a.m.
as it is required by i915kms driver from Linux kernel v 5.5.
This is done with asynchronous freeing of requested memory areas from
taskqueue thread. As memory to be freed is reused to store linked list
entry, backing UMA zone item size is rounded up to pointer size.

While here, make struct linux_kmem_cache private to LKPI to reduce amount
of BSD headers included by linux/slab.h and switch RCU code to usage of
LKPI's linux_irq_work_tq taskqueue to avoid injection of current into
system-wide taskqueue_fast thread context.

Submitted by:	nc (initial version for drm-kmod)
Reviewed by:	manu, nc
Differential revision:	https://reviews.freebsd.org/D30760
cgit