Besides removing hand-translation to assembler, this also adds missing
wrappers for arm64 and risc-v.
Reviewed by: emaste, jhb
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D7694
Expect sub testcases 2-4 in :tests_time_rotate to fail today due to changes
to newsyslog, etc made in the past month.
The issue is being root-caused as part of the bug noted below. This commit
will need to be partially reverted once the issue has been found/fixed
Reported by: Jenkins
Sponsored by: EMC / Isilon Storage Division
Add screen locking calls to sc cn grab and ungrab. The locking functions
just use the same mutex locking as sc cn putc so they have the same
The locking calls to acquire the lock are actually in sc cn open and close.
Ungrab has to unlock, although this opens a race window.
Change the direct mutex lock calls in sc cn putc to the new locking
functions via the open and close functions. Putc also has to unlock, but
doesn't keep the screen open like grab. Screen open and close reduce to
locking, except screen open for grab also attempts to switch the screen.
Keyboard locking is more difficult and still null, even when keyboard
input calls screen functions, except some of the functions have locks
too deep to work right.
This organization gives a single place to fix some of the locking.
On amd64, declare sse2_pagezero() and start using it again, but only
for zeroing pages in idle where nontemporal writes are clearly best.
This is almost a no-op since zeroing in idle works does nothing good
and is off by default. Fix END() statement forgotten in previous
Align the loop in sse2_pagezero(). Since it writes to main memory,
the loop doesn't have to be very carefully written to keep up.
Unrolling it was considered useless or harmful and was not done on
i386, but that was too careless.
Timing for i386: the loop was not unrolled at all, and moved only 4
bytes/iteration. So on a 2GHz CPU, it needed to run at 2 cycles/
iteration to keep up with a memory speed of just 4GB/sec. But when
it crossed a 16-byte boundary, on old CPUs it ran at 3 cycles/
iteration so it gave a maximum speed of 2.67GB/sec and couldn't even
keep up with PC3200 memory. Fix the alignment so that it keep up with
4GB/sec memory, and unroll once to get nearer to 8GB/sec. Further
unrolling might be useless or harmful since it would prevent the loop
fitting in 16-bytes. My test system with an old CPU and old DDR1 only
needed 5+ GB/sec. My test system with a new CPU and DDR3 doesn't need
any changes to keep up ~16GB/sec.
Timing for amd64: with 8-byte accesses and newer faster CPUs it is
easy to reach 16GB/sec but not so easy to go much faster. The
alignment doesn't matter much if the CPU is not very old. The loop
was already unrolled 4 times, but needs 32 bytes and uses a fancy
method that doesn't work for 2-way unrolling in 16 bytes. Just
align it to 32-bytes.
Use both the MACHINE and MACHINE_CPUARCH directories for finding sources.
When fixing this module to build on PC98, I actually broke the build on
ARM64. On PC98 we need to pull in the sources from the MACHINE_CPUARCH
(i386), but on ARM64 we need to use the MACHINE, as MACHINE_CPUARCH is
set to aarch64 instead of just arm64.
Sync libarchive with vendor including security fixes
Vendor issues fixed:
Issue #731: Reject tar entries >= INT64_MAX
Issue #744 (part of Issue #743): Enforce sandbox with very long pathnames
Issue #748: Zip decompression failure with highly-compressed data
Issue #767: Buffer overflow printing a filename
Issue #770: Zip read: be more careful about extra_length
MFC after: 3 days