ec5753e0 pfg Jan. 20, 2017, 12:02 a.m.
There were several places where reference to compression were left
unfinished. Furthermore, KASSERTs contained references to MPPC_INVALID
which is not defined in the tree and therefore were sure to break with
INVARIANTS: comment them out.

Reported by:	Eugene Grosbein
PR:		216265
MFC after:	3 days
cgit ViewVC
2d4d81c4 behlendorf1 Jan. 19, 2017, 10:41 p.m.
rt_mutex_owner is internal to kernel/locking/rtmutex_common.h and
inaccessible for SPL via the public kernel headers. The way of
accessing the owner has been stable since at least 3.13 ([1], [2]),
which is masking the lowest bit in the owner pointer in rt_mutex. We
do the same.


Reviewed-by: Brian Behlendorf <>
Signed-off-by: Clemens Fruhwirth <>
Closes #593
5cb44271 behlendorf1 Jan. 19, 2017, 10:32 p.m.
Reviewed-by: Brian Behlendorf <>
Signed-off-by: George Melikov <>
Closes #594
61673a1f jkim Jan. 19, 2017, 10:07 p.m.
040dab99 behlendorf1 Jan. 19, 2017, 9:56 p.m.
When doing recv and rollback, dsl_dataset_clone_swap_sync_impl will be
called to swap out the ds_objset and do dmu_objset_evict on the old one.
However, currently zv->zv_objset will not be swapped out accordingly, so
if anyone currently holds a fd on the zvol, we risk hitting a use-after-free.

We fix this by introducing the suspend and resume mechanism of zsb to
zv.  Before recv or rollback, we use zvol_suspend to block all access to
zv_objset and shut it down. After the recv or rollback, we use zvol_resume
to swap in zv_objset with the new ds_objset and unblock the access.

Reviewed-by: Brian Behlendorf <>
Signed-off-by: Chunwei Chen <>
Closes #4866 
Closes #5609
76fe529b behlendorf1 Jan. 19, 2017, 9:50 p.m.
Porting notes:
- This issue was first fixed in ZoL by commit d862cb0d.  That fix was
then modified and an equivalent version of the patch landed in the
upstream code base.  For additional details see the discussion in .

This commit aligns ZoL with OpenZFS codebase.

Authored by: Andriy Gapon <>
Reviewed by: Brian Behlendorf <>
Reviewed by: Matthew Ahrens <>
Reviewed by: Ned Bass <>
Reviewed by: Tim Chase <>
Approved by: Gordon Ross <>
Ported-by: George Melikov

Closes #5606
c11c484f scottl Jan. 19, 2017, 9:47 p.m.
All of the printing from the tables file now has wrappers so that the
handling is cleaner and it's possible to print something out (say, during
development) without having to fight the global debug flags. This re-org
will also make it easier to have the tables be compiled out at build time
if desired.

Other than fixing some minor bugs, there are no user-visible changes from
this change

Sponsored by:	Netflix, Inc.
Differential Revision:	D9238
cgit ViewVC
1e37d7e5 jpaetzel Jan. 19, 2017, 8:44 p.m.
  The core issue I've found is that there is no throttle for how many
  deletes get assigned to one TXG. As a results when deleting large files
  we end up filling consecutive TXGs with deletes/frees, then write
  throttling other (more important) ops.

  There is an easy test case for this problem. Try deleting several
  large files (at least 1/2 TB) while you do write ops on the same
  pool. What we've seen is performance of these write ops (let's
  call it sideload I/O) would drop to zero.

  More specifically the problem is that dmu_free_long_range_impl()
  can/will fill up all of the dirty data in the pool "instantly",
  before many of the sideload ops can get in. So sideload
  performance will be impacted until all the files are freed.

  The solution we have tested at Nexenta (with positive results)
  creates a relatively simple throttle for how many "free" ops we let
  into one TXG.

  However this solution exposes other problems that should also be
  addressed. If we are to slow down freeing of data that means one
  has to wait even longer (assuming vnode ref count of 1) to get shell
  back after an rm or for NFS thread to finish the free-ing op.
  To avoid this the proposed solution is to call zfs_inactive() async
  for "large" files. Async freeing then begs for the reclaimed space
  to be accounted for in the zpool's "freeing" prop.

  The other issue with having a longer delete is the inability to
  export/unmount for a longer period of time. The proposed solution
  is to interrupt freeing of blocks when a fs is unmounted.

Author: Alek Pinchuk <>
Reviewed by: Matt Ahrens <>
Reviewed by: Sanjay Nadkarni <>
Reviewed by: Pavel Zakharov <>
Approved by: Dan McDonald <>
cgit ViewVC
538ee0d7 kib Jan. 19, 2017, 8:03 p.m.
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
cgit ViewVC
d23d2945 sbruno Jan. 19, 2017, 7:58 p.m.
users that choose not to use EARLY_AP_STARTUP.

There is still an initialization issue/panic with !SMP and !EARLY_AP_STARTUP
that we have yet to resolve.

Submitted by:	bde
cgit ViewVC
00ac6a98 kib Jan. 19, 2017, 7:46 p.m.
The option "nonc" disables using of namecache for the created mount,
by default namecache is used.  The rationale for the option is that
namecache duplicates the information which is already kept in memory
by tmpfs.  Since it believed that namecache scales better than tmpfs,
or will scale better, do not enable the option by default.  On the
other hand, smaller machines may benefit from lesser namecache

Discussed with:	mjg
Tested by:	pho (as part of larger patch)
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
cgit ViewVC
a0b3a9cf jkim Jan. 19, 2017, 7:46 p.m.
08c053e7 kib Jan. 19, 2017, 7:29 p.m.
For directories, node->tn_spec.tn_dir.tn_parent pointer to the parent
is used.  For non-directories, the implementation is naive, all
directory nodes are scanned to find a dirent linking the specified
node.  This can be significantly improved by maintaining tn_parent for
all nodes, later.

Tested by:	pho (as part of larger patch)
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
cgit ViewVC
b4ba3b64 kib Jan. 19, 2017, 7:25 p.m.
Tested by:	pho (as part of larger patch)
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
cgit ViewVC
64c25043 kib Jan. 19, 2017, 7:15 p.m.
On dotdot lookup and fhtovp operations, it is possible for the file
represented by tmpfs node to be removed after the thread calculated
the pointer.  In this case, tmpfs_alloc_vp() accesses freed memory.

Introduce the reference count on the nodes.  The allnodes list from
tmpfs mount owns 1 reference, and threads performing unlocked
operations on the node, add one transient reference.  Similarly, since
struct tmpfs_mount maintains the list where nodes are enlisted,
refcount it by one reference from struct mount and one reference from
each node on the list.  Both nodes and tmpfs_mounts are removed when
refcount goes to zero.

Note that this means that nodes and tmpfs_mounts might survive some
time after the node is deleted or tmpfs_unmount() finished.  The
tmpfs_alloc_vp() in these cases returns error either due to node
removal (tn_nlinks == 0) or because of insmntque1(9) error.

Tested by:	pho (as part of larger patch)
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
cgit ViewVC