<feed xmlns='http://www.w3.org/2005/Atom'>
<title>src/include/sys, branch zfs-0.6.0-rc1</title>
<subtitle>FreeBSD source tree</subtitle>
<id>https://cgit-dev.freebsd.org/src/atom?h=zfs-0.6.0-rc1</id>
<link rel='self' href='https://cgit-dev.freebsd.org/src/atom?h=zfs-0.6.0-rc1'/>
<link rel='alternate' type='text/html' href='https://cgit-dev.freebsd.org/src/'/>
<updated>2011-02-18T17:31:25Z</updated>
<entry>
<title>Merge branch 'zpl'</title>
<updated>2011-02-18T17:31:25Z</updated>
<author>
<name>Brian Behlendorf</name>
<email>behlendorf1@llnl.gov</email>
</author>
<published>2011-02-18T17:31:25Z</published>
<link rel='alternate' type='text/html' href='https://cgit-dev.freebsd.org/src/commit/?id=5d0265c0dd54d798a35babe587ad5138392fe807'/>
<id>urn:sha1:5d0265c0dd54d798a35babe587ad5138392fe807</id>
<content type='text'>
</content>
</entry>
<entry>
<title>Add API to wait for pending commit callbacks</title>
<updated>2011-02-16T19:20:06Z</updated>
<author>
<name>Ricardo M. Correia</name>
<email>ricardo.correia@oracle.com</email>
</author>
<published>2011-01-21T22:35:41Z</published>
<link rel='alternate' type='text/html' href='https://cgit-dev.freebsd.org/src/commit/?id=54a179e7b80413bd48cd2cd259110fb493d0215e'/>
<id>urn:sha1:54a179e7b80413bd48cd2cd259110fb493d0215e</id>
<content type='text'>
This adds an API to wait for pending commit callbacks of already-synced
transactions to finish processing.  This is needed by the DMU-OSD in
Lustre during device finalization when some callbacks may still not be
called, this leads to non-zero reference count errors.  See lustre.org
bug 23931.
</content>
</entry>
<entry>
<title>Linux 2.6.36 compat, sops-&gt;evict_inode()</title>
<updated>2011-02-11T21:47:51Z</updated>
<author>
<name>Brian Behlendorf</name>
<email>behlendorf1@llnl.gov</email>
</author>
<published>2011-02-11T21:46:10Z</published>
<link rel='alternate' type='text/html' href='https://cgit-dev.freebsd.org/src/commit/?id=2c395def2763ccc7a549d297f7f11bd304caaeae'/>
<id>urn:sha1:2c395def2763ccc7a549d297f7f11bd304caaeae</id>
<content type='text'>
The new prefered inteface for evicting an inode from the inode cache
is the -&gt;evict_inode() callback.  It replaces both the -&gt;delete_inode()
and -&gt;clear_inode() callbacks which were previously used for this.
</content>
</entry>
<entry>
<title>Linux 2.6.35 compat, fops-&gt;fsync()</title>
<updated>2011-02-11T17:05:51Z</updated>
<author>
<name>Brian Behlendorf</name>
<email>behlendorf1@llnl.gov</email>
</author>
<published>2011-02-11T16:58:55Z</published>
<link rel='alternate' type='text/html' href='https://cgit-dev.freebsd.org/src/commit/?id=7268e1bec8478639b7a1047e02ab931f30bc2f92'/>
<id>urn:sha1:7268e1bec8478639b7a1047e02ab931f30bc2f92</id>
<content type='text'>
The fsync() callback in the file_operations structure used to take
3 arguments.  The callback now only takes 2 arguments because the
dentry argument was determined to be unused by all consumers.  To
handle this a compatibility prototype was added to ensure the right
prototype is used.  Our implementation never used the dentry argument
either so it's just a matter of using the right prototype.
</content>
</entry>
<entry>
<title>Linux 2.6.35 compat, const struct xattr_handler</title>
<updated>2011-02-11T00:29:00Z</updated>
<author>
<name>Brian Behlendorf</name>
<email>behlendorf1@llnl.gov</email>
</author>
<published>2011-02-11T00:16:52Z</published>
<link rel='alternate' type='text/html' href='https://cgit-dev.freebsd.org/src/commit/?id=777d4af89137907adc91377327505f40c296035d'/>
<id>urn:sha1:777d4af89137907adc91377327505f40c296035d</id>
<content type='text'>
The const keyword was added to the 'struct xattr_handler' in the
generic Linux super_block structure.  To handle this we define an
appropriate xattr_handler_t typedef which can be used.  This was
the preferred solution because it keeps the code clean and readable.
</content>
</entry>
<entry>
<title>Use 'noop' IO Scheduler</title>
<updated>2011-02-10T17:27:22Z</updated>
<author>
<name>Brian Behlendorf</name>
<email>behlendorf1@llnl.gov</email>
</author>
<published>2011-02-07T21:54:59Z</published>
<link rel='alternate' type='text/html' href='https://cgit-dev.freebsd.org/src/commit/?id=6839eed23e3c9d85cf0de767be32af0759e5bf2d'/>
<id>urn:sha1:6839eed23e3c9d85cf0de767be32af0759e5bf2d</id>
<content type='text'>
Initial testing has shown the the right IO scheduler to use under Linux
is noop.  This strikes the ideal balance by allowing the zfs elevator
to do all request ordering and prioritization.  While allowing the
Linux elevator to do the maximum front/back merging allowed by the
physical device.  This yields the largest possible requests for the
device with the lowest total overhead.

While 'noop' should be right for your system you can choose a different
IO scheduler with the 'zfs_vdev_scheduler' option.  You may set this
value to any of the standard Linux schedulers: noop, cfq, deadline,
anticipatory.  In addition, if you choose 'none' zfs will not attempt
to change the IO scheduler for the block device.
</content>
</entry>
<entry>
<title>Add mmap(2) support</title>
<updated>2011-02-10T17:27:21Z</updated>
<author>
<name>Brian Behlendorf</name>
<email>behlendorf1@llnl.gov</email>
</author>
<published>2011-02-03T18:34:05Z</published>
<link rel='alternate' type='text/html' href='https://cgit-dev.freebsd.org/src/commit/?id=c0d35759c5ab1abaa6b72062cc4ecd0d86628de8'/>
<id>urn:sha1:c0d35759c5ab1abaa6b72062cc4ecd0d86628de8</id>
<content type='text'>
It's worth taking a moment to describe how mmap is implemented
for zfs because it differs considerably from other Linux filesystems.
However, this issue is handled the same way under OpenSolaris.

The issue is that by design zfs bypasses the Linux page cache and
leaves all caching up to the ARC.  This has been shown to work
well for the common read(2)/write(2) case.  However, mmap(2)
is problem because it relies on being tightly integrated with the
page cache.  To handle this we cache mmap'ed files twice, once in
the ARC and a second time in the page cache.  The code is careful
to keep both copies synchronized.

When a file with an mmap'ed region is written to using write(2)
both the data in the ARC and existing pages in the page cache
are updated.  For a read(2) data will be read first from the page
cache then the ARC if needed.  Neither a write(2) or read(2) will
will ever result in new pages being added to the page cache.

New pages are added to the page cache only via .readpage() which
is called when the vfs needs to read a page off disk to back the
virtual memory region.  These pages may be modified without
notifying the ARC and will be written out periodically via
.writepage().  This will occur due to either a sync or the usual
page aging behavior.  Note because a read(2) of a mmap'ed file
will always check the page cache first even when the ARC is out
of date correct data will still be returned.

While this implementation ensures correct behavior it does have
have some drawbacks.  The most obvious of which is that it
increases the required memory footprint when access mmap'ed
files.  It also adds additional complexity to the code keeping
both caches synchronized.

Longer term it may be possible to cleanly resolve this wart by
mapping page cache pages directly on to the ARC buffers.  The
Linux address space operations are flexible enough to allow
selection of which pages back a particular index.  The trick
would be working out the details of which subsystem is in
charge, the ARC, the page cache, or both.  It may also prove
helpful to move the ARC buffers to a scatter-gather lists
rather than a vmalloc'ed region.

Additionally, zfs_write/read_common() were used in the readpage
and writepage hooks because it was fairly easy.  However, it
would be better to update zfs_fillpage and zfs_putapage to be
Linux friendly and use them instead.
</content>
</entry>
<entry>
<title>Add Hooks for Linux File Operations</title>
<updated>2011-02-10T17:27:21Z</updated>
<author>
<name>Brian Behlendorf</name>
<email>behlendorf1@llnl.gov</email>
</author>
<published>2011-01-26T20:03:58Z</published>
<link rel='alternate' type='text/html' href='https://cgit-dev.freebsd.org/src/commit/?id=1efb473f8919c5f195e127136b79c6d3b1eb1c81'/>
<id>urn:sha1:1efb473f8919c5f195e127136b79c6d3b1eb1c81</id>
<content type='text'>
The Linux specific file operations have all been located in the
file zpl_file.c.  These functions primarily rely on the reworked
zfs_* functions to do their job.  They are also responsible for
converting the possible Solaris style error codes to negative
Linux errors.

This first zpl_* commit also includes a common zpl.h header with
minimal entries to register the Linux specific hooks.  In also
adds all the new zpl_* file to the Makefile.in.  This is not a
standalone commit, you required the following zpl_* commits.
</content>
</entry>
<entry>
<title>Add zp-&gt;z_is_zvol flag</title>
<updated>2011-02-10T17:27:21Z</updated>
<author>
<name>Brian Behlendorf</name>
<email>behlendorf1@llnl.gov</email>
</author>
<published>2011-02-08T19:29:50Z</published>
<link rel='alternate' type='text/html' href='https://cgit-dev.freebsd.org/src/commit/?id=3c4988c83e4f278cd6c8076f6cdb8e4858d05840'/>
<id>urn:sha1:3c4988c83e4f278cd6c8076f6cdb8e4858d05840</id>
<content type='text'>
A new flag is required for the zfs_rlock code to determine if
it is operation of the zvol of zpl dataset.  This used to be
keyed off the zp-&gt;z_vnode, which was a hack to begin with, but
with the removal of vnodes we needed a dedicated flag.
</content>
</entry>
<entry>
<title>Prototype/structure update for Linux</title>
<updated>2011-02-10T17:27:21Z</updated>
<author>
<name>Brian Behlendorf</name>
<email>behlendorf1@llnl.gov</email>
</author>
<published>2011-02-08T19:16:06Z</published>
<link rel='alternate' type='text/html' href='https://cgit-dev.freebsd.org/src/commit/?id=3558fd73b5d863304102f6745c26e0b592aca60a'/>
<id>urn:sha1:3558fd73b5d863304102f6745c26e0b592aca60a</id>
<content type='text'>
I appologize in advance why to many things ended up in this commit.
When it could be seperated in to a whole series of commits teasing
that all apart now would take considerable time and I'm not sure
there's much merrit in it.  As such I'll just summerize the intent
of the changes which are all (or partly) in this commit.  Broadly
the intent is to remove as much Solaris specific code as possible
and replace it with native Linux equivilants.  More specifically:

1) Replace all instances of zfsvfs_t with zfs_sb_t.  While the
type is largely the same calling it private super block data
rather than a zfsvfs is more consistent with how Linux names
this.  While non critical it makes the code easier to read when
your thinking in Linux friendly VFS terms.

2) Replace vnode_t with struct inode.  The Linux VFS doesn't have
the notion of a vnode and there's absolutely no good reason to
create one.  There are in fact several good reasons to remove it.
It just adds overhead on Linux if we were to manage one, it
conplicates the code, and it likely will lead to bugs so there's
a good change it will be out of date.  The code has been updated
to remove all need for this type.

3) Replace all vtype_t's with umode types.  Along with this shift
all uses of types to mode bits.  The Solaris code would pass a
vtype which is redundant with the Linux mode.  Just update all the
code to use the Linux mode macros and remove this redundancy.

4) Remove using of vn_* helpers and replace where needed with
inode helpers.  The big example here is creating iput_aync to
replace vn_rele_async.  Other vn helpers will be addressed as
needed but they should be be emulated.  They are a Solaris VFS'ism
and should simply be replaced with Linux equivilants.

5) Update znode alloc/free code.  Under Linux it's common to
embed the inode specific data with the inode itself.  This removes
the need for an extra memory allocation.  In zfs this information
is called a znode and it now embeds the inode with it.  Allocators
have been updated accordingly.

6) Minimal integration with the vfs flags for setting up the
super block and handling mount options has been added this
code will need to be refined but functionally it's all there.

This will be the first and last of these to large to review commits.
</content>
</entry>
</feed>
