Kernel modules are object files used to extend an operating system’s kernel functionality at run time.
In this post, we’ll look at implementing a simple character device driver as a
kernel module in NetBSD.
Once it is loaded, userspace processes will be able to write
an arbitrary byte string to
the device,
and on every successive read
expect a cryptographically-secure pseudorandom permutation of
the original byte string.
Before we begin, compiling a kernel module requires the NetBSD source code to live in /usr/src
.
This explains how to
get that.
Usually, most userspace interfaces to character or block devices are through
special files that live in /dev
. We’ll create one such special file through
the command
$ mknod /dev/rperm c 420 0
The c
indicates that this file is an interface
to a character device, 420
indicates this device’s major number, and 0
indicates this device’s minor number. The major number is used by the kernel
to uniquely identify each device, and the minor number is usually used
internally by device drivers but we won’t be bothering with it.
Our device driver will specifically implement the open
, read
, write
, and close
I/O methods. To
register our implementations of those methods with the kernel, we first
prototype them in way that makes the compiler happy using the
dev_type_*
set of macros, and then put them into a struct cdevsw
.
dev_type_open(rperm_open);
dev_type_close(rperm_close);
dev_type_write(rperm_write);
dev_type_read(rperm_read);
static struct cdevsw rperm_cdevsw = {
.d_open = rperm_open,
.d_close = rperm_close,
.d_read = rperm_read,
.d_write = rperm_write,
.d_ioctl = noioctl,
.d_stop = nostop,
.d_tty = notty,
.d_poll = nopoll,
.d_mmap = nommap,
.d_kqfilter = nokqfilter,
.d_discard = nodiscard,
.d_flag = D_OTHER
};
As we can see, there are plenty of functions we won’t be implementing.
devsw
stands for device switch.
Every kernel module is required to define it’s metadata through the C macro
MODULE(class, name, required)
. Since our module is a device driver, named
rperm
, and won’t require another module being pre-loaded, we write
MODULE(MODULE_CLASS_DRIVER, rperm, NULL);
Every module is also required to implement a MODNAME_modcmd
function, which the kernel
calls to report important module-related events, like when the module loads
or unloads. This is where we’ll register our struct cdevsw
.
#define CMAJOR 420
static int
rperm_modcmd(modcmd_t cmd, void *args)
{
devmajor_t bmajor, cmajor;
bmajor = -1;
cmajor = CMAJOR;
switch(cmd) {
case MODULE_CMD_INIT:
devsw_attach("rperm", NULL, &bmajor, &rperm_cdevsw, &cmajor);
break;
case MODULE_CMD_FINI:
devsw_detach(NULL, &rperm_cdevsw);
break;
default:
break;
}
return 0;
}
The NULL
argument to the devsw_*
functions is for a block device switch structure,
which we aren’t bothered with. Similarly for bmajor
, but the kernel ends up
assigning an unused block device number for our driver anyway.
Now we turn to actually implementing the four device I/O methods.
On every write
, we need to store the byte string somewhere. We use a static
structure for that.
static struct rperm_softc {
char *buf;
int buf_len;
} sc;
sc.buf
will end up pointing to a location in the kernel’s heap that contains
the byte string. sc
and softc
stand for software context,
which is just a convention followed in the NetBSD kernel for naming static structures in
device driver code.
open
is a required implementation, as it is always the first
syscall in Unix I/O. But, there is nothing meaningful for us to do there, so we
simply write a stub.
int
rperm_open(dev_t self, int flag, int mod, struct lwp *l)
{
return 0;
}
In write
, we allocate enough memory in the kernel’s heap to store the
byte string, and then transfer the byte string from userspace to kernelspace.
int
rperm_write(dev_t self, struct uio *uio, int flags)
{
if (sc.buf)
kmem_free(sc.buf, sc.buf_len);
sc.buf_len = uio->uio_iov->iov_len;
sc.buf = (char *)kmem_alloc(sc.buf_len, KM_SLEEP);
uiomove(sc.buf, sc.buf_len, uio);
return 0;
}
First, let’s discuss the allocations.
kmem_alloc
is similar to userspace malloc
,
in that it allocates some number of bytes of memory in the heap. Interestingly, this memory
is wired, which means that during physical memory pressure, it is not paged
out to a swap disk like userspace memory is. The KM_SLEEP
flag to kmem_alloc
tells the kernel that
the current kernel thread should sleep until enough physical memory is avaiable for the request, if it
already isn’t, as opposed to kmem_alloc
simply returning NULL
in such a situation. Hence,
our allocation request never fails, and we don’t have to test for sc.buf == NULL
.
kmem_free
is similar to userspace free
, except for a second argument that has to be the number of bytes allocated using kmem_alloc
.
Next, we come to the transfer of the byte string from userspace to kernelspace.
Generally, memory to be transfered, in either direction, comes in one or more
non-contiguous chunks of memory (think scatter-gather I/O) along with some additional
state variables like the amount of data remaining to be transfered in the current
session, an offset into a block device, and some flags. All that information
is encapulated in a struct uio
data type. And uiomove
performs the actual transfer
by using that information. For example, here uio->uio_rw
is set to UIO_WRITE
,
telling uiomove
that data
from uio
should be transfered to sc.buf
. uiomove
also ends up updating
uio->uio_resid
, which is the total number of bytes left to transfer to uio
.
Next we come to read
.
int
rperm_read(dev_t self, struct uio *uio, int flags)
{
if (sc.buf == NULL || uio->uio_resid < sc.buf_len)
return EINVAL;
char c;
uint32_t i, n, r;
for (i = 0; i < sc.buf_len-1; i++) {
r = rand_n(i, sc.buf_len);
c = sc.buf[r];
sc.buf[r] = sc.buf[i];
sc.buf[i] = c;
}
uiomove(sc.buf, sc.buf_len, uio);
return 0;
}
We first check if there is enough space in uio
to transfer a permuted byte
string, then use the Fisher–Yates shuffle
to permute the original byte string using a random number generated by rand_n
,
and then copy the permuted string back to userspace. In this case, uio->uio_rw
would be set to UIO_READ
, telling uiomove
that data
from sc.buf
should be transfered to uio
.
The rand_n
function, which we need to implement, returns a random integer n
uniformly distributed
over the range [low, high)
.
#define R32MAX 4294967295
uint32_t rand_n(uint32_t low, uint32_t high) {
uint32_t limit, diff, r;
diff = high - low;
limit = diff * (R32MAX/diff);
do {
r = cprng_strong32();
} while (r > limit);
return (r % diff) + low;
}
For a source of randomness, we use cprng_strong32
. The cprng_*
family of
functions supply cryptographically secure pseudorandom
bytes (in this case, 4
) to callers within the kernel.
Once we have it, we transform the range of our random number from [0, 2^32)
to
[low, high)
by an iterative test that discards those values of r
that are
larger than the largest multiple of r
less than 2^32
, as using those numbers
would result in numbers in a certain subrange of [low, high)
being more
likely to occur than those not.
In close
, we free sc.buf
if it was allocated before.
int
rperm_close(dev_t self, int flag, int mod, struct lwp *l)
{
if (sc.buf != NULL) {
kmem_free(sc.buf, sc.buf_len);
sc.buf = NULL;
}
return 0;
}
Lastly, we write a three line Makefile
to build our module.
KMOD= rperm
SRCS= rperm.c
.include <bsd.kmodule.mk>
Now we compile and load the module.
$ make
$ modload ./rperm.kmod
The ./
has to be present for modload
.
Unfortunately, we can’t simply do
$ echo 'bloop' > /dev/rperm
$ cat /dev/rperm
as that would end up copying the \0
along with the rest of the string into the driver, and the driver
would end up shuffling the \0
as well, which we don’t want.
So we settle for a simple test program.
#define BUF_LEN 80
#define N_ITER 20
int main() {
char buf[BUF_LEN] = "Hello NetBSD!";
int i, fd, str_len;
str_len = strlen(buf);
fd = open("/dev/rperm", O_RDWR);
write(fd, buf, str_len);
for (i = 0; i < N_ITER; i++) {
read(fd, buf, BUF_LEN);
printf("%s\n", buf);
}
close(fd);
return 0;
}
$ ./test
NDe eBlS!tHol
Hl!elDSNeo Bt
N!llD eSoHetB
...
tHelS!BlNeo D
DBeSlt!HloNe
llSetB!oeNHD
All of the above code is available in it’s entirety on github.
More examples of kernel modules can be found in the NetBSD source tree
at src/sys/modules/examples/
.
posted on Feb 2, 2017 by Saurav Sachidanand