Chapter 4
Modifying System Parameters
|
|
|
|
|
|
In this chapter: |
|
|
|
|
Modifying kernel parameters by writing into files found in /proc/sys
|
|
Exploring the files which modify certain parameters
|
|
Review of the /proc/sys file tree
|
|
|
|
|
|
A very interesting part of /proc is the directory
/proc/sys. This is not only a source of information, it
also allows you to change parameters within the kernel. Be very
careful when attempting this. You can optimize your system, but you
can also cause it to crash. Never alter kernel parameters on a
production system. Set up a development machine and test to make sure
that everything works the way you want it to. You may have no
alternative but to reboot the machine once an error has been made.
To change a value, simply echo the new value into the
file. An example is given below in the section on the file system
data. You need to be root to do this. You can create your own boot
script to perform this every time your system boots.
|
Figure 4-1 |
|
| Files and directories in /proc/sys |
|
Figure 4-1 shows a snapshot of /proc/sys on
a SuSE system running kernel version 2.1.131. Please note that this
figure shows actual files, not just only directories like the examples
we've seen so far. Since the contents of /proc change
dynamically, this picture may look different on your system. It does
not include the subtree /proc/sys/net/ipv4, which is
shown in figure 4-2
and discussed later on.
The files in /proc/sys can be used to fine tune and
monitor miscellaneous and general things in the operation of the Linux
kernel. Since some of the files can inadvertently disrupt your
system, it is advisable to read both documentation and source before
actually making adjustments. In any case, be very careful when writing
to any of these files. The entries in /proc may change
slightly between the 2.1.* and the 2.2 kernel, so if there is any
doubt review the kernel documentation in the directory
/usr/src/linux/Documentation. This chapter is heavily
based on the documentation included in the pre 2.2 kernels, and became
part of it in version 2.2.1 of the Linux kernel.
|
4.1 | /proc/sys/fs - File system data |
|
This subdirectory contains specific file system, file handle, inode,
dentry and quota information.
Currently, these files are in /proc/sys/fs:
|
4.1.1 | dentry-state |
|
Status of the directory cache. Since directory entries are
dynamically allocated and deallocated, this file indicates
the current status. It holds six values, in which the last
two are not used and are always zero. The others are listed in
table 4-1
.
|
Table 4-1 |
Status files of the directory cache |
|
File |
Content |
nr_dentry |
Almost always zero |
nr_unused |
Number of unused cache entries |
age_limit |
Time in seconds after the entry may be reclaimed, when memory is short |
want_pages |
internally |
|
|
|
4.1.2 | dquot-nr and dquot-max |
|
The file dquot-max shows the maximum number of cached disk quota
entries.
The file dquot-nr shows the number of allocated disk quota
entries and the number of free disk quota entries.
If the number of available cached disk quotas is very low and you have
a large number of simultaneous system users, you might want
to raise the limit.
|
4.1.3 | file-nr and file-max |
|
The kernel allocates file handles dynamically, but
doesn't free them again at this time.
The value in file-max denotes the maximum number of file handles
that the Linux kernel will allocate. When you get a lot of error
messages about running out of file handles, you might want to raise
this limit. The default value is 4096. To change it, just write the
new number into the file:
|
| # cat /proc/sys/fs/file-max
4096
# echo 8192 > /proc/sys/fs/file-max
# cat /proc/sys/fs/file-max
8192
|
|
This method of revision is useful for all customizable parameters
of the kernel - simply echo the new value to the corresponding
file.
The three values in file-nr denote the number of allocated file
handles, the number of used file handles, and the maximum number of
file handles. When the allocated file handles come close to the
maximum, but the number of actually used ones is far behind, you've
encountered a peak in your usage of file handles and you don't need
to increase the maximum.
However, there is still a per process limit of open files, which
unfortunately can't be changed that easily. It is set to 1024 by
default. To change this you have to edit the files limits.h
and fs.h in the kernel source tree. Finally, change the definition of
NR_OPEN and recompile the kernel.
|
4.1.4 | inode-state, inode-nr and inode-max |
|
As with file handles, the kernel allocates the inode structures
dynamically, but can't free them yet.
The value in inode-max denotes the maximum number of inode
handlers. This value should be 3 to 4 times larger than the value
in file-max, since stdin, stdout, and network sockets also need an
inode struct to handle them. If you regularly run out of inodes,
you should increase this value.
The file inode-nr contains the first two items from inode-state, so we'll skip to that file...
inode-state contains three actual numbers and four dummy values. The
numbers are nr_inodes, nr_free_inodes,
and preshrink (in order of appearance).
|
| nr_inodes |
|
Denotes the number of inodes the system has allocated. This can
be slightly more than inode-max because Linux allocates them one
pageful at a time.
|
| nr_free_inodes |
|
Represents the number of free inodes and preshrink is
nonzero when nr_inodes is greater than
inode-max and the system needs to prune the inode list
instead of allocating more.
|
| super-nr and super-max |
|
Again, super block structures are allocated by the kernel,
but not freed. The file super-max contains the maximum number of
super block handlers, where super-nr shows the number of
currently allocated ones.
Every mounted file system needs a super block, so if you plan to
mount lots of file systems, you may want to increase these
numbers.
|
4.2 | /proc/sys/fs/binfmt_misc - Miscellaneous binary formats |
|
Besides these files, there is the subdirectory
/proc/sys/fs/binfmt_misc. This handles the kernel support for
miscellaneous binary formats.
Binfmt_misc provides the ability to register additional
binary formats to the Kernel without compiling an additional
module/kernel. Therefore, binfmt_misc needs to know magic
numbers at the beginning or the filename extension of the binary.
It works by maintaining a linked list of structs that contain a
description of a binary format, including a magic with size (or the
filename extension), offset and mask, and the interpreter name. On
request it invokes the given interpreter with the original program as
argument, as binfmt_java and binfmt_em86 and
binfmt_mz do. Since binfmt_misc does not
define any default binary-formats, you have to register an additional
binary-format.
There are two general files in binfmt_misc and one file per registered
format. The two general files are register and status.
|
4.2.1 | Registering a new binary format |
|
To register a new binary format you have to issue the command
|
| echo :name:type:offset:magic:mask:interpreter: > /proc/sys/fs/binfmt_misc/register
|
|
with appropriate name (the name for the /proc-dir entry),
offset (defaults to 0, if omitted), magic,
mask (which can be omitted, defaults to all 0xff) and
last but not least, the interpreter that is to be invoked (for
example and testing /bin/echo). Type can be M for
usual magic matching or E for filename extension matching (give
extension in place of magic).
|
4.2.2 | Check or reset the status of the binary format handler |
|
If you do a cat on the file /proc/sys/fs/binfmt_misc/status, you will
get the current status (enabled/disabled) of binfmt_misc. Change the
status by echoing 0 (disables) or 1 (enables) or -1 (caution: this
clears all previously registered binary formats) to status. For
example echo 0 > status to disable binfmt_misc (temporarily).
|
4.2.3 | Status of a single handler |
|
Each registered handler has an entry in /proc/sys/fs/binfmt_misc.
These files perform the same function as status, but their scope is
limited to the actual binary format. By cating this file, you also
receive all related information about the interpreter/magic of the
binfmt.
|
4.2.4 | Example usage of binfmt_misc (emulate binfmt_java) |
|
|
| cd /proc/sys/fs/binfmt_misc
echo ':Java:M::\xca\xfe\xba\xbe::/usr/local/java/bin/javawrapper:' > register
echo ':HTML:E::html::/usr/local/java/bin/appletviewer:' > register
echo ':Applet:M::<!--applet::/usr/local/java/bin/appletviewer:' > register
echo ':DEXE:M::\x0eDEX::/usr/bin/dosexec:' > register
|
|
These four lines add support for Java executables and Java applets
(like binfmt_java, additionally recognizing the .html
extension with no need to put <!--applet> to every applet
file). You have to install the JDK and the shell-script
/usr/local/java/bin/javawrapper too. It works around the brokenness of
the Java filename handling. To add a Java binary, just create a link
to the class-file somewhere in the path.
|
4.3 | /proc/sys/kernel - general kernel parameters |
|
This directory reflects general kernel behaviors. As I've said before,
the contents depend on your configuration. Here you'll find the most
important files, along with descriptions of what they mean and how to
use them.
|
4.3.1 | acct |
|
The file contains three values; highwater, lowwater,
and frequency.
It exists only when BSD-style process accounting is enabled. These
values control its behavior. If the free space on the file system
where the log lives goes below lowwater percentage, accounting
suspends. If it goes above highwater percentage, accounting
resumes. Frequency determines how often you check the amount
of free space (value is in seconds). Default settings are: 4, 2,
and 30. That is, suspend accounting if there is less than 2 percent free;
resume it if we have a value of 3 or more percent; consider information about the
amount of free space valid for 30 seconds
|
4.3.2 | ctrl-alt-del |
|
When the value in this file is 0, ctrl-alt-del is trapped and sent
to the init program to handle a graceful restart. However, when the
value is greater that zero, Linux's reaction to this key combination will be an
immediate reboot, without syncing its dirty buffers.
|
|
When a program (like dosemu) has the keyboard in raw mode,
the ctrl-alt-del is intercepted by the program before it ever
reaches the kernel tty layer, and it is up to the program to decide
what to do with it.
|
|
|
4.3.3 | domainname and hostname |
|
These files can be controlled to set the NIS domainname and
hostname of your box. For the classic darkstar.frop.org a simple:
|
| # echo "darkstar" > /proc/sys/kernel/hostname
# echo "frop.org" > /proc/sys/kernel/domainname
|
|
would suffice to set your hostname and NIS domainname.
|
4.3.4 | osrelease, ostype and version |
|
The names make it pretty obvious what these fields contain:
|
| > cat /proc/sys/kernel/osrelease
2.2.12
> cat /proc/sys/kernel/ostype
Linux
> cat /proc/sys/kernel/version
#4 Fri Oct 1 12:41:14 PDT 1999
|
|
The files osrelease and ostype should be clear
enough. Version needs a little more clarification.
The #4 means that this is the 4th kernel built from
this source base and the date after it indicates the time the
kernel was built. The only way to tune these values is to rebuild
the kernel.
|
4.3.5 | panic |
|
The value in this file represents the number of seconds the kernel
waits before rebooting on a panic. When you use the software
watchdog, the recommended setting is 60. If set to 0, the auto
reboot after a kernel panic is disabled, which is the default
setting.
|
4.3.6 | printk |
|
The four values in printk denote
|
|
|
console_loglevel,
|
|
default_message_loglevel,
|
|
minimum_console_level and
|
|
default_console_loglevel
|
|
|
respectively.
These values influence printk() behavior when printing or
logging error messages, which come from inside the kernel. See
syslog(2) for more information on the different log levels.
|
| console_loglevel |
|
Messages with a higher priority than this will be printed to
the console.
|
| default_message_level |
|
Messages without an explicit priority will be printed with
this priority.
|
| minimum_console_loglevel |
|
Minimum (highest) value to which the console_loglevel can be set.
|
| default_console_loglevel |
|
Default value for console_loglevel.
|
4.3.7 | sg-big-buff |
|
This file shows the size of the generic SCSI (sg) buffer. At this
point, you can't tune it yet, but you can change it at compile time
by editing include/scsi/sg.h and changing the value of
SG_BIG_BUFF.
If you use a scanner with SANE (Scanner Access Now Easy) you
might want to set this to a higher value. Refer to the SANE
documentation on this issue.
|
4.3.8 | modprobe |
|
The location where the modprobe binary is located. The kernel
uses this program to load modules on demand.
|
4.4 | /proc/sys/vm - The virtual memory subsystem |
|
The files in this directory can be used to tune the operation of the
virtual memory (VM) subsystem of the Linux kernel. In addition, one of
the files (bdflush) has some influence on disk usage.
|
4.4.1 | bdflush |
|
This file controls the operation of the bdflush kernel daemon. It
currently contains nine integer values, six of which are actually used
by the kernel. They are listed in table 4-2
.
|
Table 4-2 |
Parameters in /proc/sys/vm/bdflush |
|
Value |
Meaning |
nfract |
Percentage of buffer cache dirty to activate bdflush |
ndirty |
Maximum number of dirty blocks to write out per wake-cycle |
nrefill |
Number of clean buffers to try to obtain each time we call refill |
nref_dirt |
Dirty buffer threshold for activating bdflush when trying to refill buffers. |
dummy |
Unused |
age_buffer |
Time for normal buffer to age before we flush it |
age_super |
Time for superblock to age before we flush it |
dummy |
Unused |
dummy |
Unused |
|
|
|
| nfract |
|
This parameter governs the maximum number of dirty buffers
in the buffer cache. Dirty means that the contents of the
buffer still have to be written to disk (as opposed to a
clean buffer, which can just be forgotten about). Setting
this to a higher value means that Linux can delay disk writes
for a long time, but it also means that it will have to do a
lot of I/O at once when memory becomes short. A lower value
will spread out disk I/O more evenly.
|
| ndirty |
|
Ndirty gives the maximum number of dirty buffers that
bdflush can write to the disk at one time. A high value will
mean delayed, bursty I/O, while a small value can lead to
memory shortage when bdflush isn't woken up often enough.
|
| nrefill |
|
This is the number of buffers that bdflush will add to the list
of free buffers when refill_freelist() is called. It is
necessary to allocate free buffers beforehand, since the
buffers are often different sizes than the memory pages
and some bookkeeping needs to be done beforehand. The
higher the number, the more memory will be wasted and the
less often refill_freelist() will need to run.
|
| nref_dirt |
|
When refill_freelist() comes across more
than nref_dirt dirty buffers, it will wake up bdflush.
|
| age_buffer and age_super |
|
Finally, the age_buffer and age_super parameters govern the
maximum time Linux waits before writing out a dirty buffer
to disk. The value is expressed in jiffies (clockticks), the
number of jiffies per second is 100. Age_buffer is the
maximum age for data blocks, while age_super is for
filesystems meta data.
|
4.4.2 | buffermem |
|
The three values in this file control how much memory should be
used for buffer memory. The percentage is calculated as a
percentage of total system memory.
The values are:
|
| min_percent |
|
This is the minimum percentage of memory that should be
spent on buffer memory.
|
| borrow_percent |
|
When Linux is short on memory, and the buffer cache uses more
than it has been allotted, the memory management (MM) subsystem
will prune the buffer cache more heavily than other memory to
compensate.
|
| max_percent |
|
This is the maximum amount of memory that can be used for
buffer memory.
|
4.4.3 | freepages |
|
This file contains three values: min, low and high:
|
| min |
|
When the number of free pages in the system reaches this number,
only the kernel can allocate more memory.
|
| low |
|
If the number of free pages falls below this point, the kernel
starts swapping aggressively.
|
| high |
|
The kernel tries to keep up to this amount of memory free; if
memory falls below this point, the kernel starts gently swapping
in the hopes that it never has to do really aggressive swapping.
|
4.4.4 | kswapd |
|
Kswapd is the kernel swap out daemon. That is, kswapd is that piece
of the kernel that frees memory when it gets fragmented or
full. Since every system is different, you'll probably want some
control over this piece of the system.
The file contains three numbers:
|
| tries_base |
|
The maximum number of pages kswapd tries to free in one round is
calculated from this number. Usually this number will be divided
by 4 or 8 (see mm/vmscan.c), so it isn't as big as it looks.
When you need to increase the bandwidth to/from swap, you'll want
to increase this number.
|
| tries_min |
|
This is the minimum number of times kswapd tries to free a page
each time it is called. Basically it's just there to make sure
that kswapd frees some pages even when it's being called with
minimum priority.
|
| swap_cluster |
|
This is probably the greatest influence on system
performance.
swap_cluster is the number of pages kswapd writes in
one turn. You'll want this value to be large so that kswapd does
its I/O in large chunks and the disk doesn't have to seek as
often, but you don't want it to be too large since that would
flood the request queue.
|
4.4.5 | overcommit_memory |
|
This file contains one value. The following algorithm is used to
decide if there's enough memory: if the value of
overcommit_memory is positive, then there's always enough
memory. This is a useful feature, since programs often
malloc() huge amounts of memory 'just in case', while they
only use a small part of it. Leaving this value at 0 will lead to
the failure of such a huge malloc(), when in fact the system has
enough memory for the program to run.
On the other hand, enabling this feature can cause you to run out
of memory and thrash the system to death, so large and/or important
servers will want to set this value to 0.
|
4.4.6 | pagecache |
|
This file does exactly the same job as buffermem, only this file
controls the amount of memory allowed for memory mapping and
generic caching of files.
You don't want the minimum level to be too low, otherwise your
system might thrash when memory is tight or fragmentation is
high.
|
4.4.7 | pagetable_cache |
|
The kernel keeps a number of page tables in a per-processor cache
(this helps a lot on SMP systems). The cache size for each
processor will be between the low and the high value.
On a low-memory, single CPU system, you can safely set these values
to 0 so you don't waste memory. It is used on SMP systems so that
the system can perform fast pagetable allocations without having to
aquire the kernel memory lock.
For large systems, the settings are probably fine. For normal
systems they won't hurt a bit. For small systems ( less than 16MB ram) it
might be advantageous to set both values to 0.
|
4.4.8 | swapctl |
|
This file contains no less than 8 variables. All of these values
are used by kswapd.
The first four variables
|
|
|
sc_max_page_age,
|
|
sc_page_advance,
|
|
sc_page_decline and
|
|
sc_page_initial_age
|
|
|
are used to keep track of Linux's page
aging. Page aging is a bookkeeping method to track which pages of
memory are often used, and which pages can be swapped out without
consequences.
When a page is swapped in, it starts at sc_page_initial_age
(default 3) and when the page is scanned by kswapd, its age is
adjusted according to the following scheme:
|
|
|
If the page was used since the last time we scanned, its age
is increased by sc_page_advance (default 3). Where the maximum
value is given by sc_max_page_age (default 20).
|
|
Otherwise (meaning it wasn't used) its age is decreased by
sc_page_decline (default 1).
|
|
|
When a page reaches age 0, it's ready to be swapped out.
The variables sc_age_cluster_fract,
sc_age_cluster_min, sc_pageout_weight and
sc_bufferout_weight, can be used to control kswapd's
aggressiveness in swapping out pages.
Sc_age_cluster_fract is used to calculate how many pages
from a process are to be scanned by kswapd. The formula used is
(sc_age_cluster_fract divided by 1024) times
resident set size
So if you want kswapd to scan the whole process,
sc_age_cluster_fract needs to have a value of 1024. The
minimum number of pages kswapd will scan is represented
by sc_age_cluster_min, which is done so that kswapd will
also scan small processes.
The values of sc_pageout_weight and
sc_bufferout_weight are used to control how many tries
kswapd will make in order to swap out one page/buffer. These values
can be used to fine-tune the ratio between user pages and buffer/cache
memory. When you find that your Linux system is swapping out too many
process pages in order to satisfy buffer memory demands, you may want
to either increase sc_bufferout_weight, or decrease the
value of sc_pageout_weight.
|
4.5 | /proc/sys/dev - Device specific parameters |
|
Currently there is only support for CDROM drives, and for those, there
is only one read-only file containing information about the CD-ROM
drives attached to the system:
|
| >cat /proc/sys/dev/cdrom/info
CD-ROM information, Id: cdrom.c 2.55 1999/04/25
drive name: sr0 hdb
drive speed: 32 40
drive # of slots: 1 0
Can close tray: 1 1
Can open tray: 1 1
Can lock tray: 1 1
Can change speed: 1 1
Can select disk: 0 1
Can read multisession: 1 1
Can read MCN: 1 1
Reports media changed: 1 1
Can play audio: 1 1
|
|
You see two drives, sr0 and hdb, along with
a list of their features.
|
4.6 | /proc/sys/sunrpc - Remote procedure calls |
|
This directory contains four files, which enable or disable debugging
for the RPC functions NFS, NFS-daemon, RPC and NLM. The default values
are 0. They can be set to one to turn debugging on. (The default
value is 0 for each)
|
4.7 | /proc/sys/net - Networking stuff |
|
The interface to the networking parts of the kernel is located in
/proc/sys/net. Table 4-3
shows all
possible subdirectories. You may see only some of them, depending on
your kernel's configuration.
|
Table 4-3 |
Subdirectories in /proc/sys/net |
|
Directory |
Content |
Directory |
Content |
core |
General parameter |
appletalk |
Appletalk protocol |
unix |
Unix domain sockets |
netrom |
NET/ROM |
802 |
E802 protocol |
ax25 |
AX25 |
ethernet |
Ethernet protocol |
rose |
X.25 PLP layer |
ipv4 |
IP version 4 |
x25 |
X.25 protocol |
ipx |
IPX |
token-ring |
IBM token ring |
bridge |
Bridging |
decnet |
DEC net |
ipv6 |
IP version 6 |
|
|
|
|
We will concentrate on IP networking here. Since AX15, X.25, and DEC Net
are only minor players in the Linux world, we'll skip them in this
chapter. You'll find some short info on Appletalk and IPX further on
in this chapter. Review the online documentation and the
kernel source to get a detailed view of the parameters for those
protocols. In this section we'll discuss the subdirectories printed in
bold letters in the table above. As default values are suitable for
most needs, there is no need to change these values.
|
4.7.1 | /proc/sys/net/core - Network core options |
|
|
| rmem_default |
|
The default setting of the socket receive buffer in bytes.
|
| rmem_max |
|
The maximum receive socket buffer size in bytes.
|
| wmem_default |
|
The default setting (in bytes) of the socket send buffer.
|
| wmem_max |
|
The maximum send socket buffer size in bytes.
|
| message_burst and message_cost |
|
These parameters are used to limit the warning messages written to
the kernel log from the networking code. They enforce a rate limit
to make a denial-of-service attack impossible. A higher
message_cost factor, results in fewer messages that will be
written. Message_burst controls when messages will be dropped. The
default settings limit warning messages to one every five seconds.
|
| netdev_max_backlog |
|
Maximum number of packets, queued on the INPUT side, when the interface
receives packets faster than kernel can process them.
|
| optmem_max |
|
Maximum ancillary buffer size allowed per socket. Ancillary data is
a sequence of struct cmsghdr structures with appended data.
|
4.7.2 | /proc/sys/net/unix - Parameters for Unix domain sockets |
|
There are only two files in this subdirectory. They control the delays
for deleting and destroying socket descriptors.
|
4.8 | /proc/sys/net/ipv4 - IPV4 settings |
|
IP version 4 is still the most used protocol in Unix networking. It
will be replaced by IP version 6 in the next couple of years, but for
the moment it's the de facto standard for the internet and is used in
most networking environments around the world. Because of the
importance of this protocol, we'll have a deeper look into the subtree
controlling the behavior of the IPv4 subsystem of the Linux kernel.
|
Figure 4-2 |
|
| The IPv4 subtree of /proc/sys/net |
|
Figure 4-2 shows the relevant files for the IPv4
settings. Since some directories have the same entries, they are shown
only once.
Let's start with the entries in /proc/sys/net/ipv4.
|
4.8.1 | ICMP settings |
|
|
| icmp_echo_ignore_all and icmp_echo_ignore_broadcasts |
|
Turn on (1) or off (0), if the kernel should ignore all ICMP ECHO
requests, or just those to broadcast and multicast addresses.
Please note that if you accept ICMP echo requests with a
broadcast/multi\-cast destination address your network may be used
as an exploder for denial of service packet flooding attacks to
other hosts.
|
| icmp_destunreach_rate, icmp_echoreply_rate, icmp_paramprob_rate and icmp_timeexeed_rate |
|
Sets limits for sending ICMP packets to specific targets. A value of
zero disables all limiting. Any positive value sets the maximum
package rate in hundredth of a second (on Intel systems).
|
4.8.2 | IP settings |
|
|
| ip_autoconfig |
|
This file contains the number one if the host received its IP configuration by
RARP, BOOTP, DHCP or a similar mechanism. Otherwise it is zero.
|
| ip_default_ttl |
|
TTL (Time To Live) for IPv4 interfaces. This is simply the
maximum number of hops a packet may travel.
|
| ip_dynaddr |
|
Enable dynamic socket address rewriting on interface address change. This
is useful for dialup interface with changing IP addresses.
|
| ip_forward |
|
Enable or disable forwarding of IP packages between interfaces.
Changing this value resets all other parameters to their default
values. They differ if the kernel is configured as host or router.
|
| ip_local_port_range |
|
Range of ports used by TCP and UDP to choose the local
port. Contains two numbers, the first number is the lowest port,
the second number the highest local port. Default is 1024-4999.
Should be changed to 32768-61000 for high-usage systems.
|
| ip_no_pmtu_disc |
|
Global switch to turn path MTU discovery off. It can also be set
on a per socket basis by the applications or on a per route
basis.
|
| ip_masq_debug |
|
Enable/disable debugging of IP masquerading.
|
4.8.3 | IP fragmentation settings |
|
|
| ipfrag_high_trash and ipfrag_low_trash |
|
Maximum memory used to reassemble IP fragments. When
ipfrag_high_thresh bytes of memory is allocated for this
purpose, the fragment handler will toss packets until
ipfrag_low_thresh is reached.
|
| ipfrag_time |
|
Time in seconds to keep an IP fragment in memory.
|
4.8.4 | TCP settings |
|
|
| tcp_retrans_collapse |
|
Bug-to-bug compatibility with some broken printers. On retransmit, try
to send larger packets to work around bugs in certain TCP stacks. Can
be turned off by setting it to zero.
|
| tcp_keepalive_probes |
|
Number of keep alive probes TCP sends out, until it decides that the
connection is broken.
|
| tcp_keepalive_time |
|
How often TCP sends out keep alive messages, when keep alive is
enabled. The default is 2 hours.
|
| tcp_syn_retries |
|
Number of times initial SYNs for a TCP connection attempt will be
retransmitted. Should not be higher than 255. This is only the
timeout for outgoing connections, for incoming connections the
number of retransmits is defined by tcp_retries1.
|
| tcp_sack |
|
Enable select acknowledgments after RFC2018.
|
| tcp_timestamps |
|
Enable timestamps as defined in RFC1323.
|
| tcp_stdurg |
|
Enable the strict RFC793 interpretation of the TCP urgent pointer
field. The default is to use the BSD compatible interpretation of the
urgent pointer pointing to the first byte after the urgent data. The
RFC793 interpretation is to have it point to the last byte of urgent
data. Enabling this option may lead to interoperatibility
problems. Disabled by default.
|
| tcp_syncookies |
|
Only valid when the kernel was compiled with
CONFIG_SYNCOOKIES. Send out syncookies when the syn
backlog queue of a socket overflows. This is to ward off the common
'syn flood attack'. Disabled by default.
Note that the concept of a socket backlog is abandoned. This
means the peer may not receive reliable error messages from an
over loaded server with syncookies enabled.
|
| tcp_window_scaling |
|
Enable window scaling as defined in RFC1323.
|
| tcp_fin_timeout |
|
The length of time in seconds it takes to receive a final
FIN before the socket is always closed. This is strictly
a violation of the TCP specification, but required to prevent
denial-of-service attacks.
|
| tcp_max_ka_probes |
|
Indicates how many keep alive probes are sent per slow timer
run. Should not be set too high to prevent bursts.
|
| tcp_max_syn_backlog |
|
Length of the per socket backlog queue. Since Linux 2.2 the backlog
specified in listen(2) only specifies the length of the
backlog queue of already established sockets. When more connection
requests arrive Linux starts to drop packets. When syncookies are
enabled the packets are still answered and the maximum queue is
effectively ignored.
|
| tcp_retries1 |
|
Defines how often an answer to a TCP connection request is
retransmitted before giving up.
|
| tcp_retries2 |
|
Defines how often a TCP packet is retransmitted before giving up.
|
4.8.5 | Interface specific settings |
|
In the directory /proc/sys/net/ipv4/conf you'll find one
subdirectory for each interface the system knows about and one
directory calls all. Changes in the all subdirectory affect all
interfaces, whereas changes in the other subdirectories affect only one
interface.
All directories have the same entries:
|
| accept_redirects |
|
This switch decides if the kernel accepts ICMP redirect messages
or not. The default is 'yes' if the kernel is configured for a
regular host and 'no' for a router configuration.
|
| accept_source_route |
|
Should source routed packages be accepted or declined. The
default is dependent on the kernel configuration. It's 'yes' for
routers and 'no' for hosts.
|
| bootp_relay |
|
Accept packets with source address 0.b.c.d with
destinations not to this host as local ones. It is supposed that a
BOOTP relay daemon will catch and forward such packets.
The default is 0, since this feature is not implemented
yet (kernel version 2.2.12).
|
| forwarding |
|
Enable or disable IP forwarding on this interface.
|
| log_martians |
|
Log packets with source addresses with no known route to kernel log.
|
| mc_forwarding |
|
Do multicast routing. The kernel needs to be compiled with
CONFIG_MROUTE and a multicast routing daemon is required.
|
| proxy_arp |
|
Does (1) or does not (0) perform proxy ARP.
|
| rp_filter |
|
Integer value determines if a source validation should be made.
1 means yes, 0 means no. Disabled by default, but
local/broadcast address spoofing is always on.
If you set this to 1 on a router that is the only connection
for a network to the net, it will prevent spoofing attacks
against your internal networks (external addresses can still be
spoofed), without the need for additional firewall rules.
|
| secure_redirects |
|
Accept ICMP redirect messages only for gateways, listed in
default gateway list. Enabled by default.
|
| shared_media |
|
If it is not set the kernel does not assume that different subnets
on this device can communicate directly. Default setting is 'yes'.
|
| send_redirects |
|
Determines whether to send ICMP redirects to other hosts.
|
4.8.6 | Routing settings |
|
The directory /proc/sys/net/ipv4/route contains several file to
control routing issues.
|
| error_burst and error_cost |
|
These parameters are used to limit the warning messages written to
the kernel log from the routing code. The higher the
error_cost factor is, the fewer messages will be
written. Error_burst controls when messages will be dropped. The
default settings limit warning messages to one every five seconds.
|
| flush |
|
Writing to this file results in a flush of the routing cache.
|
| gc_elastic, gc_interval, gc_min_interval, gc_tresh, gc_timeout |
|
Values to control the frequency and behavior of the garbage
collection algorithm for the routing cache.
|
| max_size |
|
Maximum size of the routing cache. Old entries will be purged
once the cache reached has this size.
|
| max_delay, min_delay |
|
Delays for flushing the routing cache.
|
| redirect_load, redirect_number |
|
Factors which determine if more ICPM redirects should be sent to
a specific host. No redirects will be sent once the load limit or
the maximum number of redirects has been reached.
|
| redirect_silence |
|
Timeout for redirects. After this period redirects will be sent
again, even if this has been stopped, because the load or number
limit has been reached.
|
4.8.7 | Network Neighbor handling |
|
Settings about how to handle connections with direct neighbors (nodes
attached to the same link) can be found in the directory
/proc/sys/net/ipv4/neigh.
As we saw it in the conf directory, there is a default subdirectory
which holds the default values, and one directory for each
interface. The contents of the directories are identical, with the
single exception that the default settings contain additional options
to set garbage collection parameters.
In the interface directories you'll find the following entries:
|
| base_reachable_time |
|
A base value used for computing the random reachable time value
as specified in RFC2461.
|
| retrans_time |
|
The time, expressed in jiffies (1/100 sec), between retransmitted
Neighbor Solicitation messages. Used for address resolution and to
determine if a neighbor is unreachable.
|
| unres_qlen |
|
Maximum queue length for a pending arp request - the number of packets which
are accepted from other layers while the ARP address is still
resolved.
|
| anycast_delay |
|
Maximum for random delay of answers to neighbor solicitation
messages in jiffies (1/100 sec). Not yet implemented (Linux does
not have anycast support yet).
|
| ucast_solicit |
|
Maximum number of retries for unicast solicitation.
|
| mcast_solicit |
|
Maximum number of retries for multicast solicitation.
|
| delay_first_probe_time |
|
Delay for the first time probe if the neighbor is reachable.
(see gc_stale_time)
|
| locktime |
|
An ARP/neighbor entry is only replaced with a new one if the old
is at least locktime old. This prevents ARP cache thrashing.
|
| proxy_delay |
|
Maximum time (real time is random [0..proxytime]) before
answering to an ARP request for which we have an proxy ARP entry.
In some cases, this is used to prevent network flooding.
|
| proxy_qlen |
|
Maximum queue length of the delayed proxy arp timer.
(see proxy_delay).
|
| app_solcit |
|
Determines the number of requests to send to the user level ARP
daemon. Use 0 to turn off.
|
| gc_stale_time |
|
Determines how often to check for stale ARP entries. After an ARP
entry is stale it will be resolved again (which is useful when an IP address
migrates to another machine). When ucast_solicit is greater than 0 it first
tries to send an ARP packet directly to the known host When that
fails and mcast_solicit is greater than 0, an ARP request is broadcasted.
|
4.9 | Appletalk |
|
The /proc/sys/net/appletalk directory holds the Appletalk
configuration data when Appletalk is loaded. The configurable
parameters are:
|
4.9.1 | aarp-expiry-time |
|
The amount of time we keep an ARP entry before expiring
it. Used to age out old hosts.
|
4.9.2 | aarp-resolve-time |
|
The amount of time we will spend trying to resolve an Appletalk
address.
|
4.9.3 | aarp-retransmit-limit |
|
The number of times we will retransmit a query before giving up.
|
4.9.4 | aarp-tick-time |
|
Controls the rate at which expires are checked.
The directory /proc/net/appletalk holds the list of active Appletalk
sockets on a machine.
The fields indicate the DDP type, the local address (in network:node
format) the remote address, the size of the transmit pending queue,
the size of the received queue (bytes waiting for applications to
read) the state and the uid owning the socket.
/proc/net/atalk_iface lists all the interfaces configured for
appletalk.It shows the name of the interface, its Appletalk address,
the network range on that address (or network number for phase 1
networks), and the status of the interface.
/proc/net/atalk_route lists each known network route. It lists the
target (network) that the route leads to, the router (may be directly
connected), the route flags, and the device the route is using.
|
4.10 | IPX |
|
The IPX protocol has no tunable values in proc/sys/net.
The IPX protocol does, however, provide proc/net/ipx. This
lists each IPX socket giving the local and remote addresses in Novell
format (that is network:node:port). In accordance with the strange
Novell tradition, everything but the port is in hex. Not_Connected is
displayed for sockets that are not tied to a specific remote
address. The Tx and Rx queue sizes indicate the number of bytes
pending for transmission and reception. The state indicates the state the
socket is in and the uid is the owning uid of the socket.
The /proc/net/ipx_interface file lists all IPX interfaces. For
each interface it gives the network number, the node number, and
indicates if the network is the primary network. It also indicates
which device it is bound to (or Internal for internal networks) and
the Frame Type if appropriate. Linux supports 802.3, 802.2, 802.2 SNAP
and DIX (Blue Book) ethernet framing for IPX.
The /proc/net/ipx_route table holds a list of IPX routes. For each
route it gives the destination network, the router node (or Directly)
and the network address of the router (or Connected) for internal
networks.
|
|
Summary: |
|
Certain aspects of kernel behavior can be modified at
runtime, without the need to recompile the kernel, or even to reboot the
system. The files in the /proc/sys tree can not only be read,
but also modified. You can use the echo command to write value
into these files, thereby changing the default settings of the kernel.
|
|