IPROUTE2 Utility Suite Howto

Main PolicyRouting.Org Website          PolicyRouting Book


9.0 Obtaining & Compiling IPROUTE2

9.1 IP Command Set

9.2 ip address - protocol address management

9.3 IP Interface Primary and Secondary Addressing:

9.4 ip neighbour --- neighbour/arp table management.

9.5 ip route - routing table management.

9.6 ip rule --- routing policy database management.

9.7 ip tunnel - ip tunnelling configuration

9.8 ip monitor and rtmon --- route state monitoring

9.9 rtacct - route realms and policy propagation

9.10 IP Utility Summary

9.11 IP Usage in Scripting

9.12 IPUP & IPDOWN

9.13 IPNetwork Init Script

9.14 ifcfg script

9.15 arping utility

9.16 Policy Routing - Multiple Route Tables Example

IPROUTE2 Utility Suite Documentation

This docvumentation covers the ip utility from IPROUTE2. This utility is written by Alexey N. Kuznetsov who also wrote the IPv6 and IPv4 routing code for Linux 2.2. This is the utility he uses for manipulating the Linux 2.2-2.6 network interface code.

We will begin by explaining where to obtain the utility collection and how to compile it. After it is compiled we will cover the utilities created and in what location on the system they should reside. This includes all of the utilities in the IPROUTE2 suite.

Then we will begin extensive coverage of the ip command with documentation of usage and examples. This section draws heavily upon Alexey's own documentation of the command with additional discussion and examples. Some of the usages of the command, such as multicast and IPv6 specific usage will be deferred at this point but we will be extending this document with that coverage as time goes on. While this is often what would be found in man pages, no man pages currently exist for the ip command and Alexey's own current documentation is only available in Latex format. With Alexey's permission we have edited and expanded the Latex documentation into the sections found here. If there are errors in these sections they probably belong to Matthew's translation and should be addressed to him first.

To tie together what we have learned about the ip utility we will list a few working examples of the ip utility. These include several longer script examples from Alexey along with some daily usage features of the utility. We then in the Table of Contents list a set of examples from real life that are collected here.

Obtaining & Compiling IPROUTE2

The ip utility is just one of the utilities in the IPROUTE2 utility package from Alexey. The primary FTP site was located in Russia at ftp://ftp.inr.ac.ru/ip-routing/ but is no longer running. The most complete mirror is located at http://www.linuxgrill.com/anonymous/iproute2/ with the newest OSDL source code located within the http://www.linuxgrill.com/anonymous/iproute2/NEW-OSDL/ directory. We will assume that you have obtained the latest package usually called iproute2-current symlinked to the latest dated version. The version we primarily cover here is the 1999-06-30 version of IPROUTE2.

Once the utility has been obtained you need to unpack it into whatever directory you use for compiling source code. The default is to use /usr/src. When you have the package untarred you can enter the directory and just type make. You must have the kernel source code that was used to compile your current running kernel located in /usr/src/linux. You do want to compile a version of your own unless you are using a distribution that includes the utility and you have not remade your kernel. Since one of the best tuning and security functions you can perform on your system is to obtain and compile your own specific kernel you will want to compile this utility also as it is the single most important utility in the IP configuration of your system.

After you have typed make the utility suite will compile. Then we have to install the various parts. There is no install target in the makefile. All of the utilities in this package should be installed into the /sbin directory. This is so that they are available even before your /usr directory is mounted. There is additionally a /etc/iproute2 directory in the package that contains sample definition files. If you do not have a /etc/iproute2/ directory on your system then create one and copy the contents of the package directory to the new directory. If an /etc/iproute2/ directory exists and you do not know what it is being used for then you will want to find out if the files in that directory have some meaning to the system you are running. If not then replacing them with the files in the package directory will not hurt.

In a nutshell we want to perform the following steps:

1. Compile the utilities by typing make

2. Check /etc/iproute2/ with ls -l /etc/iproute2

3. If needed create /etc/iproute2/ with

mkdir /etc/iproute2/

4. Populate it with cp ./etc/iproute2/* /etc/iproute2/

5. Change into the ip directory with cd ip

6. cp ifcfg ip routef routel rtacct rtmon rtpr /sbin

7. Change into tc directory with cd ../tc

8. cp tc /sbin

This will compile the utility and copy the configuration files and the executables into the appropriate directories. We should now be able to execute the ip utility from anywhere on the system by typing ip. To test and see if this worked type ip addr and you should get a list of the interfaces and addresses on your system.

IP Command Set

In this section we will present a comprehensive description of the ip utility from Alexey Kuznetsov's IPROUTE2 package. We will start by going through most of the ip command in extreme detail. We will cover the link, addr, route, rule, neigh, tunnel, and monitor parts of the ip command. The multicast sections will be covered in a "to be added later" section on IPv6 and multicasting.

We will first go through all of the command syntax of the ip command. This is due to the situation, current as of February 2000, that there are no man pages for ip and the documentation is only available in Latex format. If you have read the ip-cref.tex document that Alexey has written as included in 1999-06-30 distribution of IPROUTE2 then feel free to just skim through most of this section. Matthew has extended the discussion and examples somewhat but the core is taken from ip-cref.tex. If you have any questions or comments about the examples or statements in this section please direct them to Matthew. Note also that by the time you read this the ip command may have changed for 2.3/2.4. As it changes we will attempt to keep this document current.

IP Global Command Syntax

The generic form of the ip command is

ip [ OPTIONS ] OBJECT [ COMMAND [ ARGUMENTS ]]

OPTIONS:

OPTIONS is a multivalued set of modifiers that affect the general behaviour and output of the ip utility. All options begin with the "-" character and may be used both in long and abbreviated form. Currently the following options are available

-V, -Version --- print the version of the ip utility and exit.

-s, -stats, -statistics --- output more information.

This option may be repeated to increase the verbosity level of the output. As a rule the additional information is device or function statistics or values. In many cases the values output should be considered in the same sense as output from the /proc/ directory where the name of the value is not directly related to the value itself. See later when we run this option with different network device drivers.

-f, -family {inet, inet6, link} --- enforce which protocol family to use.

If this option is not present, the protocol family output to use is guessed from the other command line arguments. If the rest of command line does not provide sufficient information to guess a protocol family, the ip command falls back to a default family of inet in the case of network protocols or to any. Link is a special family identifier meaning that no networking protocol is involved. There are several shortcuts for this option and they are as listed here:

-4 --- shortcut for -family inet.

-6 --- shortcut for -family inet6.

-0 --- shortcut for -family link.

-o, -oneline --- format the output records as single lines by replacing any line feeds with the "\" character.

This option is to provide a convenient method for sending the command output through a pipe. IE: When you want to count the number of output records with wc or you want to to grep through the output. As of 1999-06-30 the IPROUTE2 utility package includes the trivial script rtpr to convert the output back to the original readable form.

-r, -resolve --- use system name resolution to output DNS names

Do not use this option if you are reporting bugs with the ip utility or querying for usage advice. ip itself never uses DNS to resolve names to addresses. This option exists for the administrators convenience only.

OBJECT:

OBJECT is the object type on which you wish to operate on or obtain information about. The object types understood by the current ip utility are link, address, neighbor, route, rule, maddress, mroute, and tunnel.

link --- physical or logical network device.

address --- protocol (IPv4 or IPv6) address on a device.

neighbour --- ARP or NDISC cache entry.

route --- routing table entry.

rule --- rule in routing policy database.

maddress --- multicast address.

mroute --- multicast routing cache entry.

tunnel --- tunnel over IP.

The names of all of the objects may be written in full or abbreviated form. IE: address may be abbreviated as addr or just a. However if you use these commands within scripts you should make it a habit to always use the full specification of the action. Using the abbreviation makes it easy to use on the command line but hard to understand the logic within scripts. Since you may not be the only person who ever has to deal with your scripts then you should strive to make them as complete as possible.

COMMAND:

COMMAND specifies the action to perform on the object. The set of possible actions depends on the object type. Typically it is possible to add, delete, and show (list) the object(s), but some objects will not allow all of these operations and many have additional actions and commands. Note that the command syntax help which is available for all objects prints out the full list of available commands and argument syntax conventions. If no command is given a default command is assumed. The default command is usually show (list) but if the objects of the class cannot be listed then the default is to print out the command syntax help.

ARGUMENTS:

ARGUMENTS is the list of command options specific to the command. The arguments depend on the command and the object. There are two types of arguments that can be issued:

--- flags - which are abbreviated with a single keyword

--- parameters - consisting of a keyword followed by a value

Each command has a default parameter which is used if the arguments are omitted. IE: The dev parameter is the default for the ip link command thus ip link list eth0 is equivalent to ip link list dev eth0. Within all the command descriptions below we distinguish default parameters with the marker (default).

As we mentioned above for the names of objects, all keywords may be abbreviated with the first or first few unique letters. These shortcuts are convenient when ip is used interactively, but they are not recommended for use in scripts and please do not use them when reporting bugs or asking for help. Officially allowed abbreviations are listed along with the first mention of the command.

Error Conditions

The ip command most commonly fails for the following reasons:

* Wrong command line syntax

This is often due to using an unknown keyword, a wrongly formatted IP address, wrong keyword argument for the command, etc. In this case the ip command exits without performing any actions and prints out an error message containing information about the reason for failure. In some cases it prints out the command syntax help.

* The arguments did not pass self-consistency verification

* ip failed to compile a kernel request from the arguments due to insufficient user provided information

* Kernel returned an error to a syscall. In this case ip prints the error message as it was output from perror(3), prefixed with a comment and the syscall identifier.

* Kernel returned an error to a RTNETLINK request. In this case ip prints the error message as it was output from perror(3) prefixed with "RTNETLINK answers".

Note that all ip command operations are atomic. This means that if the ip command fails it does not change anything in the system. One harmful exception is the ip link command which may change only part of the device parameters given on the command line. We will mention this again in the section on ip link usage and reccomend that all ip link actions be performed individually. This is actually a preferred use for the ip command in general. If you need to perform many repetitions of the command use a script loop or a script as then any generated error messages can be associated with the appropriate ip command action.

It is difficult to list all possible error messages especially the syntax errors. As a rule their meaning should be clear from the context of the command that was issued. For example if we issue the command ip link sub eth0 with the obvious misspelling of set then we get the error message "Command "sub" is unknown, try "ip link help"" which should prompt us to check our command syntax.

In using the ip command there are several facilities that need to be present in order for the command to perform its functions. The ip command talks to the kernel through the NETLINK interface. This is turned on by the NETLINK options which are enabled in the kernel compile. If the ip command does not work or you get an error message then you may not have the needed functions defined or your kernel is not the one you compiled. The most common mistakes are:

* NETLINK is not configured in the kernel. The error message is

"Cannot open netlink socket Invalid value"

* RTNETLINK is not configured in the kernel.

In this case one of the following messages may be printed depending on the actual command issued:

"Cannot talk to rtnetlink Connection refused"

"Cannot send dump request Connection refused"

ip link - network device configuration

A link refers a network device. The ip link object and the corresponding command set allows viewing and manipulating the state of network devices. The commands for the link object are just two, set and show.

ip link set --- change device attributes.

Abbreviations: set, s

Warning

You can request multiple parameter changes with ip link. If you request multiple parameter changes and any ONE change fails then ip aborts immediately after the failure thus the parameter changes previous to the failure have completed and are not backed out on abort. This is the only case where using the ip command can leave your system in an unpredictable state. The solution is to avoid changing multiple parameters with one ip link set call. Use as many individual ip link set commands as necessary to perform the actions you desire.

Arguments:

* dev NAME (default) --- NAME specifies the network device to operate on

* up / down --- change the state of the device to UP or to DOWN

* arp on / arp off --- change NOARP flag status on the device

Note that this operation is not allowed if the device is already in the UP state. Since neither the ip utility nor the kernel check for this condition, you can get very unpredictable results changing the flag while the device is running. It is better to set the device down then issue this command.

* multicast on / multicast off --- change MULTICAST flag on the device.

* dynamic on / dynamic off --- change DYNAMIC flag on the device.

* name NAME --- change name of the device.

Note that this operation is not recommended if the device is running or has some addresses already configured. You can break your systems security and screw up other networking daemons and programs by changing the device name while the device is running or has addressing assigned.

* txqueuelen NUMBER / txqlen NUMBER --- change transmit queue length of the device

* mtu NUMBER --- change MTU of the device.

* address LLADDRESS --- change station address of the interface.

* broadcast LLADDRESS, brd LLADDRESS or peer LLADDRESS --- change link layer broadcast address or peer address in the case of a POINTOPOINT interface

Note that for most physical network devices (Ethernet, TokenRing, etc) changing the link layer broadcast address will break networking. Do not use this argument if you do not understand what this operation really does.

* The ip command does not allow changing the PROMISC or ALLMULTI flags as these flags are considered obsolete and should not be changed administratively.

Examples:

ip link set dummy address 000000000001 --- change station address of the interface dummy.

ip link set dummy up --- start the interface dummy.

ip link show --- look at device attributes.

Abbreviations: show, list, lst, sh, ls, l

Arguments:

* dev NAME (default) --- NAME specifies network device to show.

If this argument is omitted, the command lists all the devices.

* up --- display only running interfaces.

Output:

kuznet@alisa~:$ ip link ls dummy

2: dummy: <BROADCAST,NOARP> mtu 1500 qdisc noop

link/ether 000000000000 brd ffffffffffff

The number followed by a colon is the interface index or ifindex. This number uniquely identifies the interface. If you look at the output from cat /proc/net/dev you will see that the network devices are listed in the same order as the numbering you see here. After the ifindex is the interface name (eth0, sit0 etc.). The interface name is also unique at any given moment, however interfaces may disappear from the list, such as when the corresponding driver module is unloaded, and another interface with the same name will be created later. Additionally with the ip link set DEVICE name NEWNAME command the system administrator may change the devices name.

The interface name may also have another name or the keyword NONE appended after an "@" sign. This signifies that this device is bound to another device in a master/slave device relationship. Thus packets sent through this device are encapsulated and forwarded on via the master device. If the name is NONE, then the master device is unknown.

After the interface name we see the interface mtu (maximal transfer unit) which determines maximal size of data packet which can be sent as a single packet over this interface.

The qdisc (queuing discipline) shows which queuing algorithm is used on the interface. In particular the keyword noqueue means that this interface does not queue anything and the keyword noop indicates that the interface is in blackhole mode in which all of the packets sent to it are immediately discarded.

The qlen indicates the default transmit queue length of the device measured in packets.

Following all of this inormation is a section within angle brackets. Within the angle brackets is where the interface flags are summarized. The most applicable flags are as follows:

UP --- this device is turned on, ready to accept packets for transmission onto the network and it may receive packets from other nodes on the network.

LOOPBACK --- the interface does not communicate to another hosts. All the packets which are sent through it will be returned back to the sender and nothing but bounced back packets can be received.

BROADCAST --- this device has the facility to send packets to all other hosts sharing the same physical link. Example: Ethernet

POINTOPOINT --- the network has only two ends with two nodes attached. All the packets sent to the link will reach the peer link and all packets received are origined by the peer.

If neither LOOPBACK nor BROADCAST nor POINTOPOINT are set, the interface is assumed to be a NBMA (Non-Broadcast Multi-Access) link. NBMA is the most generic type of device and also the most complicated type of device because a host attached to a NBMA link cannot send information to any other host without additional manually provided configuration information.

MULTICAST --- an advisory flag noting the interface is aware of multicasting. Broadcasting is particular case of multicasting where the multicast group contains all of the nodes on the link as members. Note that software must NOT interpret the absence of this flag as the incapability of the interface to multicast. Any POINTOPOINT and BROADCAST link is multicasting by definition because we have direct access to all the link neighbours and thus to any particular group of them. The use of high bandwidth multicast transfers is not recommended on broadcast-only networks due to the high expenses associated with the transmission, but such use is not strictly prohibited.

PROMISC --- the device listens and feeds to the kernel all of the traffic on the link. This includes every packet on the network that passes our transceiver. Usually this mode exists only on broadcast links and is used by bridges and network monitoring devices.

ALLMULTI --- the device receives all multicast packets wandering on the link. This mode is used by multicast routers.

NOARP --- this flag is different from the other flags. It has no invariant value and its interpretation depends on network protocols involved. As a rule it indicates that the device does not need any address resolution and that the software or hardware knows how to deliver packets without any help from the protocol stacks.

DYNAMIC --- is an advisory flag marking this interface as dynamically created and destroyed.

SLAVE --- this interface is bonded to other interfaces in order to share link capacities.

Other flags do exist and can be seen in within the angle brackets but they are either obsolete (NOTRAILERS), not implemented (DEBUG), or specific to certain devices (MASTER, AUTOMEDIA and PORTSEL). We will not discuss them here. Additionally the values of the PROMISC and ALLMULTI flags as shown by the ifconfig utility and by the ip utility are different. The ip link list command provides the current true device state, whereas ifconfig shows the flag state which was set through ifconfig itself.

The second line of the output from the example contains information about the link layer addresses associated with the device. The first word (ether, sit) defines the interface hardware type which then determines the format and semantics of the addresses and thus logically is part of the address itself. The default format of station and broadcast addresses (or peer addresses for pointopoint links) is a sequence of hexadecimal bytes separated by colons. However some link types may instead have their own natural address formats which are used in the presentation. IE: The addresses of IP tunnels are printed as dotted-quad IP addresses. While NBMA links have no well-defined broadcast or peer address, this field may contain useful information such as the address of a broadcast relay or the address of an ARP server. Multicast addresses are not shown by this command, see ip maddr list output.

When given the option -statistics ip will print the interface statistics as additional information in the listing. Note that you can give this option multiple times with each repetition increasing the verbosity of output.

kuznet@alisa~ $ ip -s link ls eth0

3: eth0: <BROADCAST,MULTICAST,UP> mtu 1500 qdisc cbq qlen 100

link/ether 00a0cc661878 brd ffffffffffff

RX bytes packets errors dropped overrun mcast

2449949362 2786187 0 0 0 0

TX bytes packets errors dropped carrier collsns

178558497 1783945 332 0 332 35172

The RX and TX lines summarize receiver and transmitter statistics. The information output breaks down into:

bytes --- total number of bytes received or transmitted on the interface.

This number wraps when the maximal length of the natural data type on the architecture is exceeded. In order to provide correct long term data from this output these statistics should be continuously monitored. Continuous monitoring of this data requires a user level daemon to sample the output periodically.

packets --- total number of packets received or transmitted on the interface.

errors --- total number of receiver or transmitter errors.

dropped --- total number of packets dropped because of lack of resources.

overrun --- total number of receiver overruns resulting in packet drops. As a rule if the interface is overrun you have a serious problem either within the kernel or your machine is too slow to handle the speed of this interface.

mcast --- total number of received multicast packets. This option is supported only on certain devices.

carrier --- total number of link media failures such as those due to lost carrier.

collsns --- total number of collision events on Ethernet-like media. This number has different interpretations on other link types.

compressed --- total number of compressed packets. It is available only for links using VJ header compression.

When you issue the -statistics option more than once you get additional output depending on the statistics supported by the device itself as in the following example with Ethernet:

kuznet@alisa~ $ ip -s -s link ls eth0

3: eth0: <BROADCAST,MULTICAST,UP> mtu 1500 qdisc cbq qlen 100

link/ether 00a0cc661878 brd ffffffffffff

RX bytes packets errors dropped overrun mcast

2449949362 2786187 0 0 0 0

RX errors length crc frame fifo missed

0 0 0 0 0

TX bytes packets errors dropped carrier collsns

178558497 1783945 332 0 332 35172

TX errors aborted fifo window heartbeat

0 0 0 332

In this case the error names are pure Ethernetisms. Other devices may have non-zero fields in these positions but the headers are generated independantly of the device responses. It is up to the device driver to send more appropriate error messages to the system logging facility such as is done by the TokenRing driver.

ip address - protocol address management

Abbreviations: address, addr, a

Arguments: add, delete, flush, show (list)

The address refers to a protocol (IP or IPv6) address attached to a network device. Each device must have at least one address in order to use the corresponding protocol. It is possible to have several different addresses attached to one device. These addresses are not discriminated within the protocol structure so that the term alias is not quite appropriate for such multiple addresses and we will not refer to this situation in those terms.

The ip addr command allows you to look at the addresses and their properties on an interface. You can add new addresses and delete old ones without regard to any ordering. Later on we will discuss the concept of primary and secondary addresses as applied to Linux.

ip address add --- add new protocol address.

Abbreviations: add, a

Arguments:

dev NAME --- name of the device to which we add the address

local ADDRESS (default) --- address of the interface.

The format of the address depends on the protocol. IPv4 uses dotted quad and IPv6 uses a sequence of hexadecimal halfwords separated by colons. The ADDRESS may be followed by a slash and a decimal number, which encodes network prefix (netmask) length in CIDR notation. If no CIDR netmask notation is specified then the command assumes a host (/32 mask) address is specified.

peer ADDRESS--- address of remote endpoint for pointopoint interfaces. Again, the ADDRESS may be followed by a slash and decimal number, encoding the network prefix length. If a peer address is specified then the local address cannot have a network prefix length as the network prefix is associated with the peer rather than with the local address. In other words, netmasks can only be assigned to peer addresses when specifying both peer and local addresses.

broadcast ADDRESS --- broadcast address on the interface.

The special symbols "+" and "-" can be used instead of specifying the broadcast address. In this case the broadcast address is derived by either setting all of the interface host bits to one (+) or by setting all of the interface host bits to zero (-). In most modern implementations of IPv4 networking you will want to use the (+) setting. See the ipup init script in Chapter 15. Unlike ifconfig, the ip command does not set a broadcast address unless explicitly requested.

label NAME --- Each address may be tagged with a label string.

In order to preserve compatibility with Linux-2.0 net aliases, this string must coincide with the name of the device or must be prefixed with device name followed by a colon. (eth0:duh)

scope SCOPE_VALUE --- scope of the area within which this address is valid.

The available scopes are listed in the file

/etc/iproute2/rt_scopes. The predefined scope values are:

global --- the address is globally valid.

site --- (IPv6 only) address is site local, valid only inside this site.

link --- the address is link local, valid only on this device.

host --- the address is valid only inside this host.

Examples:

ip addr add 127.0.0.1/8 dev lo brd + scope host

--- adds the usual loopback address to loopback device. The device must be enabled before this address will show up.

ip addr add 10.0.0.1/24 brd + dev eth0

--- adds address 10.0.0.1 with prefix length 24 (netmask 255.255.255.0) and standard broadcast to interface eth0

ip address delete --- delete protocol address.

Abbreviations: delete, del, d

Arguments:

The arguments coincide with arguments of ip addr add. The device name is a required argument, the rest are optional. If no arguments are given, the first address listed is deleted.

Examples:

ip addr del 127.0.0.1/8 dev lo

--- deletes the loopback address from loopback device.

Alexey states:

"It would be better not to try to repeat this experiment 8-}"

Delete all IPv4 addresses on interface eth0:

while ip -f inet addr del dev eth0; do

nothing

done

Another method to disable IP on an interface using ip addr flush is discussed later.

ip address show --- look at protocol addresses.

Abbreviations: show, list, lst, sh, ls, l

Arguments:

dev NAME (default) --- name of the device.

scope SCOPE_VAL --- list only addresses with this scope.

to PREFIX --- list only addresses matching this prefix.

label PATTERN --- list only addresses with labels matching the PATTERN.

PATTERN is the usual shell regexp style pattern.

dynamic / permanent --- (IPv6 only) list only addresses installed due to stateless address configuration or list only the permanent (not dynamic) addresses.

tentative --- (IPv6 only) list only addresses, which did not pass duplicate address detection.

deprecated --- (IPv6 only) list only deprecated addresses.

primary / secondary --- list only primary (or secondary) addresses.

Example:

kuznet@alisa~ $ ip addr ls eth0

3: eth0: <BROADCAST,MULTICAST,UP> mtu 1500 qdisc cbq qlen 100

link/ether 00a0cc661878 brd ffffffffffff

inet 193.233.7.90/24 brd 193.233.7.255 scope global eth0

inet6 3ffe2400012a0ccfffe661878/64 scope global dynamic

valid_lft forever preferred_lft 604746sec

inet6 fe802a0ccfffe661878/10 scope link

The first two lines coincide with the output of ip link list as it is only natural to interpret link layer addresses as being addresses of the protocol family AF_PACKET. The list of IPv4 and IPv6 addresses follows accompanied by additional attributes such as scope value, flags, and address label. Address flags are set by the kernel and cannot be changed administratively. Currently the following flags are defined:

secondary --- this address is not used when selecting the default source address for outgoing packets. An IP address becomes secondary if another address within the same prefix (network) already exists. The first address within the prefix is primary and is the tag address for the group of all the secondary addresses. When the primary address is deleted all of the secondaries are purged too. See the examples for the actual functionality of these steps.

dynamic --- the address was created due to stateless autoconfiguration. In this case the output also contains information on the times for which the address remains valid. After the preferred lifetime (preferred_lft) expires the address is moved to the deprecated state and after the valid lifetime (valid_lft) expires the address is finally invalidated.

deprecated --- the address is deprecated. It is still valid but cannot be used by newly created connections. See dynamic above.

tentative --- the address is not used because duplicate address detection is still not complete or has failed.

IP Interface Primary and Secondary Addressing:

To explain the actual relationship between primary and secondary addresses we will run the following experiment.

ip addr add 10.1.1.1/24 dev dummy

ip addr add 10.1.1.2/24 dev dummy

Now look at the output:

ip addr list dummy


3: dummy: <BROADCAST,MULTICAST,NOARP> mtu 1500 qdisc noop

link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff

inet 10.1.1.1/24 scope global dummy

inet 10.1.1.2/24 scope global secondary dummy

Now add in some addresses still in that network but add them as host addresses:

ip addr add 10.1.1.3/32 dev dummy

ip addr add 10.1.1.4/25 dev dummy

And run our list command:

ip addr list dummy


3: dummy: <BROADCAST,MULTICAST,NOARP> mtu 1500 qdisc noop

link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff

inet 10.1.1.1/24 scope global dummy

inet 10.1.1.3/32 scope global dummy

inet 10.1.1.4/25 scope global dummy

inet 10.1.1.2/24 scope global secondary dummy

And finally delete the primary address

ip addr del 10.1.1.1/24 dev dummy

Run the list command:

ip addr list dummmy


3: dummy: <BROADCAST,MULTICAST,NOARP> mtu 1500 qdisc noop

link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff

inet 10.1.1.3/32 scope global dummy

inet 10.1.1.4/25 scope global dummy

Note that the most important part of what we said above about secondary and primary addresses is the prefix (netmask) length. Even though technically you can consider the address 10.1.1.3 to belong within the network prefix 10.1.1.0/24, the actual prefix associated with the address is /32 so this address is treated independantly of the initial primary address. If you are still uncertain about why sit down and calculate out the networks and masks of the example above.

What we are showing here is that unlike the behaviour in the 2.0 series kernels under the horrid eth0:xx style aliasing is that multiple addresses on an interface are not neccesarily related. So if you want to (and we will show an example in the howto section) you can enter in all of your ip addresses without network masks and treat them completely independantly.

ip address flush --- flush protocol addresses.

Abbreviations: flush, f

Arguments:

This commands flushes protocol addresses selected by some criteria. This command has the same arguments as show. The major difference is that this command will not run if no arguments are given. Otherwise you could delete all of your addresses by mistake. This command (and the other flush commands described below) are very dangerous. If you make a mistake the command does not ask or forgive but really will creully purge all of your addresses. Be warned!

With the option -statistics the command becomes verbose and prints out the number of deleted addresses and number of processing rounds made in order to flush the address list. If the -statistics option is given twice then ip addr flush also dumps all of the deleted addresses in the full format as described in the ip addr list section.

Examples:

Delete all the addresses from private network 10.0.0.0/8:

netadm@amber~ # ip -stat -stat addr flush to 10/8

2 dummy inet 10.7.7.7/16 brd 10.7.255.255 scope global dummy

3 eth0 inet 10.10.7.7/16 brd 10.10.255.255 scope global eth0

4 eth1 inet 10.8.7.7/16 brd 10.8.255.255 scope global eth1

***Round 1, deleting 3 addresses***

***Flush is complete after 1 round***



Another instructive example is deleting all IPv4 addresses from all Ethernet interfaces in the system:

netadm@amber~ # ip -4 addr flush label "eth*"

And the last example shows how to flush all the IPv6 addresses acquired by the host from stateless address autoconfiguration after enabling forwarding or disabling autoconfiguration.

netadm@amber~ # ip -6 addr flush dynamic

ip neighbour --- neighbour/arp table management.

Abbreviations: neighbour, neighbor, neigh, n

The neighbour table objects establish bindings between protocol addresses and link layer addresses for hosts sharing the same physical link. Neighbour object entries are organized into tables. The IPv4 neighbour object table is known under another name as the ARP table. These commands allow you to look at the neighbour table bindings and their properties, to add new neighbour table entries, and to delete old ones.

Arguments:

add, change, replace, delete, flush and show (list)

ip neighbour add --- add new neighbour entry

ip neighbour change --- change existing entry

ip neighbour replace --- add new or change existing entry

add, a; change, chg; replace, repl

These commands create new neighbour records or update existing ones.

to ADDRESS (default) --- protocol address of the neighbour. It is either an IPv4 or IPv6 address.

dev NAME --- the interface to which this neighbour is attached

lladdr LLADDRESS --- link layer address of the neighbour. LLADDRESS can be null.

nud NUD_STATE --- state of the neighbour entry. nud is an abbreviation for "Neighbour Unreachability Detection". This state can take one of the following values:



permanent --- the neighbour entry is valid forever and can be removed only administratively.

noarp --- the neighbour entry is valid, no attempts to validate this entry will be made but it can be removed when its lifetime expires.

reachable --- the neighbour entry is valid until reachability timeout expires.

stale --- the neighbour entry is valid, but suspicious. This option to ip neighbour does not change the neighbour state if the entry was valid and the address has not been changed by this command.

Examples:

ip neigh add 10.0.0.3 lladdr 000001 dev eth0 nud perm

--- add permanent ARP entry for neighbour 10.0.0.3 on the device eth0.

ip neigh chg 10.0.0.3 dev eth0 nud reachable

--- change its state to reachable.

ip neighbour delete --- delete neighbour entry.

Abbreviations: delete, del, d.

This command invalidates a neighbour entry.

The arguments are the same as with ip neigh add, only lladdr and nud are ignored.

Example:

ip neigh del 10.0.0.3 dev eth0

--- invalidate ARP entry for neighbour 10.0.0.3 on the device eth0.

Deleted neighbour entry will not disappear from the tables immediately; if it is in use it cannot be deleted until the last client will release it, otherwise it will be destroyed during the next garbage collection.

WARNING!

Attempts to delete or to change manually a noarp entry created by kernel may result in unpredictable behaviour. More specifically the kernel may start trying to resolve this address even on NOARP interfaces or change the address to multicast or broadcast.

ip neighbour show --- list neighbour entries.

Abbreviations: show, list, sh, ls.

This commands displays neighbour tables.

Arguments:

to ADDRESS (default) --- prefix selecting neighbours to list.

dev NAME --- list only neighbours attached to this device.

unused --- list only neighbours, which are not in use now.

nud NUD_STATE --- list only neighbour entries in this state. NUD_STATE takes values listed below after the example or the special value all, which means all the states. This option may occur more than once. If this option is absent, ip lists all the entries except for none and noarp.

Example:

kuznet@alisa~ $ ip neigh ls

dev lo lladdr 000000000000 nud noarp

fe80200cfffe763f85 dev eth0 lladdr 00000c763f85 router nud stale

0.0.0.0 dev lo lladdr 000000000000 nud noarp

193.233.7.254 dev eth0 lladdr 00000c763f85 nud reachable

193.233.7.85 dev eth0 lladdr 00e01e633900 nud stale

kuznet@alisa~ $

The first word of each line is the protocol address of the neighbour followed by the device name. The rest of the line describes the contents of neighbour entry identified by the pair (device, address).

lladdr is link layer address of the neighbour.

nud is the state of ``neighbour unreachability detection for this entry. The full list of the possible nud states with minimal descriptions are:

none --- state of the neighbour is void.

incomplete --- the neighbour is in process of resolution.

reachable --- the neighbour is valid and apparently reachable.

stale --- the neighbour is valid, but probably it is already unreachable, so that kernel will try to check it at the first transmission.

delay --- a packet has been sent to the stale neighbour, kernel waits for confirmation.

probe --- delay timer expired, but no confirmation was received. Kernel has started to probe neighbour with ARP/NDISC messages.

failed --- resolution has failed.

noarp --- the neighbour is valid, no attempts to check the entry will be made.

permanent --- it is noarp entry, but only administrator may remove the entry from neighbour table.

Link layer address is valid in all the states except for none, failed and incomplete.

IPv6 neighbours can be marked with the additional flag router, which means that that neighbour introduced itself as an IPv6 router.

Option -statistics provides some usage statistics,

kuznet@alisa~ $ ip -s n ls 193.233.7.254

193.233.7.254 dev eth0 lladdr 00000c763f85 ref 5 used 12/13/20 \

nud reachable

kuznet@alisa~ $

Here ref is number of users of this entry, and used is a triplet of time intervals in seconds separated by slashes. The triplet of numbers is coded as {used/confirmed/updated}. In this example they show that

The entry was used 12 seconds ago.

The entry was confirmed 13 seconds ago.

The entry was updated 20 seconds ago.

ip neighbour flush --- flush neighbour entries.

Abbreviations: flush, f.

This commands flushes the neighbour tables. Entries may be selected to flush by various criteria.

This command has the same arguments as show. Note that it will not run when no arguments are given, and that the default neighbour states to be flushed do not include permanent or noarp.

With the option -statistics the command becomes verbose and prints out the number of deleted neighbours and number of rounds made in flushing the neighbour table. If the option is given twice, ip neigh flush also dumps all the deleted neighbours in the format described in the previous subsection.

netadm@alisa~ # ip -s -s n f 193.233.7.254

193.233.7.254 dev eth0 lladdr 00000c763f85 ref 5 used 12/13/20 \

nud reachable

***Round 1, deleting 1 entries***

***Flush is complete after 1 round***



ip route - routing table management.

Abbreviations: route, ro, r.

This command manages the route entries within the kernel routing tables. The kernel routing tables keep information about protocol paths to other networked nodes.

Each route entry has a key consisting of the protocol prefix, which is the pairing of the network address and network mask length, and optionally the Type of Service (TOS) value. An IP packet matches to the route if the highest bits of the packets destination address are equal to the route prefix at least up to the prefix length and if the TOS of the route is zero or equal to TOS of the packet.

If several routes match to the packet, the following pruning rules are used to select the best one:

1. The longest matching prefix is selected, all shorter ones are dropped.

2. If the TOS of some route with the longest prefix is equal to TOS of the packet then routes with different TOS are dropped.

3. If no exact TOS match was found and routes with TOS=0 exist, the rest of the routes are pruned. Otherwise the route lookup fails.

4. If several routes remain after steps 1-4 have been tried then routes with the best preference value are selected.

5. If we still have several routes then the first of them is selected.

Note the ambiguity of action 5. Unfortunately, Linux historically allowed such a bizarre situation. The sense of the word "the first" depends on the literal order in which the routes were added to the routing table and it is practically impossible to maintain a bundle of such routes in any such order.

For simplicity we will limit ourselves to the case wherein such a situation is impossible and routes are uniquely identified by the triplet of {prefix, tos, preference}. Using the ip command for route creation and manipulation makes it impossible to create such non-unique routes.

One useful exception to this rule is the default route on non-forwarding hosts. It is "officially" allowed to have several fallback routes in cases when several routers are present on directly connected networks. In this case Linux performs "dead gateway detection" as controlled by neighbour unreachability detection and references from the transport protocols to select the working router thus the ordering of the routes is not essential. However in this specific case it is not recommended that you manually fiddle with default routes but instead use the Router Discovery protocol. Actually Linux IPv6 does not even allow user level applications access to default routes.

Of course the route selection steps above are not performed in exactly this sequence. The routing table in the kernel is kept in a data structure which allows achieving the final result with minimal cost. Without depending on any particular routing algorithm implemented in the kernel we can summarize the sequence above as: Route is identified by triplet {prefix,tos,preference} key which uniquely locates the route in the routing table.

Route attributes: Each route key refers to a routing information record. The routing information record contains the data required to deliver IP packets, such as output device and next hop router, and additional optional attributes, such as path MTU or the preferred source address for communicating to that destination.

Route types: It is important that the set of required and optional attributes depends on the route type. The most important route type is a unicast route which describes real paths to another hosts. As a general rule, common routing tables only contain unicast routes. However other route types with different semantics do exist. The full list of types understood by the Linux 2.2 kernel is:

unicast --- the route entry describes real paths to the destinations covered by route prefix.

unreachable --- these destinations are unreachable; packets are discarded and the ICMP message host unreachable (ICMP Type 3 Code 1) is generated. The local senders get error EHOSTUNREACH.

blackhole --- these destinations are unreachable; packets are silently discarded. The local senders get error EINVAL.

prohibit --- these destinations are unreachable; packets are discarded and the ICMP message communication administratively prohibited (ICMP Type 3 Code 13) is generated. The local senders get error EACCES.

local --- the destinations are assigned to this host, the packets are looped back and delivered locally.

broadcast --- the destinations are broadcast addresses, the packets are sent as link broadcasts.

throw --- special control route used together with policy rules. If a throw route is selected then lookup in this particular table is terminated pretending that no route was found. Without any policy routing it is equivalent to the absence of the route in the routing table, the packets are dropped and ICMP message net unreachable (ICMP Type 3 Code 0) is generated. The local senders get error ENETUNREACH.

nat --- special NAT route. Destinations covered by the prefix are considered as dummy (or external) addresses, which require translation to real (or internal) ones before forwarding. The addresses to translate to are selected with the attribute via.

anycast --- (not implemented) the destinations are anycast addresses assigned to this host. They are mainly equivalent to local addresses with the difference that such addresses are invalid to be used as the source address of any packet.

multicast --- special type, used for multicast routing. It does not present in normal routing tables.

Route tables: Linux can place routes within multiple routing tables identified by a number in the range from 1 to 255 or by a name taken from the file /etc/iproute2/rt_tables. By default all normal routes are inserted to the table main (ID 254) and the kernel uses only this table when calculating routes.

Actually another routing table always exists which is invisible but even more important. It is the local table (ID 255). This table consists of routes for local and broadcast addresses. The kernel maintains this table automatically and administrators should not modify it and do not even need to look at it in normal operation.

The multiple routing tables come into play when policy routing is used. In policy routing the routing table identifier becomes effectively one more parameter added to the key triplet {prefix,tos,preference}. Thus under policy routing the route is obtained by {tableid,key triplet} identifing the route uniquely. So you can have several identical routes in different tables that will not conflict as we had mentioned above in in the description of "the first" mechanism.

ip route add --- add new route

ip route change --- change route

ip route replace --- change route or add new one.

Abbreviations: add, a; change, chg; replace, repl.

Arguments:

to PREFIX or to TYPE PREFIX (default) --- destination prefix of the route. If TYPE is omitted, ip assumes type unicast. Another values of TYPE are listed above. PREFIX is IPv4 or IPv6 address optionally followed by slash and prefix length. If the length of the prefix is missing, ip assumes full-length host route. Also there is one special PREFIX --- default --- which is equivalent to IP 0/0 or to IPv6 /0.

tos TOS or dsfield TOS --- Type Of Service (TOS) key. This key has no mask associated and the longest match is understood as first, compare TOS of the route and of the packet, if they are not equal, then the packet still may match to a route with zero TOS. TOS is either 8bit hexadecimal number or an identifier from /etc/iproute2/rt_dsfield.

metric NUMBER or preference NUMBER --- preference value of the route. NUMBER is an arbitrary 32bit number.

table TABLEID --- table to add this route. TABLEID may be a number or a string from the file /etc/iproute2/rt_tables. If this parameter is omitted, ip assumes table main, with exception of local, broadcast and nat routes, which are put to table local by default.

dev NAME --- the output device name.

via ADDRESS --- the address of nexthop router. Actually, the sense of this field depends on route type. For normal unicast routes it is either true nexthop router or, if it is a direct route installed in BSD compatibility mode, it can be a local address of the interface. For NAT routes it is the first address of block of translated IP destinations.

src ADDRESS --- the source address to prefer using when sending to the destinations covered by route prefix. This address must be defined on a local machine interface. This will come into play when routes and rules are combined with the masquerade rules of the ipchains firewall we discuss later.

realm REALMID --- the realm which this route is assigned to. REALMID may be a number or a string from the file /etc/iproute2/rt_realms.

mtu MTU or mtu lock MTU --- the MTU along the path to destination. If modifier lock is not used, MTU may be updated by the kernel due to Path MTU Discovery. If the modifier lock is used then no path MTU discovery will be performed and all the packets will be sent without the DF bit set for the IPv4 case or fragmented to the MTU for the IPv6 case.

window NUMBER --- the maximal advertised window for TCP to these destinations measured in bytes. This parameter limits the maximal data bursts our TCP peers are allowed to send to us.

rtt NUMBER --- the initial RTT (``Round Trip Time) estimate. Actually, in Linux 2.2 and 2.0 it is not RTT but the initial TCP retransmission timeout. The kernel forgets it as soon as it receives the first valid ACK from peer. Alas, this means that this attribute affects only the connection retry rate and is hence useless.

nexthop NEXTHOP --- nexthop of multipath route. NEXTHOP is a complex value with its own syntax as follows:

via ADDRESS is nexthop router.


dev NAME is output device.


weight NUMBER is weight of this element of multipath route

reflecting its relative bandwidth or quality.


scope SCOPE_VAL --- scope of the destinations covered by the route prefix. SCOPE_VAL may be a number or a string from the file /etc/iproute2/rt_scopes. If this parameter is omitted, ip assumes scope global for all gatewayed unicast routes, scope link for direct unicast routes and broadcasts and scope host for local routes.

protocol RTPROTO --- routing protocol identifier of this route. RTPROTO may be a number or a string from the file /etc/iproute2/rt_protos. If the routing protocol ID is not given ip assumes the protocol is boot. IE. This route has been added by someone who does not understand what they are doing. Several of these protocol values have a fixed interpretation.

redirect --- route was installed due to ICMP redirect.


kernel --- route was installed by the kernel during autoconfiguration.


boot --- route was installed during bootup sequence. If a routing daemon will start, it will purge all of them. This is the value assigned to manually inserted routes that do not have a protocol specified.


static --- route was installed by administrator to override dynamic routing. Routing daemon(s) will respect them and advertise them if it is so configured.


ra --- route was installed by Router Discovery protocol.


Note that the rest of values are not reserved and administrator is free to assign or not assign protocol tags. Routing daemons at least should take care of setting some unique protocol values for themselves such as they are assigned in rtnetlink.h or in the rt_protos database.


onlink --- pretend that the nexthop is directly attached to this link, even if it does match any interface prefix. One application of this option may be found in ip tunnels between dissimilar addresses.

equalize --- allow packet by packet randomization on multipath routes. Without this modifier route will be frozen to one selected nexthop, so that load splitting will occur only on per-flow base. Equalize works only if the appropriate kernel configuration option is chosen or if the kernel is patched.

Two more commands, prepend and append do exist. Prepend does the same thing as the classic route add command by adding the route even if another route to the same destination already exists. The opposite case is append which adds the route to the end of the list. We strongly reccommend that you avoid using these commands.

Unfortunately, IPv6 currently only understands the append command correctly, all the rest of the set translating to append. Certainly, this will change in the future.

Examples:

Add a plain route to network 10.0.0/24 via gateway 193.233.7.65

ip route add 10.0.0/24 via 193.233.7.65

change it to a direct route via device dummy

ip ro chg 10.0.0/24 via 193.233.7.65 dev dummy

Add default multipath route splitting load between ppp0 and ppp1

ip route add default scope global nexthop dev ppp0 nexthop dev ppp1

Note the scope value which is not necessary but prompts the kernel that this route is gatewayed rather than direct. Actually, if you know the addresses of the remote endpoints it would be better to specify them using the parameter via.

NAT the address 192.203.80.144 to 193.233.7.83 before forwarding

ip route add nat 192.203.80.142 via 193.233.7.83

Note that the reverse NAT translation is setup with policy rules as described in the policy routing section.

ip route delete

Abbreviations: delete, del, d.

ip route del has the same arguments as ip route add but their semantics are a bit different.

Key values (dest, tos, preference and table) select the route to delete. If any optional attributes are present, ip verifies that they coincide with attributes of the route to delete. If no route with given key and attributes is found then ip route del fails.

Linux kernel 2.0 had the ability to delete a route selected only by the prefix address while ignoring its netmask. This option does not exist anymore due to the ambiguous nature of the selection. If you wish to have such functionality then look at the ip route flush command which provides a richer set of capabilities.

Examples:

Delete the multipath route created by the add example previously

ip route del default scope global nexthop dev ppp0 nexthop dev ppp1

ip route show

Abbreviations: show, list, sh, ls, l.

This format of the command allows viewing the routing tables contents and looking at route(s) as selected by some criteria.

Arguments:

to SELECTOR (default) --- select routes only from the given range of destinations. SELECTOR has optional modifiers (root, match or exact) and a prefix.

root PREFIX selects routes with prefixes not shorter than PREFIX. IE: root 0/0 selects all the routing table.

match PREFIX selects routes with prefixes not longer than PREFIX. match 10.0/16 selects 10.0/16, 10/8 and 0/0, but it does not select 10.1/16 and 10.0.0/24.

exact PREFIX (or just PREFIX) selects routes exactly with this prefix.

If none of these options are present then the ip command assumes root 0/0 which lists the entire table.

tos TOS or dsfield TOS --- Select only routes with given TOS.

table TABLEID --- Show routes from this table(s). Default setting is to show table main (ID 254). TABLEID may be either ID of a real table or one of the special values:



all --- list all the tables.



cache --- dump routing cache.



IPv6 has only a single table, however splitting into main, local, and cache is emulated by the ip utility.



cloned or cached --- list cloned routes which are routes dynamically forked off of other routes because some route attribute (like MTU) was updated. It is equivalent to table cache.



from SELECTOR --- the same syntax as to SELECTOR but bounds the source address range rather than the destination. Note that the from option only works with cloned routes.

protocol RTPROTO --- list only routes of this protocol.

scope SCOPE_VAL --- list only routes with this scope.

type TYPE --- list only routes of this type.

dev NAME --- list only routes going via this device.

via PREFIX --- list only routes going via selected by PREFIX nexthop routers.

src PREFIX --- list only routes with preferred source addresses selected by PREFIX.

realm REALMID or realms FROMREALM/TOREALM --- list only routes with these realms.



Using this command is best explained by running through some examples.

Example: Let us count the routes of protocol gated/bgp on a router

kuznet@amber~ $ ip route list proto gated/bgp | wc

1413 9891 79010

kuznet@amber~ $

To count size of routing cache we have to use option -o, because cached attributes can take more than one line of the output

kuznet@amber~ $ ip -o route list cloned | wc

159 2543 18707

kuznet@amber~ $

The output of this command consists of per route records separated by line feeds. However, some records may consist of more than one line particularly when the route is cloned or you have requested additional statistics. If the option -o is given, then line feeds separating lines inside records are replaced with backslash sign.

The output has the same syntax as arguments given to ip route add, so that it can be understood easily.

kuznet@amber~ $ ip route list 193.233.7/24

193.233.7.0/24 dev eth0 proto gated/conn scope link \

src 193.233.7.65 realms inr.ac

kuznet@amber~ $

If you list cloned entries the output contains other attributes, which are evaluated during route calculation and updated during route lifetime. The example of the output is:

kuznet@amber~ $ ip route list 193.233.7.82 table cache</