Linux implements the Internet Protocol, version 4,
described in RFC 791 and RFC 1122.
ip contains a level 2
multicasting implementation conforming to RFC 1112.
It also contains an IP router including a packet filter.
The programming interface is BSD sockets compatible.
For more information on sockets, see
socket(7).
An IP socket is created by calling the
socket(2)
function as
socket(PF_INET, socket_type, protocol). Valid socket types are
SOCK_STREAM to open a
tcp(7)
socket,
SOCK_DGRAM to open a
udp(7)
socket, or
SOCK_RAW to open a
raw(7)
socket to access the IP protocol directly.
protocol is the IP protocol in the IP header to be received or sent.
The only valid values for
protocol are
0 and
IPPROTO_TCP for TCP sockets and
0 and
IPPROTO_UDP for UDP sockets. For
SOCK_RAW you may specify
a valid IANA IP protocol defined in
RFC 1700
assigned numbers.
When a process wants to receive new incoming packets or connections, it
should bind a socket to a local interface address using
bind(2).
Only one IP socket may be bound to any given local (address, port) pair.
When
INADDR_ANY is specified in the bind call the socket will be bound to
all local interfaces. When
listen(2)
or
connect(2)
are called on an unbound socket, it is automatically bound to a
random free port with the local address set to
INADDR_ANY.
A TCP local socket address that has been bound is unavailable for
some time after closing,
unless the
SO_REUSEADDR flag has been set. Care should be taken when using this flag as it
makes TCP less reliable.
An IP socket address is defined as a combination of an IP interface
address and a 16-bit port number.
The basic IP protocol does not supply port numbers, they
are implemented by higher level protocols like
udp(7)
and
tcp(7).
On raw sockets
sin_port is set to the IP protocol.
struct sockaddr_in {
sa_family_t sin_family; /* address family: AF_INET */
u_int16_t sin_port; /* port in network byte order */
struct in_addr sin_addr; /* internet address */
};
/* Internet address. */
struct in_addr {
u_int32_t s_addr; /* address in network byte order */
};
sin_family is always set to
AF_INET. This is required; in Linux 2.2 most networking functions return
EINVAL when this setting is missing.
sin_port contains the port in network byte order.
The port numbers below 1024 are called
reserved ports. Only privileged processes (i.e., those having the
CAP_NET_BIND_SERVICE capability) may
bind(2)
to these sockets.
Note that the raw IPv4 protocol as such has no concept of a
port, they are only implemented by higher protocols like
tcp(7)
and
udp(7).
sin_addr is the IP host address.
The
s_addr member of
struct in_addr contains the host interface address in network byte order.
in_addr should be assigned one of the INADDR_* values (e.g., INADDR_ANY)
or set using the
inet_aton(3),
inet_addr(3),
inet_makeaddr(3)
library functions or directly with the name resolver (see
gethostbyname(3)).
IPv4 addresses are divided into unicast, broadcast
and multicast addresses.
Unicast addresses specify a single interface of a host,
broadcast addresses specify all hosts on a network and multicast
addresses address all hosts in a multicast group.
Datagrams to broadcast addresses can be only sent or received when the
SO_BROADCAST socket flag is set.
In the current implementation connection oriented sockets are only allowed
to use unicast addresses.
Note that the address and the port are always stored in
network byte order.
In particular, this means that you need to call
htons(3)
on the number that is assigned to a port. All address/port manipulation
functions in the standard library work in network byte order.
There are several special addresses:
INADDR_LOOPBACK (127.0.0.1)
always refers to the local host via the loopback device;
INADDR_ANY (0.0.0.0)
means any address for binding;
INADDR_BROADCAST (255.255.255.255)
means any host and has the same effect on bind as
INADDR_ANY for historical reasons.
IP supports some protocol specific socket options that can be set with
setsockopt(2)
and read with
getsockopt(2).
The socket option level for IP is
IPPROTO_IP.
A boolean integer flag is zero when it is false, otherwise true.
IP_OPTIONS
Sets or get the IP options to be sent with every packet from this
socket.
The arguments are a pointer to a memory buffer containing the options
and the option length.
The
setsockopt(2)
call sets the IP options associated with a socket.
The maximum option size for IPv4 is 40 bytes. See RFC 791 for the allowed
options. When the initial connection request packet for a
SOCK_STREAM socket contains IP options, the IP options will be set automatically
to the options from the initial packet with routing headers reversed.
Incoming packets are not allowed to change options after the connection
is established.
The processing of all incoming source routing options
is disabled by default and can be enabled by using the
accept_source_route sysctl. Other options like timestamps are still handled.
For datagram sockets, IP options can be only set by the local user.
Calling
getsockopt(2)
with
IP_OPTIONS puts the current IP options used for sending into the supplied buffer.
IP_PKTINFO
Pass an
IP_PKTINFO ancillary message that contains a
pktinfo structure that supplies some information about the incoming packet.
This only works for datagram oriented sockets.
The argument is a flag that tells the socket whether the IP_PKTINFO
message should be passed or not.
The message itself can only be sent/retrieved
as control message with a packet using
recvmsg(2)
or
sendmsg(2).
struct in_pktinfo {
unsigned int ipi_ifindex; /* Interface index */
struct in_addr ipi_spec_dst; /* Local address */
struct in_addr ipi_addr; /* Header Destination
address */
};
ipi_ifindex is the unique index of the interface the packet was received on.
ipi_spec_dst is the local address of the packet and
ipi_addr is the destination address in the packet header.
If
IP_PKTINFO is passed to
sendmsg(2)
and
ipi_spec_dst is not zero, then it is used as the local source address for the routing
table lookup and for setting up IP source route options.
When
ipi_ifindex is not zero the primary local address of the interface specified by the
index overwrites
ipi_spec_dst for the routing table lookup.
IP_RECVTOS
If enabled the
IP_TOS ancillary message is passed with incoming packets.
It contains a byte which specifies the Type of Service/Precedence
field of the packet header.
Expects a boolean integer flag.
IP_RECVTTL
When this flag is set
pass a
IP_TTL control message with the time to live
field of the received packet as a byte. Not supported for
SOCK_STREAM sockets.
IP_RECVOPTS
Pass all incoming IP options to the user in a
IP_OPTIONS control message.
The routing header and other options are already filled in
for the local host. Not supported for
SOCK_STREAM sockets.
IP_RETOPTS
Identical to
IP_RECVOPTS but returns raw unprocessed options with timestamp and route record
options not filled in for this hop.
IP_TOS
Set or receive the Type-Of-Service (TOS) field that is sent
with every IP packet originating from this socket.
It is used to prioritize packets on the network.
TOS is a byte. There are some standard TOS flags defined:
IPTOS_LOWDELAY to minimize delays for interactive traffic,
IPTOS_THROUGHPUT to optimize throughput,
IPTOS_RELIABILITY to optimize for reliability,
IPTOS_MINCOST should be used for "filler data" where slow transmission doesnt matter.
At most one of these TOS values can be specified.
Other bits are invalid and shall be cleared.
Linux sends
IPTOS_LOWDELAY datagrams first by default,
but the exact behaviour depends on the configured queueing discipline.
Some high priority levels may require superuser privileges (the
CAP_NET_ADMIN capability).
The priority can also be set in a protocol independent way by the
(SOL_SOCKET, SO_PRIORITY) socket option (see
socket(7)).
IP_TTL
Set or retrieve the current time to live field that is used in every packet
sent from this socket.
IP_HDRINCL
If enabled
the user supplies an IP header in front of the user data.
Only valid for
SOCK_RAW sockets. See
raw(7)
for more information. When this flag is enabled the values set by
IP_OPTIONS,
IP_TTL and
IP_TOS are ignored.
IP_RECVERR (defined in <linux/errqueue.h>)
Enable extended reliable error message passing.
When enabled on a datagram socket all
generated errors will be queued in a per-socket error queue. When the user
receives an error from a socket operation the errors can
be received by calling
recvmsg(2)
with the
MSG_ERRQUEUE flag set. The
sock_extended_err structure describing the error will be passed in a ancillary message with
the type
IP_RECVERR and the level
IPPROTO_IP.
This is useful for reliable error handling on unconnected sockets.
The received data portion of the error queue
contains the error packet.
The
IP_RECVERR control message contains a
sock_extended_err structure:
struct sock_extended_err {
u_int32_t ee_errno; /* error number */
u_int8_t ee_origin; /* where the error originated */
u_int8_t ee_type; /* type */
u_int8_t ee_code; /* code */
u_int8_t ee_pad;
u_int32_t ee_info; /* additional information */
u_int32_t ee_data; /* other data */
/* More data may follow */
};
ee_errno contains the errno number of the queued error.
ee_origin is the origin code of where the error originated.
The other fields are protocol specific. The macro
SO_EE_OFFENDER returns a pointer to the address of the network object
where the error originated from given a pointer to the ancillary message.
If this address is not known, the
sa_family member of the
sockaddr contains
AF_UNSPEC and the other fields of the
sockaddr are undefined.
IP uses the
sock_extended_err structure as follows:
ee_origin is set to
SO_EE_ORIGIN_ICMP for errors received as an ICMP packet, or
SO_EE_ORIGIN_LOCAL for locally generated errors. Unknown values should be ignored.
ee_type and
ee_code are set from the type and code fields of the ICMP header.
ee_info contains the discovered MTU for
EMSGSIZE errors. The message also contains the
sockaddr_in of the node caused the error, which can be accessed with the
SO_EE_OFFENDER macro. The
sin_family field of the SO_EE_OFFENDER address is
AF_UNSPEC when the source was unknown.
When the error originated from the network, all IP options
(IP_OPTIONS, IP_TTL, etc.) enabled on the socket and contained in the
error packet are passed as control messages. The payload of the packet
causing the error is returned as normal payload.
Note that TCP has no error queue;
MSG_ERRQUEUE is illegal on
SOCK_STREAM sockets.
IP_RECVERR is valid for TCP, but all errors are
returned by socket function return or
SO_ERROR only.
For raw sockets,
IP_RECVERR enables passing of all received ICMP errors to the
application, otherwise errors are only reported on connected sockets
It sets or retrieves an integer boolean flag.
IP_RECVERR defaults to off.
IP_MTU_DISCOVER
Sets or receives the Path MTU Discovery setting
for a socket. When enabled, Linux will perform Path MTU Discovery
as defined in RFC 1191
on this socket. The dont fragment flag is set on all outgoing datagrams.
The system-wide default is controlled by the
ip_no_pmtu_disc sysctl for
SOCK_STREAM sockets, and disabled on all others. For non
SOCK_STREAM sockets it is the users responsibility to packetize the data
in MTU sized chunks and to do the retransmits if necessary.
The kernel will reject packets that are bigger than the known
path MTU if this flag is set (with
EMSGSIZE ).
Path MTU discovery flags
Meaning
IP_PMTUDISC_WANT
Use per-route settings.
IP_PMTUDISC_DONT
Never do Path MTU Discovery.
IP_PMTUDISC_DO
Always do Path MTU Discovery.
When PMTU discovery is enabled the kernel automatically keeps track of
the path MTU per destination host.
When it is connected to a specific peer with
the currently known path MTU can be retrieved conveniently using the
socket option (e.g. after a
error occurred). It may change over time.
For connectionless sockets with many destinations
the new also MTU for a given destination can also be accessed using the
error queue (see
A new error will be queued for every incoming MTU update.
While MTU discovery is in progress initial packets from datagram sockets
may be dropped. Applications using UDP should be aware of this and not
take it into account for their packet retransmit strategy.
To bootstrap the path MTU discovery process on unconnected sockets it
is possible to start with a big datagram size
(up to 64K-headers bytes long) and let it shrink by updates of the
path MTU.
To get an initial estimate of the
path MTU connect a datagram socket to the destination address using
and retrieve the MTU by calling
with the
option.
Retrieve the current known path MTU of the current socket.
Only valid when the socket has been connected. Returns an integer.
Only valid as a
Pass all to-be forwarded packets with the
IP Router Alert
option
set to this socket. Only valid for raw sockets.
This is useful, for instance, for user
space RSVP daemons.
The tapped packets are not forwarded by the kernel, it is
the users responsibility to send them out again.
Socket binding is ignored,
such packets are only filtered by protocol.
Expects an integer flag.
Set or reads the time-to-live value of outgoing multicast packets for this
socket. It is
very important for multicast packets to set the smallest TTL possible.
The default is 1 which means that multicast packets dont leave the local
network unless the user program explicitly requests it. Argument is an
integer.
Sets or reads a boolean integer argument whether sent multicast
packets should be looped back to the local sockets.
Join a multicast group. Argument is an
structure.
struct ip_mreqn {
struct in_addr imr_multiaddr; /* IP multicast group
address */
struct in_addr imr_address; /* IP address of local
interface */
int imr_ifindex; /* interface index */
};
Linux Man Page
IP (7)
2001-06-19
Generated by OpenAsthra.com from man7/ip.7 using man macros with tbl support.