Saturday, July 30, 2011

Socket - Domains and Address Families

socket address family / socket domain type protocol:

In the first post of this tutorial(Introducing Sockets - Basic's of the Socket), you read about the telephone analogy where the caller gets in touch with the other person by dialing that person's telephone number. In the telephone network, each person's telephone number is like a socket address. Sockets have addresses of their own to allow them to be specifically identified. The socket address will be the primary focus of this post.
In this post you will see
  • Understand address families
  • Learn how to form socket addresses
  • Understand the difference between big-endian and little-endian byte ordering
  • Learn what an abstract local address is and how to form one
  • Learn when socket addresses are not required
These topics are very important to you because many programmers struggle with this very aspect of socket programming. A little extra effort spent here will reward you later.

Nameless Sockets

Sockets do not always need to have an address. The socketpair(2) function, for example, creates two sockets that are connected to each other, but without addresses. They are, in essence, "nameless" sockets. Imagine a red telephone between the U.S. president's office and the Soviet Union, during the Cold War. There is no need for a telephone number at either end, because they are directly connected. In the same way, the sockets created by socketpair(2) are directly connected and have no need for addresses.

Anonymous Calls
Sometimes in practice, one of the two sockets in a connection will have no address. For a remote socket to be contacted, it must have an address to identify it. However, the local socket that is "placing the call" can be anonymous. The connection that becomes established has one remote socket with an address and another socket without an address.

Generating Addresses
Sometimes you don't care what your local address is, but you need one to communicate. This is particularly true of programs that need to connect to a service, like a RDBMS database server. Their local address is only required for the duration of the communication. Allocating fixed addresses could be done, but this increases network administration work. Consequently, address generation is often used when it is available.

Understanding Domains

When the BSD socket interface was being conceived by the Berkeley team, the TCP/IP protocol was still undergoing development. At the same time, there were a number of other competing protocols being used by different organizations like the X.25 protocol. Still other protocols were being researched.

The socketpair(2) function that you have seen in the last chapter, and the socket(2) function, which has yet to be introduced, wisely allowed for the possibility that other protocols might be used instead of TCP/IP. The domain argument of the socketpair(2) function allows for this contingency. For ease of discussion, let's restate the function synopsis for the following socketpair(2) function:

#include <sys/types.h>
#include <sys/socket.h>
int socketpair(int domain, int type, int protocol, int sv[2]);

Here you will simply learn about the domain and the protocol arguments. The discussion for the type argument will be deferred until later in the post. Normally, the protocol argument is specified as zero. A zero allows the operating system to choose the correct default protocol to be used for the domain that has been selected. There are exceptions to this rule, but this is beyond the scope of the present discussion. This leaves the domain argument to be explained. For the socketpair(2) function, this value must always be AF_LOCAL or AF_UNIX. In the last post, it was pointed out that the macro AF_UNIX is the equivalent of and the older macro name for AF_LOCAL.

What does AF_LOCAL mean however? What does it select?

The AF_ prefix of the constant indicates the address family. The domain argument selects the address family to be used.

Forming Socket Addresses

Each communication protocol specifies its own format for its networking address. Consequently, the address family is used to indicate which type of addressing is being used. The constant AF_LOCAL (AF_UNIX) specifies that the address will be formed according to local (UNIX) address rules. The constant AF_INET indicates that the address will conform to IP address rules, and so on. These are examples of address families. Within one address family, there can be variations. You will see an example of this when you learn how to form AF_LOCAL addresses. In the sections that follow, you will examine the format and the physical layout of various address families. This is an important section to master. Much of the difficulty that people experience with the BSD socket interface is related to address initialization.

Examining the Generic Socket Address
Because the BSD socket interface was developed before the ANSI C standard was adopted, there was no (void *) data pointer type to accept any structure address. Consequently, the BSD solution chosen was to define a generic address structure. The generic structure is defined by the C language statement

#include <sys/socket.h>
Listing 2.1 illustrates how the structure is defined in C language terms.

Listing 2.1: The Generic Socket Address

struct sockaddr {
sa_family_t sa_family; /* Address Family */
char sa_data [14];     /* Address data. */

Presently the data type sa_family_t is an unsigned short integer, which is two bytes in length under Linux. The total structure size is 16 bytes. The structure element sa_data[14] represents 14 remaining bytes of address information.

Figure 2.1 provides a physical view of the generic socket address structure.

Figure 2.1:

Here is a representation of the generic socket address layout. The generic socket address structure itself is not that useful to the programmer. It does, however, provide a reference model from which all other address structures must fit. For example, you will learn that all addresses must define the sa_family member in exactly the same location in the structure, because this element determines how the remaining bytes of the address are interpreted.

Forming Local Addresses

This address format is used by sockets that are local to your host (your PC running Linux). For example, when you queue a file to be printed using the lpr(1) command, it uses a local socket to
communicate with the spooling service on your PC. While it is also possible to use TCP/IP for local communication, it turns out that this is less efficient. Traditionally, the local address family has been referred to as the AF_UNIX domain (for example, a
UNIX socket address). This is because these addresses use local UNIX file names to act as the socket name. Linux kernels 2.2.0 and later support abstract socket names, which you'll learn about shortly.

The structure name for AF_LOCAL or AF_UNIX addresses is sockaddr_un. This structure is defined by including the following statement in your C program:

#include <sys/un.h>

An example of the sockaddr_un structure is shown in Listing 2.2.
Listing 2.2: The sockaddr_un Address Structure

struct sockaddr_un {
sa_family_t sun_family;   /* Address Family */
char sun_path[108];       /* Pathname */

The structure member sun_family must have the value AF_LOCAL or AF_UNIX assigned to it (these macros represent the same value, though usage of AF_LOCAL is now being encouraged). This value indicates the structure is formatted according to the structure sockaddr_un rules. The structure member sun_path[108] contains a valid UNIX pathname. There is no null byte required at the end of the character array, as you will find out.

Note that the total size for the sockaddr_un address is much larger than the 16 bytes of the generic address structure. Make sure you allocate sufficient storage to accommodate the AF_LOCAL/AF_UNIX address if you are working with multiple address families within your code. In the further coming  sections, you will learn how to initialize an AF_LOCAL address and define its length.

Information about local socket addresses can be found in the unix(4) man page.

Forming Traditional Local Addresses
The address name space for traditional local addresses are file system path names. A process might name its local socket by any valid path name. To be valid, however, the process naming the socket must have access to all directory components of the path name and  permissions to create the final socket object in the directory named. Figure 2.2 shows the physical layout of a socket /dev/printer, which you may have active on your system. The lpd printer daemon listens on this local socket address.

Figure 2.2:

Here is the AF_LOCAL/AF_UNIX Socket Address for /dev/printer.

Notice that the first two bytes indicate the address type of AF_LOCAL. The remaining bytes are the characters /dev/printer with no null byte present. Now you'll turn your attention to the C code to initialize such an address. Some programmers like to initialize the address structure completely to zero before filling it in. This is often done using the memset(3) function and is probably a good idea:

struct sockaddr_un uaddr;
memset(&uaddr,0,sizeof uaddr);

This function call will zero out all bytes of the address structure for you.

Zeroing out the address structure is not required if you properly initialize the mandatory address elements. However, it does make debugging easier because it eliminates any leftover data that might otherwise remain. In this chapter, memset(3) is used to zero the address structures, as a demonstration of how it would be done

 * AF_UNIX Socket Example:

Forming Abstract Local Addresses

One of the annoyances of the traditional AF_UNIX socket name was that a file system object was always involved. This was often unnecessary and inconvenient. If the original file system object was not removed and the same name was used in a call to bind(2), the name assignment would fail. Linux kernel version 2.2 has made it possible to create an abstract name for a local socket. The trick to this is to make the first byte of the path name a null byte. Only the bytes that follow that first null byte in the path name then become part of the abstract name.

Forming Internet (IPv4) Socket Addresses

The most commonly used address family under Linux is the AF_INET family. This gives a socket an IPv4 socket address to allow it to communicate with other hosts over a TCP/IP network. The include file that defines the structure sockaddr_in is defined by the C language statement:

#include <netinet/in.h>

Listing 2.7 shows an example of the structure sockaddr_in which is used for Internet addresses. An additional structure in_addr is also shown, because the sockaddr_in structure uses it in its definition.

Listing 2.7: The sockaddr_in Structure

struct sockaddr_in {
sa_family_t sin_family;    /* Address Family */
uint16_t sin_port;         /* Port number */
struct in_addr sin_addr;   /* Internet address */
unsigned char sin_zero[8]; /* Pad bytes */
struct in_addr {
uint32_t s_ addr;         /* Internet address */
Listing 2.7 can be described as follows:
  • The sin_family member occupies the same storage area that sa_family does in the generic socket definition. The value of sin_family is initialized to the value of AF_INET. 
  • The sin_port member defines the TCP/IP port number for the socket address. This value must be in network byte order (this will be elaborated upon later). 
  • The sin_addr member is defined as the structure in_addr, which holds the IP number in network byte order. If you examine the structure in_addr, you will see that it consists of one 32- bit unsigned integer. 
  • Finally, the remainder of the structure is padded to 16 bytes by the member sin_zero[8] for 8 bytes. This member does not require any initialization and is not used.
Now turn your attention to Figure 2.3 to visualize the physical layout of the address.
Figure 2.3:
Here is the structure sockaddr_in physical layout.

In Figure 2.3, you see that the sin_port member uses two bytes, whereas the sin_addr member uses four bytes. Both of these members show a tag on them indicating that these values must be in network byte order.

Information about IPv4 Internet addresses can be obtained by examining the ip(4) man page.

Understanding Network Byte Order

Different CPU architectures have different arrangements for grouping multiple bytes of data together to form integers of 16, 32, or more bits. The two most basic byte orderings are

• big-endian
• little-endian

Other combinations are possible, but they need not be considered here. Figure 2.4 shows a simple example of these two different byte orderings.

Figure 2.4:
Here is an example of the basic big-and little-endian byte ordering.

The value illustrated in Figure 2.4 is decimal value 4660, which, in hexadecimal, is the value 0x1234. The value requires that 2 bytes be used to represent it. It can be seen that you can either place the most significant byte first (big-endian) or you can place the least significant byte value first (little-endian.) The choice is rather arbitrary and it boils down to the design of the CPU.

You might already know that the Intel CPU uses the little-endian byte order. Other CPU's like the Motorola 68000 series use the big-endian byte order. The important thing to realize here is that CPU's of both persuasions exist in the world and are connected to a common Internet.

What happens if a Motorola CPU were to write a 16-bit number to the network and is received by an Intel CPU? "Houston, we have a problem!" The bytes will be interpreted in the reverse order for the Intel CPU, causing it to see the value as 0x3412 in hexadecimal. This is the value 13330 in decimal, instead of 4660! For agreement to exist over the network, it was decided that big-endian byte order would be the order used on a network. As long as every message communicated over the network obeys this sequence, all software will be able to communicate in harmony. This brings you back to AF_INET addresses. The TCP/IP port number (sin_port) and the IP number (sin_addr) must be in network byte order. The BSD socket interface requires that you as the programmer consider this when forming the address.

Performing Endian Conversions

A few functions have been provided to help simplify this business of endian conversions. There are two directions of conversion to be considered:

• Host ordering to network ordering
• Network ordering to host ordering

By "host order" what is meant is the byte ordering that your CPU uses. For Intel CPUs, this will mean little-endian byte order. Network order, as you learned earlier, is big-endian byte order. There are also two categories of conversion functions:

• Short (16-bit) integer conversion
• Long (32-bit) integer conversion

The following provides a synopsis of the conversion functions that you have at your

#include <netinet/in.h>

unsigned long htonl(unsigned long hostlong);
unsigned short htons(unsigned short hostshort);
unsigned long ntohl(unsigned long netlong);
unsigned short ntohs(unsigned short netshort);

These functions are all described in the byteorder(3) man page.

In the context of these conversion functions, "short" refers to a 16-bit value and "long" refers to a 32-bit value. Do not confuse these terms with what might be different sizes of the C data types.

For example, a long data type on some CPU's running Linux could conceivably be 64-bits in length.

Use of these functions is quite simple. For example, to convert a short integer to network order, the following code can be used:

short host_ short = 0x1234;
short netw_short;
netw_short = htons(host_short);

The value netw_short will receive the appropriate value from the conversion to network order. To convert a value from network order back into host order is equally simple:

host_short = ntohs(netw_short);

The h in the function name refers to "host," whereas n refers to "network." Similarly, s refers to "short" and 1 refers to "long." Using these conventions, it is a simple matter to pick the name of the conversion function you need.

The byteorder(3) functions may be implemented as macros on some systems. Linux systems that run on CPUs using the big-endian byte ordering might provide a simple macro instead, because no conversion of the value is required.

Initializing a Wild Internet Address

Now you are ready to create an Internet address. The example shown here will request that the address be wild. This is often done when you are connecting to a remote service. The reason for doing this is that your host might have two or more network interface cards, each with a different IP number. Furthermore, Linux also permits the assignment of more than one IP number to each interface. When you specify a wild IP number, you allow the system to pick the route to the remote service. The kernel will then determine what your final local socket address will be at the time the connection is established.

There are also times when you want the kernel to assign a local port number for you. This is done by specifying sin_port as the value zero. The example code shown in Listing 2.8 demonstrates how to initialize an AF_INET address with both a wild port number and a wild IP number.


 * AF_INET Socket Example:

Specifying Other Address Families

The scope of this post does not permit a full coverage of all address families supported by Linux. The list of supported protocols is growing longer with each new year. If you are looking for a fast track to TCP/IP programming, you can skip this section and advance to the next section. In this section, you will read briefly about a few other protocols that might be of interest to you. This section is intended as a roadmap to other places of interest, should you feel like some adventure. There are at least three more address families that Linux can support. They are

• AF_INET6— IPv6, which is under development
• AF_AX25— Amateur Radio X.25 protocol
• AF_APPLETALK— Linux AppleTalk protocol implementation

Each of these protocols requires that you have the corresponding support compiled into your kernel. Some of these protocols may not be complete implementations— programmer beware! Incomplete
or experimental protocols will be buggy or sometimes even crash your system.

The AF_APPLETALK address family is documented in the ddp(4) man page.

Sample Code : 
For just the usage of what we have learn't upto now, follow a sample example code given below,

test port client

No comments:

Post a Comment