Saturday, July 30, 2011

Socket - A Basic Introduction

Understanding Sockets

It is important that you have an understanding of some of the concepts behind the socket interface before you try to apply them. This section outlines some of the high level concepts surrounding the sockets themselves.

Defining a Socket

To communicate with someone using a telephone, you must pick up the handset, dial the other party's telephone number, and wait for them to answer. While you speak to that other party, there are two endpoints of communication established:

• Your telephone, at your location
• The remote party's telephone, at his location

As long as both of you communicate, there are two endpoints involved, with a line of communication in between them. Figure 1.2 shows an illustration of two telephones as endpoints, each connected to the other, through the telephone network.


Figure 1.2:

Without the telephone network, each endpoint of a telephone line is nothing more than a plastic box. A socket under Linux, is quite similar to a telephone. Sockets represent endpoints in a line of communication. In between the endpoints exists the data  communications network.
Sockets are like telephones in another way. For you to telephone someone, you dial the telephone number of the party you want to contact. Sockets have network addresses instead of telephone
numbers. By indicating the address of the remote socket, your program can establish a line of communication between your local socket and that remote endpoint. Socket addresses are discussed
in Chapter 2, "Domains and Address Families." You can conclude then, that a socket is merely an endpoint in communication. There are a number of Linux function calls that operate on sockets, and you learn about all of them in this book.

Using Sockets

You might think that Linux sockets are treated specially, because you've already learned that sockets have a collection of specific functions that operate on them. Although it is true that sockets have some special qualities, they are very similar to file descriptors that you should already be familiar with.

NOTE
Any reference to a function name like pipe(2) means that you should have online documentation (man pages) on your Linux system for that function. For information about pipe(2) for example, you can enter the command:

$ man 2 pipe

where the 2 represents the manual section number, and the function name can be used as the name of the manual page. Although the section number is often optional, there are many cases where you must specify it in order to obtain the correct information.
For example, when you open a file using the Linux open(2) call, you are returned a file descriptor if the open(2) function is successful. After you have this file descriptor, your program uses it to read(2), write(2), lseek(2), and close(2) the specific file that was opened. Similarly, a socket, when it is created, is just like a file descriptor. You can use the same file I/O functions to read, write, and close that socket. You learn in Chapter 15, "Using the inetd Daemon," that sockets can be used for standard input (file unit 0), standard output (file unit 1), or standard error (file unit 2).

NOTE
Sockets are referenced by file unit numbers in the same way that opened files are. These unit numbers share the same "number space"— for example, you cannot have both a socket with unit number 4 and an open file on unit number 4 at the same time. There are some differences, however, between sockets and opened files. The following list highlights some of these differences:

• You cannot lseek(2) on a socket (this restriction also applies to pipes).
• Sockets can have addresses associated with them. Files and pipes do not have network addresses.
• Sockets have different option capabilities that can be queried and set using ioctl
(2).
• Sockets must be in the correct state to perform input or output. Conversely, opened disk files can be read from or written to at any time.

Referencing Sockets

When you open a new file using the open(2) function call, the next available and lowest file descriptor is returned by the Linux kernel. This file descriptor, or file unit number as it is often
called, is a zero or positive integer value that is used to refer to the file that was opened. This "handle" is used in all other functions that operate upon opened files. Now you know that file unit numbers can also refer to specific sockets.

NOTE
When a new file unit (or file descriptor) is needed by the kernel, the lowest available unit number is returned. For example, if you were to close standard input (file unit number 0), and then open a file successfully, the file unit number returned by the open(2) call will be zero. Assume for a moment that your program already has file units 0, 1, and 2 open (standard input, output, and error) and the following sequence of program operations is carried out. Notice how the file descriptors are allocated by the kernel:

1. The open(2) function is called to open a file.
2. File unit 3 is returned to reference the opened file. Because this unit is not currently in use, and is the lowest file unit presently available, the value 3 is chosen to be the file unit number for the file.
3. A new socket is created using an appropriate function call.
4. File unit 4 is returned to reference that new socket.
5. Yet, another file is opened by calling open(2).
6. File unit 5 is returned to reference the newly opened file.

Notice how the Linux kernel makes no distinction between files and sockets when allocating unit numbers. A file descriptor is used to refer to an opened file or a network socket. This means that you, as a programmer, will use sockets as if they were open files. Being able to reference files and sockets interchangeably by file unit number provides you with a great deal of flexibility. This also means that functions like read(2) and write(2) can operate upon both open files and sockets.

Comparing Sockets to Pipes

Before you are introduced to any socket functions, review the pipe(2) function call that you might already be familiar with. Let's see how the file descriptors it returns differ from a socket. The following is a function synopsis taken from the pipe(2) man page:

#include <unistd.h>
int pipe(int filedes[2]);

The pipe(2) function call returns two file descriptors when the call is successful. Array element filedes[0] contains the file descriptor number for the read end of the pipe. Element filedes [1] receives the file unit number of the write end of the pipe.

This arrangement of two file descriptors is suggestive of a communications link with file descriptors at each end, acting as sockets. How then does this differ from using sockets instead? The difference lies in that the pipe(2) function creates a line of communications in one direction only. Information can only be written to the file unit in filedes[1] and only read by unit filedes
[0]. Any attempt to write data in the opposite direction results in the Linux kernel returning an error to your program.

Sockets, on the other hand, allow processes to communicate in both directions. A process is able to use a socket open on file unit 3, for example, to send data to a remote process. Unlike when using a
pipe, the same local process can also receive information from file unit 3 that was sent by the remote process it is communicating with.

Creating Sockets

In this section, you see that creating sockets can be almost as easy as creating a pipe. There are a few more function arguments however, which you will learn about. These arguments must be supplied with suitable values to be successful. The function socketpair(2) synopsis is as follows:

#include <sys/types.h>
#include <sys/socket.h>
int socketpair(int domain, int type, int protocol, int sv[2]);

The include file sys/types.h is required to define some C macro constants. The include file sys/socket.h is necessary to define the socketpair(2) function prototype. The socketpair(2) function takes four arguments. They are

• The domain of the socket.
• The type of the socket.
• The protocol to be used.
• The pointer to the array that will receive file descriptors that reference the created sockets.

The domain argument's explanation will be deferred until Chapter 2. For the purpose of the socketpair(2) function, however, always supply the C macro value AF_LOCAL. The type argument declares what type of socket you want to create. The choices for the socketpair(2) function are

• SOCK_STREAM
• SOCK_DGRAM

The implication of the socket choice will be explored in Chapter 4, ''Socket Types and Protocols." For this chapter, we'll simply use SOCK_STREAM for the type of the socket. For the socketpair(2) function, the protocol argument must be supplied as zero.
The argument sv[2] is a receiving array of two integer values that represent two sockets. Each file descriptor represents one socket (endpoint) and is otherwise indistinguishable from the other.
If the function is successful, the value zero is returned. Otherwise, a return value of -1 indicates that a failure has occurred, and that errno should be consulted for the specific reason.

CAUTION
Always test the function return value for success or failure. The value errno should only be consulted when it has been determined that the function call has indicated that it failed. Only errors are posted to errno; it is never cleared to zero upon success.

Performing I/O on Sockets

You learned earlier that sockets can be written to and read from just like any opened file. In this section, you are going to demonstrate this firsthand for yourself. For the sake of completeness however, let's review the function synopsis for the calls read(2), write(2), and close(2) before we put them to work:

#include <unistd.h>
ssize_t read(int fd, void *buf, size_t count);
ssize_t write(int fd, const void *buf, size_t count);
int close(int fd);

These are Linux input/output functions you should be already familiar with. By way of review, the function read(2) returns input that is available from the file descriptor fd, into your supplied buffer buf of a maximum size of count bytes. The return value represents the number of bytes read. A return count of zero represents end-of-file.
The write(2) function writes data to your file descriptor fd, from your supplied buffer buf for a total of count bytes. The returned value represents the actual number of bytes written. Normally, this should match the supplied count argument.
However, there are some valid circumstances where this will be less than count, but you won't have to worry about it here.
Finally, close(2) returns zero if the unit was closed successfully. A return value of -1 for any of these functions indicates that an error occurred, and that the reason for the error is posted to the external variable errno. To make this value accessible, include the file errno.h within the source module that needs it.

Closing Sockets

Previously, you saw how a pair of sockets could be easily created and how some elementary input and output can be performed using those sockets. You also saw that these sockets could be closed in
the same manner that files are with the use of the close(2) function call. It's now time that you learn what is implied by the closing of a socket. When reading from a pipe created by the pipe(2) function, the receiving end recognizes that there will be no more data when an end-of-file is received. The end-of-file condition is sent by the writing process, when it closes the write end of the pipe.
This same procedure can be used with a pair of sockets. The receiving end will receive an end-offile indication when the other endpoint (socket) has been closed. The problem develops when the local process wants to signal to the remote endpoint that there is no more data to be received. If the local process closes its socket, this much will be accomplished.
However, if it needs to receive a confirmation from the remote end, it cannot, because its socket is now closed. Situations like these require a means to half close a socket.

The shutdown(2) Function
The following shows the function synopsis of the shutdown(2) function:

#include <sys/socket.h>
int shutdown(int s, int how);

The function shutdown(2) requires two arguments. They are

• Socket descriptor s specifies the socket to be partially shut down.
• Argument how indicates how this socket should be shut down.
The returned value is zero if the function call succeeded. A failure is indicated by returning a value of -1, and the reason for the failure is posted to errno.

The permissible values for how are shown in Table 1.1.
Table 1.1: Permissible Values of the shutdown(2) how Argument
Value Macro Description

0 SHUT_RD No further reads will be allowed on the specified socket.
1 SHUT_WR No further writes will be allowed on the specified socket.
2 SHUT_RDWR No further reads or writes will be allowed on the specified socket.

Notice that when the how value is supplied as 2, this function call becomes almost equivalent to a close(2) call.

Shutting down Writing to a Socket

The following code shows how to indicate that no further writes will be performed upon the local socket:
Example

int z;
int s; /* Socket */
z = shutdown(s, SHUT_WR);
if ( z == -1 )
perror("shutdown()");

Shutting down the writing end of a socket solves a number of thorny problems. They are
• Flushes out the kernel buffers that contain any pending data to be sent. Data is buffered by the kernel networking software to improve performance.
• Sends an end-of-file indication to the remote socket. This tells the remote reading process that no more data will be sent to it on this socket.
• Leaves the partially shutdown socket open for reading. This makes it possible to receive confirmation messages after the end-of-file indication has been sent on the socket.
• Disregards the number of open references on the socket. Only the last close(2) on a socket will cause an end-of-file indication to be sent.

The last point requires a bit of explanation, which is provided in the next section.

Dealing with Duplicated Sockets

If a socket's file descriptor is duplicated with the help of a dup(2) or a dup2(2) function call, then only the last outstanding close(2) call actually closes down the socket. This happens
because the other duplicated file descriptors are still considered to be in use. This is demonstrated in the following code:
Example

int s; /* Existing socket */
int d; /* Duplicated socket */
d = dup(s); /* duplicate this socket */
close(s); /* nothing happens yet */
close(d); /* last close, so shutdown socket */

In the example, the first close(2) call would have no effect. It would make no difference which socket was closed first. Closing either s or d first would still leave one outstanding file descriptor for the same socket. Only when closing the last surviving file descriptor for that socket would a close(2) call have any effect. In the example, the close of the d file descriptor closes down the socket.
The shutdown(2) function avoids this difficulty. Repeating the example code, the problem is solved using the shutdown(2) function:

Example
int s; /* Existing socket */
int d; /* Duplicated socket */
d = dup(s); /* duplicate this socket */
shutdown(s,SHUT_RDWR); /* immediate shutdown */

Even though the socket s is also open on file unit d, the  shutdown(2) function immediately causes the socket to perform its shutdown duties as requested. This naturally affects both the open
file descriptors s and d because they both refer to the same socket.
Another way this problem is manifested is after a fork(2) function has been called upon. Any sockets that existed prior to a fork operation would be duplicated in the child process.

TIP
Use the shutdown(2) function instead of the close(2) function whenever immediate or partial shutdown action is required. Duplicated file descriptors from dup(2), dup2(2), or fork(2) operations can prevent a close(2) function from initiating any shutdown action until the last outstanding descriptor is closed.

Shutting down Reading from a Socket

Shutting down the read side of the socket causes any pending read data to be silently ignored. If more data is sent from the remote socket, it too is silently ignored. Any attempt by the process to
read from that socket, however, will have an error returned to it. This is often done to enforce protocol or to help debug code.
Writing a Client/Server Example You have now looked at enough of the socket API set to start having some fun with it. In this section, you examine, compile, and test a simple client and server process that communicates with a pair of sockets.
To keep the programming code to a bare minimum, one program will start and then fork into a client process and a server process. The child process will assume the role of the client program, whereas
the original parent process will perform the role of the server. Figure 1.3 illustrates the relationship of the parent and child processes and the sockets that will be used.


Figure 1.3:
A Client / Server example using fork(2) and socketpair(2).

The parent process is the original starting process. It will immediately ask for a pair of sockets by calling socketpair(2) and then fork itself into two processes by calling fork(2).
The server will accept one request, act on that request, and then exit. The client likewise in this example will issue one request, report the server response, and then exit. The request will take the form of the third argument to the strftime(3) function. This is a
format string, which will be used to format a date and time string. The server will obtain the current date and time at the time that the request is received. The server will use the client's request string to format it into a final string, which is returned to the client.


Sample Example code


Click here to see sample code



No comments:

Post a Comment