This is the first post in a series on Network Programming.
For part two of the series click here.
Have you ever wondered how data travels over the internet? Ever pondered how most of the cool technologies you use like Dropbox, Bittorrent, Email, web servers, works on a lower level?
The answer is via User Datagram Protocol (UDP) and Transmission Control Protocl (TCP) sockets.
Honestly, the Wikipedia pages for UDP and TCP probably describe them better than I could, but the basic gist is that UDP is connectionless and does not have error correction and reliable transmission built into it and that TCP has the concept of connections, error correction, and reliable transmission, all at the cost of speed.
We are going to use the python programming language to examine sockets in this series on network programming.
Consider the following program:
#!/usr/bin/env python
# server.py
from __future__ import print_function
import socket
def main(*args, **kwargs):
server_socket = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
server_socket.listen(5)
print(server_socket.getsockname())
server_socket.shutdown(socket.SHUT_RDWR)
server_socket.close()
if __name__ == '__main__':
main()
Line 8 creates the server socket by calling socket.socket. The python documentation for this method describes this in more detail, but basically we’re saying that we want an IPv4 TCP socket.
Line 9 then tells to socket to listen for connections and to allow a connection backlog queue with no more than five connections (see docs). This doesn’t mean that you can only have a maximum of 5 connections (more on that later).
Normally when you run a server you want to specify the IP address and port which it listens on, but for now we don’t care about that. If you call socket.listen without calling socket.bind, then your operating system will select an open port for you to use.
Line 10 prints out the IP address and port number the operating system gave us.
I imagine lines 11 and 12 probably look a bit redundant, but there are very subtle differences between socket.shutdown and socket.close. This stack overflow post describes it in detail (skip over the answer which was marked as best and look at the one with the most upvotes instead. IMHO that is a better answer)
Essentially server_socket.shutdown(socket.SHUT_RDWR) sends the IP packets necessary to tell the other end of the connection that server_socket will no longer be receiving or sending any data over this connection and server_socket.close() deallocates the reference to the socket from memory. A common annoyance for people starting to do network programming using TCP is that when they “shut down” their servers they can’t restart them because the socket they requested is “already in use.” Generally most people don’t realize how to correctly close a socket connection and the connection “stays locked” after they close their program.
So let’s make a simple, single-threaded chat server:
#!/usr/bin/env python
from __future__ import print_function
import socket
def main(*args, **kwargs):
print('creating server socket')
server_socket = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
print('done. now binding to address')
server_socket.bind(('0.0.0.0', 54321))
print('done. now listening for connections')
server_socket.listen(20)
try:
while True:
client_socket, client_address = server_socket.accept()
print('new client connected from: ', client_address)
content = client_socket.recv(100)
print('#### new data received ####')
print(content)
print('#### finished receiving data ####')
client_socket.shutdown(socket.SHUT_RDWR)
client_socket.close()
finally:
server_socket.shutdown(socket.SHUT_RDWR)
server_socket.close()
if __name__ == '__main__':
main()
This time we do want a specific IP and port so in line 11 we bind to all ip addresses on this physical machine (that’s why we use the ip address 0.0.0.0) and we request port 54321. If you have some other program running which is already bound to this port, then server.py will error out at this point.
Notice that basically everything we do after we start listening on our server socket is within a try/finally block. This way, if anything bad happens within the try block, we will still gracefully shutdown our server using socket.shutdown and socket.close.
Also note that we’re doing an infinite loop here. That way we’ll be able to continue to keep getting connections even after our first one is finished.
So within our infinite loop, all we do is accept a new connection (line 16), receive a maximum of 100 bytes of data from the connection (line 18), and then close the connection (lines 20 and 21).
So go ahead and run this program. You’ll see that it printed out everything until the “done. now listening to connections” and that it doesn’t appear to have done anything after that.
This is because we are using blocking sockets (I’ll touch on that a bit later). Essentially our program is “stuck waiting” on line 16, or rather it’s waiting for something to connect to it. This program will literally wait forever for a new connection (because we did not specify any timeouts on this blocking socket). Don’t worry I’ll get into timeouts later as well.
So let’s get a client together to talk to it:
#!/usr/bin/env python
# client.py
from __future__ import print_function
import socket
def main(*args, **kwargs):
client_socket = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
client_socket.connect(('0.0.0.0', 54321))
client_socket.send('the rain in spain stays mainly in the plain')
client_socket.shutdown(socket.SHUT_RDWR)
client_socket.close()
if __name__ == '__main__':
main()
Notice that this looks very similar to the server but does a few things differently. On line 9 we connect to server.py (it’ll throw an error if it can’t connect, say if you closed your server script) Then we send the text “the rain in spain stays mainly in the plain” and close the connection.
Go ahead and run it and you’ll see your server print out “the rain in spain stays mainly in the plain.”
Because of the infinite loop in server.py, you can run client.py over and over again and server.py will “serve” them all.
But remember from server.py that we only read 100 bytes, so what happens if you change up client.py to send more than 100 bytes?
#!/usr/bin/env python
# client.py
from __future__ import print_function
import socket
def main(*args, **kwargs):
client_socket = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
client_socket.connect(('0.0.0.0', 54321))
client_socket.send('banana '*100)
client_socket.shutdown(socket.SHUT_RDWR)
client_socket.close()
if __name__ == '__main__':
main()
When we sent the string “banana ” repeated 100 times we got an error.
File "./client.py", line 16, in <module>
main()
File "./client.py", line 11, in main
client_socket.shutdown(socket.SHUT_RDWR)
File "/usr/lib/python2.7/socket.py", line 224, in meth
return getattr(self._sock,name)(*args)
socket.error: [Errno 107] Transport endpoint is not connected
The error is because our server only read out the first 100 bytes we sent (about 14 bananas) and then closed the connection. I’m sure you’re thinking “why don’t we just increase the number of bytes read?” We can, but eventually we’ll want to be able to read any amount of data.
But that’s for Part Two of this series.