For anyone used to traditional desktop programming switching to the web, one of the hardest things to wrap your head around is the lack of state. There’s no inherent way of keeping information around when you’re interacting with a user. Each page request starts with a blank slate, you don’t have in-memory variables that can keep track of useful information.
If you’re working in PHP, this is where sessions look like a great solution. They’re a general-purpose mechanism built around cookies, and let you store arbitrary variables that are remembered across all page requests from a particular user. Under the hood, they set a single sessionid cookie on the user’s machine, that’s sent along with any subsequent page requests. That id is used to load a file from the server’s disk containing a list of variable names and values that are stored for that user. Any changes or additions the server makes to the data are saved into that same file.
From the programmers point of view, you call session_start() and then have access to a global associative array, $_SESSION[]. You set and read entries in this array, and they remain persistent for page requests for a given user as long as they keep sending the cookie. This all looks like a very natural model for storing state, one that traditional app programmers would feel very comfortable with. You could do something similar by setting cookies directly, but then you’re exposing a lot of information to the user, and opens the door to malicious tinkering with your internal server variables.
As you might have guessed, there’s no such thing as a free lunch, and sessions have some significant drawbacks. The data is stored in a file on the server’s disk, which means that you’re tied to a single server and can’t load balance without duplicating that file and any changes across all machines. The file is locked so it can only be accessed by one request at a time, which means that simultaneous requests get serialized, which is a serious problem if you have a long-running calculation in one of them. The locking also results in deadlocks if you’re making sub-requests within the main page request to get parts of the page, and passing the session id cookie manually. In general the behind-the-scenes nature of sessions make it tough to tell who’s connected and debug state problems.
Some of these issues are fixed if you write your own handler to back up the sessions to a database, rather than to file. You still end up locking though, and the database access makes the operation much more expensive. It also requires some planning ahead to know exactly what state you want to store, which abandons a lot of the flexibility that makes sessions so useful.
I ended up with my own API for storing and reading information about each session in a database, using a special cookie ID as a key, generated once a user logs in and is authenticated. I also have a convention where the ID is passed through POST or GET parameters to make sub-requests very easy. It isn’t that different from storing sessions in a database, but it does avoid the locking problem, and makes the database cost explicit on the programming side. The fact that it’s associated with a particular user, and can only be created by logging in, makes it harder to spoof too, and lets you limit the number of connections for a single user.