This information was culled from http://www.dpo.uab.edu/Email/attach.html on 10-9-2002. Anatomy of an Email Attachment ============================== Ever wondered how your email program knows that an incoming message has an attachment? The following example illustrates the simplest case: the attachment is a simple text file that is to be stored on the hard disk under the (suggested) name "textfile.txt". First, an email message with no attachments ------------------------------------------- Email messages include more than just the text of the message you typed. Your email program begins each message with a "header" - several lines of information that indicate the email addresses of both sender and recipient, the subject of the message, plus other information. There is a standard that applies to such headers, and indeed to the entire format of an email message. This standard is called MIME, which stands for Multipurpose Internet Mail Extensions. This standard is documented in a series of RFC documents (Request For Comments) that can be obtained at the Ohio State University Computer and Information Science Web site. The RFCs most relevant to the following discussion are RFC-1521 and RFC-1806. Suppose you used your email program to create an email message to Joe Bologna (email address: joe@jbpizza.com) with the subject "Sicilian". Suppose also that your name is John Henry and your email address is jhenry@unw.edu, and that the body of the message was the phrase "I'm hungry, Joe. Cheers, John." The email message would actually look something like the following after your email program created the appropriate MIME header information: MIME-Version: 1.0 From: John Henry To: Joe Bologna Subject: Sicilian Content-Type: text/plain I'm hungry, Joe. Cheers, John. If the above text were stored in a file called "mymail.txt" on a machine running the UNIX operating system, and the standard sendmail program resided on that machine in the directory /usr/lib, you could send this message as follows (assuming you were logged into the UNIX system): /usr/lib/sendmail -t -n < mymail.txt Of course, normally your user-friendly email program does this work for you and the operating system calls are done in a manner very transparent to the user. The '-t' command line option tells the sendmail program to look for the 'To:' field in the body of the message itself. The '-n' option tells sendmail to not accept email aliases - that is, sendmail should expect the fully-qualified email address (in this case joe@jbpizza.com) to be present in the 'To:' field of the message header. Now for an attachment --------------------- To attach additional information to the email message, we would use the following format: MIME-Version: 1.0 From: John Henry To: Joe Bologna Subject: Sicilian Content-Type: multipart/mixed; boundary=boundarystring --boundarystring Content-Type: text/plain I'm hungry, Joe. See the attachment for my pizza order. Cheers, John --boundarystring Content-Type: text/plain Content-Disposition: attachment; filename="textfile.txt" I want one large Sicilian-style pizza with mushrooms and black olives. --boundarystring-- The first "Content-Type:" field in the message header tells the receiving email program that this message has more than one component, and each component will be separated by the string of characters "boundarystring". Notice that the word "boundarystring" is prefaced with two hyphens in all instances, and the last time it is also followed immediately by two hyphens. The receiving email program knows when the last component of the message has been read when it reads the boundary string followed by two hyphens. The receiving email program also knows that the sender wished the attachment (second part of the email message) to be saved to the recipient's hard disk because of the "Content-Disposition:" header field. This field instructs the recipient's email program to try saving the attachment text in the file "textfile.txt" in the standard attachment directory. If a file by that name already exists, the recipient's email program will modify the name slightly (adding a number, for example) until the name is unique. The only other real important things to note about this header structure are the mandatory blank lines following the subheader for each part of the email message. If these blank lines are missing, the recipient's email program will have difficulty telling where the header information stops and the text of the message begins. The attachment was in this case a text file. It is also common to send binary files (such as computer programs or compressed archives) as attachments. In such cases, the sender must first encode the binary file so that it can be sent easily over the Internet. One common encoding scheme is known as base64, which is described nicely in RFC 1521. Below is an abbreviated version of an email message containing a binary attachment encoded using the base64 method (note that the three dots indicate that a large portion of the encoded binary file has been left out): MIME-Version: 1.0 From: root@mycomputer.edu To: plewis@uconnvm.uconn.edu Subject: compressed tar file Content-Type: multipart/mixed; boundary=boundarystring --boundarystring Content-Type: text/plain Attached is the file bstcvs.tar.Z, which has been base64 encoded. --boundarystring Content-Type: application/octet-stream; name="bstcvs.tar.Z" Content-Transfer-Encoding: base64 Content-Disposition: attachment; filename="bstcvs.tar.Z" H52QLl6MsTNHzps3dF4AWMiwocOHECNKnEixYkUQNGDcuAEDBAAQIEF2/Ahyow2PIVOG 7AiCIwwaGWPYkFEDZAwYNWzA+FjDos+fQIMKhVhnDp0wckACMIhwqEQxRp1KnUq1qtWr WKsGHFjwYMIhVqZIefKEisKsaKti1HijJsmWNtyGNIlSZUqWLmfcmFEDxowZNmX83Qmi . . . 2GrADMF7fHdKKtZWRHrYg9gD1wSUy9ja2MrDV8/T2HDYJtNb0wu8UNh7vHbY6NPm2J3Y twyZNLB84MpcFpHAx72TwNLXlcAQxO/X+Nj52PrY+9j82P3Y/tj/2ADZAdkC2QPZBNkF 2QbZ+NN= --boundarystring-- This information was volunteered by Paul O. Lewis. Please send any comments, corrections or suggestions about this page to me at plewis@uconnvm.uconn.edu