Python Parse Emails And Attachments From POP3 Server Example

This example will tell you how to use python code to retrieve emails from the pop3 email server and parse out the email info such as from/to email address, subject, text content, and attachment files.

In this example, I create a local SMTP and pop3 email server with apache James. You can refer article How To Connect Localhost Apache James With Thunderbird to learn how to do it.

I create two email account [email protected] and [email protected]. I send an email from [email protected] to [email protected]. The email attach two image files and one pdf file.

Below python source code will retrieve the email from the Apache James pop3 server and parsed out the email from email address, to email address, email subject, email text content and save the three attached files ( two image files and one pdf file ) to a local folder where the python script run. The example uses python poplib and email.parser.Parser module to implement this function.

The python poplib module is used to connect to the pop3 email server and retrieve email messages data from the user email box. The email.parser.Parser module is used to parse out one email message ( instance of MIMEMessage class) data such as from/to address, subject, content, and attached files.

'''
@author: zhaosong
'''
import poplib, os
from email.parser import Parser

# pop3 server domain.
pop3_server_domain = 'pop3.test.com'

# pop3 server connection object.
pop3_server_conn = None

'''
This method will connect to the global pop3 server 
and login with the provided user email and password.
'''
def connect_pop3_server(user_email, user_password):
    # use global pop3_server_conn variable in this function.
    global pop3_server_conn
    
    # if pop3 server connection object is null then create it.
    if(pop3_server_conn is None):
        print('********************************* start connect_pop3_server *********************************')
        # create pop3 server connection object.
        pop3_server_conn = poplib.POP3(pop3_server_domain)
        pop3_server_conn.set_debuglevel(1)
        
        # get pop3 server welcome message and print on console.
        welcome_message = pop3_server_conn.getwelcome()
        print('Below is pop3 server welcome messages : ')
        print(welcome_message)
        
        # send user email and password to pop3 server.
        pop3_server_conn.user(user_email)
        pop3_server_conn.pass_(user_password)
    
    return pop3_server_conn

'''
Close the pop3 server connection and release the connection object.
'''
def close_pop3_server_connection():
    global pop3_server_conn
    if pop3_server_conn != None:
        pop3_server_conn.quit()
        pop3_server_conn = None

'''
Get email messages status of the given user.
'''
def get_user_email_status(user_email, user_password):
    
    # connect to pop3 server with the user account.
    connect_pop3_server(user_email, user_password)

    print('********************************* start get_user_email_status *********************************')
    
    # get user total email message count and email file size. 
    (messageCount, totalMessageSize) = pop3_server_conn.stat()
    print('Email message numbers : ' + str(messageCount))
    print('Total message size : ' + str(totalMessageSize) + ' bytes.')
    

'''
Get user email index info。
'''
def get_user_email_index(user_email, user_password):
    
    connect_pop3_server(user_email, user_password)
    print('********************************* start get_user_email_index *********************************')
    
    # get all user email list info from pop3 server.
    (resp_message, mails_list, octets) = pop3_server_conn.list()
    # print server response message.
    print('Server response message : ' + str(resp_message))
    # loop in the mail list.
    for mail in mails_list:
        # print each mail object info.
        print('Mail : ' + str(mail))
    
    print('Octets number : ' + str(octets))
    

'''
Get user account email by the provided email account and email index number.
'''
def get_email_by_index(user_email, user_password, email_index):
    
    connect_pop3_server(user_email, user_password)
    print('********************************* start get_email_by_index *********************************')

    # retrieve user email by email index. 
    (resp_message, lines, octets) = pop3_server_conn.retr(email_index)
    print('Server response message : ' + str(resp_message))
    print('Octets number : ' + str(octets))
   
    # join each line of email message content to create the email content and decode the data with utf-8 charset encoding.  
    msg_content = b'\r\n'.join(lines).decode('utf-8')
    # print out the email content string.
    # print('Mail content : ' + msg_content)
    
    # parse the email string to a MIMEMessage object.
    msg = Parser().parsestr(msg_content)
    parse_email_msg(msg)
    
 
# Parse email message.   
def parse_email_msg(msg):
    
    print('********************************* start parse_email_msg *********************************')
    
    parse_email_header(msg)
     
    parse_email_body(msg)    
    
# Delete user email by index.   
def delete_email_from_pop3_server(user_email, user_password, email_index):
    connect_pop3_server(user_email, user_password)   
    print('********************************* start delete_email_from_pop3_server *********************************')
    
    pop3_server_conn.dele(email_index)
    print('Delete email at index : ' + email_index)
    
    
# Parse email header data.    
def parse_email_header(msg):
    print('********************************* start parse_email_header *********************************')
    # just parse from, to, subject header value.
    header_list = ('From', 'To', 'Subject')
    
    # loop in the header list
    for header in header_list:
        # get each header value.
        header_value = msg.get(header, '')
        print(header + ' : ' + header_value)    
      
# Parse email body data.      
def parse_email_body(msg):
    print('********************************* start parse_email_body *********************************')
    
    # if the email contains multiple part.
    if (msg.is_multipart()):
        # get all email message parts.
        parts = msg.get_payload()
        # loop in above parts.
        for n, part in enumerate(parts):
            # get part content type.
            content_type = part.get_content_type()
            print('---------------------------Part ' + str(n) + ' content type : ' + content_type + '---------------------------------------')
            parse_email_content(msg)                
    else:
       parse_email_content(msg) 

# Parse email message part data.            
def parse_email_content(msg):
    # get message content type.
    content_type = msg.get_content_type().lower()
    
    print('---------------------------------' + content_type + '------------------------------------------')
    # if the message part is text part.
    if content_type=='text/plain' or content_type=='text/html':
        # get text content.
        content = msg.get_payload(decode=True)
        # get text charset.
        charset = msg.get_charset()
        # if can not get charset. 
        if charset is None:
            # get message 'Content-Type' header value.
            content_type = msg.get('Content-Type', '').lower()
            # parse the charset value from 'Content-Type' header value.
            pos = content_type.find('charset=')
            if pos >= 0:
                charset = content_type[pos + 8:].strip()
                pos = charset.find(';')
                if pos>=0:
                    charset = charset[0:pos]           

        if charset:
            content = content.decode(charset)
                
        print(content)
    # if this message part is still multipart such as 'multipart/mixed','multipart/alternative','multipart/related'
    elif content_type.startswith('multipart'):
        # get multiple part list.
        body_msg_list = msg.get_payload()
        # loop in the multiple part list.
        for body_msg in body_msg_list:
            # parse each message part.
            parse_email_content(body_msg)
    # if this message part is an attachment part that means it is a attached file.        
    elif content_type.startswith('image') or content_type.startswith('application'):
        # get message header 'Content-Disposition''s value and parse out attached file name.
        attach_file_info_string = msg.get('Content-Disposition')
        prefix = 'filename="'
        pos = attach_file_info_string.find(prefix)
        attach_file_name = attach_file_info_string[pos + len(prefix): len(attach_file_info_string) - 1]
        
        # get attached file content.
        attach_file_data = msg.get_payload(decode=True)
        # get current script execution directory path. 
        current_path = os.path.dirname(os.path.abspath(__file__))
        # get the attached file full path.
        attach_file_path = current_path + '/' + attach_file_name
        # write attached file content to the file.
        with open(attach_file_path,'wb') as f:
            f.write(attach_file_data)
            
        print('attached file is saved in path ' + attach_file_path)    
                
    else:
        content = msg.as_string()
        print(content)         
    
if __name__ == '__main__':
    
    user_email = '[email protected]'
    
    user_password = 'jerry'
    
    get_user_email_status(user_email, user_password)
    
    get_user_email_index(user_email, user_password)
    
    get_email_by_index(user_email, user_password, 1)
    
    close_pop3_server_connection()

Below is the above script output.

********************************* start connect_pop3_server *********************************
Below is pop3 server welcome messages :
b'+OK <[email protected]> POP3 server (JAMES POP3 Server ) ready '
*cmd* 'USER [email protected]'
*cmd* 'PASS jerry'
********************************* start get_user_email_status *********************************
*cmd* 'STAT'
*stat* [b'+OK', b'1', b'157795']
Email message numbers : 1
Total message size : 157795 bytes.
********************************* start get_user_email_index *********************************
*cmd* 'LIST'
Server response message : b'+OK 1 157795'
Mail : b'1 157795'
Octets number : 10
********************************* start get_email_by_index *********************************
*cmd* 'RETR 1'
Server response message : b'+OK Message follows'
Octets number : 157795
********************************* start parse_email_msg *********************************
********************************* start parse_email_header *********************************
From : admin <[email protected]>
To : [email protected]
Subject : test email
********************************* start parse_email_body *********************************
---------------------------Part 0 content type : text/plain---------------------------------------
---------------------------------multipart/mixed------------------------------------------
---------------------------------text/plain------------------------------------------
This is a test email from localhost james server.
---------------------------------application/pdf------------------------------------------
attached file is saved in path /Users/zhaosong/Documents/WorkSpace/dev2qa.com-example-code/PythonExampleProject/com/dev2qa/example/email/example.pdf
---------------------------------image/jpeg------------------------------------------
attached file is saved in path /Users/zhaosong/Documents/WorkSpace/dev2qa.com-example-code/PythonExampleProject/com/dev2qa/example/email/python-docker.jpeg
---------------------------------image/png------------------------------------------
attached file is saved in path /Users/zhaosong/Documents/WorkSpace/dev2qa.com-exampe-code/PythonExampleProject/com/dev2qa/example/email/python-logo.png
*cmd* 'QUIT'

Leave a Comment

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.