Python Pandas Read/Write CSV File And Convert To Excel File Example

Pandas is a third-party python module that can manipulate different format data files, such as csv, json, excel, clipboard, html etc. This example will tell you how to use Pandas to read / write csv file, and how to save the pandas.DataFrame object to an excel file.

1. How To Use Pandas In Python Application.

1.1 Install Pandas.

First you should make sure pandas module has been installed use pip list command in a terminal. If you can not find the pandas module in the popup list, you need to run pip install pandas command to install it.

Also make sure XlsxWriter has been installed, otherwise you should run pip install XlsxWriter to install it. Because this example will save data to excel file with pandas, so this python module should be installed also.

1. 2 Import Pandas.

This is very simple, just add import pandas command at the beginning of the python source file to import it, then you can use it’s various methods.

2. Read CSV File Use Pandas.

To read csv file use pandas is only one line code. The returned object is a pandas.DataFrame object. It represent whole data of the csv file, you can use it’s various method to manipulate the data such as order, query, change index, columns etc.

data_frame = pandas.read_csv(csv_file)

3. Pandas Write Data To CSV File.

After you edit the data in the pandas.DataFrame object, you can call it’s to_csv method to save the new data into a csv file.

data_frame.to_csv(csv_file_path)

4. Pandas Write Data To Excel File.

  1. Create a file writer use pandas.ExcelWriter method.
    excel_writer = pandas.ExcelWriter(excel_file_path, engine=xlsxwriter)

  2. Call DataFrame’s to_excel method to set the DataFrame data to special excel file sheet.
    data_frame.to_excel(excel_writer, ‘Employee Info’)

  3. Call the writer’s save method to save the data to excel file.
    excel_writer.save()

5. DataFrame Operation Methods.

5.1 Sort DataFrame Data By One Column.

Please note the data column name is case sensitive.

data_frame.sort_values(by=['Salary'], ascending=False)

5.2 Query DataFrame Data In A Range.

data_frame = data_frame.loc[(data_frame['Salary'] > 10000) & (data_frame['Salary'] < 20000)]

6. Python Pandas Read/Write CSV File And Save To Excel File Example.

python pandas read write csv file and save to excel file example source files

6.1 CSVExcelConvertionExample.py

'''
Created on Aug 26, 2018
@author: zhaosong
'''

import pandas
import os

# Read csv file use pandas module.
def read_csv_file_by_pandas(csv_file):
    data_frame = None
    if(os.path.exists(csv_file)):
        data_frame = pandas.read_csv(csv_file)
        
        print("------------------data frame all----------------------")
        print(data_frame)

        print("------------------data frame index----------------------")
        print(data_frame.index)

        data_frame = data_frame.set_index('Name')
        print("------------------set Name column as data frame index----------------------")
        print(data_frame.index)
        
        print("------------------data frame columns----------------------")
        print(data_frame.columns)
        
        print("------------------data frame values----------------------")
        print(data_frame.values)
        
        print("------------------data frame hire date series----------------------")
        print(data_frame['Hire Date'])
        
        print("------------------select multiple columns from data frame----------------------")
        print(data_frame[['Salary', 'Hire Date']])
    else:
        print(csv_file + " do not exist.")    

    return data_frame

# Write pandas.DataFrame object to a csv file.
def write_to_csv_file_by_pandas(csv_file_path, data_frame):
    data_frame.to_csv(csv_file_path)
    print(csv_file_path + ' has been created.')

# Write pandas.DataFrame object to an excel file.
def write_to_excel_file_by_pandas(excel_file_path, data_frame):
    excel_writer = pandas.ExcelWriter(excel_file_path, engine='xlsxwriter')
    data_frame.to_excel(excel_writer, 'Employee Info')
    excel_writer.save()
    print(excel_file_path + ' has been created.')    

# Sort the data in DataFrame object by name that data type is string.
def sort_data_frame_by_string_column(data_frame):
    data_frame = data_frame.sort_values(by=['Name'])
    print("--------------Sort data frame by string column---------------")
    print(data_frame)

# Sort DataFrame data by Hire Date that data type is datetime.
def sort_data_frame_by_datetime_column(data_frame):
    data_frame = data_frame.sort_values(by=['Hire Date'])
    print("--------------Sort data frame by datetime column---------------")
    print(data_frame)

# Sort DataFrame data by Salary that data type is number.    
def sort_data_frame_by_number_column(data_frame):
    data_frame = data_frame.sort_values(by=['Salary'], ascending=False)
    print("--------------Sort data frame by number column desc---------------")
    print(data_frame)

# Get DataFrame data list in salary range.    
def get_data_in_salary_range(data_frame):
    data_frame = data_frame.loc[(data_frame['Salary'] > 10000) & (data_frame['Salary'] < 20000)]
    data_frame = data_frame.sort_values(by=['Salary'])
    print("-------------- Employee info whose salary between 10000 and 20000---------------")
    print(data_frame)

# Get DataFrame data list in hire date range.       
def get_data_in_hire_date_range(data_frame):
    min_hire_date = '2010-01-01'
    max_hire_date = '2017-01-01'
    # Get data that Hire Date column in range min_hire_date and max_hire_date. The Hire Date value format had better be 'yyyy-mm-dd'
    data_frame = data_frame.loc[(data_frame['Hire Date'] > min_hire_date) & (data_frame['Hire Date'] < max_hire_date)]
    data_frame = data_frame.sort_values(by=['Hire Date'])
    print("-------------- Employee info whose Hire Date between 2010/01/01 and 2017/01/01---------------")
    print(data_frame)
  
# Get DataFrame data list in name range.       
def get_data_in_name_range(data_frame):
    start_name = 'jerry'
    end_name = 'kevin'
    # First sort the data in the data_frame by Name column.
    data_frame = data_frame.sort_values(by=['Name'])
    # Because the Name column is the index column, so use the value in loc directly.
    data_frame = data_frame.loc[start_name:end_name]
    print("-------------- Employee info whose Name value between jerry and kevin---------------")
    print(data_frame)
    
if __name__ == '__main__':
    data_frame = read_csv_file_by_pandas("./employee_info.csv")
    sort_data_frame_by_string_column(data_frame)
    sort_data_frame_by_datetime_column(data_frame)
    sort_data_frame_by_number_column(data_frame)
    get_data_in_salary_range(data_frame)
    get_data_in_hire_date_range(data_frame)
    get_data_in_name_range(data_frame)
    write_to_csv_file_by_pandas("./employee_info_new.csv", data_frame)
    write_to_excel_file_by_pandas("./employee_info_new.xlsx", data_frame)

6.2 employee_info.csv

This is the source csv file, the other two file are all generated by the two write method.

Name,Hire Date,Salary
jerry,2010-01-01,16000
tom,2011-08-19,6000
kevin,2009-02-08,13000
richard,2012-03-19,5000
jackie,2015-06-08,28000
steven,2008-02-01,36000
jack,2006-09-19,8000
gary,2018-01-16,19000
john,2017-10-01,16600

6.3 Example Code Execution Result.

------------------data frame all----------------------
      Name   Hire Date  Salary
0    jerry  2010-01-01   16000
1      tom  2011-08-19    6000
2    kevin  2009-02-08   13000
3  richard  2012-03-19    5000
4   jackie  2015-06-08   28000
5   steven  2008-02-01   36000
6     jack  2006-09-19    8000
7     gary  2018-01-16   19000
8     john  2017-10-01   16600

------------------data frame index----------------------
RangeIndex(start=0, stop=9, step=1)

------------------set Name column as data frame index----------------------
Index(['jerry', 'tom', 'kevin', 'richard', 'jackie', 'steven', 'jack', 'gary',
       'john'],
      dtype='object', name='Name')

------------------data frame columns----------------------
Index(['Hire Date', 'Salary'], dtype='object')

------------------data frame values----------------------
[['2010-01-01' 16000]
['2011-08-19' 6000]
['2009-02-08' 13000]
['2012-03-19' 5000]
['2015-06-08' 28000]
['2008-02-01' 36000]
['2006-09-19' 8000]
['2018-01-16' 19000]
['2017-10-01' 16600]]

------------------data frame hire date series----------------------
Name
jerry      2010-01-01
tom        2011-08-19
kevin      2009-02-08
richard    2012-03-19
jackie     2015-06-08
steven     2008-02-01
jack       2006-09-19
gary       2018-01-16
john       2017-10-01
Name: Hire Date, dtype: object

------------------select multiple columns from data frame----------------------
         Salary   Hire Date
Name                       
jerry     16000  2010-01-01
tom        6000  2011-08-19
kevin     13000  2009-02-08
richard    5000  2012-03-19
jackie    28000  2015-06-08
steven    36000  2008-02-01
jack       8000  2006-09-19
gary      19000  2018-01-16
john      16600  2017-10-01

--------------Sort data frame by string column---------------
          Hire Date  Salary
Name                       
gary     2018-01-16   19000
jack     2006-09-19    8000
jackie   2015-06-08   28000
jerry    2010-01-01   16000
john     2017-10-01   16600
kevin    2009-02-08   13000
richard  2012-03-19    5000
steven   2008-02-01   36000
tom      2011-08-19    6000

--------------Sort data frame by datetime column---------------
          Hire Date  Salary
Name                       
jack     2006-09-19    8000
steven   2008-02-01   36000
kevin    2009-02-08   13000
jerry    2010-01-01   16000
tom      2011-08-19    6000
richard  2012-03-19    5000
jackie   2015-06-08   28000
john     2017-10-01   16600
gary     2018-01-16   19000

--------------Sort data frame by number column desc---------------
          Hire Date  Salary
Name                       
steven   2008-02-01   36000
jackie   2015-06-08   28000
gary     2018-01-16   19000
john     2017-10-01   16600
jerry    2010-01-01   16000
kevin    2009-02-08   13000
jack     2006-09-19    8000
tom      2011-08-19    6000
richard  2012-03-19    5000

-------------- Employee info whose salary between 10000 and 20000---------------
        Hire Date  Salary
Name                     
kevin  2009-02-08   13000
jerry  2010-01-01   16000
john   2017-10-01   16600
gary   2018-01-16   19000

-------------- Employee info whose Hire Date between 2010/01/01 and 2017/01/01---------------
          Hire Date  Salary
Name                       
tom      2011-08-19    6000
richard  2012-03-19    5000
jackie   2015-06-08   28000

-------------- Employee info whose Name value between jerry and kevin---------------
        Hire Date  Salary
Name                     
jerry  2010-01-01   16000
john   2017-10-01   16600
kevin  2009-02-08   13000

./employee_info_new.csv has been created.

./employee_info_new.xlsx has been created.
(Visited 286 times, 17 visits today)
READ :   Copy Rows Between Excel Sheet Use Apache POI

1 Comment


  1. #Jerry Zhao#, Thank you very much, you really helped me with this tutorial very clear and explicit. I just saved thousands of hair on my head!:)

    Reply

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.