Python Pandas Read/Write CSV File And Convert To Excel File Example

Pandas is a third-party python module that can manipulate different format data files, such as CSV, JSON, Excel, clipboard, HTML, etc. This example will tell you how to use Pandas to read/write CSV files, and how to save the pandas.DataFrame object to an excel file.

1. How To Use Pandas In Python Application.

1.1 Install Python Pandas Module.

  1. First, you should make sure the python pandas module has been installed using the pip show pandas command in a terminal. If it shows can not find the pandas module in the terminal, you need to run the pip install pandas command to install it.
    $ pip show pandas
    WARNING: Package(s) not found: pandas
    
    $ pip install pandas
    Collecting pandas
      Downloading pandas-1.2.3-cp37-cp37m-macosx_10_9_x86_64.whl (10.4 MB)
         |████████████████████████████████| 10.4 MB 135 kB/s 
    Collecting pytz>=2017.3
      Downloading pytz-2021.1-py2.py3-none-any.whl (510 kB)
         |████████████████████████████████| 510 kB 295 kB/s 
    Requirement already satisfied: numpy>=1.16.5 in /Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages (from pandas) (1.20.1)
    Requirement already satisfied: python-dateutil>=2.7.3 in /Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages (from pandas) (2.8.1)
    Requirement already satisfied: six>=1.5 in /Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages (from python-dateutil>=2.7.3->pandas) (1.15.0)
    Installing collected packages: pytz, pandas
    Successfully installed pandas-1.2.3 pytz-2021.1
  2. Because this example will save data to an excel file with python pandas module, so it should install the python XlsxWriter module also. Run the command pip show XlsxWriter to see whether the python XlsxWriter module has been installed or not, if not you should run the pip install XlsxWriter to install it.
    $ pip show XlsxWriter
    WARNING: Package(s) not found: XlsxWriter
    
    $ pip install XlsxWriter
    Collecting XlsxWriter
      Downloading XlsxWriter-1.3.7-py2.py3-none-any.whl (144 kB)
         |████████████████████████████████| 144 kB 852 kB/s 
    Installing collected packages: XlsxWriter
    Successfully installed XlsxWriter-1.3.7

1. 2 Import Python Pandas Module In Python Source File.

This is very simple, just add import pandas command at the beginning of the python source file to import it, then you can use it’s various methods.

2. Read CSV File Use Pandas.

To read a CSV file use python pandas is very easy, you just need to invoke the pandas module’s read_csv method with the CSV file path. The returned object is a pandas.DataFrame object. It represents the whole data of the CSV file, you can use its various method to manipulate the data such as order, query, change index, columns, etc.

data_frame = pandas.read_csv(csv_file)

3. Pandas Write Data To CSV File.

After you edit the data in the pandas.DataFrame object, you can call its to_csv method to save the new data into a CSV file.

data_frame.to_csv(csv_file_path)

4. Pandas Write Data To Excel File.

  1. Create a file writer use pandas.ExcelWriter method.
    excel_writer = pandas.ExcelWriter(excel_file_path, engine='xlsxwriter')
  2. Call DataFrame’s to_excel method to set the DataFrame data to a special excel file sheet.
    data_frame.to_excel(excel_writer, 'Employee Info')
  3. Call the writer’s save method to save the data to an excel file.
    excel_writer.save()

5. Python Pandas DataFrame Operation Methods.

5.1 Sort DataFrame Data By One Column.

Please note the data column name is case sensitive.

data_frame.sort_values(by=['Salary'], ascending=False)

5.2 Query DataFrame Data In A Range.

data_frame = data_frame.loc[(data_frame['Salary'] > 10000) & (data_frame['Salary'] < 20000)]

6. Python Pandas Read/Write CSV File And Save To Excel File Example.

python-pandas-read-write-csv-file-and-save-to-excel-file-example-source-files

6.1 CSVExcelConvertionExample.py

'''
Created on Aug 26, 2018
@author: zhaosong
'''
import pandas
import os

# Read csv file use pandas module.
def read_csv_file_by_pandas(csv_file):
    data_frame = None
    if(os.path.exists(csv_file)):
        data_frame = pandas.read_csv(csv_file)
        
        print("------------------data frame all----------------------")
        print(data_frame)

        print("------------------data frame index----------------------")
        print(data_frame.index)

        data_frame = data_frame.set_index('Name')
        print("------------------set Name column as data frame index----------------------")
        print(data_frame.index)
        
        print("------------------data frame columns----------------------")
        print(data_frame.columns)
        
        print("------------------data frame values----------------------")
        print(data_frame.values)
        
        print("------------------data frame hire date series----------------------")
        print(data_frame['Hire Date'])
        
        print("------------------select multiple columns from data frame----------------------")
        print(data_frame[['Salary', 'Hire Date']])
    else:
        print(csv_file + " do not exist.")    

    return data_frame

# Write pandas.DataFrame object to a csv file.
def write_to_csv_file_by_pandas(csv_file_path, data_frame):
    
    data_frame.to_csv(csv_file_path)
    print(csv_file_path + ' has been created.')

# Write pandas.DataFrame object to an excel file.
def write_to_excel_file_by_pandas(excel_file_path, data_frame):
    
    excel_writer = pandas.ExcelWriter(excel_file_path, engine='xlsxwriter')
    data_frame.to_excel(excel_writer, 'Employee Info')
    
    excel_writer.save()
    print(excel_file_path + ' has been created.')    

# Sort the data in DataFrame object by name that data type is string.
def sort_data_frame_by_string_column(data_frame):
    
    data_frame = data_frame.sort_values(by=['Name'])
    print("--------------Sort data frame by string column---------------")
    print(data_frame)

# Sort DataFrame data by Hire Date that data type is datetime.
def sort_data_frame_by_datetime_column(data_frame):
    
    data_frame = data_frame.sort_values(by=['Hire Date'])
    print("--------------Sort data frame by datetime column---------------")
    print(data_frame)

# Sort DataFrame data by Salary that data type is number.    
def sort_data_frame_by_number_column(data_frame):
    
    data_frame = data_frame.sort_values(by=['Salary'], ascending=False)
    print("--------------Sort data frame by number column desc---------------")
    print(data_frame)

# Get DataFrame data list in salary range.    
def get_data_in_salary_range(data_frame):
    
    data_frame = data_frame.loc[(data_frame['Salary'] > 10000) & (data_frame['Salary'] < 20000)]
    data_frame = data_frame.sort_values(by=['Salary'])
    print("-------------- Employee info whose salary between 10000 and 20000---------------")
    print(data_frame)

# Get DataFrame data list in hire date range.       
def get_data_in_hire_date_range(data_frame):
    min_hire_date = '2010-01-01'
    max_hire_date = '2017-01-01'
    # Get data that Hire Date column in range min_hire_date and max_hire_date. The Hire Date value format had better be 'yyyy-mm-dd'
    data_frame = data_frame.loc[(data_frame['Hire Date'] > min_hire_date) & (data_frame['Hire Date'] < max_hire_date)]
    data_frame = data_frame.sort_values(by=['Hire Date'])
    print("-------------- Employee info whose Hire Date between 2010/01/01 and 2017/01/01---------------")
    print(data_frame)
  
# Get DataFrame data list in name range.       
def get_data_in_name_range(data_frame):
    start_name = 'jerry'
    end_name = 'kevin'
    # First sort the data in the data_frame by Name column.
    data_frame = data_frame.sort_values(by=['Name'])
    # Because the Name column is the index column, so use the value in loc directly.
    data_frame = data_frame.loc[start_name:end_name]
    print("-------------- Employee info whose Name value between jerry and kevin---------------")
    print(data_frame)
    
if __name__ == '__main__':
    data_frame = read_csv_file_by_pandas("./employee_info.csv")
    sort_data_frame_by_string_column(data_frame)
    
    sort_data_frame_by_datetime_column(data_frame)
    
    sort_data_frame_by_number_column(data_frame)
    
    get_data_in_salary_range(data_frame)
    
    get_data_in_hire_date_range(data_frame)
    
    get_data_in_name_range(data_frame)
    
    write_to_csv_file_by_pandas("./employee_info_new.csv", data_frame)
    
    write_to_excel_file_by_pandas("./employee_info_new.xlsx", data_frame)

6.2 employee_info.csv

This is the source CSV file, the other two files are all generated by the two write methods.

Name,Hire Date,Salary
jerry,2010-01-01,16000
tom,2011-08-19,6000
kevin,2009-02-08,13000
richard,2012-03-19,5000
jackie,2015-06-08,28000
steven,2008-02-01,36000
jack,2006-09-19,8000
gary,2018-01-16,19000
john,2017-10-01,16600

6.3 Example Code Execution Result.

------------------data frame all----------------------
      Name   Hire Date  Salary
0    jerry  2010-01-01   16000
1      tom  2011-08-19    6000
2    kevin  2009-02-08   13000
3  richard  2012-03-19    5000
4   jackie  2015-06-08   28000
5   steven  2008-02-01   36000
6     jack  2006-09-19    8000
7     gary  2018-01-16   19000
8     john  2017-10-01   16600

------------------data frame index----------------------
RangeIndex(start=0, stop=9, step=1)

------------------set Name column as data frame index----------------------
Index(['jerry', 'tom', 'kevin', 'richard', 'jackie', 'steven', 'jack', 'gary',
       'john'],
      dtype='object', name='Name')

------------------data frame columns----------------------
Index(['Hire Date', 'Salary'], dtype='object')

------------------data frame values----------------------
[['2010-01-01' 16000]
['2011-08-19' 6000]
['2009-02-08' 13000]
['2012-03-19' 5000]
['2015-06-08' 28000]
['2008-02-01' 36000]
['2006-09-19' 8000]
['2018-01-16' 19000]
['2017-10-01' 16600]]

------------------data frame hire date series----------------------
Name
jerry      2010-01-01
tom        2011-08-19
kevin      2009-02-08
richard    2012-03-19
jackie     2015-06-08
steven     2008-02-01
jack       2006-09-19
gary       2018-01-16
john       2017-10-01
Name: Hire Date, dtype: object

------------------select multiple columns from data frame----------------------
         Salary   Hire Date
Name                       
jerry     16000  2010-01-01
tom        6000  2011-08-19
kevin     13000  2009-02-08
richard    5000  2012-03-19
jackie    28000  2015-06-08
steven    36000  2008-02-01
jack       8000  2006-09-19
gary      19000  2018-01-16
john      16600  2017-10-01

--------------Sort data frame by string column---------------
          Hire Date  Salary
Name                       
gary     2018-01-16   19000
jack     2006-09-19    8000
jackie   2015-06-08   28000
jerry    2010-01-01   16000
john     2017-10-01   16600
kevin    2009-02-08   13000
richard  2012-03-19    5000
steven   2008-02-01   36000
tom      2011-08-19    6000

--------------Sort data frame by datetime column---------------
          Hire Date  Salary
Name                       
jack     2006-09-19    8000
steven   2008-02-01   36000
kevin    2009-02-08   13000
jerry    2010-01-01   16000
tom      2011-08-19    6000
richard  2012-03-19    5000
jackie   2015-06-08   28000
john     2017-10-01   16600
gary     2018-01-16   19000

--------------Sort data frame by number column desc---------------
          Hire Date  Salary
Name                       
steven   2008-02-01   36000
jackie   2015-06-08   28000
gary     2018-01-16   19000
john     2017-10-01   16600
jerry    2010-01-01   16000
kevin    2009-02-08   13000
jack     2006-09-19    8000
tom      2011-08-19    6000
richard  2012-03-19    5000

-------------- Employee info whose salary between 10000 and 20000---------------
        Hire Date  Salary
Name                     
kevin  2009-02-08   13000
jerry  2010-01-01   16000
john   2017-10-01   16600
gary   2018-01-16   19000

-------------- Employee info whose Hire Date between 2010/01/01 and 2017/01/01---------------
          Hire Date  Salary
Name                       
tom      2011-08-19    6000
richard  2012-03-19    5000
jackie   2015-06-08   28000

-------------- Employee info whose Name value between jerry and kevin---------------
        Hire Date  Salary
Name                     
jerry  2010-01-01   16000
john   2017-10-01   16600
kevin  2009-02-08   13000

./employee_info_new.csv has been created.

./employee_info_new.xlsx has been created.

7. How To Convert Multiple CSV Files In A Folder To Excel File.

The comment-95168 wants to convert some CSV files in a directory to Excel files automatically. Below is the example code which can implement this function.

import pandas

import os

import glob

'''
This function will convert all the CSV files in the folder_path to Excel files.
'''
def convert_csv_to_excel_in_folder(folder_path):
     
    # Loop all the CSV files under the path.
    for csv_file in glob.glob(os.path.join(folder_path, '*.csv')):

        # If the CSV file exist.
        if(os.path.exists(csv_file)):

            # Get the target excel file name and path.
            excel_file_path = csv_file.replace(".csv", ".xlsx")

            # Read the CSV file by python pandas module read_csv method. 
            data_frame = pandas.read_csv(csv_file)

            # Create an excel writer object.
            excel_writer = pandas.ExcelWriter(excel_file_path, engine='xlsxwriter')

            # Add a work sheet in the target excel file.
            data_frame.to_excel(excel_writer, 'sheet_1')

            # Save the target excel file.
            excel_writer.save()

            print(excel_file_path + ' has been created.')

        else:
            print(csv_file + " do not exist.")
0 0 votes
Article Rating
Subscribe
Notify of
guest

This site uses Akismet to reduce spam. Learn how your comment data is processed.

3 Comments
Newest
Oldest Most Voted
Inline Feedbacks
View all comments
soto
soto

I have a lot of .csv files saved in a directory. I want to convert all the .csv files in the folder to excel files with .xls or .xlsx extension programmatically. How can I do that? Thanks a lot.

Piken
Piken

#Jerry Zhao#, Thank you very much, you really helped me with this tutorial very clear and explicit. I just saved thousands of hair on my head!:)

3
0
Would love your thoughts, please comment.x
()
x