Python Pandas Read/Write CSV File And Convert To Excel File Example

Pandas is a third-party python module that can manipulate different format data files, such as CSV, JSON, Excel, Clipboard, HTML format, etc. This example will tell you how to use Pandas to read/write CSV files, and how to save the pandas.DataFrame object to an excel file.

1. How To Use Pandas In Python Application.

1.1 Install Python Pandas Module.

  1. First, you should make sure the python pandas module has been installed using the pip show pandas command in a terminal. If it shows can not find the pandas module in the terminal, you need to run the pip install pandas command to install it.
    $ pip show pandas
    WARNING: Package(s) not found: pandas
    
    $ pip install pandas
    Collecting pandas
      Downloading pandas-1.2.3-cp37-cp37m-macosx_10_9_x86_64.whl (10.4 MB)
         |████████████████████████████████| 10.4 MB 135 kB/s 
    Collecting pytz>=2017.3
      Downloading pytz-2021.1-py2.py3-none-any.whl (510 kB)
         |████████████████████████████████| 510 kB 295 kB/s 
    Requirement already satisfied: numpy>=1.16.5 in /Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages (from pandas) (1.20.1)
    Requirement already satisfied: python-dateutil>=2.7.3 in /Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages (from pandas) (2.8.1)
    Requirement already satisfied: six>=1.5 in /Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages (from python-dateutil>=2.7.3->pandas) (1.15.0)
    Installing collected packages: pytz, pandas
    Successfully installed pandas-1.2.3 pytz-2021.1
  2. Because this example will save data to an excel file with the python pandas module, so it should install the python XlsxWriter module also. Run the command pip show XlsxWriter to see whether the python XlsxWriter module has been installed or not, if not you should run the pip install XlsxWriter to install it.
    $ pip show XlsxWriter
    WARNING: Package(s) not found: XlsxWriter
    
    $ pip install XlsxWriter
    Collecting XlsxWriter
      Downloading XlsxWriter-1.3.7-py2.py3-none-any.whl (144 kB)
         |████████████████████████████████| 144 kB 852 kB/s 
    Installing collected packages: XlsxWriter
    Successfully installed XlsxWriter-1.3.7

1. 2 Import Python Pandas Module In Python Source File.

  1. This is very simple, just add the import pandas command at the beginning of the python source file to import it, then you can use it’s various methods.

2. Read CSV File Use Pandas.

  1. To read a CSV file using python pandas is very easy, you just need to invoke the pandas module’s read_csv method with the CSV file path. The returned object is a pandas.DataFrame object. It represents the whole data of the CSV file, you can use its various method to manipulate the data such as order, query, change index, columns, etc.
    data_frame = pandas.read_csv(csv_file)
  2. You can pass an encoding parameter to the read_csv() method to specify the CSV file text character encoding.
    data_frame = pandas.read_csv(csv_file, encoding='gbk')
  3. Now you can call the returned DataFrame object’s head(n) method to get the first n rows of the text in the CSV file.
    data_frame.head(n)

3. Pandas Write Data To CSV File.

  1. After you edit the data in the pandas.DataFrame object, you can call its to_csv method to save the new data into a CSV file.
    data_frame.to_csv(csv_file_path)

4. Pandas Write Data To Excel File.

  1. Create a file writer using pandas.ExcelWriter method.
    excel_writer = pandas.ExcelWriter(excel_file_path, engine='xlsxwriter')
  2. Call DataFrame object’s to_excel method to set the DataFrame data to a special excel file sheet.
    data_frame.to_excel(excel_writer, 'Employee Info')
  3. Call the writer’s save method to save the data to an excel file.
    excel_writer.save()

5. Python Pandas DataFrame Operation Methods.

5.1 Sort DataFrame Data By One Column.

  1. Please note the data column name is case sensitive.
    data_frame.sort_values(by=['Salary'], ascending=False)

5.2 Query DataFrame Data In A Range.

  1. The below python code will query a range of data in the DataFrame object.
    data_frame = data_frame.loc[(data_frame['Salary'] > 10000) & (data_frame['Salary'] < 20000)]

6. Python Pandas Read/Write CSV File And Save To Excel File Example.

  1. Below is the content of this example used source CSV file, the file name is employee_info.csv.
    Name,Hire Date,Salary
    jerry,2010-01-01,16000
    tom,2011-08-19,6000
    kevin,2009-02-08,13000
    richard,2012-03-19,5000
    jackie,2015-06-08,28000
    steven,2008-02-01,36000
    jack,2006-09-19,8000
    gary,2018-01-16,19000
    john,2017-10-01,16600
  2. The example python file name is CSVExcelConvertionExample.py, it contains the below functions.
    python-pandas-read-write-csv-file-and-save-to-excel-file-example-python-source-code
  3. read_csv_file_by_pandas(csv_file).
    import pandas
    
    import os
    
    import glob
    
    # Read csv file use pandas module.
    def read_csv_file_by_pandas(csv_file):
        data_frame = None
        if(os.path.exists(csv_file)):
            data_frame = pandas.read_csv(csv_file)
            
            print("------------------data frame all----------------------")
            print(data_frame)
            
            print("------------------data frame index----------------------")
            print(data_frame.index)
            
            data_frame = data_frame.set_index('Name')
            print("------------------set Name column as data frame index----------------------")
            print(data_frame.index)
            
            print("------------------data frame columns----------------------")
            print(data_frame.columns)
            
            print("------------------data frame values----------------------")
            print(data_frame.values)
            
            print("------------------data frame hire date series----------------------")
            print(data_frame['Hire Date'])
            
            print("------------------select multiple columns from data frame----------------------")
            print(data_frame[['Salary', 'Hire Date']])
        else:
            print(csv_file + " do not exist.")    
        return data_frame
    
    
    if __name__ == '__main__':    
        
        data_frame = read_csv_file_by_pandas("./employee_info.csv")
    
    ========================================================================================================
    Execution output:
    
    ------------------data frame all----------------------
          Name   Hire Date  Salary
    0    jerry  2010-01-01   16000
    1      tom  2011-08-19    6000
    2    kevin  2009-02-08   13000
    3  richard  2012-03-19    5000
    4   jackie  2015-06-08   28000
    5   steven  2008-02-01   36000
    6     jack  2006-09-19    8000
    7     gary  2018-01-16   19000
    8     john  2017-10-01   16600
    ------------------data frame index----------------------
    RangeIndex(start=0, stop=9, step=1)
    ------------------set Name column as data frame index----------------------
    Index(['jerry', 'tom', 'kevin', 'richard', 'jackie', 'steven', 'jack', 'gary',
           'john'],
          dtype='object', name='Name')
    ------------------data frame columns----------------------
    Index(['Hire Date', 'Salary'], dtype='object')
    ------------------data frame values----------------------
    [['2010-01-01' 16000]
     ['2011-08-19' 6000]
     ['2009-02-08' 13000]
     ['2012-03-19' 5000]
     ['2015-06-08' 28000]
     ['2008-02-01' 36000]
     ['2006-09-19' 8000]
     ['2018-01-16' 19000]
     ['2017-10-01' 16600]]
    ------------------data frame hire date series----------------------
    Name
    jerry      2010-01-01
    tom        2011-08-19
    kevin      2009-02-08
    richard    2012-03-19
    jackie     2015-06-08
    steven     2008-02-01
    jack       2006-09-19
    gary       2018-01-16
    john       2017-10-01
    Name: Hire Date, dtype: object
    ------------------select multiple columns from data frame----------------------
             Salary   Hire Date
    Name                       
    jerry     16000  2010-01-01
    tom        6000  2011-08-19
    kevin     13000  2009-02-08
    richard    5000  2012-03-19
    jackie    28000  2015-06-08
    steven    36000  2008-02-01
    jack       8000  2006-09-19
    gary      19000  2018-01-16
    john      16600  2017-10-01
    
  4. write_to_csv_file_by_pandas(csv_file_path, data_frame).
    import pandas
    
    import os
    
    import glob
    
    # Read csv file use pandas module.
    def read_csv_file_by_pandas(csv_file):
        data_frame = None
        if(os.path.exists(csv_file)):
            data_frame = pandas.read_csv(csv_file)
            
            print("------------------data frame all----------------------")
            print(data_frame)
            
            print("------------------data frame index----------------------")
            print(data_frame.index)
            
            data_frame = data_frame.set_index('Name')
            print("------------------set Name column as data frame index----------------------")
            print(data_frame.index)
            
            print("------------------data frame columns----------------------")
            print(data_frame.columns)
            
            print("------------------data frame values----------------------")
            print(data_frame.values)
            
            print("------------------data frame hire date series----------------------")
            print(data_frame['Hire Date'])
            
            print("------------------select multiple columns from data frame----------------------")
            print(data_frame[['Salary', 'Hire Date']])
        else:
            print(csv_file + " do not exist.")    
        return data_frame
    
    # Write pandas.DataFrame object to a csv file.
    def write_to_csv_file_by_pandas(csv_file_path, data_frame):
        data_frame.to_csv(csv_file_path)
        print(csv_file_path + ' has been created.')
    
    
    if __name__ == '__main__':
      
        data_frame = read_csv_file_by_pandas("./employee_info.csv")
        
        write_to_csv_file_by_pandas("./employee_info_new.csv", data_frame)
    
    ================================================================================================================
    Execution output:
    
    ./employee_info_new.csv has been created.
    
  5. write_to_excel_file_by_pandas(excel_file_path, data_frame).
    import pandas
    
    import os
    
    import glob
    
    # Read csv file use pandas module.
    def read_csv_file_by_pandas(csv_file):
        data_frame = None
        if(os.path.exists(csv_file)):
            data_frame = pandas.read_csv(csv_file)
            
            print("------------------data frame all----------------------")
            print(data_frame)
            
            print("------------------data frame index----------------------")
            print(data_frame.index)
            
            data_frame = data_frame.set_index('Name')
            print("------------------set Name column as data frame index----------------------")
            print(data_frame.index)
            
            print("------------------data frame columns----------------------")
            print(data_frame.columns)
            
            print("------------------data frame values----------------------")
            print(data_frame.values)
            
            print("------------------data frame hire date series----------------------")
            print(data_frame['Hire Date'])
            
            print("------------------select multiple columns from data frame----------------------")
            print(data_frame[['Salary', 'Hire Date']])
        else:
            print(csv_file + " do not exist.")    
        return data_frame
    
    
    # Write pandas.DataFrame object to an excel file.
    def write_to_excel_file_by_pandas(excel_file_path, data_frame):
        excel_writer = pandas.ExcelWriter(excel_file_path, engine='xlsxwriter')
        data_frame.to_excel(excel_writer, 'Employee Info')
        excel_writer.save()
        print(excel_file_path + ' has been created.')
    
    if __name__ == '__main__':
        
        data_frame = read_csv_file_by_pandas("./employee_info.csv")
        
        write_to_excel_file_by_pandas("./employee_info_new.xlsx", data_frame)
    
    ==========================================================================================
    Execution output:
    
    ./employee_info_new.xlsx has been created.
  6. sort_data_frame_by_string_column(data_frame).
    import pandas
    
    import os
    
    import glob
    
    # Read csv file use pandas module.
    def read_csv_file_by_pandas(csv_file):
        data_frame = None
        if(os.path.exists(csv_file)):
            data_frame = pandas.read_csv(csv_file)
            
            print("------------------data frame all----------------------")
            print(data_frame)
            
            print("------------------data frame index----------------------")
            print(data_frame.index)
            
            data_frame = data_frame.set_index('Name')
            print("------------------set Name column as data frame index----------------------")
            print(data_frame.index)
            
            print("------------------data frame columns----------------------")
            print(data_frame.columns)
            
            print("------------------data frame values----------------------")
            print(data_frame.values)
            
            print("------------------data frame hire date series----------------------")
            print(data_frame['Hire Date'])
            
            print("------------------select multiple columns from data frame----------------------")
            print(data_frame[['Salary', 'Hire Date']])
        else:
            print(csv_file + " do not exist.")    
        return data_frame
    
    
    # Sort the data in DataFrame object by name that data type is string.
    def sort_data_frame_by_string_column(data_frame):
        data_frame = data_frame.sort_values(by=['Name'])
        print("--------------Sort data format by string column---------------")
        print(data_frame)
    
    
    if __name__ == '__main__':
    
        data_frame = read_csv_file_by_pandas("./employee_info.csv")
        
        sort_data_frame_by_string_column(data_frame)
    
    ====================================================================================================
    Execution output:
    
    --------------Sort data format by string column---------------
              Hire Date  Salary
    Name                       
    gary     2018-01-16   19000
    jack     2006-09-19    8000
    jackie   2015-06-08   28000
    jerry    2010-01-01   16000
    john     2017-10-01   16600
    kevin    2009-02-08   13000
    richard  2012-03-19    5000
    steven   2008-02-01   36000
    tom      2011-08-19    6000
  7. sort_data_frame_by_datetime_column(data_frame).
    import pandas
    
    import os
    
    import glob
    
    # Read csv file use pandas module.
    def read_csv_file_by_pandas(csv_file):
        data_frame = None
        if(os.path.exists(csv_file)):
            data_frame = pandas.read_csv(csv_file)
            
            print("------------------data frame all----------------------")
            print(data_frame)
            
            print("------------------data frame index----------------------")
            print(data_frame.index)
            
            data_frame = data_frame.set_index('Name')
            print("------------------set Name column as data frame index----------------------")
            print(data_frame.index)
            
            print("------------------data frame columns----------------------")
            print(data_frame.columns)
            
            print("------------------data frame values----------------------")
            print(data_frame.values)
            
            print("------------------data frame hire date series----------------------")
            print(data_frame['Hire Date'])
            
            print("------------------select multiple columns from data frame----------------------")
            print(data_frame[['Salary', 'Hire Date']])
        else:
            print(csv_file + " do not exist.")    
        return data_frame
    
    
    # Sort DataFrame data by Hire Date that data type is datetime. 
    def sort_data_frame_by_datetime_column(data_frame):
        data_frame = data_frame.sort_values(by=['Hire Date'])
        print("--------------Sort data format by date column---------------")
        print(data_frame)
    
    
    if __name__ == '__main__':
        
        data_frame = read_csv_file_by_pandas("./employee_info.csv")
       
        sort_data_frame_by_datetime_column(data_frame)
    ===========================================================================================
    
    Execution output:
    
    --------------Sort data format by date column---------------
              Hire Date  Salary
    Name                       
    jack     2006-09-19    8000
    steven   2008-02-01   36000
    kevin    2009-02-08   13000
    jerry    2010-01-01   16000
    tom      2011-08-19    6000
    richard  2012-03-19    5000
    jackie   2015-06-08   28000
    john     2017-10-01   16600
    gary     2018-01-16   19000
    
    
  8. sort_data_frame_by_number_column(data_frame).
    import pandas
    
    import os
    
    import glob
    
    # Read csv file use pandas module.
    def read_csv_file_by_pandas(csv_file):
        data_frame = None
        if(os.path.exists(csv_file)):
            data_frame = pandas.read_csv(csv_file)
            
            print("------------------data frame all----------------------")
            print(data_frame)
            
            print("------------------data frame index----------------------")
            print(data_frame.index)
            
            data_frame = data_frame.set_index('Name')
            print("------------------set Name column as data frame index----------------------")
            print(data_frame.index)
            
            print("------------------data frame columns----------------------")
            print(data_frame.columns)
            
            print("------------------data frame values----------------------")
            print(data_frame.values)
            
            print("------------------data frame hire date series----------------------")
            print(data_frame['Hire Date'])
            
            print("------------------select multiple columns from data frame----------------------")
            print(data_frame[['Salary', 'Hire Date']])
        else:
            print(csv_file + " do not exist.")    
        return data_frame
    
        
    # Sort DataFrame data by Salary that data type is number.    
    def sort_data_frame_by_number_column(data_frame):
        data_frame = data_frame.sort_values(by=['Salary'], ascending=False)
        print("--------------Sort data format by number column desc---------------")
        print(data_frame) 
    
    
    if __name__ == '__main__':
        data_frame = read_csv_file_by_pandas("./employee_info.csv")
    
        sort_data_frame_by_number_column(data_frame)
    
    ================================================================================
    
    Execution output:
    
    --------------Sort data format by number column desc---------------
              Hire Date  Salary
    Name                       
    steven   2008-02-01   36000
    jackie   2015-06-08   28000
    gary     2018-01-16   19000
    john     2017-10-01   16600
    jerry    2010-01-01   16000
    kevin    2009-02-08   13000
    jack     2006-09-19    8000
    tom      2011-08-19    6000
    richard  2012-03-19    5000
  9. get_data_in_salary_range(data_frame).
    import pandas
    
    import os
    
    import glob
    
    # Read csv file use pandas module.
    def read_csv_file_by_pandas(csv_file):
        data_frame = None
        if(os.path.exists(csv_file)):
            data_frame = pandas.read_csv(csv_file)
            
            print("------------------data frame all----------------------")
            print(data_frame)
            
            print("------------------data frame index----------------------")
            print(data_frame.index)
            
            data_frame = data_frame.set_index('Name')
            print("------------------set Name column as data frame index----------------------")
            print(data_frame.index)
            
            print("------------------data frame columns----------------------")
            print(data_frame.columns)
            
            print("------------------data frame values----------------------")
            print(data_frame.values)
            
            print("------------------data frame hire date series----------------------")
            print(data_frame['Hire Date'])
            
            print("------------------select multiple columns from data frame----------------------")
            print(data_frame[['Salary', 'Hire Date']])
        else:
            print(csv_file + " do not exist.")    
        return data_frame
    
    
    
    # Get DataFrame data list in salary range.    
    def get_data_in_salary_range(data_frame):
        data_frame = data_frame.loc[(data_frame['Salary'] > 10000) & (data_frame['Salary'] < 20000)]
        data_frame = data_frame.sort_values(by=['Salary'])
        
        print("-------------- Employee info whose salary between 10000 and 20000---------------")
        print(data_frame) 
    
    
    if __name__ == '__main__':
        
        data_frame = read_csv_file_by_pandas("./employee_info.csv")
        
        get_data_in_salary_range(data_frame)
    
    ==============================================================================================================
    
    Execution output:
    
    -------------- Employee info whose salary between 10000 and 20000---------------
            Hire Date  Salary
    Name                     
    kevin  2009-02-08   13000
    jerry  2010-01-01   16000
    john   2017-10-01   16600
    gary   2018-01-16   19000
  10. get_data_in_hire_date_range(data_frame).
    import pandas
    
    import os
    
    import glob
    
    # Read csv file use pandas module.
    def read_csv_file_by_pandas(csv_file):
        data_frame = None
        if(os.path.exists(csv_file)):
            data_frame = pandas.read_csv(csv_file)
            
            print("------------------data frame all----------------------")
            print(data_frame)
            
            print("------------------data frame index----------------------")
            print(data_frame.index)
            
            data_frame = data_frame.set_index('Name')
            print("------------------set Name column as data frame index----------------------")
            print(data_frame.index)
            
            print("------------------data frame columns----------------------")
            print(data_frame.columns)
            
            print("------------------data frame values----------------------")
            print(data_frame.values)
            
            print("------------------data frame hire date series----------------------")
            print(data_frame['Hire Date'])
            
            print("------------------select multiple columns from data frame----------------------")
            print(data_frame[['Salary', 'Hire Date']])
        else:
            print(csv_file + " do not exist.")    
        return data_frame
    
     
    # Get DataFrame data list in hire date range.       
    def get_data_in_hire_date_range(data_frame):
        min_hire_date = '2010-01-01'
        max_hire_date = '2017-01-01'
        data_frame = data_frame.loc[(data_frame['Hire Date'] > min_hire_date) & (data_frame['Hire Date'] < max_hire_date)]
        data_frame = data_frame.sort_values(by=['Hire Date'])
        print("-------------- Employee info whose Hire Date between 2010/01/01 and 2017/01/01---------------")
        print(data_frame)
    
    
    if __name__ == '__main__':
        
        data_frame = read_csv_file_by_pandas("./employee_info.csv")
        
        get_data_in_hire_date_range(data_frame)
    
    ====================================================================================================
    
    Execution output:
    
    -------------- Employee info whose Hire Date between 2010/01/01 and 2017/01/01---------------
              Hire Date  Salary
    Name                       
    tom      2011-08-19    6000
    richard  2012-03-19    5000
    jackie   2015-06-08   28000
  11. get_data_in_name_range(data_frame).
    import pandas
    
    import os
    
    import glob
    
    # Read csv file use pandas module.
    def read_csv_file_by_pandas(csv_file):
        data_frame = None
        if(os.path.exists(csv_file)):
            data_frame = pandas.read_csv(csv_file)
            
            print("------------------data frame all----------------------")
            print(data_frame)
            
            print("------------------data frame index----------------------")
            print(data_frame.index)
            
            data_frame = data_frame.set_index('Name')
            print("------------------set Name column as data frame index----------------------")
            print(data_frame.index)
            
            print("------------------data frame columns----------------------")
            print(data_frame.columns)
            
            print("------------------data frame values----------------------")
            print(data_frame.values)
            
            print("------------------data frame hire date series----------------------")
            print(data_frame['Hire Date'])
            
            print("------------------select multiple columns from data frame----------------------")
            print(data_frame[['Salary', 'Hire Date']])
        else:
            print(csv_file + " do not exist.")    
        return data_frame
    
      
    # Get DataFrame data list in name range.       
    def get_data_in_name_range(data_frame):
        start_name = 'jerry'
        end_name = 'kevin'
        # First sort the data in the data_frame by Name column.
        data_frame = data_frame.sort_values(by=['Name'])
        # Because the Name column is the index column, so use the value in loc directly. 
        data_frame = data_frame.loc[start_name:end_name]
        print("-------------- Employee info whose Name first character between jerry and kevin---------------")
        print(data_frame)
    
    
    if __name__ == '__main__':
        
        data_frame = read_csv_file_by_pandas("./employee_info.csv")
        
        get_data_in_name_range(data_frame)
    
    ====================================================================================================
    
    Execution output:
    
    -------------- Employee info whose Name first character between jerry and kevin---------------
            Hire Date  Salary
    Name                     
    jerry  2010-01-01   16000
    john   2017-10-01   16600
    kevin  2009-02-08   13000
  12. convert_csv_to_excel_in_folder(folder_path): You can see the below section 7 to see this function detail python source code.

7. How To Convert Multiple CSV Files In A Folder To Excel File.

  1. The comment-95168 wants to convert some CSV files in a directory to Excel files automatically. Below is the example code which can implement this function.
    import pandas
    
    import os
    
    import glob
    
    '''
    This function will convert all the CSV files in the folder_path to Excel files.
    '''
    def convert_csv_to_excel_in_folder(folder_path):
         
        # Loop all the CSV files under the path.
        for csv_file in glob.glob(os.path.join(folder_path, '*.csv')):
    
            # If the CSV file exist.
            if(os.path.exists(csv_file)):
    
                # Get the target excel file name and path.
                excel_file_path = csv_file.replace(".csv", ".xlsx")
    
                # Read the CSV file by python pandas module read_csv method. 
                data_frame = pandas.read_csv(csv_file)
    
                # Create an excel writer object.
                excel_writer = pandas.ExcelWriter(excel_file_path, engine='xlsxwriter')
    
                # Add a work sheet in the target excel file.
                data_frame.to_excel(excel_writer, 'sheet_1')
    
                # Save the target excel file.
                excel_writer.save()
    
                print(excel_file_path + ' has been created.')
    
            else:
                print(csv_file + " do not exist.")
    
    if __name__ == '__main__':
        
        convert_csv_to_excel_in_folder(".")
    
    ====================================================================================================
    
    Execution output:
    
    ./employee_info_new.xlsx has been created.
    ./employee_info.xlsx has been created.
    
    
0 0 votes
Article Rating
Subscribe
Notify of
guest

This site uses Akismet to reduce spam. Learn how your comment data is processed.

3 Comments
Newest
Oldest Most Voted
Inline Feedbacks
View all comments
soto
soto

I have a lot of .csv files saved in a directory. I want to convert all the .csv files in the folder to excel files with .xls or .xlsx extension programmatically. How can I do that? Thanks a lot.

Piken
Piken

#Jerry Zhao#, Thank you very much, you really helped me with this tutorial very clear and explicit. I just saved thousands of hair on my head!:)

Clicky
3
0
Would love your thoughts, please comment.x
()
x