How To Query A Pandas DataFrame With Examples?

This article will tell you how to use pandas DataFrame object’s loc, iloc, and iat attribute to query a DataFrame object’s rows, columns, and elements at a special index(row_index, column_index) with examples.

1. Example Data.

  1. In this example, we will create a DataFrame object that contains the below data.
  2. Below is the python source code that generates the DataFrame data.
    import pandas as pd
    
    '''
    This function create a python pandas DataFrame object with a 2 dimension array.
    '''
    def create_dataframe_from_2_dimension_array():
        
        pd.set_option('display.unicode.east_asian_width', True)
        
        ''' Define a 2 dimension array, each element in the array's first dimension is a list. 
            
            It contains the position number, programming language and operating system '''
        data = [[1, 'python', 'Windows'], [5, 'java', 'Linux'],[8, 'c++', 'macOS']]
        
        # Define the column list, each element in the list is the column label.
        columns = ['Position', 'Programming Language', 'Operating System']
        
        name = ['USA', 'UK', 'CA']
        
        df = pd.DataFrame(data=data, index=name, columns=columns)
        
        print(df, '\r\n')
        
        # Return the python pandas DataFrame object.
        return df
    
    
    if __name__ == '__main__':
        
        create_dataframe_from_2_dimension_array()
  3. Below is the DataFrame object data. We can see that there are 3 columns ( Position, Programming Language, Operating System ). The row index name is USAUK, CA.
         Position Programming Language Operating System
    USA         1               python          Windows
    UK          5                 java            Linux
    CA          8                  c++            macOS

2. Pandas DataFrame loc, iloc, iat Attribute.

  1. loc(row_name_list, column_name_list): Get the DataFrame object’s data by row name label list, and column name label list.
  2. iloc(row_index_number_list, column_index_number_list): Get the DataFrame object’s data by row index number list, and column index number list.
  3. iat(element_row_index_number, element_column_index_number): Get the  DataFrame object located at the provided row index number and column index number ( not a number list).

3. PandasDataFrame loc, iloc, iat Attribute Examples.

  1. query_one_row().

    import pandas as pd
    
    def query_one_row():
        
        df = create_dataframe_from_2_dimension_array()
        
        # query the data row with index name 'USA'
        row_data_usa = df.loc['USA']
        
        print('row_data_usa = \r\n', row_data_usa, '\r\n')
        
        # query the data row with the row index number 2.
        row_data_number_2 = df.iloc[2]
        
        print('row_data_number_2 = \r\n', row_data_number_2, '\r\n')
    
    
    if __name__ == '__main__':
        
        query_one_row()
    
    ============================================================================
    Below is the execution output.
         Position Programming Language Operating System
    USA         1               python          Windows
    UK          5                 java            Linux
    CA          8                  c++            macOS 
    
    
    row_data_usa = 
    
     Position                      1
    Programming Language     python
    Operating System        Windows
    Name: USA, dtype: object 
    
    
    row_data_number_2 = 
    
     Position                    8
    Programming Language      c++
    Operating System        macOS
    Name: CA, dtype: object 
    
  2. query_multiple_rows().
    import pandas as pd
    
    def query_multiple_rows():
        
        df = create_dataframe_from_2_dimension_array()
        
        # query 2 rows with the provided row index name list.
        row_data_usa_uk = df.loc[['USA', 'UK']]
        
        print('row_data_usa_uk = \r\n', row_data_usa_uk, '\r\n')
        
         # query 2 rows with the provided row index number list.
        row_data_number_1_2 = df.iloc[[1, 2]]
        
        print('row_data_number_1_2 = \r\n', row_data_number_1_2, '\r\n')    
        
        # query 1 row with the provided row index number list slice.
        row_data_number_1 = df.iloc[1:2]
        
        print('row_data_number_1 = \r\n', row_data_number_1, '\r\n')    
    
    
    if __name__ == '__main__':
        
        query_multiple_rows()
    
    =====================================================================================
    Below is the execution output.
    
         Position Programming Language Operating System
    USA         1               python          Windows
    UK          5                 java            Linux
    CA          8                  c++            macOS 
    
    
    row_data_usa_uk = 
    
          Position Programming Language Operating System
    USA         1               python          Windows
    UK          5                 java            Linux 
    
    
    row_data_number_1_2 = 
    
         Position Programming Language Operating System
    UK         5                 java            Linux
    CA         8                  c++            macOS 
    
    
    row_data_number_1 = 
    
         Position Programming Language Operating System
    UK         5                 java            Linux
  3. query_one_column().
    import pandas as pd
    
    def query_one_column():
        
        df = create_dataframe_from_2_dimension_array()
        
        
        # set the first arguments to : to query all rows's 'Programming Language' column.
        column_data_pl = df.loc[:, ['Programming Language']]
        
        print('column_data_pl = \r\n', column_data_pl, '\r\n')
        
        
        # the first argument is the row index number list( : means all), the second argument is the column index number.
        column_data_os = df.iloc[:, [2]]
        
        print('column_data_os = \r\n', column_data_os, '\r\n')
    
    
    if __name__ == '__main__':
        
        query_one_column()
    
    =================================================================================
    Below is the execution output.
    
         Position Programming Language Operating System
    USA         1               python          Windows
    UK          5                 java            Linux
    CA          8                  c++            macOS 
    
    
    column_data_pl = 
    
         Programming Language
    USA               python
    UK                  java
    CA                   c++ 
    
    
    column_data_os = 
    
         Operating System
    USA          Windows
    UK             Linux
    CA             macOS 
    
    
  4. query_multiple_columns().
    import pandas as pd
    
    def query_multiple_columns():
        
        df = create_dataframe_from_2_dimension_array()
        
        # : means all rows, ['Programming Language', 'Operating System'] means the 2 columns.
        column_data_pl_os = df.loc[:, ['Programming Language', 'Operating System']]
        
        print('column_data_pl_os = \r\n', column_data_pl_os, '\r\n')
        
        # : means all rows, 1:2 means column index number 1 only not include column with index number 2.
        column_data_1 = df.iloc[:, 1:2]
        
        print('column_data_1 = \r\n', column_data_1, '\r\n') 
        
        
        column_data_pl_os = df.iloc[:,1:]
        
        print('column_data_pl_os = \r\n', column_data_pl_os, '\r\n')    
        
        column_data_pos_pl = df.iloc[:,[0, 1]]
        
        print('column_data_pos_pl = \r\n', column_data_pos_pl, '\r\n')   
    
    
    if __name__ == '__main__':
        
        query_multiple_columns()
    ========================================================================================
    Below is the above code execution result.
    
         Position Programming Language Operating System
    USA         1               python          Windows
    UK          5                 java            Linux
    CA          8                  c++            macOS 
    
    
    column_data_pl_os = 
    
         Programming Language Operating System
    USA               python          Windows
    UK                  java            Linux
    CA                   c++            macOS 
    
    
    column_data_1 = 
    
         Programming Language
    USA               python
    UK                  java
    CA                   c++ 
    
    
    column_data_pl_os = 
    
         Programming Language Operating System
    USA               python          Windows
    UK                  java            Linux
    CA                   c++            macOS 
    
    
    column_data_pos_pl = 
    
          Position Programming Language
    USA         1               python
    UK          5                 java
    CA          8                  c++ 
    
  5. query_by_row_and_column().
    import pandas as pd
    
    def query_by_row_and_column():
         
        df = create_dataframe_from_2_dimension_array()
        
        # query DataFrame with both rows and columns specified by row name list and column name list.
        rows_and_columns = df.loc[['USA','UK'], ['Programming Language', 'Operating System']]
        
        print('rows_and_columns =\r\n ', rows_and_columns, '\r\n')
        
        # query DataFrame with both rows and columns specified by row index number and column index number list.
        rows_and_columns = df.iloc[2, :2]
        
        print('rows_and_columns =\r\n ', rows_and_columns)
    
    if __name__ == '__main__':
        
        query_by_row_and_column()
    ===============================================================================================
    Below is the above source code execution result.
    
         Position Programming Language Operating System
    USA         1               python          Windows
    UK          5                 java            Linux
    CA          8                  c++            macOS 
    
    
    rows_and_columns =
    
          Programming Language Operating System
    USA               python          Windows
    UK                  java            Linux 
    
    
    rows_and_columns =
    
      Position                  8
    Programming Language    c++
    Name: CA, dtype: object
  6. query_at_index().
    import pandas as pd
    
    def query_at_index():
        
        df = create_dataframe_from_2_dimension_array()
        
        # query the DataFrame element at the specified row & column index number.
        column_data_at_index_1_1 = df.iat[1, 1]
        
        print('column_data_at_index = \r\n', column_data_at_index_1_1, '\r\n')
    
    if __name__ == '__main__':
        
        query_at_index()
    
    =====================================================================================
    Below is the above example source code execution result.
         Position Programming Language Operating System
    USA         1               python          Windows
    UK          5                 java            Linux
    CA          8                  c++            macOS 
    
    
    column_data_at_index = 
    
     java 
    
  7. query_bigger_than().
    import pandas as pd
    
    def query_bigger_than():
        
        df = create_dataframe_from_2_dimension_array()
        
        # query the data at provided rows and column, and then compare the column value to an integer. 
        print(df.loc[:, 'Position'] > 5)
    
    
    if __name__ == '__main__':
        
        query_bigger_than()
    ====================================================================================
    Above example source code execution result.
    
         Position Programming Language Operating System
    USA         1               python          Windows
    UK          5                 java            Linux
    CA          8                  c++            macOS 
    
    
    USA    False
    UK     False
    CA      True
    Name: Position, dtype: bool
    

Leave a Comment

Your email address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Clicky