Python Save Html File From Url Example

This article will tell you how to use the Python requests module to retrieve a web page content by page URL  and then save the web page content to a local file step by step.

1. Steps To Use Python Requests Module To Get A Web Page Content By URL.

  1. Open a terminal and run the command pip show requests to make sure the Python requests module has been installed.
    $ pip show requests
    Name: requests
    Version: 2.22.0
    Summary: Python HTTP for Humans.
    Home-page: http://python-requests.org
    Author: Kenneth Reitz
    Author-email: [email protected]
    License: Apache 2.0
    Location: /Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages
    Requires: chardet, idna, urllib3, certifi
    Required-by: 
    
  2. If your Python environment does not contain the Python requests module, you can run the command pip install requests to install it.
  3. Now run the command python in the terminal to go to the python interactive console.
    $ python
    Python 3.7.3 (v3.7.3:ef4ec6ed12, Mar 25 2019, 16:52:21) 
    [Clang 6.0 (clang-600.0.57)] on darwin
    Type "help", "copyright", "credits" or "license" for more information.
    >>> 
    
  4. Import the requests module.
    >>> import requests
  5. Define a variable to contain a web page URL.
    >>> web_page_url = "http://www.google.com"
  6. Add a User-Agent header to simulate a real web browser to send the requests. The User-Agent header should be saved in a dictionary object.
    >>> headers = {
    
            'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.150 Safari/537.36'
    
        }
  7. Make an HTTP get requests to the webserver by the requests module get method, and the get method returns a response object.
    >>> response = requests.get(url=web_page_url, headers=headers)
  8. Get the request web page text content by the response.text attribute.
    >>> page_content = response.text
  9. Print out the text to verify it is correct.
    >>> print(page_content)
  10. Write the web page content to a local file to save it.
    >>> with open('./google.html', 'w', encoding='utf8') as fp:
    ...         fp.write(page_content)
    ... 
    131502
    
    >>> print('Save web page content ' + web_page_url + ' successfully.')
    Save web page content http://www.google.com successfully.
  11. Read the local file content to verify it is the web page content.
    # Open the local file with read permission.
    >>> with open('./google.html', 'r', encoding='utf8') as fp:
    ...        line = fp.readline() # read one line text.
               
               # Only when the read-out text's length is 0 then quit the loop.
    ...        while len(line) > 0:
    ...            print(line)
                   
                   # read the next line.
    ...            line = fp.readline()

Leave a Comment

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.