Scrapy shell is useful to browse a web URL and extract the web element data in an interactive console, but when I use the Scrapy shell to extract a web page item with XPath recently, I meet an error message -bash: syntax error near unexpected token. This article will tell you how to fix it.
1. The Character Of Error Response.xpath Syntax Error Near Unexpected Token.
- Below is the detail error message. We can see that when we run the command
scrapy shell https://www.indeed.com/jobs?q=python+developer&l=&ts=1610810564897&rq=1&rsIdx=6
, it will not display the prompt character (>>>
) as normal.$ scrapy shell https://www.indeed.com/jobs?q=python+developer&l=&ts=1610810564897&rq=1&rsIdx=6 ...... [s] request <GET https://www.indeed.com/jobs?q=python+developer> [s] response <200 https://www.indeed.com/q-python-developer-jobs.html> [s] settings <scrapy.settings.Settings object at 0x7f82254c23c8> [s] spider <DefaultSpider 'default' at 0x7f82267634a8> [s] Useful shortcuts: [s] fetch(url[, redirect=True]) Fetch URL and update local objects (by default, redirects are followed) [s] fetch(req) Fetch a scrapy.Request and update local objects [s] shelp() Shell help (print this help) [s] view(response) View response in a browser
- And when we run the command
response.xpath('/html/head/title/text()').extract()
in above Scrapy shell, it will prompt the error message -bash: syntax error near unexpected token `’/html/head/title/text()”. And then the Scrapy shell will exit.[s] view(response) View response in a browser response.xpath('/html/head/title/text()').extract() -bash: syntax error near unexpected token `'/html/head/title/text()'' [7]+ Stopped scrapy shell https://www.indeed.com/jobs?q=python+developer
2. How To Fix -Bash: Syntax Error Near Unexpected Token.
- From the above Scrapy shell response message, we can see this log data
response <200 https://www.indeed.com/q-python-developer-jobs.html>
- I think the original url https://www.indeed.com/jobs?q=python+developer&l=&ts=1610810564897&rq=1&rsIdx=6 has been changed to https://www.indeed.com/q-python-developer-jobs.html.
- So I run the command
scrapy shell https://www.indeed.com/q-python-developer-jobs.html
again, and it will display the prompt character (>>>)$ scrapy shell https://www.indeed.com/q-python-developer-jobs.html ...... [s] scrapy scrapy module (contains scrapy.Request, scrapy.Selector, etc) [s] crawler <scrapy.crawler.Crawler object at 0x7f81733f1160> [s] item {} [s] request <GET https://www.indeed.com/q-python-developer-jobs.html> [s] response <200 https://www.indeed.com/q-python-developer-jobs.html> [s] settings <scrapy.settings.Settings object at 0x7f81733edb70> [s] spider <DefaultSpider 'default' at 0x7f81737754e0> [s] Useful shortcuts: [s] fetch(url[, redirect=True]) Fetch URL and update local objects (by default, redirects are followed) [s] fetch(req) Fetch a scrapy.Request and update local objects [s] shelp() Shell help (print this help) [s] view(response) View response in a browser >>>
- Now I run the command
response.xpath('/html/head/title/text()').extract()
in Scrapy shell console, it will return the correct web element data.>>> response.xpath('/html/head/title/text()').extract() ['Python Developer Jobs, Employment | Indeed.com'] >>>
- So the error is because of the web page URL, you should take it carefully.