文章詳情頁(yè)

Python Scrapy框架：通用爬蟲(chóng)之CrawlSpider用法簡(jiǎn)單示例

瀏覽：4日期：2022-07-30 13:12:21

本文實(shí)例講述了Python Scrapy框架：通用爬蟲(chóng)之CrawlSpider用法。分享給大家供大家參考，具體如下：

步驟01: 創(chuàng)建爬蟲(chóng)項(xiàng)目

scrapy startproject quotes

步驟02: 創(chuàng)建爬蟲(chóng)模版

scrapy genspider -t quotes quotes.toscrape.com

步驟03: 配置爬蟲(chóng)文件quotes.py

import scrapyfrom scrapy.spiders import CrawlSpider, Rulefrom scrapy.linkextractors import LinkExtractorclass Quotes(CrawlSpider): # 爬蟲(chóng)名稱 name = 'get_quotes' allow_domain = [’quotes.toscrape.com’] start_urls = [’http://quotes.toscrape.com/’]# 設(shè)定規(guī)則 rules = ( # 對(duì)于quotes內(nèi)容頁(yè)URL，調(diào)用parse_quotes處理， # 并以此規(guī)則跟進(jìn)獲取的鏈接 Rule(LinkExtractor(allow=r’/page/d+’), callback=’parse_quotes’, follow=True), # 對(duì)于author內(nèi)容頁(yè)URL，調(diào)用parse_author處理，提取數(shù)據(jù) Rule(LinkExtractor(allow=r’/author/w+’), callback=’parse_author’) )# 提取內(nèi)容頁(yè)數(shù)據(jù)方法 def parse_quotes(self, response): for quote in response.css('.quote'): yield {’content’: quote.css(’.text::text’).extract_first(), ’author’: quote.css(’.author::text’).extract_first(), ’tags’: quote.css(’.tag::text’).extract() } # 獲取作者數(shù)據(jù)方法 def parse_author(self, response): name = response.css(’.author-title::text’).extract_first() author_born_date = response.css(’.author-born-date::text’).extract_first() author_bron_location = response.css(’.author-born-location::text’).extract_first() author_description = response.css(’.author-description::text’).extract_first() return ({’name’: name, ’author_bron_date’: author_born_date, ’author_bron_location’: author_bron_location, ’author_description’: author_description })

步驟04: 運(yùn)行爬蟲(chóng)

scrapy crawl quotes

更多相關(guān)內(nèi)容可查看本站專(zhuān)題：《Python Socket編程技巧總結(jié)》、《Python正則表達(dá)式用法總結(jié)》、《Python數(shù)據(jù)結(jié)構(gòu)與算法教程》、《Python函數(shù)使用技巧總結(jié)》、《Python字符串操作技巧匯總》、《Python入門(mén)與進(jìn)階經(jīng)典教程》及《Python文件與目錄操作技巧匯總》

希望本文所述對(duì)大家基于Scrapy框架的Python程序設(shè)計(jì)有所幫助。

Python 編程

上一條：Python批量將圖片灰度化的實(shí)現(xiàn)代碼下一條：Python實(shí)現(xiàn)AI換臉功能

相關(guān)文章：

1. 使用Hangfire+.NET 6實(shí)現(xiàn)定時(shí)任務(wù)管理(推薦)2. Xml簡(jiǎn)介_(kāi)動(dòng)力節(jié)點(diǎn)Java學(xué)院整理3. 如何在jsp界面中插入圖片4. jsp實(shí)現(xiàn)登錄驗(yàn)證的過(guò)濾器5. phpstudy apache開(kāi)啟ssi使用詳解6. JSP之表單提交get和post的區(qū)別詳解及實(shí)例7. jsp文件下載功能實(shí)現(xiàn)代碼8. 詳解瀏覽器的緩存機(jī)制9. vue3+ts+elementPLus實(shí)現(xiàn)v-preview指令10. xml中的空格之完全解說(shuō)

排行榜

					
					phpstudy apache開(kāi)啟ssi使用詳解
JavaScript判斷數(shù)據(jù)類(lèi)型有幾種方法及區(qū)別介紹
Java中將File轉(zhuǎn)化為MultipartFile的操作
python中HTMLParser模塊知識(shí)點(diǎn)總結(jié)
.Net加密神器Eazfuscator.NET?2023.2?最新版使用教程
PHP擴(kuò)展之圖像處理1——GD庫(kù)使用及相關(guān)函數(shù)
如何基于python3和Vue實(shí)現(xiàn)AES數(shù)據(jù)加密
PHP遠(yuǎn)程調(diào)用以及RPC框架
SSM框架整合之Spring+SpringMVC+MyBatis實(shí)踐步驟
詳解Python openpyxl庫(kù)的基本應(yīng)用
JSP之表單提交get和post的區(qū)別詳解及實(shí)例
				

电脑知识|欧美黑人一区二区三区|软件|欧美黑人一级爽快片淫片高清|系统|欧美黑人狂野猛交老妇|数据库|服务器|编程开发|网络运营|知识问答|技术教程文章 - 好吧啦网

Python Scrapy框架：通用爬蟲(chóng)之CrawlSpider用法簡(jiǎn)單示例