{"id":21191,"date":"2023-06-10T18:05:21","date_gmt":"2023-06-10T09:05:21","guid":{"rendered":"http:\/\/www.code-magagine.com\/?p=21191"},"modified":"2023-06-10T21:07:05","modified_gmt":"2023-06-10T12:07:05","slug":"%e3%80%90python%e3%80%91scrapy%e3%81%aeimagepipeline%e3%81%ae%e4%bd%bf%e3%81%84%e6%96%b9","status":"publish","type":"post","link":"http:\/\/www.code-magagine.com\/?p=21191","title":{"rendered":"\u3010Python\u3011Scrapy\u306eImagePipeline\u306e\u4f7f\u3044\u65b9"},"content":{"rendered":"<h2>ImagePipeline\u3068\u306f\uff1f<\/h2>\n<p>Scrapy\u306b\u306fURL\u3092\u6e21\u3059\u3068\u753b\u50cf\u30d5\u30a1\u30a4\u30eb\u3092\u30c0\u30a6\u30f3\u30ed\u30fc\u30c9\u3057\u3066\u6240\u5b9a\u306e\u30d5\u30a9\u30eb\u30c0\u306b\u5165\u308c\u3066\u304f\u308c\u308b\u4fbf\u5229\u6a5f\u80fd\u304c\u3042\u308a\u307e\u3059\u3002\u305f\u3060\u3001\u4ee5\u4e0b\u306e\u7279\u5fb4\u306b\u306a\u3063\u3066\u3044\u308b\u306e\u3067\u9069\u5b9c\u30ab\u30b9\u30bf\u30de\u30a4\u30ba\u304c\u5fc5\u8981\u306b\u306a\u308a\u307e\u3059\u3002<\/p>\n<ul>\n<li>\u30c7\u30d5\u30a9\u30eb\u30c8\u3067\u30e9\u30f3\u30c0\u30e0\u306a\u30d5\u30a1\u30a4\u30eb\u540d<\/li>\n<li>\u4fdd\u5b58\u5834\u6240\u304c\u4e00\u5b9a<\/li>\n<\/ul>\n<h2>\u6ce8\u610f\u70b9<\/h2>\n<p>\u753b\u50cf\u30d5\u30a1\u30a4\u30eb\u3092\u30c0\u30a6\u30f3\u30ed\u30fc\u30c9\u3059\u308b\u306b\u306f\u5b8c\u5168\u306a\u5f62\u306e\u300c\u7d76\u5bfeURL\u300d\u304c\u5fc5\u8981\u306b\u306a\u308a\u307e\u3059\u3002\uff08\u76f8\u5bfeURL\u3060\u3068\u30c0\u30a6\u30f3\u30ed\u30fc\u30c9\u3067\u304d\u306a\u3044\u3067\u3059\u3002\uff09\u3001\u7d76\u5bfeURL\u304c\u753b\u50cf\u306b\u4f7f\u3063\u3066\u3042\u308b\u30b5\u30a4\u30c8\u306a\u3089\u7279\u306b\u610f\u8b58\u3057\u306a\u304f\u3066\u3082\u826f\u3044\u306e\u3067\u3059\u304c\u76f8\u5bfe\u30d1\u30b9\u304c\u6307\u5b9a\u3055\u308c\u3066\u3044\u308b\u5834\u5408\u306f\u5fc5\u305a\u4ee5\u4e0b\u306e\u52a0\u5de5\u3092\u3057\u305f\u4e0a\u3067ImagePipeline\u3067\u306f\u6271\u3063\u3066\u3042\u3052\u308b\u5fc5\u8981\u304c\u3042\u308a\u307e\u3059\u3002<\/p>\n<h3>\u6307\u5b9a\u65b9\u6cd51<\/h3>\n<pre class=\"lang:default decode:true\">f'https:\/\/xxx.com\/{\u76f8\u5bfe\u30c9\u30e1\u30a4\u30f3\u540d}'<\/pre>\n<h3>\u6307\u5b9a\u65b9\u6cd52<\/h3>\n<pre class=\"lang:default decode:true\">response.urljoin(\u76f8\u5bfeURL)<\/pre>\n<h2>scrapy\/projects\/\u30d7\u30ed\u30b8\u30a7\u30af\u30c8\u540d\/\u30d7\u30ed\u30b8\u30a7\u30af\u30c8\u540d\/items.py<\/h2>\n<p>\u307e\u305a\u306f\u3001Items.py\u306b\u5bfe\u3057\u3066\u753b\u50cf\u3092\u683c\u7d0d\u3059\u308b\u30d5\u30a3\u30fc\u30eb\u30c9\u3092\u5b9a\u7fa9\u3057\u307e\u3059\u3002<\/p>\n<pre class=\"lang:default decode:true\">import scrapy\r\nclass XXXItem(scrapy.Item):\r\n\u3000\u3000image_urls = scrapy.Field()<\/pre>\n<h2>spider\u306e\u30b3\u30fc\u30c9<\/h2>\n<pre class=\"lang:default decode:true\">def parse_item(self, response):\r\n        loader = ItemLoader(item=BookItem(), response = response)\r\n        loader.add_value('image_urls',response.urljoin(response.xpath('\u53d6\u5f97\u753b\u50cf\u306eXPath').get()))\r\n        yield loader.load_item()<\/pre>\n<p>response.urljoin\u3092\u4f7f\u3063\u3066\u7d76\u5bfeURL\u3092\u53d6\u5f97\u3057\u307e\u3059\u3002response.urljoin\u81ea\u4f53\u306bXpath\u3092\u6e21\u3059\u306e\u3067\u306f\u306a\u304f\u76f8\u5bfeURL\u3092\u6e21\u3057\u305f\u3044\u306e\u3067response.xpath\u306b\u3066\u76f8\u5bfeURL\u3092\u53d6\u5f97\u3057\u307e\u3059\u3002<\/p>\n<p>ItemLoader\u306fadd_value\u3067Xpath\u306a\u3069\u3092\u4f7f\u308f\u305a\u305d\u306e\u307e\u307eItem\u306b\u683c\u7d0d\u3057\u307e\u3059\u3002<\/p>\n<h2>settings.py\u306e\u8a2d\u5b9a<\/h2>\n<pre class=\"lang:default decode:true\">ITEM_PIPELINES = {\r\n   \"scrapy.pipelines.images.ImagesPipeline\": 400,\r\n}\r\n\r\nIMAGES_STORE = r'\u4efb\u610f\u306e\u30d1\u30b9\/projects\/\u30d7\u30ed\u30b8\u30a7\u30af\u30c8\u540d\/images'\r\nIMAGES_URLS_FIELD = 'image_urls'<\/pre>\n<h3>ITEM_PIPELINES<\/h3>\n<p>ImagePipeline\u306e\u512a\u5148\u5ea6\u306e\u8a2d\u5b9a\u3067\u3059\u3002\u901a\u5e38\u306e\u81ea\u4f5cpipeline\u3068\u9055\u3063\u3066scrapy\u672c\u6765\u304c\u6301\u3063\u3066\u3044\u308b\u6a5f\u80fd\u306a\u306e\u3067\u6307\u5b9a\u306fscrapy\u304b\u3089\u59cb\u307e\u308a\u307e\u3059\u3002<\/p>\n<h3>IMAGES_STORE<\/h3>\n<p>\u753b\u50cf\u306e\u4fdd\u5b58\u5148\u306b\u306a\u308a\u307e\u3059\u3002\u304a\u4f7f\u3044\u306ePC\u4e0a\u306e\u4efb\u610f\u306e\u7d76\u5bfe\u30d1\u30b9\u306a\u3069\u3067\u5927\u4e08\u592b\u3067\u3059\u3002\u30d1\u30b9\u306e\u524d\u306b\u306f\u300cr\u300d\u3092\u4ed8\u3051\u307e\u3059\u3002\u30d1\u30b9\u306e\u4e2d\u306b\u306f\u5186\u30de\u30fc\u30af\u3084\u30d0\u30c3\u30af\u30b9\u30e9\u30c3\u30b7\u30e5\u306a\u3069\u304c\u542b\u307e\u308c\u3066\u3044\u308b\u306e\u3067\u3053\u308c\u3092\u6b63\u3057\u304f\u8a8d\u8b58\u3067\u304d\u308b\u3088\u3046\u306b\u3059\u308b\u305f\u3081\u3067\u3059\u3002<\/p>\n<h3>IMAGES_URLS_FIELD<\/h3>\n<p>\u753b\u50cf\u30d5\u30a1\u30a4\u30eb\u306eURL\u3092\u683c\u7d0d\u3057\u305fItem\u306e\u30d5\u30a3\u30fc\u30eb\u30c9\u540d\u3092\u6307\u5b9a\u3057\u307e\u3059\u3002\u3053\u306e\u8a18\u4e8b\u3067\u306f\u300cimage_urls\u300d\u3068\u3044\u3046\u540d\u524d\u306e\u30d5\u30a3\u30fc\u30eb\u30c9\u540d\u306b\u3057\u3066\u3044\u308b\u306e\u3067\u305d\u306e\u540d\u524d\u3092\u6307\u5b9a\u3057\u3066\u3044\u307e\u3059\u3002<\/p>\n<h2>ImagePipeline\u306e\u30b3\u30fc\u30c9\u3092\u30aa\u30fc\u30d0\u30fc\u30e9\u30a4\u30c9\u3059\u308b\u3002<\/h2>\n<p>\u73fe\u72b6\u3060\u3068full\u3068\u3044\u3046\u30c7\u30a3\u30ec\u30af\u30c8\u30ea\u306b\u5e30\u3063\u3066\u6765\u3066\u3044\u307e\u3059\u3002\u305d\u308c\u3092\u9632\u6b62\u3059\u308b\u305f\u3081\u306b\u4ee5\u4e0b\u306e\u3088\u3046\u306bImagePipeline\u306e\u30b3\u30fc\u30c9\u3092\u4e0a\u66f8\u304d\u3057\u3066\u30d5\u30a1\u30a4\u30eb\u540d\u3060\u3051\u53d6\u5f97\u3059\u308b\u3088\u3046\u306b\u3057\u307e\u3059\u3002<\/p>\n<p>pipelines.py\u306b\u3066\u4ee5\u4e0b\u306e\u3088\u3046\u306b\u8a18\u8ff0\u3057\u307e\u3059\u3002<\/p>\n<pre class=\"lang:default decode:true\">from scrapy.pipelines.images import ImagesPipeline\r\n\r\nclass customImagePipeline(ImagesPipeline):\r\n    def file_path(self,request,response=None, info=None,*,item=None):\r\n        return request.url.split('\/')[-1]<\/pre>\n<h3>settings.py<\/h3>\n<p>ITEM_PIPELINES\u306e\u65b9\u306e\u8a2d\u5b9a\u3082\u30aa\u30fc\u30d0\u30fc\u30e9\u30a4\u30c9\u3057\u305f\u30e1\u30bd\u30c3\u30c9\u3092\u547c\u3073\u51fa\u3059\u3088\u3046\u306b\u3057\u307e\u3059\u3002<\/p>\n<pre class=\"lang:default decode:true\">ITEM_PIPELINES = {\r\n   \"\u30d7\u30ed\u30b8\u30a7\u30af\u30c8\u540d.pipelines.customImagePipeline\": 400,\r\n}<\/pre>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n","protected":false},"excerpt":{"rendered":"ImagePipeline\u3068\u306f\uff1f Scrapy\u306b\u306fURL\u3092\u6e21\u3059\u3068\u753b\u50cf\u30d5\u30a1\u30a4\u30eb\u3092\u30c0\u30a6\u30f3\u30ed\u30fc\u30c9\u3057\u3066\u6240\u5b9a\u306e\u30d5\u30a9\u30eb\u30c0\u306b\u5165\u308c\u3066\u304f\u308c\u308b\u4fbf\u5229\u6a5f\u80fd\u304c\u3042\u308a\u307e\u3059\u3002\u305f\u3060\u3001\u4ee5\u4e0b\u306e\u7279\u5fb4\u306b\u306a\u3063\u3066\u3044\u308b\u306e\u3067\u9069\u5b9c\u30ab\u30b9\u30bf\u30de\u30a4\u30ba\u304c\u5fc5\u8981\u306b\u306a\u308a\u307e\u3059\u3002 \u30c7\u30d5\u30a9\u30eb\u30c8 [&hellip;]","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[47],"tags":[],"_links":{"self":[{"href":"http:\/\/www.code-magagine.com\/index.php?rest_route=\/wp\/v2\/posts\/21191"}],"collection":[{"href":"http:\/\/www.code-magagine.com\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/www.code-magagine.com\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/www.code-magagine.com\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"http:\/\/www.code-magagine.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=21191"}],"version-history":[{"count":13,"href":"http:\/\/www.code-magagine.com\/index.php?rest_route=\/wp\/v2\/posts\/21191\/revisions"}],"predecessor-version":[{"id":21204,"href":"http:\/\/www.code-magagine.com\/index.php?rest_route=\/wp\/v2\/posts\/21191\/revisions\/21204"}],"wp:attachment":[{"href":"http:\/\/www.code-magagine.com\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=21191"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/www.code-magagine.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=21191"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/www.code-magagine.com\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=21191"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}