152

আমি ইন্টারনেটে একটি চিত্রের URL জানি।

উদাহরণস্বরূপ http://www.digimouth.com/news/media/2011/09/google-logo.jpg , যার মধ্যে গুগলের লোগো রয়েছে।

এখন, আমি কীভাবে পাইথন ব্যবহার করে এই ব্রাউজারে ইউআরএলটি না খোলাই এবং নিজে নিজে ফাইলটি সংরক্ষণ না করে ডাউনলোড করতে পারি।

python web-scraping

— পঙ্কজ ভাতসা
সূত্র

1

সম্ভাব্য সদৃশ আমি পাইথন ব্যবহার HTTP- র মাধ্যমে একটি ফাইল ডাউনলোড করতে পারব?

— জয়দেব

316

পাইথন 2

এখানে আপনি আরও কিছু করতে চান যদি আপনি যা করতে চান তা ফাইল হিসাবে সংরক্ষণ করুন:

import urllib

urllib.urlretrieve("http://www.digimouth.com/news/media/2011/09/google-logo.jpg", "local-filename.jpg")

দ্বিতীয় যুক্তি হ'ল স্থানীয় পথ যেখানে ফাইলটি সংরক্ষণ করা উচিত।

পাইথন ঘ

সার্জিও যেমন পরামর্শ দিয়েছিল নীচের কোডটি পাইথন 3 এর সাথে কাজ করা উচিত।

import urllib.request

urllib.request.urlretrieve("http://www.digimouth.com/news/media/2011/09/google-logo.jpg", "local-filename.jpg")

— Liquid_Fire
সূত্র

55

একটি ভাল উপায় লিঙ্ক থেকে ফাইলের নাম পেতে হয়filename = link.split('/')[-1]

— heltonbiker

2

urlretrieve দিয়ে আমি ভিতরে 1kb ফাইল পেয়ে যাচ্ছি যার মধ্যে একটি ডিক এবং ভিতরে 404 ত্রুটির পাঠ্য রয়েছে? কেন? আমি যদি আমার ব্রাউজারে ইউআরএল প্রবেশ করি তবে আমি ছবিটি পেতে পারি

— ইয়েবাচ

2

@ ইয়াবাচ: আপনি যে সাইট থেকে ডাউনলোড করছেন সেটি কোনও কন্টেন্ট আপনাকে পরিবেশন করতে হবে তা নির্ধারণ করতে কুকিজ, ব্যবহারকারী-এজেন্ট বা অন্যান্য শিরোনাম ব্যবহার করতে পারে। এগুলি আপনার ব্রাউজার এবং পাইথনের মধ্যে পৃথক হবে।

— লিকুইড_ফায়ার

27

পাইথন 3 : import urllib.request এবংurllib.request.urlretrieve()তদনুসারে।

— সের্গো

1

@ সার্গো - আপনি কি মূল উত্তরে পাইথন 3 অংশ যুক্ত করতে পারেন?

— শ্রীজিথ মেনন

27

import urllib
resource = urllib.urlopen("http://www.digimouth.com/news/media/2011/09/google-logo.jpg")
output = open("file01.jpg","wb")
output.write(resource.read())
output.close()

file01.jpg আপনার ইমেজ থাকবে।

— নওফাল ইব্রাহিম
সূত্র

2

আপনার ফাইলটি বাইনারি মোডে খুলতে হবে: open("file01.jpg", "wb")অন্যথায় আপনি চিত্রটি দূষিত করতে পারেন।

— লিকুইড_ফায়ার

2

urllib.urlretrieveইমেজ সরাসরি সংরক্ষণ করতে পারেন।

— হেলটনবাইকার

17

আমি একটি স্ক্রিপ্ট লিখেছি যা কেবল এটি করে এবং এটি আপনার ব্যবহারের জন্য আমার গিথুবে উপলভ্য।

আমি ছবিগুলির জন্য কোনও ওয়েবসাইটকে পার্স করার অনুমতি দেওয়ার জন্য আমি বিউটিফুলসুপ ব্যবহার করেছি। আপনি যদি অনেক ওয়েব স্ক্র্যাপিং করে থাকেন (বা আমার সরঞ্জামটি ব্যবহারের উদ্দেশ্যে) আমি আপনাকে পরামর্শ দিই sudo pip install BeautifulSoup। বিউটিফুলসপ সম্পর্কিত তথ্য এখানে পাওয়া যায় ।

সুবিধার জন্য এখানে আমার কোড:

from bs4 import BeautifulSoup
from urllib2 import urlopen
import urllib

# use this image scraper from the location that 
#you want to save scraped images to

def make_soup(url):
    html = urlopen(url).read()
    return BeautifulSoup(html)

def get_images(url):
    soup = make_soup(url)
    #this makes a list of bs4 element tags
    images = [img for img in soup.findAll('img')]
    print (str(len(images)) + "images found.")
    print 'Downloading images to current working directory.'
    #compile our unicode list of image links
    image_links = [each.get('src') for each in images]
    for each in image_links:
        filename=each.split('/')[-1]
        urllib.urlretrieve(each, filename)
    return image_links

#a standard call looks like this
#get_images('http://www.wookmark.com')

— হা.
সূত্র

11

এটি অনুরোধ দিয়ে করা যেতে পারে। পৃষ্ঠাটি লোড করুন এবং বাইনারি সামগ্রীটি কোনও ফাইলে ডাম্প করুন।

import os
import requests

url = 'https://apod.nasa.gov/apod/image/1701/potw1636aN159_HST_2048.jpg'
page = requests.get(url)

f_ext = os.path.splitext(url)[-1]
f_name = 'img{}'.format(f_ext)
with open(f_name, 'wb') as f:
    f.write(page.content)

— AlexG
সূত্র

1

খারাপ অনুরোধ থাকলে অনুরোধে ব্যবহারকারীর শিরোনাম :)

— 1UC1F3R616

8

পাইথন ঘ

urllib.request - ইউআরএল খোলার জন্য এক্সটেনসিবল লাইব্রেরি

from urllib.error import HTTPError
from urllib.request import urlretrieve

try:
    urlretrieve(image_url, image_local_path)
except FileNotFoundError as err:
    print(err)   # something wrong with local path
except HTTPError as err:
    print(err)  # something wrong with url

— SergO
সূত্র

6

পাইথন 2 এবং পাইথন 3 এর সাথে কাজ করে এমন একটি সমাধান:

try:
    from urllib.request import urlretrieve  # Python 3
except ImportError:
    from urllib import urlretrieve  # Python 2

url = "http://www.digimouth.com/news/media/2011/09/google-logo.jpg"
urlretrieve(url, "local-filename.jpg")

বা, যদি অতিরিক্ত প্রয়োজনীয়তা requestsগ্রহণযোগ্য হয় এবং যদি এটি কোনও http (গুলি) ইউআরএল হয়:

def load_requests(source_url, sink_path):
    """
    Load a file from an URL (e.g. http).

    Parameters
    ----------
    source_url : str
        Where to load the file from.
    sink_path : str
        Where the loaded file is stored.
    """
    import requests
    r = requests.get(source_url, stream=True)
    if r.status_code == 200:
        with open(sink_path, 'wb') as f:
            for chunk in r:
                f.write(chunk)

— মার্টিন থোমা
সূত্র

5

আমি ইউপির স্ক্রিপ্টে প্রসারিত একটি স্ক্রিপ্ট তৈরি করেছি। আমি কিছু জিনিস স্থির করেছি। এটি এখন 403 বাইপাস করবে: নিষিদ্ধ সমস্যা। কোনও চিত্র পুনরুদ্ধার করতে ব্যর্থ হলে এটি ক্রাশ করবে না। এটি কলুষিত পূর্বরূপগুলি এড়ানোর চেষ্টা করে। এটি সঠিক পরম url পায়। এটি আরও তথ্য দেয়। কমান্ড লাইন থেকে এটি একটি আর্গুমেন্ট দিয়ে চালানো যেতে পারে।

# getem.py
# python2 script to download all images in a given url
# use: python getem.py http://url.where.images.are

from bs4 import BeautifulSoup
import urllib2
import shutil
import requests
from urlparse import urljoin
import sys
import time

def make_soup(url):
    req = urllib2.Request(url, headers={'User-Agent' : "Magic Browser"}) 
    html = urllib2.urlopen(req)
    return BeautifulSoup(html, 'html.parser')

def get_images(url):
    soup = make_soup(url)
    images = [img for img in soup.findAll('img')]
    print (str(len(images)) + " images found.")
    print 'Downloading images to current working directory.'
    image_links = [each.get('src') for each in images]
    for each in image_links:
        try:
            filename = each.strip().split('/')[-1].strip()
            src = urljoin(url, each)
            print 'Getting: ' + filename
            response = requests.get(src, stream=True)
            # delay to avoid corrupted previews
            time.sleep(1)
            with open(filename, 'wb') as out_file:
                shutil.copyfileobj(response.raw, out_file)
        except:
            print '  An error occured. Continuing.'
    print 'Done.'

if __name__ == '__main__':
    url = sys.argv[1]
    get_images(url)

— ক্ষিপ্ত সাজসরঞ্জাম
সূত্র

3

অনুরোধ পাঠাগার ব্যবহার করে

import requests
import shutil,os

headers = {
    'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.108 Safari/537.36'
}
currentDir = os.getcwd()
path = os.path.join(currentDir,'Images')#saving images to Images folder

def ImageDl(url):
    attempts = 0
    while attempts < 5:#retry 5 times
        try:
            filename = url.split('/')[-1]
            r = requests.get(url,headers=headers,stream=True,timeout=5)
            if r.status_code == 200:
                with open(os.path.join(path,filename),'wb') as f:
                    r.raw.decode_content = True
                    shutil.copyfileobj(r.raw,f)
            print(filename)
            break
        except Exception as e:
            attempts+=1
            print(e)


ImageDl(url)

— সোহান দাস
সূত্র

মনে হচ্ছে শিরোনামটি আমার ক্ষেত্রে সত্যই গুরুত্বপূর্ণ, আমি 403 ত্রুটি পাচ্ছিলাম। এটা কাজ করেছে.

— ইশতিয়াক হুসেন

2

এটি খুব সংক্ষিপ্ত উত্তর।

import urllib
urllib.urlretrieve("http://photogallery.sandesh.com/Picture.aspx?AlubumId=422040", "Abc.jpg")

— OO7
সূত্র

2

পাইথন 3 এর জন্য সংস্করণ

আমি পাইথন 3 এর জন্য @ এমডপ্রপসের কোডটি সামঞ্জস্য করেছি

# getem.py
# python2 script to download all images in a given url
# use: python getem.py http://url.where.images.are

from bs4 import BeautifulSoup
import urllib.request
import shutil
import requests
from urllib.parse import urljoin
import sys
import time

def make_soup(url):
    req = urllib.request.Request(url, headers={'User-Agent' : "Magic Browser"}) 
    html = urllib.request.urlopen(req)
    return BeautifulSoup(html, 'html.parser')

def get_images(url):
    soup = make_soup(url)
    images = [img for img in soup.findAll('img')]
    print (str(len(images)) + " images found.")
    print('Downloading images to current working directory.')
    image_links = [each.get('src') for each in images]
    for each in image_links:
        try:
            filename = each.strip().split('/')[-1].strip()
            src = urljoin(url, each)
            print('Getting: ' + filename)
            response = requests.get(src, stream=True)
            # delay to avoid corrupted previews
            time.sleep(1)
            with open(filename, 'wb') as out_file:
                shutil.copyfileobj(response.raw, out_file)
        except:
            print('  An error occured. Continuing.')
    print('Done.')

if __name__ == '__main__':
    get_images('http://www.wookmark.com')

— জিওভান্নি জি পিওয়াই
সূত্র

1

অনুরোধগুলি ব্যবহার করে পাইথন 3 এর জন্য নতুন কিছু:

কোডে মন্তব্য। ফাংশন ব্যবহার করতে প্রস্তুত।


import requests
from os import path

def get_image(image_url):
    """
    Get image based on url.
    :return: Image name if everything OK, False otherwise
    """
    image_name = path.split(image_url)[1]
    try:
        image = requests.get(image_url)
    except OSError:  # Little too wide, but work OK, no additional imports needed. Catch all conection problems
        return False
    if image.status_code == 200:  # we could have retrieved error page
        base_dir = path.join(path.dirname(path.realpath(__file__)), "images") # Use your own path or "" to use current working directory. Folder must exist.
        with open(path.join(base_dir, image_name), "wb") as f:
            f.write(image.content)
        return image_name

get_image("https://apod.nasddfda.gov/apod/image/2003/S106_Mishra_1947.jpg")

— পাভেল পানোচা
সূত্র

0

দেরীতে উত্তর, তবে আপনার জন্য ডলোডpython>=3.6 ব্যবহার করতে পারেন , যেমন:

import dload
dload.save("http://www.digimouth.com/news/media/2011/09/google-logo.jpg")

আপনার যদি চিত্রটির মতো হয় তবে bytesব্যবহার করুন:

img_bytes = dload.bytes("http://www.digimouth.com/news/media/2011/09/google-logo.jpg")

ব্যবহার করে ইনস্টল করুন pip3 install dload

— CONvid19
সূত্র

-2

img_data=requests.get('https://apod.nasa.gov/apod/image/1701/potw1636aN159_HST_2048.jpg')

with open(str('file_name.jpg', 'wb') as handler:
    handler.write(img_data)

— লুইস মান
সূত্র

4

স্ট্যাক ওভারফ্লোতে স্বাগতম! আপনি যদি এই ব্যবহারকারীর সমস্যার সমাধান করতে পারেন, কেবলমাত্র কোড-উত্তরগুলি ভবিষ্যতে এই প্রশ্নে আসা ব্যবহারকারীদের পক্ষে খুব কার্যকর নয়। আপনার কোডটি কেন মূল কোডটি সমাধান করে তা বোঝাতে আপনার উত্তরটি সম্পাদনা করুন।

— জো সি

1

TypeError: a bytes-like object is required, not 'Response'। এটি অবশ্যই হবেhandler.write(img_data.content)

— টাইটানফাইটার

এটা হওয়া উচিত handler.write(img_data.read())।

— jdhao

পাইথন যার ইউআরএল ঠিকানা আমি ইতিমধ্যে জানি তা ব্যবহার করে স্থানীয়ভাবে কীভাবে একটি চিত্র সংরক্ষণ করবেন?

পাইথন 2

পাইথন ঘ

পাইথন 3 এর জন্য সংস্করণ