Python 파이썬 beautifulsoup[bs4] 활용해 영화 순위 크롤링하기

728x90

Python 크롤링 : beautifulsoup 활용

'request로 요청하고, beautifulsoup 으로 솎아낸다'

Python 파이썬 beautifulsoup[bs4] 활용해 영화 순위 크롤링하기

BeautifulSoup 사용해 영화 순위 크롤링하기

영화 제목 크롤링 하기

import requests
from bs4 import BeautifulSoup

headers = {'User-Agent' : 'Mozilla/5.0 (Windows NT 10.0; Win64; x64)AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.86 Safari/537.36'}
data = requests.get('https://movie.naver.com/movie/sdb/rank/rmovie.nhn?sel=pnt&date=20220127',headers=headers)

soup = BeautifulSoup(data.text, 'html.parser')

#old_content > table > tbody > tr:nth-child(2) > td.title > div > a

trs = soup.select('#old_content > table > tbody > tr')

for tr in trs:
    a_tag = tr.select_one('td.title > div > a')
    if a_tag is not None:
        title = a_tag.text
        print(title)

- 타겟 URL을 읽어 HTML을 받아오고 BeautifulSoup 라이브러리 활용

- soup이라는 변수에 "파싱 용이해진 html"이 담긴 상태

* 크롤링 시 select / select_one 사용

- 태그 안의 텍스트를 찍고 싶을 땐 → 태그.text

- 태그 안의 속성을 찍고 싶을 땐 → 태그['속성']

영화 순위, 제목, 평점까지 크롤링 하기

import requests
from bs4 import BeautifulSoup

headers = {'User-Agent' : 'Mozilla/5.0 (Windows NT 10.0; Win64; x64)AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.86 Safari/537.36'}
data = requests.get('https://movie.naver.com/movie/sdb/rank/rmovie.nhn?sel=pnt&date=20200303',headers=headers)

soup = BeautifulSoup(data.text, 'html.parser')

#old_content > table > tbody > tr:nth-child(2) > td.title > div > a
#old_content > table > tbody > tr:nth-child(2) > td:nth-child(1) > img
#old_content > table > tbody > tr:nth-child(2) > td.point

trs = soup.select('#old_content > table > tbody > tr')

for tr in trs:
    a_tag = tr.select_one('td.title > div > a')
    if a_tag is not None:
        rank = tr.select_one('td:nth-child(1) > img')['alt']
        title = a_tag.text
        star = tr.select_one('td.point').text
        print(rank, title, star)

* BeautifulSoup 사용해 영화 순위 크롤링하는 법

- beautifulsoup4 패키지 설치

- 크롤링을 위한 기본 코드 활용

- select 선택자를 이용해 tr 불러오기

import requests
from bs4 import BeautifulSoup

headers = {'User-Agent' : 'Mozilla/5.0 (Windows NT 10.0; Win64; x64)AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.86 Safari/537.36'}
data = requests.get('https://movie.naver.com/movie/sdb/rank/rmovie.nhn?sel=pnt&date=20220127',headers=headers)

soup = BeautifulSoup(data.text, 'html.parser')

movies = soup.select('#old_content > table > tbody > tr')


# 선택자 사용 (copy selector)
soup.select('태그명')
soup.select('.클래스명')
soup.select('#아이디명')

soup.select('상위태그명 > 하위태그명 > 하위태그명')
soup.select('상위태그명.클래스명 > 하위태그명.클래스명')

# 태그와 속성값
soup.select('태그명[속성="값"]')

# 한 개만 가져오고 싶은 경우
soup.select_one('태그명[속성="값"]')

728x90

저작자표시 비영리 변경금지

'Python' 카테고리의 다른 글

[Python] 파이썬 패키지 사용법 : import requests (0)	2022.01.30
[Python] 파이썬 자료형, 함수, 조건문, 반복문 (0)	2022.01.29
[Python Crawling] 파이썬 음악 순위 크롤링(스크랩핑) (0)	2022.01.28
파이썬 jinja2.exceptions.TemplateNotFound 오류 해결 (1)	2022.01.13
Python PYPI, 파이썬 비교연산자, 기본 조건문 (0)	2022.01.09

Jann's World

Python 파이썬 beautifulsoup[bs4] 활용해 영화 순위 크롤링하기

Python 크롤링 : beautifulsoup 활용

BeautifulSoup 사용해 영화 순위 크롤링하기

영화 제목 크롤링 하기

* 크롤링 시 select / select_one 사용

영화 순위, 제목, 평점까지 크롤링 하기

* BeautifulSoup 사용해 영화 순위 크롤링하는 법

'Python' 카테고리의 다른 글

댓글

티스토리툴바

Python 파이썬 beautifulsoup[bs4] 활용해 영화 순위 크롤링하기

Python 크롤링 : beautifulsoup 활용

BeautifulSoup 사용해 영화 순위 크롤링하기

영화 제목 크롤링 하기

* 크롤링 시 select / select_one 사용

영화 순위, 제목, 평점까지 크롤링 하기

* BeautifulSoup 사용해 영화 순위 크롤링하는 법

'Python' 카테고리의 다른 글

관련글

댓글

티스토리툴바