Python × Selenium｜食べログ詳細ページからジャンル・住所・緯度経度を抽出する方法

前回の記事では、Python + Selenium を使って、食べログの店舗一覧ページから店舗名と詳細ページURLを全件収集する方法を紹介しました。

今回はその続きとして、取得済みの店舗URL一覧をもとに、各店舗の詳細ページへアクセスし、次のような情報を抽出する方法をご紹介します。

この記事でできること
1. 事前準備：一覧URLのCSV
2. スクレイピング対象のHTML構造
1. ジャンル
2. 住所と座標（緯度・経度）
3. スクリプト全文：tabelog_extract_details.py
4. 実行方法
5. 出力例
トラブル対応ヒント
まとめ
関連記事

この記事でできること

店舗詳細ページから次の情報を取得：
ジャンル（居酒屋、ラーメンなど）
住所（正規表現で整形不要な形式）
緯度・経度（Google Maps画像のURLから抽出）
抽出結果をCSV形式で保存
件数制限付き実行も可能（テストに便利）

1. 事前準備：一覧URLのCSV

一覧取得スクリプトの出力ファイル（例：tabelog_higashimurayama_all.csv）は以下のような形式になっている想定です：

店名,URL
串かつ でんがな 秋津店,https://tabelog.com/tokyo/A1328/A132806/13155695/
...

このCSVのURL列をもとに、各店舗の詳細ページへアクセスして情報を収集していきます。

2. スクレイピング対象のHTML構造

ジャンル

<tr>
  <th>ジャンル</th>
  <td>
    <span>居酒屋、焼き鳥、もつ焼き</span>
  </td>
</tr>

住所と座標（緯度・経度）

<img class="rstinfo-table__map-image"
     data-original="https://maps.googleapis.com/maps/api/staticmap?...&center=35.777271,139.493903&...">

緯度・経度は data-original 属性の中にある center= パラメータから抽出します。
src 属性にはダミー画像しか入っていないため、Lazy Load対応が必要です。

3. スクリプト全文：tabelog_extract_details.py

import pandas as pd
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import NoSuchElementException
import time
import re

# --- 設定 ---
INPUT_CSV = "tabelog_higashimurayama_all.csv"
OUTPUT_CSV = "tabelog_higashimurayama_detailed.csv"
LIMIT = None  # 全件 = None、10件だけ = 10

# --- Selenium設定 ---
options = Options()
options.add_argument("start-maximized")
options.add_argument("user-agent=Mozilla/5.0")
options.page_load_strategy = "eager"

service = Service("/usr/local/bin/chromedriver")
driver = webdriver.Chrome(service=service, options=options)
driver.set_page_load_timeout(30)

# --- CSV読み込みと初期化 ---
df = pd.read_csv(INPUT_CSV)
df["ジャンル"] = ""
df["住所"] = ""
df["緯度"] = ""
df["経度"] = ""

rows = df.iterrows()
if LIMIT:
    rows = list(rows)[:LIMIT]

# --- 詳細ページ処理 ---
for index, row in rows:
    url = row["URL"]
    print(f"{index+1}/{len(df)}: {url}")
    try:
        driver.get(url)
        time.sleep(2)

        # ジャンル
        try:
            genre_el = driver.find_element(By.XPATH, '//th[contains(text(), "ジャンル")]/following-sibling::td/span')
            df.at[index, "ジャンル"] = genre_el.text.strip()
        except NoSuchElementException:
            pass

        # 住所
        try:
            addr_el = driver.find_element(By.CSS_SELECTOR, 'p.rstinfo-table__address')
            df.at[index, "住所"] = addr_el.text.strip()
        except NoSuchElementException:
            pass

        # 緯度・経度（data-originalから抽出）
        try:
            map_img = WebDriverWait(driver, 10).until(
                EC.presence_of_element_located((By.CSS_SELECTOR, '.rstinfo-table__map img'))
            )
            map_url = map_img.get_attribute("data-original")
            if map_url:
                match = re.search(r'center=([\d.]+),([\d.]+)', map_url)
                if match:
                    df.at[index, "緯度"] = match.group(1)
                    df.at[index, "経度"] = match.group(2)
        except Exception as e:
            print(f"座標取得失敗: {e}")

    except Exception as e:
        print(f"ページエラー: {e}")
        continue

# --- 保存 ---
df.to_csv(OUTPUT_CSV, index=False, encoding="utf-8-sig")
print(f"抽出完了。{OUTPUT_CSV} に保存しました。")

driver.quit()

4. 実行方法

python3 tabelog_extract_details.py

全件抽出したい場合 → LIMIT = None
テストとして10件のみ実行 → LIMIT = 10

5. 出力例

店名,URL,ジャンル,住所,緯度,経度
新鮮な魚介類と地酒専門店 おやじの隠れ家 魚武,https://tabelog.com/tokyo/A1328/A132806/13196471/,居酒屋、海鮮,東京都東村山市栄町2-29-6 1F,35.74830623857909,139.4702341852696
串かつ でんがな 秋津店,https://tabelog.com/tokyo/A1328/A132806/13155695/,,,,

トラブル対応ヒント

現象	対応方法
座標が取れない	`src` ではなく `data-original` を使う（本記事対応済）
要素が見つからない	`NoSuchElementException` で例外処理を回避
一部ページが構造的に異なる	セレクタの条件分岐または複数パターン対応

まとめ

このスクリプトを使うことで、一覧ページで取得したURLリストから詳細な情報（ジャンル・住所・座標）を高精度で自動抽出できます。
今後のデータ利活用（地図表示、カテゴリ分析など）の基礎データとして非常に有効です。