Beautiful Soup 按 ID 查找元素

在HTML文档中，通常每个元素都有一个唯一的ID。这使得前端代码如JavaScript函数能够提取元素的值。

使用BeautifulSoup，你可以通过ID找到给定元素的内容。有两种方法可以实现这一点 - 使用find()以及find_all()和select()。

使用`find()`方法

find()方法在BeautifulSoup对象中搜索满足作为参数给定条件的第一个元素。

让我们使用以下HTML脚本（作为index.html）来达到这个目的：

<html>
   <head>
      <title>Yoagoa</title>
   </head>
   <body>
      <form>
         <input type = 'text' id = 'nm' name = 'name'>
         <input type = 'text' id = 'age' name = 'age'>
         <input type = 'text' id = 'marks' name = 'marks'>
      </form>
   </body>
</html>

下面的Python代码找到了具有id为nm的元素：

示例

from bs4 import BeautifulSoup

fp = open("index.html")
soup = BeautifulSoup(fp, 'html.parser')

obj = soup.find(id = 'nm')
print (obj)

输出

<input id="nm" name="name" type="text"/>

使用`find_all()`方法

find_all()方法也接受一个过滤器参数。它返回具有给定id的所有元素的列表。在一个确定的HTML文档中，通常只有一个具有特定id的单个元素。因此，使用find()而不是find_all()来寻找给定的id是更可取的。

示例

from bs4 import BeautifulSoup

fp = open("index.html")
soup = BeautifulSoup(fp, 'html.parser')

obj = soup.find_all(id = 'nm')
print (obj)

输出

[<input id="nm" name="name" type="text"/>]

注意find_all()方法返回的是一个列表。find_all()方法也有一个limit参数。将limit=1设置给find_all()等同于使用find()。

obj = soup.find_all(id = 'nm', limit=1)

使用`select()`方法

BeautifulSoup类中的select()方法接受CSS选择器作为参数。#符号是CSS的选择器符号，后面跟着所需的id值，传递给select()方法。它的工作方式类似于find_all()方法。

示例

from bs4 import BeautifulSoup

fp = open("index.html")
soup = BeautifulSoup(fp, 'html.parser')

obj = soup.select("#nm")
print (obj)

输出

[<input id="nm" name="name" type="text"/>]

使用`select_one()`方法

像find_all()方法一样，select()方法也返回一个列表。还有一个select_one()方法来返回给定参数的第一个标签。

示例

from bs4 import BeautifulSoup

fp = open("index.html")
soup = BeautifulSoup(fp, 'html.parser')

obj = soup.select_one("#nm")
print (obj)

输出

<input id="nm" name="name" type="text"/>