Beautiful Soup find_all_previous() 方法

一、方法描述

在 BeautifulSoup 库中，find_all_previous() 方法从当前 PageElement 开始向前查找文档，并找到所有与给定条件匹配并且在当前元素之前出现的 PageElements。此方法返回一个包含在当前标签之前出现的 PageElements 的 ResultSet。如果 limit 参数设置为 1，则此方法等同于 find_previous() 方法。

二、语法

find_all_previous(name, attrs, string, limit, **kwargs)

三、参数

name：一个对标签名的过滤器。
attrs：一个包含属性值过滤器的字典。
string：一个过滤器，用于带有特定文本的 NavigableString。
limit：在找到这么多结果后停止查找。
kwargs：一个包含属性值过滤器的字典。

四、返回值

find_all_previous() 方法返回一个包含 Tag 或 NavigableString 对象的 ResultSet。

五、示例

示例 1

在此示例中，显示了出现在第一个 <input> 标签之前的每个对象的 name 属性。

from bs4 import BeautifulSoup

fp = open("index.html")
soup = BeautifulSoup(fp, 'html.parser')
tag = soup.find('input')
for t in tag.find_all_previous():
   print(t.name)

输出：

form
h1
body
title
head
html

示例 2

在考虑中的 HTML 文档 (index.html) 中，有三个 <input> 元素。通过以下代码，我们在 <input> 标签中 name 属性为 marks 的标签之前打印所有前面的标签名称。为了区分前面的两个 <input> 标签，我们也打印了 attrs 属性。注意其他标签没有任何属性。

from bs4 import BeautifulSoup

fp = open("index.html")
soup = BeautifulSoup(fp, 'html.parser')
tag = soup.find('input', {'name':'marks'})
pretags = tag.find_all_previous()
for pretag in pretags:
   print(pretag.name, pretag.attrs)

输出：

input {'type': 'text', 'id': 'age', 'name': 'age'}
input {'type': 'text', 'id': 'nm', 'name': 'name'}
form {}
h1 {}
body {}
title {}
head {}
html {}

示例 3

BeautifulSoup 对象存储整个文档的树结构。由于它是树的根，因此没有前面的元素，如下面的示例所示。

from bs4 import BeautifulSoup

fp = open("index.html")
soup = BeautifulSoup(fp, 'html.parser')
tags = soup.find_all_previous()
print(tags)

输出：

[]