一、方法描述
在 BeautifulSoup 库中,extract()
方法用于从文档树中移除一个标签或字符串。extract()
方法返回被移除的对象。这类似于 Python 列表中的 pop()
方法的工作方式。
二、语法
extract(index)
三、参数
-
Index
:要移除的元素的位置,默认为 None
。
四、返回类型
extract()
方法返回从文档树中移除的元素。
五、示例
示例 1
html = '''
<div>
<p>Hello Python</p>
</div>
'''
from bs4 import BeautifulSoup
soup = BeautifulSoup(html, 'html.parser')
tag1 = soup.find("div")
tag2 = tag1.find("p")
ret = tag2.extract()
print('Extracted:', ret)
print('Original:', soup)
输出:
Extracted: <p>Hello Python</p>
Original:
<div>
</div>
示例 2
考虑以下 HTML 标记:
<html>
<body>
<p> The quick, brown fox jumps over a lazy dog.</p>
<p> DJs flock by when MTV ax quiz prog.</p>
<p> Junk MTV quiz graced by fox whelps.</p>
<p> Bawds jog, flick quartz, vex nymphs.</p>
</body>
</html>
下面是代码:
from bs4 import BeautifulSoup
fp = open('index.html')
soup = BeautifulSoup(fp, 'html.parser')
tags = soup.find_all()
for tag in tags:
obj = tag.extract()
print("Extracted:", obj)
print(soup)
输出:
Extracted: <html>
<body>
<p> The quick, brown fox jumps over a lazy dog.</p>
<p> DJs flock by when MTV ax quiz prog.</p>
<p> Junk MTV quiz graced by fox whelps.</p>
<p> Bawds jog, flick quartz, vex nymphs.</p>
</body>
</html>
Extracted: <body>
<p> The quick, brown fox jumps over a lazy dog.</p>
<p> DJs flock by when MTV ax quiz prog.</p>
<p> Junk MTV quiz graced by fox whelps.</p>
<p> Bawds jog, flick quartz, vex nymphs.</p>
</body>
Extracted: <p> The quick, brown fox jumps over a lazy dog.</p>
Extracted: <p> DJs flock by when MTV ax quiz prog.</p>
Extracted: <p> Junk MTV quiz graced by fox whelps.</p>
Extracted: <p> Bawds jog, flick quartz, vex nymphs.</p>
示例 3
您也可以将 extract()
方法与 find_next()
, find_previous()
方法以及 next_element
, previous_element
属性一起使用。
html = '''
<div>
<p><b>Hello</b><b>Python</b></p>
</div>
'''
from bs4 import BeautifulSoup
soup = BeautifulSoup(html, 'html.parser')
tag1 = soup.find("b")
ret = tag1.next_element.extract()
print('Extracted:', ret)
print('Original:', soup)
输出:
Extracted: Hello
Original:
<div>
<p><b></b><b>Python</b></p>
</div>