Beautiful Soup - 查找所有注释 - 优构网

from bs4 import BeautifulSoup markup = "<b></b>" soup = BeautifulSoup(markup, 'html.parser') comment = soup.b.string print (comment, type(comment))

输出

这是一个 HTML 中的注释文本 <class 'bs4.element.Comment'>

为了在一个 HTML 文档中搜索所有注释的发生，我们将使用 find_all() 方法。没有参数的情况下，find_all() 返回解析后的 HTML 文档中的所有元素。你可以向 find_all() 方法传递一个关键字参数 string。我们将把函数 iscomment() 的返回值赋给它。

comments = soup.find_all(string=iscomment)

iscomment() 函数利用 isinstance() 函数验证标签中的文本是否为 Comment 对象。

def iscomment(elem): return isinstance(elem, Comment)

comments 变量将会存储给定 HTML 文档中所有注释文本的发生。我们在示例代码中将会使用以下的 index.html 文件：

<html> <head>  <title>Yoagoa</title> </head> <body>  <h2>Departmentwise Employees</h2>  <ul id="dept"> <li>Accounts</li> <ul id='acc'>  <li>Anand</li> <li>Mahesh</li> </ul> <li>HR</li> <ul id="HR">  <li>Rani</li> <li>Ankita</li> </ul> </ul> </body> </html>

以下 Python 程序抓取上面的 HTML 文档，并找到其中所有的注释。

from bs4 import BeautifulSoup, Comment fp = open('index.html') soup = BeautifulSoup(fp, 'html.parser') def iscomment(elem): return isinstance(elem, Comment) comments = soup.find_all(string=iscomment) print (comments)