|

搜索文档树 1.find_all(name, attrs, recursive, text, **kwargs) 1)name参数 name参数可以查找所有名字为name的Tag,字符串对象会被自动忽略掉。 a.传字符串 最简单的过滤器就是字符串,在搜索方法中传入一个字符串参数,Beautiful Soup会查找与字符串完整匹配所有的内容,返回一个列表。 #!/usr/bin/python3
# -*- coding:utf-8 -*-
from bs4 import BeautifulSoup
html = """
<html><head><title>The Dormouse's story</title></head>
<body>
<p class="title" name="dromouse"><b>The Dormouse's story</b></p>
<p class="story">Once upon a time there were three little sisters; and their names were
<a href="http://example.com/elsie" class="sister" id="link1"><!-- Elsie --></a>,
<a href="http://example.com/lacie" class="sister" id="link2">Lacie</a> and
<a href="http://example.com/tillie" class="sister" id="link3">Tillie</a>;
and they lived at the bottom of a well.</p>
<p class="story">...</p>
"""
# 创建 Beautiful Soup 对象,指定lxml解析器
soup = BeautifulSoup(html, "lxml")
print(soup.find_all("b"))
print(soup.find_all("a"))运行结果 [<b>The Dormouse's story</b>]
[<a class="sister" href="http://example.com/elsie" id="link1"><!-- Elsie --></a>, <a class="sister"
href="http://example.com/lacie" id="link2">Lacie</a>, <a class="sister" href="http://example.com/tillie"
id="link3">Tillie</a>] 相关推荐:《Python视频教程》 B.传正则表达式 如果传入正则表达式作为参数,Beautiful Soup会通过正则表达式match()来匹配内容。 #!/usr/bin/python3
# -*- coding:utf-8 -*-
from bs4 import BeautifulSoup
import re
html = """
<html><head><title>The Dormouse's story</title></head>
<body>
<p class="title" name="dromouse"><b>The Dormouse's story</b></p>
<p class="story">Once upon a time there were three little sisters; and their names were
<a href="http://example.com/elsie" class="sister" id="link1"><!-- Elsie --></a>,
<a href="http://example.com/lacie" class="sister" id="link2">Lacie</a> and
<a href="http://example.com/tillie" class="sister" id="link3">Tillie</a>;
and they lived at the bottom of a well.</p>
<p class="story">...</p>
"""
# 创建 Beautiful Soup 对象,指定lxml解析器
soup = BeautifulSoup(html, "lxml")
for tag in soup.find_all(re.compile("^b")):
print(tag.name)运行结果 body
b C.传列表 如果传入列表参数,Beautiful Soup会将与列表中任一元素匹配的内容以列表方式返回。 #!/usr/bin/python3
# -*- coding:utf-8 -*-
from bs4 import BeautifulSoup
html = """
<html><head><title>The Dormouse's story</title></head>
<body>
<p class="title" name="dromouse"><b>The Dormouse's story</b></p>
<p class="story">Once upon a time there were three little sisters; and their names were
<a href="http://example.com/elsie" class="sister" id="link1"><!-- Elsie --></a>,
<a href="http://example.com/lacie" class="sister" id="link2">Lacie</a> and
<a href="http://example.com/tillie" class="sister" id="link3">Tillie</a>;
and they lived at the bottom of a well.</p>
<p class="story">...</p>
"""
# 创建 Beautiful Soup 对象,指定lxml解析器
soup = BeautifulSoup(html, "lxml")
print(soup.find_all(['a', 'b'])) 2)keyword参数 #!/usr/bin/python3
# -*- coding:utf-8 -*-
from bs4 import BeautifulSoup
html = """
<html><head><title>The Dormouse's story</title></head>
<body>
<p class="title" name="dromouse"><b>The Dormouse's story</b></p>
<p class="story">Once upon a time there were three little sisters; and their names were
<a href="http://example.com/elsie" class="sister" id="link1"><!-- Elsie --></a>,
<a href="http://example.com/lacie" class="sister" id="link2">Lacie</a> and
<a href="http://example.com/tillie" class="sister" id="link3">Tillie</a>;
and they lived at the bottom of a well.</p>
<p class="story">...</p>
"""
# 创建 Beautiful Soup 对象,指定lxml解析器
soup = BeautifulSoup(html, "lxml")
print(soup.find_all(id="link1")) 运行结果 [<a class="sister" href="http://example.com/elsie" id="link1"><!-- Elsie --></a>] 3)text参数 通过text参数可以搜索文档中的字符串内容,与name参数的可选值一样,text参数接受字符串,正则表达式,列表。 #!/usr/bin/python3
# -*- coding:utf-8 -*-
from bs4 import BeautifulSoup
import re
html = """
<html><head><title>The Dormouse's story</title></head>
<body>
<p class="title" name="dromouse"><b>The Dormouse's story</b></p>
<p class="story">Once upon a time there were three little sisters; and their names were
<a href="http://example.com/elsie" class="sister" id="link1"><!-- Elsie --></a>,
<a href="http://example.com/lacie" class="sister" id="link2">Lacie</a> and
<a href="http://example.com/tillie" class="sister" id="link3">Tillie</a>;
and they lived at the bottom of a well.</p>
<p class="story">...</p>
"""
# 创建 Beautiful Soup 对象,指定lxml解析器
soup = BeautifulSoup(html, "lxml")
# 字符串
print(soup.find_all(text = " Elsie "))
# 列表
print(soup.find_all(text = ["Tillie", " Elsie ", "Lacie"]))
# 正则表达式
print(soup.find_all(text = re.compile("Dormouse")))运行结果 [' Elsie ']
[' Elsie ', 'Lacie', 'Tillie']
["The Dormouse's story", "The Dormouse's story"] |