|
| | __init__ (self, Dict[str, Set[str]] multi_valued_attributes=USE_DEFAULT, Set[str] preserve_whitespace_tags=USE_DEFAULT, bool store_line_numbers=USE_DEFAULT, Dict[str, Type[NavigableString]] string_containers=USE_DEFAULT, Set[str] empty_element_tags=USE_DEFAULT, Type[AttributeDict] attribute_dict_class=AttributeDict, Type[AttributeValueList] attribute_value_list_class=AttributeValueList) |
| |
| None | initialize_soup (self, BeautifulSoup soup) |
| |
| None | reset (self) |
| |
| bool | can_be_empty_element (self, str tag_name) |
| |
| None | feed (self, _RawMarkup markup) |
| |
| Iterable[Tuple[_RawMarkup, Optional[_Encoding], Optional[_Encoding], bool]] | prepare_markup (self, _RawMarkup markup, Optional[_Encoding] user_specified_encoding=None, Optional[_Encoding] document_declared_encoding=None, Optional[_Encodings] exclude_encodings=None) |
| |
| str | test_fragment_to_document (self, str fragment) |
| |
| bool | set_up_substitutions (self, Tag tag) |
| |
|
|
Any | USE_DEFAULT = object() |
| |
|
str | NAME = "[Unknown tree builder]" |
| |
|
list | ALTERNATE_NAMES = [] |
| |
|
list | features = [] |
| |
|
bool | is_xml = False |
| |
|
bool | picklable = False |
| |
|
Optional | soup [BeautifulSoup] |
| |
|
Optional | empty_element_tags = None |
| |
|
Dict | cdata_list_attributes [str, Set[str]] |
| |
|
Set | preserve_whitespace_tags [str] |
| |
|
Dict | string_containers [str, Type[NavigableString]] |
| |
|
bool | tracks_line_numbers |
| |
|
Dict | DEFAULT_CDATA_LIST_ATTRIBUTES = defaultdict(set) |
| |
|
Set | DEFAULT_PRESERVE_WHITESPACE_TAGS = set() |
| |
|
dict | DEFAULT_STRING_CONTAINERS = {} |
| |
|
Optional | DEFAULT_EMPTY_ELEMENT_TAGS = None |
| |
|
bool | TRACKS_LINE_NUMBERS = False |
| |
Turn a textual document into a Beautiful Soup object tree.
This is an abstract superclass which smooths out the behavior of
different parser libraries into a single, unified interface.
:param multi_valued_attributes: If this is set to None, the
TreeBuilder will not turn any values for attributes like
'class' into lists. Setting this to a dictionary will
customize this behavior; look at :py:attr:`bs4.builder.HTMLTreeBuilder.DEFAULT_CDATA_LIST_ATTRIBUTES`
for an example.
Internally, these are called "CDATA list attributes", but that
probably doesn't make sense to an end-user, so the argument name
is ``multi_valued_attributes``.
:param preserve_whitespace_tags: A set of tags to treat
the way <pre> tags are treated in HTML. Tags in this set
are immune from pretty-printing; their contents will always be
output as-is.
:param string_containers: A dictionary mapping tag names to
the classes that should be instantiated to contain the textual
contents of those tags. The default is to use NavigableString
for every tag, no matter what the name. You can override the
default by changing :py:attr:`DEFAULT_STRING_CONTAINERS`.
:param store_line_numbers: If the parser keeps track of the line
numbers and positions of the original markup, that information
will, by default, be stored in each corresponding
:py:class:`bs4.element.Tag` object. You can turn this off by
passing store_line_numbers=False; then Tag.sourcepos and
Tag.sourceline will always be None. If the parser you're using
doesn't keep track of this information, then store_line_numbers
is irrelevant.
:param attribute_dict_class: The value of a multi-valued attribute
(such as HTML's 'class') willl be stored in an instance of this
class. The default is Beautiful Soup's built-in
`AttributeValueList`, which is a normal Python list, and you
will probably never need to change it.
| Iterable[Tuple[_RawMarkup, Optional[_Encoding], Optional[_Encoding], bool]] bs4.builder.TreeBuilder.prepare_markup |
( |
|
self, |
|
|
_RawMarkup |
markup, |
|
|
Optional[_Encoding] |
user_specified_encoding = None, |
|
|
Optional[_Encoding] |
document_declared_encoding = None, |
|
|
Optional[_Encodings] |
exclude_encodings = None |
|
) |
| |
Run any preliminary steps necessary to make incoming markup
acceptable to the parser.
:param markup: The markup that's about to be parsed.
:param user_specified_encoding: The user asked to try this encoding
to convert the markup into a Unicode string.
:param document_declared_encoding: The markup itself claims to be
in this encoding. NOTE: This argument is not used by the
calling code and can probably be removed.
:param exclude_encodings: The user asked *not* to try any of
these encodings.
:yield: A series of 4-tuples: (markup, encoding, declared encoding,
has undergone character replacement)
Each 4-tuple represents a strategy that the parser can try
to convert the document to Unicode and parse it. Each
strategy will be tried in turn.
By default, the only strategy is to parse the markup
as-is. See `LXMLTreeBuilderForXML` and
`HTMLParserTreeBuilder` for implementations that take into
account the quirks of particular parsers.
:meta private:
Reimplemented in bs4.builder._html5lib.HTML5TreeBuilder, bs4.builder._htmlparser.HTMLParserTreeBuilder, and bs4.builder._lxml.LXMLTreeBuilderForXML.