|
| Iterable[Tuple[_RawMarkup, Optional[_Encoding], Optional[_Encoding], bool]] | prepare_markup (self, _RawMarkup markup, Optional[_Encoding] user_specified_encoding=None, Optional[_Encoding] document_declared_encoding=None, Optional[_Encodings] exclude_encodings=None) |
| |
| None | feed (self, _RawMarkup markup) |
| |
| "TreeBuilderForHtml5lib" | create_treebuilder (self, bool namespaceHTMLElements) |
| |
| str | test_fragment_to_document (self, str fragment) |
| |
| bool | set_up_substitutions (self, Tag tag) |
| |
| | __init__ (self, Dict[str, Set[str]] multi_valued_attributes=USE_DEFAULT, Set[str] preserve_whitespace_tags=USE_DEFAULT, bool store_line_numbers=USE_DEFAULT, Dict[str, Type[NavigableString]] string_containers=USE_DEFAULT, Set[str] empty_element_tags=USE_DEFAULT, Type[AttributeDict] attribute_dict_class=AttributeDict, Type[AttributeValueList] attribute_value_list_class=AttributeValueList) |
| |
| None | initialize_soup (self, BeautifulSoup soup) |
| |
| None | reset (self) |
| |
| bool | can_be_empty_element (self, str tag_name) |
| |
|
|
str | NAME = "html5lib" |
| |
|
list | features = [NAME, PERMISSIVE, HTML_5, HTML] |
| |
|
bool | TRACKS_LINE_NUMBERS = True |
| |
|
str | underlying_builder : "TreeBuilderForHtml5lib" |
| |
|
Optional | user_specified_encoding [_Encoding] |
| |
| Optional | DEFAULT_EMPTY_ELEMENT_TAGS |
| |
|
Set | DEFAULT_BLOCK_ELEMENTS |
| |
| dict | DEFAULT_STRING_CONTAINERS |
| |
| dict | DEFAULT_CDATA_LIST_ATTRIBUTES |
| |
|
set | DEFAULT_PRESERVE_WHITESPACE_TAGS = set(["pre", "textarea"]) |
| |
|
Any | USE_DEFAULT = object() |
| |
|
str | NAME = "[Unknown tree builder]" |
| |
|
list | ALTERNATE_NAMES = [] |
| |
|
list | features = [] |
| |
|
bool | is_xml = False |
| |
|
bool | picklable = False |
| |
|
Optional | soup [BeautifulSoup] |
| |
|
Optional | empty_element_tags = None |
| |
|
Dict | cdata_list_attributes [str, Set[str]] |
| |
|
Set | preserve_whitespace_tags [str] |
| |
|
Dict | string_containers [str, Type[NavigableString]] |
| |
|
bool | tracks_line_numbers |
| |
|
Dict | DEFAULT_CDATA_LIST_ATTRIBUTES = defaultdict(set) |
| |
|
Set | DEFAULT_PRESERVE_WHITESPACE_TAGS = set() |
| |
|
dict | DEFAULT_STRING_CONTAINERS = {} |
| |
|
Optional | DEFAULT_EMPTY_ELEMENT_TAGS = None |
| |
|
bool | TRACKS_LINE_NUMBERS = False |
| |
Use `html5lib <https://github.com/html5lib/html5lib-python>`_ to
build a tree.
Note that `HTML5TreeBuilder` does not support some common HTML
`TreeBuilder` features. Some of these features could theoretically
be implemented, but at the very least it's quite difficult,
because html5lib moves the parse tree around as it's being built.
Specifically:
* This `TreeBuilder` doesn't use different subclasses of
`NavigableString` (e.g. `Script`) based on the name of the tag
in which the string was found.
* You can't use a `SoupStrainer` to parse only part of a document.
| Iterable[Tuple[_RawMarkup, Optional[_Encoding], Optional[_Encoding], bool]] bs4.builder._html5lib.HTML5TreeBuilder.prepare_markup |
( |
|
self, |
|
|
_RawMarkup |
markup, |
|
|
Optional[_Encoding] |
user_specified_encoding = None, |
|
|
Optional[_Encoding] |
document_declared_encoding = None, |
|
|
Optional[_Encodings] |
exclude_encodings = None |
|
) |
| |
Run any preliminary steps necessary to make incoming markup
acceptable to the parser.
:param markup: The markup that's about to be parsed.
:param user_specified_encoding: The user asked to try this encoding
to convert the markup into a Unicode string.
:param document_declared_encoding: The markup itself claims to be
in this encoding. NOTE: This argument is not used by the
calling code and can probably be removed.
:param exclude_encodings: The user asked *not* to try any of
these encodings.
:yield: A series of 4-tuples: (markup, encoding, declared encoding,
has undergone character replacement)
Each 4-tuple represents a strategy that the parser can try
to convert the document to Unicode and parse it. Each
strategy will be tried in turn.
By default, the only strategy is to parse the markup
as-is. See `LXMLTreeBuilderForXML` and
`HTMLParserTreeBuilder` for implementations that take into
account the quirks of particular parsers.
:meta private:
Reimplemented from bs4.builder.TreeBuilder.