Qucs-S S-parameter Viewer & RF Synthesis Tools
Loading...
Searching...
No Matches
Public Member Functions | Public Attributes | Static Public Attributes | Protected Member Functions | List of all members
bs4.builder.TreeBuilder Class Reference
Inheritance diagram for bs4.builder.TreeBuilder:
Inheritance graph
[legend]
Collaboration diagram for bs4.builder.TreeBuilder:
Collaboration graph
[legend]

Public Member Functions

 __init__ (self, Dict[str, Set[str]] multi_valued_attributes=USE_DEFAULT, Set[str] preserve_whitespace_tags=USE_DEFAULT, bool store_line_numbers=USE_DEFAULT, Dict[str, Type[NavigableString]] string_containers=USE_DEFAULT, Set[str] empty_element_tags=USE_DEFAULT, Type[AttributeDict] attribute_dict_class=AttributeDict, Type[AttributeValueList] attribute_value_list_class=AttributeValueList)
 
None initialize_soup (self, BeautifulSoup soup)
 
None reset (self)
 
bool can_be_empty_element (self, str tag_name)
 
None feed (self, _RawMarkup markup)
 
Iterable[Tuple[_RawMarkup, Optional[_Encoding], Optional[_Encoding], bool]] prepare_markup (self, _RawMarkup markup, Optional[_Encoding] user_specified_encoding=None, Optional[_Encoding] document_declared_encoding=None, Optional[_Encodings] exclude_encodings=None)
 
str test_fragment_to_document (self, str fragment)
 
bool set_up_substitutions (self, Tag tag)
 

Public Attributes

 soup
 
 cdata_list_attributes
 
 preserve_whitespace_tags
 
 empty_element_tags
 
 store_line_numbers
 
 string_containers
 
 attribute_dict_class
 
 attribute_value_list_class
 

Static Public Attributes

Any USE_DEFAULT = object()
 
str NAME = "[Unknown tree builder]"
 
list ALTERNATE_NAMES = []
 
list features = []
 
bool is_xml = False
 
bool picklable = False
 
Optional soup [BeautifulSoup]
 
Optional empty_element_tags = None
 
Dict cdata_list_attributes [str, Set[str]]
 
Set preserve_whitespace_tags [str]
 
Dict string_containers [str, Type[NavigableString]]
 
bool tracks_line_numbers
 
Dict DEFAULT_CDATA_LIST_ATTRIBUTES = defaultdict(set)
 
Set DEFAULT_PRESERVE_WHITESPACE_TAGS = set()
 
dict DEFAULT_STRING_CONTAINERS = {}
 
Optional DEFAULT_EMPTY_ELEMENT_TAGS = None
 
bool TRACKS_LINE_NUMBERS = False
 

Protected Member Functions

_AttributeValues _replace_cdata_list_attribute_values (self, str tag_name, _RawOrProcessedAttributeValues attrs)
 

Detailed Description

Turn a textual document into a Beautiful Soup object tree.

This is an abstract superclass which smooths out the behavior of
different parser libraries into a single, unified interface.

:param multi_valued_attributes: If this is set to None, the
 TreeBuilder will not turn any values for attributes like
 'class' into lists. Setting this to a dictionary will
 customize this behavior; look at :py:attr:`bs4.builder.HTMLTreeBuilder.DEFAULT_CDATA_LIST_ATTRIBUTES`
 for an example.

 Internally, these are called "CDATA list attributes", but that
 probably doesn't make sense to an end-user, so the argument name
 is ``multi_valued_attributes``.

:param preserve_whitespace_tags: A set of tags to treat
 the way <pre> tags are treated in HTML. Tags in this set
 are immune from pretty-printing; their contents will always be
 output as-is.

:param string_containers: A dictionary mapping tag names to
 the classes that should be instantiated to contain the textual
 contents of those tags. The default is to use NavigableString
 for every tag, no matter what the name. You can override the
 default by changing :py:attr:`DEFAULT_STRING_CONTAINERS`.

:param store_line_numbers: If the parser keeps track of the line
 numbers and positions of the original markup, that information
 will, by default, be stored in each corresponding
 :py:class:`bs4.element.Tag` object. You can turn this off by
 passing store_line_numbers=False; then Tag.sourcepos and
 Tag.sourceline will always be None. If the parser you're using
 doesn't keep track of this information, then store_line_numbers
 is irrelevant.

:param attribute_dict_class: The value of a multi-valued attribute
  (such as HTML's 'class') willl be stored in an instance of this
  class.  The default is Beautiful Soup's built-in
  `AttributeValueList`, which is a normal Python list, and you
  will probably never need to change it.

Constructor & Destructor Documentation

◆ __init__()

bs4.builder.TreeBuilder.__init__ (   self,
Dict[str, Set[str]]   multi_valued_attributes = USE_DEFAULT,
Set[str]   preserve_whitespace_tags = USE_DEFAULT,
bool   store_line_numbers = USE_DEFAULT,
Dict[str, Type[NavigableString]]   string_containers = USE_DEFAULT,
Set[str]   empty_element_tags = USE_DEFAULT,
Type[AttributeDict]   attribute_dict_class = AttributeDict,
Type[AttributeValueList]   attribute_value_list_class = AttributeValueList 
)

Member Function Documentation

◆ _replace_cdata_list_attribute_values()

_AttributeValues bs4.builder.TreeBuilder._replace_cdata_list_attribute_values (   self,
str  tag_name,
_RawOrProcessedAttributeValues   attrs 
)
protected
When an attribute value is associated with a tag that can
have multiple values for that attribute, convert the string
value to a list of strings.

Basically, replaces class="foo bar" with class=["foo", "bar"]

NOTE: This method modifies its input in place.

:param tag_name: The name of a tag.
:param attrs: A dictionary containing the tag's attributes.
   Any appropriate attribute values will be modified in place.
:return: The modified dictionary that was originally passed in.

◆ can_be_empty_element()

bool bs4.builder.TreeBuilder.can_be_empty_element (   self,
str  tag_name 
)
Might a tag with this name be an empty-element tag?

The final markup may or may not actually present this tag as
self-closing.

For instance: an HTMLBuilder does not consider a <p> tag to be
an empty-element tag (it's not in
HTMLBuilder.empty_element_tags). This means an empty <p> tag
will be presented as "<p></p>", not "<p/>" or "<p>".

The default implementation has no opinion about which tags are
empty-element tags, so a tag will be presented as an
empty-element tag if and only if it has no children.
"<foo></foo>" will become "<foo/>", and "<foo>bar</foo>" will
be left alone.

:param tag_name: The name of a markup tag.

◆ feed()

None bs4.builder.TreeBuilder.feed (   self,
_RawMarkup  markup 
)

◆ initialize_soup()

None bs4.builder.TreeBuilder.initialize_soup (   self,
BeautifulSoup  soup 
)
The BeautifulSoup object has been initialized and is now
being associated with the TreeBuilder.

:param soup: A BeautifulSoup object.

Reimplemented in bs4.builder._lxml.LXMLTreeBuilderForXML.

◆ prepare_markup()

Iterable[Tuple[_RawMarkup, Optional[_Encoding], Optional[_Encoding], bool]] bs4.builder.TreeBuilder.prepare_markup (   self,
_RawMarkup  markup,
Optional[_Encoding]   user_specified_encoding = None,
Optional[_Encoding]   document_declared_encoding = None,
Optional[_Encodings]   exclude_encodings = None 
)
Run any preliminary steps necessary to make incoming markup
acceptable to the parser.

:param markup: The markup that's about to be parsed.
:param user_specified_encoding: The user asked to try this encoding
   to convert the markup into a Unicode string.
:param document_declared_encoding: The markup itself claims to be
    in this encoding. NOTE: This argument is not used by the
    calling code and can probably be removed.
:param exclude_encodings: The user asked *not* to try any of
    these encodings.

:yield: A series of 4-tuples: (markup, encoding, declared encoding,
    has undergone character replacement)

    Each 4-tuple represents a strategy that the parser can try
    to convert the document to Unicode and parse it. Each
    strategy will be tried in turn.

 By default, the only strategy is to parse the markup
 as-is. See `LXMLTreeBuilderForXML` and
 `HTMLParserTreeBuilder` for implementations that take into
 account the quirks of particular parsers.

:meta private:

Reimplemented in bs4.builder._html5lib.HTML5TreeBuilder, bs4.builder._htmlparser.HTMLParserTreeBuilder, and bs4.builder._lxml.LXMLTreeBuilderForXML.

◆ reset()

None bs4.builder.TreeBuilder.reset (   self)
Do any work necessary to reset the underlying parser
for a new document.

By default, this does nothing.

◆ set_up_substitutions()

bool bs4.builder.TreeBuilder.set_up_substitutions (   self,
Tag  tag 
)
Set up any substitutions that will need to be performed on
a `Tag` when it's output as a string.

By default, this does nothing. See `HTMLTreeBuilder` for a
case where this is used.

:return: Whether or not a substitution was performed.
:meta private:

Reimplemented in bs4.builder.HTMLTreeBuilder.

◆ test_fragment_to_document()

str bs4.builder.TreeBuilder.test_fragment_to_document (   self,
str  fragment 
)
Wrap an HTML fragment to make it look like a document.

Different parsers do this differently. For instance, lxml
introduces an empty <head> tag, and html5lib
doesn't. Abstracting this away lets us write simple tests
which run HTML fragments through the parser and compare the
results against other HTML fragments.

This method should not be used outside of unit tests.

:param fragment: A fragment of HTML.
:return: A full HTML document.
:meta private:

Reimplemented in bs4.builder._html5lib.HTML5TreeBuilder, bs4.builder._lxml.LXMLTreeBuilderForXML, and bs4.builder._lxml.LXMLTreeBuilder.


The documentation for this class was generated from the following file: