|
| | __init__ (self, Optional[str] language=None, Optional[_EntitySubstitutionFunction] entity_substitution=None, str void_element_close_prefix="/", Optional[Set[str]] cdata_containing_tags=None, bool empty_attributes_are_booleans=False, Union[int, str] indent=1) |
| |
| str | substitute (self, str ns) |
| |
| str | attribute_value (self, str value) |
| |
| Iterable[Tuple[str, Optional[_AttributeValue]]] | attributes (self, bs4.element.Tag # type:ignore tag) |
| |
| str | quoted_attribute_value (cls, str value) |
| |
| str | substitute_xml (cls, str value, bool make_quoted_attribute=False) |
| |
| str | substitute_xml_containing_entities (cls, str value, bool make_quoted_attribute=False) |
| |
| str | substitute_html (cls, str s) |
| |
| str | substitute_html5 (cls, str s) |
| |
| str | substitute_html5_raw (cls, str s) |
| |
|
|
str | HTML = "html" |
| |
|
str | XML = "xml" |
| |
| Dict | HTML_DEFAULTS |
| |
|
Optional | language [str] |
| |
|
Optional | entity_substitution [_EntitySubstitutionFunction] |
| |
|
str | void_element_close_prefix |
| |
|
Set | cdata_containing_tags [str] |
| |
|
str | indent |
| |
|
bool | empty_attributes_are_booleans |
| |
|
Dict | HTML_ENTITY_TO_CHARACTER [str, str] |
| |
|
Dict | CHARACTER_TO_HTML_ENTITY [str, str] |
| |
|
Pattern | CHARACTER_TO_HTML_ENTITY_RE [str] |
| |
|
Pattern | CHARACTER_TO_HTML_ENTITY_WITH_AMPERSAND_RE [str] |
| |
| dict | CHARACTER_TO_XML_ENTITY |
| |
|
| ANY_ENTITY_RE = re.compile("&(#\\d+|#x[0-9a-fA-F]+|\\w+);", re.I) |
| |
| Pattern | BARE_AMPERSAND_OR_BRACKET |
| |
|
Pattern | AMPERSAND_OR_BRACKET = re.compile("([<>&])") |
| |
Describes a strategy to use when outputting a parse tree to a string.
Some parts of this strategy come from the distinction between
HTML4, HTML5, and XML. Others are configurable by the user.
Formatters are passed in as the `formatter` argument to methods
like `bs4.element.Tag.encode`. Most people won't need to
think about formatters, and most people who need to think about
them can pass in one of these predefined strings as `formatter`
rather than making a new Formatter object:
For HTML documents:
* 'html' - HTML entity substitution for generic HTML documents. (default)
* 'html5' - HTML entity substitution for HTML5 documents, as
well as some optimizations in the way tags are rendered.
* 'html5-4.12.0' - The version of the 'html5' formatter used prior to
Beautiful Soup 4.13.0.
* 'minimal' - Only make the substitutions necessary to guarantee
valid HTML.
* None - Do not perform any substitution. This will be faster
but may result in invalid markup.
For XML documents:
* 'html' - Entity substitution for XHTML documents.
* 'minimal' - Only make the substitutions necessary to guarantee
valid XML. (default)
* None - Do not perform any substitution. This will be faster
but may result in invalid markup.
| bs4.formatter.Formatter.__init__ |
( |
|
self, |
|
|
Optional[str] |
language = None, |
|
|
Optional[_EntitySubstitutionFunction] |
entity_substitution = None, |
|
|
str |
void_element_close_prefix = "/", |
|
|
Optional[Set[str]] |
cdata_containing_tags = None, |
|
|
bool |
empty_attributes_are_booleans = False, |
|
|
Union[int,str] |
indent = 1 |
|
) |
| |
Constructor.
:param language: This should be `Formatter.XML` if you are formatting
XML markup and `Formatter.HTML` if you are formatting HTML markup.
:param entity_substitution: A function to call to replace special
characters with XML/HTML entities. For examples, see
bs4.dammit.EntitySubstitution.substitute_html and substitute_xml.
:param void_element_close_prefix: By default, void elements
are represented as <tag/> (XML rules) rather than <tag>
(HTML rules). To get <tag>, pass in the empty string.
:param cdata_containing_tags: The set of tags that are defined
as containing CDATA in this dialect. For example, in HTML,
<script> and <style> tags are defined as containing CDATA,
and their contents should not be formatted.
:param empty_attributes_are_booleans: If this is set to true,
then attributes whose values are sent to the empty string
will be treated as `HTML boolean
attributes<https://dev.w3.org/html5/spec-LC/common-microsyntaxes.html#boolean-attributes>`_. (Attributes
whose value is None are always rendered this way.)
:param indent: If indent is a non-negative integer or string,
then the contents of elements will be indented
appropriately when pretty-printing. An indent level of 0,
negative, or "" will only insert newlines. Using a
positive integer indent indents that many spaces per
level. If indent is a string (such as "\t"), that string
is used to indent each level. The default behavior is to
indent one space per level.
Reimplemented in bs4.formatter.HTMLFormatter, and bs4.formatter.XMLFormatter.