Qucs-S S-parameter Viewer & RF Synthesis Tools
Loading...
Searching...
No Matches
Public Member Functions | Public Attributes | Static Public Attributes | Protected Member Functions | List of all members
bs4.dammit.EncodingDetector Class Reference
Collaboration diagram for bs4.dammit.EncodingDetector:
Collaboration graph
[legend]

Public Member Functions

 __init__ (self, bytes markup, Optional[_Encodings] known_definite_encodings=None, Optional[bool] is_html=False, Optional[_Encodings] exclude_encodings=None, Optional[_Encodings] user_encodings=None, Optional[_Encodings] override_encodings=None)
 
Iterator[_Encoding] encodings (self)
 
Tuple[bytes, Optional[_Encoding]] strip_byte_order_mark (cls, bytes data)
 
Optional[_Encoding] find_declared_encoding (cls, Union[bytes, str] markup, bool is_html=False, bool search_entire_document=False)
 

Public Attributes

 known_definite_encodings
 
 user_encodings
 
 exclude_encodings
 
 chardet_encoding
 
 is_html
 
 markup
 
 sniffed_encoding
 
 declared_encoding
 

Static Public Attributes

_Encodings known_definite_encodings
 
_Encodings user_encodings
 
_Encodings exclude_encodings
 
Optional chardet_encoding [_Encoding]
 
bool is_html
 
Optional declared_encoding [_Encoding]
 
bytes markup
 
Optional sniffed_encoding [_Encoding]
 

Protected Member Functions

bool _usable (self, Optional[_Encoding] encoding, Set[_Encoding] tried)
 

Detailed Description

This class is capable of guessing a number of possible encodings
for a bytestring.

Order of precedence:

1. Encodings you specifically tell EncodingDetector to try first
   (the ``known_definite_encodings`` argument to the constructor).

2. An encoding determined by sniffing the document's byte-order mark.

3. Encodings you specifically tell EncodingDetector to try if
   byte-order mark sniffing fails (the ``user_encodings`` argument to the
   constructor).

4. An encoding declared within the bytestring itself, either in an
   XML declaration (if the bytestring is to be interpreted as an XML
   document), or in a <meta> tag (if the bytestring is to be
   interpreted as an HTML document.)

5. An encoding detected through textual analysis by chardet,
   cchardet, or a similar external library.

6. UTF-8.

7. Windows-1252.

:param markup: Some markup in an unknown encoding.

:param known_definite_encodings: When determining the encoding
    of ``markup``, these encodings will be tried first, in
    order. In HTML terms, this corresponds to the "known
    definite encoding" step defined in `section 13.2.3.1 of the HTML standard <https://html.spec.whatwg.org/multipage/parsing.html#parsing-with-a-known-character-encoding>`_.

:param user_encodings: These encodings will be tried after the
    ``known_definite_encodings`` have been tried and failed, and
    after an attempt to sniff the encoding by looking at a
    byte order mark has failed. In HTML terms, this
    corresponds to the step "user has explicitly instructed
    the user agent to override the document's character
    encoding", defined in `section 13.2.3.2 of the HTML standard <https://html.spec.whatwg.org/multipage/parsing.html#determining-the-character-encoding>`_.

:param override_encodings: A **deprecated** alias for
    ``known_definite_encodings``. Any encodings here will be tried
    immediately after the encodings in
    ``known_definite_encodings``.

:param is_html: If True, this markup is considered to be
    HTML. Otherwise it's assumed to be XML.

:param exclude_encodings: These encodings will not be tried,
    even if they otherwise would be.

Member Function Documentation

◆ _usable()

bool bs4.dammit.EncodingDetector._usable (   self,
Optional[_Encoding]  encoding,
Set[_Encoding]  tried 
)
protected
Should we even bother to try this encoding?

:param encoding: Name of an encoding.
:param tried: Encodings that have already been tried. This
    will be modified as a side effect.

◆ encodings()

Iterator[_Encoding] bs4.dammit.EncodingDetector.encodings (   self)
Yield a number of encodings that might work for this markup.

:yield: A sequence of strings. Each is the name of an encoding
   that *might* work to convert a bytestring into Unicode.

◆ find_declared_encoding()

Optional[_Encoding] bs4.dammit.EncodingDetector.find_declared_encoding (   cls,
Union[bytes, str]  markup,
bool   is_html = False,
bool   search_entire_document = False 
)
Given a document, tries to find an encoding declared within the
text of the document itself.

An XML encoding is declared at the beginning of the document.

An HTML encoding is declared in a <meta> tag, hopefully near the
beginning of the document.

:param markup: Some markup.
:param is_html: If True, this markup is considered to be HTML. Otherwise
    it's assumed to be XML.
:param search_entire_document: Since an encoding is supposed
    to declared near the beginning of the document, most of
    the time it's only necessary to search a few kilobytes of
    data.  Set this to True to force this method to search the
    entire document.
:return: The declared encoding, if one is found.

◆ strip_byte_order_mark()

Tuple[bytes, Optional[_Encoding]] bs4.dammit.EncodingDetector.strip_byte_order_mark (   cls,
bytes  data 
)
If a byte-order mark is present, strip it and return the encoding it implies.

:param data: A bytestring that may or may not begin with a
   byte-order mark.

:return: A 2-tuple (data stripped of byte-order mark, encoding implied by byte-order mark)

The documentation for this class was generated from the following file: