Qucs-S S-parameter Viewer & RF Synthesis Tools
Loading...
Searching...
No Matches
Public Member Functions | Static Public Member Functions | Public Attributes | Static Public Attributes | Protected Attributes | List of all members
pip._vendor.chardet.charsetprober.CharSetProber Class Reference
Inheritance diagram for pip._vendor.chardet.charsetprober.CharSetProber:
Inheritance graph
[legend]

Public Member Functions

None __init__ (self, LanguageFilter lang_filter=LanguageFilter.NONE)
 
None reset (self)
 
Optional[str] charset_name (self)
 
Optional[str] language (self)
 
ProbingState feed (self, Union[bytes, bytearray] byte_str)
 
ProbingState state (self)
 
float get_confidence (self)
 

Static Public Member Functions

bytes filter_high_byte_only (Union[bytes, bytearray] buf)
 
bytearray filter_international_words (Union[bytes, bytearray] buf)
 
bytes remove_xml_tags (Union[bytes, bytearray] buf)
 

Public Attributes

 active
 
 lang_filter
 
 logger
 

Static Public Attributes

float SHORTCUT_THRESHOLD = 0.95
 

Protected Attributes

 _state
 

Member Function Documentation

◆ filter_international_words()

bytearray pip._vendor.chardet.charsetprober.CharSetProber.filter_international_words ( Union[bytes, bytearray]  buf)
static
We define three types of bytes:
alphabet: english alphabets [a-zA-Z]
international: international characters [\x80-\xFF]
marker: everything else [^a-zA-Z\x80-\xFF]
The input buffer can be thought to contain a series of words delimited
by markers. This function works to filter all words that contain at
least one international character. All contiguous sequences of markers
are replaced by a single space ascii character.
This filter applies to all scripts which do not use English characters.

◆ remove_xml_tags()

bytes pip._vendor.chardet.charsetprober.CharSetProber.remove_xml_tags ( Union[bytes, bytearray]  buf)
static
Returns a copy of ``buf`` that retains only the sequences of English
alphabet and high byte characters that are not between <> characters.
This filter can be applied to all scripts which contain both English
characters and extended ASCII characters, but is currently only used by
``Latin1Prober``.

The documentation for this class was generated from the following file: