|
|
None | __init__ (self, LanguageFilter lang_filter=LanguageFilter.NONE) |
| |
|
None | reset (self) |
| |
|
Optional[str] | charset_name (self) |
| |
|
Optional[str] | language (self) |
| |
|
ProbingState | feed (self, Union[bytes, bytearray] byte_str) |
| |
|
ProbingState | state (self) |
| |
|
float | get_confidence (self) |
| |
|
|
| active |
| |
|
| lang_filter |
| |
|
| logger |
| |
|
|
float | SHORTCUT_THRESHOLD = 0.95 |
| |
◆ filter_international_words()
| bytearray pip._vendor.chardet.charsetprober.CharSetProber.filter_international_words |
( |
Union[bytes, bytearray] |
buf | ) |
|
|
static |
We define three types of bytes:
alphabet: english alphabets [a-zA-Z]
international: international characters [\x80-\xFF]
marker: everything else [^a-zA-Z\x80-\xFF]
The input buffer can be thought to contain a series of words delimited
by markers. This function works to filter all words that contain at
least one international character. All contiguous sequences of markers
are replaced by a single space ascii character.
This filter applies to all scripts which do not use English characters.
◆ remove_xml_tags()
| bytes pip._vendor.chardet.charsetprober.CharSetProber.remove_xml_tags |
( |
Union[bytes, bytearray] |
buf | ) |
|
|
static |
Returns a copy of ``buf`` that retains only the sequences of English
alphabet and high byte characters that are not between <> characters.
This filter can be applied to all scripts which contain both English
characters and extended ASCII characters, but is currently only used by
``Latin1Prober``.
The documentation for this class was generated from the following file:
- docs/help/help-venv/lib/python3.12/site-packages/pip/_vendor/chardet/charsetprober.py