3 Types Of ChatterBot’s Preprocessors - Techno informations
Coding

3 Types Of ChatterBot’s Preprocessors

ChatterBot’s preprocessors are simple functions that modify the input statement that a chat bot receives before the statement gets processed by the logic adaper.

Here is an example of how to set preprocessors. The preprocessors parameter should be a list of strings of the import paths to your preprocessors.

chatbot = ChatBot(
    'Bob the Bot',
    preprocessors=[
        'chatterbot.preprocessors.clean_whitespace'
    ]
)

Preprocessor functions

ChatterBot comes with several built-in preprocessors.

Remove any consecutive whitespace characters from the statement text.

def clean_whitespace(chatbot, statement):
    """
    Remove any consecutive whitespace characters from the statement text.
    """
    import re

    # Replace linebreaks and tabs with spaces
    statement.text = statement.text.replace('\n', ' ').replace('\r', ' ').replace('\t', ' ')

    # Remove any leeding or trailing whitespace
    statement.text = statement.text.strip()

    # Remove consecutive spaces
    statement.text = re.sub(' +', ' ', statement.text)

    return statement

Convert escaped html characters into unescaped html characters. For example: “&lt;b&gt;” becomes “<b>”.

def unescape_html(chatbot, statement):
    """
    Convert escaped html characters into unescaped html characters.
    For example: "&lt;b&gt;" becomes "<b>".
    """
    import sys

    # Replace HTML escape characters
    if sys.version_info[0] < 3:
        from HTMLParser import HTMLParser
        html = HTMLParser()
    else:
        import html

    statement.text = html.unescape(statement.text)

    return statement

Converts unicode characters to ASCII character equivalents. For example: “på fédéral” becomes “pa federal”.

def convert_to_ascii(chatbot, statement):
    """
    Converts unicode characters to ASCII character equivalents.
    For example: "på fédéral" becomes "pa federal".
    """
    import unicodedata
    import sys

    # Normalize unicode characters
    if sys.version_info[0] < 3:
        statement.text = unicode(statement.text) # NOQA

    text = unicodedata.normalize('NFKD', statement.text)
    text = text.encode('ascii', 'ignore').decode('utf-8')

    statement.text = str(text)
    return statement

Creating new preprocessors

It is simple to create your own preprocessors. A preprocessor is just a function with a few requirements.

  1. It must take two parameters, the first is a ChatBot instance, the second is a Statement instance.
  2. It must return a statement instance.

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button