Preprocess
PREPROCESS
This class is used to preprocess text data.
Methods
forward: Preprocesses text data.
Source code in semantics/data/data_preprocessing.py
forward(text, remove_punctuation=True, remove_numbers=True, lowercase=True, lemmatize=True, remove_stopwords=True)
This function preprocesses text data.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
text |
str
|
Text to be preprocessed. |
required |
remove_punctuation |
bool
|
Whether to remove punctuation. Defaults to True. |
True
|
remove_numbers |
bool
|
Whether to remove numbers. Defaults to True. |
True
|
lowercase |
bool
|
Whether to lowercase. Defaults to True. |
True
|
lemmatize |
bool
|
Whether to lemmatize. Defaults to True. |
True
|
remove_stopwords |
bool
|
Whether to remove stopwords. Defaults to True. |
True
|
Returns:
Name | Type | Description |
---|---|---|
newtext |
str
|
Preprocessed text. |
Examples:
>>> from semantics.data.data_preprocessing import PREPROCESS
>>> text = 'This is a test. 1234'
>>> print('Original text: ', text)
>>> print('Preprocessed text: ', PREPROCESS().forward(text, remove_punctuation=True, remove_numbers=True, lowercase=True, lemmatize=True, remove_stopwords=True))
Original text: This is a test. 1234
Preprocessed text: test