Load
Loader
Class for loading data.
Methods
from_txt: Reads texts from a text file.
from_xml: Reads texts from an XML file.
forward: Filters the texts based on the target words and the maximum number of documents.
Source code in semantics/data/data_loader.py
9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 |
|
__init__(texts)
Parameters:
Name | Type | Description | Default |
---|---|---|---|
texts |
List[str]
|
List of texts. |
required |
Attributes:
Name | Type | Description |
---|---|---|
texts |
List[str]
|
List of texts. |
forward(target_words=None, max_documents=None, shuffle=True, random_seed=None)
Filters the texts based on the target words and the maximum number of documents.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
target_words |
(List[str], str, None)
|
List of target words. Defaults to None. |
None
|
max_documents |
(int, None)
|
Maximum number of documents. Defaults to None. |
None
|
shuffle |
bool
|
Whether to shuffle the data. Defaults to True. |
True
|
random_seed |
(int, None)
|
Random seed. Defaults to None. |
None
|
Returns:
Name | Type | Description |
---|---|---|
texts |
List[str]
|
List of texts. |
Examples:
>>> from semantics.data.data_loader import Loader
>>> texts = ['This is a test.', 'This is another test.', 'This is a third test.']
>>> print('Original texts: ', texts)
>>> print('Filtered texts: ', Loader(texts).forward(target_words=['third'], max_documents=1, shuffle=False))
Original texts: ['This is a test.', 'This is another test.', 'This is a third test.
Filtered texts: ['This is a third test.']
Source code in semantics/data/data_loader.py
from_txt(path)
classmethod
Class method to read texts from a text file.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
path |
Union[str, Path]
|
Path to the text file. |
required |
from_xml(path, tag)
classmethod
Class method to read texts from an XML file.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
path |
Union[str, Path]
|
Path to the XML file. |
required |
tag |
str
|
Tag of the XML file to extract the texts from. |
required |
Source code in semantics/data/data_loader.py
split_xml(path, output_dir, max_children=1000)
Splits an XML file into multiple XML files with a maximum number of children.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
path |
str
|
Path to the XML file. |
required |
output_dir |
str
|
Path to the output directory. |
required |
max_children |
int
|
Maximum number of children. Defaults to 1000. |
1000
|
Returns:
Name | Type | Description |
---|---|---|
paths |
List[str]
|
List of paths to the new XML files. |