Breaking text into sentences in Python using NLTK
To break a chunk of text into sentences in Python using NLTK, do the following.
Install NLTK:
pip install -U nltk
Download the punkt
submodule in a Python REPL:
import nltk
nltk.download("punkt")
nltk.download("punkt_tab")
And then break the text into sentences:
import nltk
with open("my-big-file.txt") as file:
raw = file.read()
sentences = nltk.tokenize.sent_tokenize(raw)