The Martin Heidegger Corpora

Texts for Natural Language Processing

Just the text reduced to sentences. Stripped of footnotes, page headers and footers, translators' and editors' forwards and afterwords, glossaries, indexes, Table of Contents, section titles, and other extraneous text.

I wrote some tools for creating the Heidegger corpora in Python. The tools create the copora files below from the latest pages on Removing footnotes remains a manual process. The Python code is on GitHub. The code may be of interest to anyone using the Natural Language Toolkit (NLTK) with Heidegger texts. Let me know.

The texts below are hosted on the Voyant text analysis web site.

[2022/1/8] The links below are broken.
Voyant changed their web site, and didn't migrate existing corpora.
I'll have to resubmit the texts to Voyant and generate new links when I have time.

[2022/1/15] Internal write.lock error messages resubmitting corpora URLs.


Send comments to info at

Ereignis .

Created 2021/12/20
Last updated 2022/1/15