term frequency inverse document frequency
information density frequencies in the document versus the whole corpus. if something comes across all documents, it may not be so unique.
TODO
- todo
- should look at the number of times the word undefined appears in the internet texts. prompting an LLM might be a nice way of doing this.
- bm25