Under the hood

cue.language (since version-2.0.0 compiled into WordCram.jar)

Created by Jonathan Feinberg

cue.language is a small library of Java code and resources that provides the following basic natural-language processing capabilities:
- Tokenizing natural language text into individual words
- Tokenizing natural language text into sentences
- Tokenizing natural language text into n-grams (sequences of 2 or more words that appear next to each other in a sentence)
- Counting strings
- Detecting which script (alphabet, writing system) is required to represent a text
- Guessing what language a text is in
- Customizable “stop word” detection for a variety of languages
WordCram.jar

Created by Dan Bernier WordCram lets you generate word clouds in Processing. It does the heavy lifting – text analysis, collision detection – for you, so you can focus on making your word clouds as beautiful, as revealing, or as silly as you like. Since version 2.0.0 the java code has been updated from java-1.5 to java-1.8, and the java code from the cue.language has aloso been included.
jsoup-1.10.2.jar (since version 2.0.0)

jsoup is a Java library for working with real-world HTML. It provides a very convenient API for extracting and manipulating data, using the best of DOM, CSS, and jquery-like methods.

The build depends on polyglot maven (access to processing core.jar, jsoup.jar is pulled from maven central)