Thematic Clustering of Proverbs

Sebastian Mendoza December 2025Experiment

The Book of Proverbs is packed full of wisdom. If you've given it a read, you'll notice it's not a page turner in the typical sense. There are some distinct sections such as an Introduction, Proverbs by Solomon, the Sayings of the Wise, and more, but themes and teachings can change verse by verse or chapter by chapter.

Inspired by a graphical representation of related Wikipedia topics, I wondered: what neighbourhoods exist within Proverbs?

I had just finished a few projects on clustering techniques which spurred enough confidence to give it a shot. My goal was to find a visual representation of themes in Proverbs.

What does Proverbs look like?

Proverbs is a book from the Bible's wisdom literature and its authorship is generally attributed to King Solomon. A proverb is a short, sometimes formulaic, saying that conveys some truth through experience or common sense. This book contains a collection of these sayings in a not-so-obvious order. For example, following a common pattern of introducing a character's actions and then contrasting with the antithesis, chapter 12, verse 19 goes:

"Truthful lips endure forever, but a lying tongue is but for a moment"

and soon after in verse 24 there is a sharp topic switch,

"The hand of the diligent will rule, while the slothful will be put to forced labor"

This might work better in Hebrew, but in English, this digressive pattern can make it feel a little choppy.

About the data

The translation used for this book is the English Standard Version (ESV). This translation is easy to find online, considered more word-for-word, and required little text cleaning. Proverbs makes up 915 verses.

Other than stray whitespace, I didn't end up removing punctuation since I would be using transformer models which can reasonably understand punctuation. There are a few duplicates in Proverbs, however they were intentionally left in the dataset since a generalized clustering algorithm should group these verses in the same cluster.

Embeddings

The next step was to turn numbers into words. We do this because machine learning models don't understand words like humans do. They need words to be transformed into long lists of numbers. In this case, each verse was turned into a list of 384 numbers (or dimensions).

Embedding Example

I was concerned if the brevity of verses (aka. less context) would impact the accuracy of the output of an embedding model. For example: a 7 word sentence with a few key words is more difficult to put in a "category" than 6 sentences talking about one topic.

I presumed a general pretrained text embedding model would be trained on paragraphs, which allow many more characters to express meaning. I ended up using the open source embedding model, sentence-transformers/all-MiniLM-L6-v2, because it was:

Free
Accessible in SentenceTransformers
Trained specifically on general knowledge short sentences

What We Can Do With Embeddings?

We can make a similarity matrix that highlight (in yellow) the similarity between verses.

We can see that there are "squares" of yellow verses that highlight highly semantically similar verses. The yellow diagonal line has a similarity of 1, and it brightly yellow, because it's a line of identical verses. Sections with a green hue indicate that the verses are somewhat related. Meanwhile, dark blue sections indicate the a pair of verses that are not sematically related.

With different clustering techniques, we can use these embeddings to measure similarity and group communities of verses.

Searching For a Good Visualization

Community Graphs

The video that inspired me to do this project used the Leiden Algorithm to sort a network of nodes (articles) and edges (measure of similarity) into clusters. This algorithm essentially finds groups of nodes that are more densely packed together than to other nodes. This forms a community. I constructed the following graph by computing cosine similiarity between each verse and connecting verses with cosine similarity scores over a certain threshold. Luckily, there are Python libraries that implement this algorithm so the computation was pretty easy. The algorithm identified the following communities within Proverbs:

Communities

Loading graph visualization...

This graph is fun to explore. Community 1 seems to be focused on the Wise vs. the Fool and how each character behaves. Community 12 is much smaller and features one of my favourite charcters: the sluggard.

When tuning this algorithms' parameters, I noticed there were consistently disproportionately large commnities such as Community 1 and 2. As well, it only ran on the largest connected network and excluded other networks, leaving out about 150 verses. This is a fun visualization, but let's move on to other ways of visualizing Proverbs' themes.

Hierarchical Clustering

A hierarchical graph is like a family tree. This graph is called a dendrogram (Greek: dendron (tree) + gramma (drawing/writing)). Using the same embeddings, I took a bottom-up approach, first grouping together the most similar pairs, and continually linking sub-groups that are similar (or rather the least different) in theme. This is called Ward's method. Luckily, Scipy has libararies for this algorithm, so it was really just plug-and-play to find a good visualization with enough sub-groups.

How to use the graph: Zoom and pan to explore. Each colored branch represents a cluster of thematically similar verses. The specificity of these themes can be adjusted with it's "Height". This can be adjusted with the slider. Click on any branch node to collapse or expand it. Hover over leaf nodes to see the verse text.

Hierarchical Clustering

Clusters:45

Cut Height:2.0

Click verse nodes below to compare clusters

Loading dendrogram visualization...

I found this visualization and algorithm helpful for finding smaller subgroups within Proverbs. For example, there are subgroups with 5-8 verses each talking about scorners, rain, birds, and patience, respectively. However, when you look at subgroups of 30+ verses, it can be harder to see the thematic connection. There are exceptions. For example, I found a major subgroup of 50+ verses with references to nature (plants, weather, animals, etc.). These are interesting and maybe with further investigation I could categorize some medium-sized subgroups. But for now, let's move on.

Kmeans Clustering

The standard and easiest method of clustering I knew was Kmeans clustering. With this algorithm, the number of clusters is predetermined and the algorithm assigns each verse to a cluster to optimize for dense clusters.

I tried 7 through 40 clusters and recorded metrics for each ones. These metrics included silhouette scores which measure how good the clusters are. A low score of -1 implies the cluster is noisy and contains verses of opposite meaning while a score of +1 would indicate the verses in the cluster are near identical. Most clusters had a score of between 0.1 and 0.4. This can be explained by:

verses being relatively short, not giving much room for high variance
each verse in Proverbs fits into a niche subgenre of biblical proverbial text. Compare this to a novel which might talk about dozens of characters, scenes, hopes, fears, etc.

I chose a value of k=20, making 20 communities within Proverbs. This parameter gave one of the highest silhouette score and it was a nice round number. After inspection, the 20 clusters were general had consistent topics. Below is a graph of size distrubtions of each community. They seem to be roughly the same size with natural outliers remaining. Cluster Size Distribution for
k=20

Titling Clusters

After running the kmeans algorithm, I had 20 groups of verses. I titled each group using its respective verses. Here are the methods I used and their results:

Most frequent words

Extract key words from each cluster

Res:Generally incoherent sentences

KeyBert key word extraction

Automatic keyword extraction

Res:Not specific enough

Llama 3 with Ollama

Open Source LLM

Res:Coherent, but often vague and took WAY TOO long to query

Claude Sonnet 4.5

Foundational LLM

Res:Quick, coherent, and easy to set up

Method	Description	Result
Most frequent words	Extract key words from each cluster	Generally incoherent sentences
KeyBert key word extraction	Automatic keyword extraction	Not specific enough
Llama 3 with Ollama	Open Source LLM	Coherent, but often vague and took WAY TOO long to query
Claude Sonnet 4.5	Foundational LLM	Quick, coherent, and easy to set up

For a quick sanity check, let's see how each cluster's Claude title compares to the most frequent lexical words found in that cluster.

Laziness leads to ruin

words:sluggard

Pursuing wisdom and understanding

words:wisdom, knowledge, wise

Righteousness blessed, wickedness cursed

words:righteous

Choosing righteous paths wisely

words:way, walk

Wisdom versus foolish women

words:her, she

Words' power and consequences

words:lips, tongue, mouth, words

Family relationships require wisdom

words:son

Wisdom versus foolish folly

words:fool, man

Father's instruction to sons

words:son, keep, heart

Avoid strife, choose wisdom

words:man, evil, strife, anger

Kingship, Righteousness, Divine Authority

words:king, kings, ruler

Temptation's deceptive temporary pleasures

words:eat, bread

Righteous contrasted with wicked

words:wicked, righteous, evil

Divine order in creation

words:little, earth, unequal, weights

Deceit destroys, truth protects

words:witness, false

Physical consequences of wisdom

words:heart, life

Wisdom versus folly contrasted

words:wise, fool, fools, gold

Wise living requires discernment

words:neighbor, house

Wealth, poverty, and diligence

words:poor, wealth, poverty, rich

Lord knows human hearts

words:lord, heart, fear, ways, eyes

Cluster	Claude Title	Frequent Words
0	Laziness leads to ruin	sluggard
1	Pursuing wisdom and understanding	wisdom, knowledge, wise
2	Righteousness blessed, wickedness cursed	righteous
3	Choosing righteous paths wisely	way, walk
4	Wisdom versus foolish women	her, she
5	Words' power and consequences	lips, tongue, mouth, words
6	Family relationships require wisdom	son
7	Wisdom versus foolish folly	fool, man
8	Father's instruction to sons	son, keep, heart
9	Avoid strife, choose wisdom	man, evil, strife, anger
10	Kingship, Righteousness, Divine Authority	king, kings, ruler
11	Temptation's deceptive temporary pleasures	eat, bread
12	Righteous contrasted with wicked	wicked, righteous, evil
13	Divine order in creation	little, earth, unequal, weights
14	Deceit destroys, truth protects	witness, false
15	Physical consequences of wisdom	heart, life
16	Wisdom versus folly contrasted	wise, fool, fools, gold
17	Wise living requires discernment	neighbor, house
18	Wealth, poverty, and diligence	poor, wealth, poverty, rich
19	Lord knows human hearts	lord, heart, fear, ways, eyes

Next, I wanted to inspect them with a 3D visualization. I explored UMAP, which projects the 384-dimensional embeddings into 3 dimensions while trying to preserve local neighborhood structure.

Here’s the resulting 3D UMAP scatter plot:

KMeans clusters (k=20)

Loading UMAP visualization...

Cluster Titles

Cluster 13Righteous contrasted with wicked85 verses

Proverbs 2:14

who rejoice in doing evil and delight in the perverseness of evil,

Proverbs 2:22

but the wicked will be cut off from the land, and the treacherous will be rooted out of it.

Proverbs 3:25

Do not be afraid of sudden terror or of the ruin of the wicked, when it comes,

Proverbs 4:17

For they eat the bread of wickedness and drink the wine of violence.

Proverbs 4:18

But the path of the righteous is like the light of dawn, which shines brighter and brighter until full day.

Proverbs 4:19

The way of the wicked is like deep darkness; they do not know over what they stumble.

Proverbs 5:22

The iniquities of the wicked ensnare him, and he is held fast in the cords of his sin.

Proverbs 21:3

To do righteousness and justice is more acceptable to the Lord than sacrifice.

Proverbs 21:4

Haughty eyes and a proud heart, the lamp of the wicked, are sin.

Proverbs 21:7

The violence of the wicked will sweep them away, because they refuse to do what is just.

Proverbs 21:12

The Righteous One observes the house of the wicked; he throws the wicked down to ruin.

Proverbs 21:15

When justice is done, it is a joy to the righteous but terror to evildoers.

Proverbs 21:18

The wicked is a ransom for the righteous, and the traitor for the upright.

Proverbs 21:27

The sacrifice of the wicked is an abomination; how much more when he brings it with evil intent.

Proverbs 19:29

Condemnation is ready for scoffers, and beating for the backs of fools.

Proverbs 24:9

The devising of folly is sin, and the scoffer is an abomination to mankind.

Proverbs 24:16

for the righteous falls seven times and rises again, but the wicked stumble in times of calamity.

Proverbs 24:19

Fret not yourself because of evildoers, and be not envious of the wicked,

Proverbs 24:20

for the evil man has no future; the lamp of the wicked will be put out.

Proverbs 24:24

Whoever says to the wicked, “You are in the right,” will be cursed by peoples, abhorred by nations,

Proverbs 24:25

but those who rebuke the wicked will have delight, and a good blessing will come upon them.

Proverbs 18:3

When wickedness comes, contempt comes also, and with dishonor comes disgrace.

Proverbs 18:5

It is not good to be partial to the wicked or to deprive the righteous of justice.

Proverbs 15:6

In the house of the righteous there is much treasure, but trouble befalls the income of the wicked.

Proverbs 15:8

The sacrifice of the wicked is an abomination to the Lord, but the prayer of the upright is acceptable to him.

Proverbs 15:9

The way of the wicked is an abomination to the Lord, but he loves him who pursues righteousness.

Proverbs 15:26

The thoughts of the wicked are an abomination to the Lord, but gracious words are pure.

Proverbs 15:28

The heart of the righteous ponders how to answer, but the mouth of the wicked pours out evil things.

Proverbs 15:29

The Lord is far from the wicked, but he hears the prayer of the righteous.

Proverbs 29:2

When the righteous increase, the people rejoice, but when the wicked rule, the people groan.

Proverbs 29:16

When the wicked increase, transgression increases, but the righteous will look upon their downfall.

Proverbs 29:27

An unjust man is an abomination to the righteous, but one whose way is straight is an abomination to the wicked.

Proverbs 28:1

The wicked flee when no one pursues, but the righteous are bold as a lion.

Proverbs 28:4

Those who forsake the law praise the wicked, but those who keep the law strive against them.

Proverbs 28:5

Evil men do not understand justice, but those who seek the Lord understand it completely.

Proverbs 28:12

When the righteous triumph, there is great glory, but when the wicked rise, people hide themselves.

Proverbs 28:28

When the wicked rise, people hide themselves, but when they perish, the righteous increase.

Proverbs 14:11

The house of the wicked will be destroyed, but the tent of the upright will flourish.

Proverbs 14:19

The evil bow down before the good, the wicked at the gates of the righteous.

Proverbs 14:22

Do they not go astray who devise evil? Those who devise good meet steadfast love and faithfulness.

Proverbs 14:32

The wicked is overthrown through his evildoing, but the righteous finds refuge in his death.

Proverbs 14:34

Righteousness exalts a nation, but sin is a reproach to any people.

Proverbs 16:4

The Lord has made everything for its purpose, even the wicked for the day of trouble.

Proverbs 16:8

Better is a little with righteousness than great revenues with injustice.

Proverbs 17:15

He who justifies the wicked and he who condemns the righteous are both alike an abomination to the Lord.

Proverbs 17:23

The wicked accepts a bribe in secret to pervert the ways of justice.

Proverbs 17:26

To impose a fine on a righteous man is not good, nor to strike the noble for their uprightness.

Proverbs 13:5

The righteous hates falsehood, but the wicked brings shame and disgrace.

Proverbs 13:6

Righteousness guards him whose way is blameless, but sin overthrows the wicked.

Proverbs 13:9

The light of the righteous rejoices, but the lamp of the wicked will be put out.

Proverbs 13:19

A desire fulfilled is sweet to the soul, but to turn away from evil is an abomination to fools.

Proverbs 13:25

The righteous has enough to satisfy his appetite, but the belly of the wicked suffers want.

Proverbs 12:3

No one is established by wickedness, but the root of the righteous will never be moved.

Proverbs 12:5

The thoughts of the righteous are just; the counsels of the wicked are deceitful.

Proverbs 12:6

The words of the wicked lie in wait for blood, but the mouth of the upright delivers them.

Proverbs 12:7

The wicked are overthrown and are no more, but the house of the righteous will stand.

Proverbs 12:10

Whoever is righteous has regard for the life of his beast, but the mercy of the wicked is cruel.

Proverbs 12:12

Whoever is wicked covets the spoil of evildoers, but the root of the righteous bears fruit.

Proverbs 12:21

No ill befalls the righteous, but the wicked are filled with trouble.

Proverbs 12:26

One who is righteous is a guide to his neighbor, but the way of the wicked leads them astray.

Proverbs 10:2

Treasures gained by wickedness do not profit, but righteousness delivers from death.

Proverbs 10:3

The Lord does not let the righteous go hungry, but he thwarts the craving of the wicked.

Proverbs 10:6

Blessings are on the head of the righteous, but the mouth of the wicked conceals violence.

Proverbs 10:7

The memory of the righteous is a blessing, but the name of the wicked will rot.

Proverbs 10:11

The mouth of the righteous is a fountain of life, but the mouth of the wicked conceals violence.

Proverbs 10:16

The wage of the righteous leads to life, the gain of the wicked to sin.

Proverbs 10:20

The tongue of the righteous is choice silver; the heart of the wicked is of little worth.

Proverbs 10:24

What the wicked dreads will come upon him, but the desire of the righteous will be granted.

Proverbs 10:25

When the tempest passes, the wicked is no more, but the righteous is established forever.

Proverbs 10:27

The fear of the Lord prolongs life, but the years of the wicked will be short.

Proverbs 10:28

The hope of the righteous brings joy, but the expectation of the wicked will perish.

Proverbs 10:30

The righteous will never be removed, but the wicked will not dwell in the land.

Proverbs 10:32

The lips of the righteous know what is acceptable, but the mouth of the wicked, what is perverse.

Proverbs 11:5

The righteousness of the blameless keeps his way straight, but the wicked falls by his own wickedness.

Proverbs 11:6

The righteousness of the upright delivers them, but the treacherous are taken captive by their lust.

Proverbs 11:8

The righteous is delivered from trouble, and the wicked walks into it instead.

Proverbs 11:10

When it goes well with the righteous, the city rejoices, and when the wicked perish there are shouts of gladness.

Proverbs 11:11

By the blessing of the upright a city is exalted, but by the mouth of the wicked it is overthrown.

Proverbs 11:18

The wicked earns deceptive wages, but one who sows righteousness gets a sure reward.

Proverbs 11:19

Whoever is steadfast in righteousness will live, but he who pursues evil will die.

Proverbs 11:21

Be assured, an evil person will not go unpunished, but the offspring of the righteous will be delivered.

Proverbs 11:23

The desire of the righteous ends only in good, the expectation of the wicked in wrath.

Proverbs 11:31

If the righteous is repaid on earth, how much more the wicked and the sinner!

Proverbs 8:7

for my mouth will utter truth; wickedness is an abomination to my lips.

Proverbs 8:8

All the words of my mouth are righteous; there is nothing twisted or crooked in them.

Loading cluster explorer...

Improvements

I wanted to try this with different English translations of the Bible. However, this was a weekend project and I didn't find the time to do so.
Near the center of Proverbs, roughly from chapter 10-29, most verses are in the proverb format compared to general poetical reflections on wisdom. Only clustering within this subset might have produced better visual results.
Comparing the clustering alogrithms using different embedding models.
Using Hebrew version of Proverbs. Hebrew embedding models might be able to pick up intricacies lost in translation.

Limitations

This was a weekend project and the core goal was to learn some new things. If anyone reading this is inspired to dig deeper into this dataset, please feel free to reach out @sebmen7

back to home