The Book of Proverbs is packed full of wisdom. If you've given it a read, you'll notice it's not a page turner in the typical sense. There are some distinct sections such as an Introduction, Proverbs by Solomon, the Sayings of the Wise, and more, but themes and teachings can change verse by verse or chapter by chapter.
Inspired by a graphical representation of related Wikipedia topics, I wondered: what neighbourhoods exist within Proverbs?
I had just finished a few projects on clustering techniques which spurred enough confidence to give it a shot. My goal was to find a visual representation of themes in Proverbs.
Proverbs is a book from the Bible's wisdom literature and its authorship is generally attributed to King Solomon. A proverb is a short, sometimes formulaic, saying that conveys some truth through experience or common sense. This book contains a collection of these sayings in a not-so-obvious order. For example, following a common pattern of introducing a character's actions and then contrasting with the antithesis, chapter 12, verse 19 goes:
"Truthful lips endure forever, but a lying tongue is but for a moment"
and soon after in verse 24 there is a sharp topic switch,
"The hand of the diligent will rule, while the slothful will be put to forced labor"
This might work better in Hebrew, but in English, this digressive pattern can make it feel a little choppy.
The translation used for this book is the English Standard Version (ESV). This translation is easy to find online, considered more word-for-word, and required little text cleaning. Proverbs makes up 915 verses.
Other than stray whitespace, I didn't end up removing punctuation since I would be using transformer models which can reasonably understand punctuation. There are a few duplicates in Proverbs, however they were intentionally left in the dataset since a generalized clustering algorithm should group these verses in the same cluster.
The next step was to turn numbers into words. We do this because machine learning models don't understand words like humans do. They need words to be transformed into long lists of numbers. In this case, each verse was turned into a list of 384 numbers (or dimensions).

I was concerned if the brevity of verses (aka. less context) would impact the accuracy of the output of an embedding model. For example: a 7 word sentence with a few key words is more difficult to put in a "category" than 6 sentences talking about one topic.
I presumed a general pretrained text embedding model would be trained on paragraphs, which allow many more characters to express meaning. I ended up using the open source embedding model, sentence-transformers/all-MiniLM-L6-v2, because it was:
We can make a similarity matrix that highlight (in yellow) the similarity between verses.

We can see that there are "squares" of yellow verses that highlight highly semantically similar verses. The yellow diagonal line has a similarity of 1, and it brightly yellow, because it's a line of identical verses. Sections with a green hue indicate that the verses are somewhat related. Meanwhile, dark blue sections indicate the a pair of verses that are not sematically related.
With different clustering techniques, we can use these embeddings to measure similarity and group communities of verses.
The video that inspired me to do this project used the Leiden Algorithm to sort a network of nodes (articles) and edges (measure of similarity) into clusters. This algorithm essentially finds groups of nodes that are more densely packed together than to other nodes. This forms a community. I constructed the following graph by computing cosine similiarity between each verse and connecting verses with cosine similarity scores over a certain threshold. Luckily, there are Python libraries that implement this algorithm so the computation was pretty easy. The algorithm identified the following communities within Proverbs:
Loading graph visualization...
This graph is fun to explore. Community 1 seems to be focused on the Wise vs. the Fool and how each character behaves. Community 12 is much smaller and features one of my favourite charcters: the sluggard.
When tuning this algorithms' parameters, I noticed there were consistently disproportionately large commnities such as Community 1 and 2. As well, it only ran on the largest connected network and excluded other networks, leaving out about 150 verses. This is a fun visualization, but let's move on to other ways of visualizing Proverbs' themes.
A hierarchical graph is like a family tree. This graph is called a dendrogram (Greek: dendron (tree) + gramma (drawing/writing)). Using the same embeddings, I took a bottom-up approach, first grouping together the most similar pairs, and continually linking sub-groups that are similar (or rather the least different) in theme. This is called Ward's method. Luckily, Scipy has libararies for this algorithm, so it was really just plug-and-play to find a good visualization with enough sub-groups.
How to use the graph: Zoom and pan to explore. Each colored branch represents a cluster of thematically similar verses. The specificity of these themes can be adjusted with it's "Height". This can be adjusted with the slider. Click on any branch node to collapse or expand it. Hover over leaf nodes to see the verse text.
Loading dendrogram visualization...
I found this visualization and algorithm helpful for finding smaller subgroups within Proverbs. For example, there are subgroups with 5-8 verses each talking about scorners, rain, birds, and patience, respectively. However, when you look at subgroups of 30+ verses, it can be harder to see the thematic connection. There are exceptions. For example, I found a major subgroup of 50+ verses with references to nature (plants, weather, animals, etc.). These are interesting and maybe with further investigation I could categorize some medium-sized subgroups. But for now, let's move on.
The standard and easiest method of clustering I knew was Kmeans clustering. With this algorithm, the number of clusters is predetermined and the algorithm assigns each verse to a cluster to optimize for dense clusters.
I tried 7 through 40 clusters and recorded metrics for each ones. These metrics included silhouette scores which measure how good the clusters are. A low score of -1 implies the cluster is noisy and contains verses of opposite meaning while a score of +1 would indicate the verses in the cluster are near identical. Most clusters had a score of between 0.1 and 0.4. This can be explained by:
I chose a value of k=20, making 20 communities within Proverbs. This
parameter gave one of the highest silhouette score and it was a nice round
number. After inspection, the 20 clusters were general had consistent topics.
Below is a graph of size distrubtions of each community. They seem to be roughly
the same size with natural outliers remaining. 
After running the kmeans algorithm, I had 20 groups of verses. I titled each group using its respective verses. Here are the methods I used and their results:
For a quick sanity check, let's see how each cluster's Claude title compares to the most frequent lexical words found in that cluster.
Next, I wanted to inspect them with a 3D visualization. I explored UMAP, which projects the 384-dimensional embeddings into 3 dimensions while trying to preserve local neighborhood structure.
Here’s the resulting 3D UMAP scatter plot:
Loading UMAP visualization...
Loading cluster explorer...
This was a weekend project and the core goal was to learn some new things. If anyone reading this is inspired to dig deeper into this dataset, please feel free to reach out @sebmen7
Click verse nodes below to compare clusters
Cluster Titles
who rejoice in doing evil and delight in the perverseness of evil,
but the wicked will be cut off from the land, and the treacherous will be rooted out of it.
Do not be afraid of sudden terror or of the ruin of the wicked, when it comes,
For they eat the bread of wickedness and drink the wine of violence.
But the path of the righteous is like the light of dawn, which shines brighter and brighter until full day.
The way of the wicked is like deep darkness; they do not know over what they stumble.
The iniquities of the wicked ensnare him, and he is held fast in the cords of his sin.
To do righteousness and justice is more acceptable to the Lord than sacrifice.
Haughty eyes and a proud heart, the lamp of the wicked, are sin.
The violence of the wicked will sweep them away, because they refuse to do what is just.
The Righteous One observes the house of the wicked; he throws the wicked down to ruin.
When justice is done, it is a joy to the righteous but terror to evildoers.
The wicked is a ransom for the righteous, and the traitor for the upright.
The sacrifice of the wicked is an abomination; how much more when he brings it with evil intent.
Condemnation is ready for scoffers, and beating for the backs of fools.
The devising of folly is sin, and the scoffer is an abomination to mankind.
for the righteous falls seven times and rises again, but the wicked stumble in times of calamity.
Fret not yourself because of evildoers, and be not envious of the wicked,
for the evil man has no future; the lamp of the wicked will be put out.
Whoever says to the wicked, “You are in the right,” will be cursed by peoples, abhorred by nations,
but those who rebuke the wicked will have delight, and a good blessing will come upon them.
When wickedness comes, contempt comes also, and with dishonor comes disgrace.
It is not good to be partial to the wicked or to deprive the righteous of justice.
In the house of the righteous there is much treasure, but trouble befalls the income of the wicked.
The sacrifice of the wicked is an abomination to the Lord, but the prayer of the upright is acceptable to him.
The way of the wicked is an abomination to the Lord, but he loves him who pursues righteousness.
The thoughts of the wicked are an abomination to the Lord, but gracious words are pure.
The heart of the righteous ponders how to answer, but the mouth of the wicked pours out evil things.
The Lord is far from the wicked, but he hears the prayer of the righteous.
When the righteous increase, the people rejoice, but when the wicked rule, the people groan.
When the wicked increase, transgression increases, but the righteous will look upon their downfall.
An unjust man is an abomination to the righteous, but one whose way is straight is an abomination to the wicked.
The wicked flee when no one pursues, but the righteous are bold as a lion.
Those who forsake the law praise the wicked, but those who keep the law strive against them.
Evil men do not understand justice, but those who seek the Lord understand it completely.
When the righteous triumph, there is great glory, but when the wicked rise, people hide themselves.
When the wicked rise, people hide themselves, but when they perish, the righteous increase.
The house of the wicked will be destroyed, but the tent of the upright will flourish.
The evil bow down before the good, the wicked at the gates of the righteous.
Do they not go astray who devise evil? Those who devise good meet steadfast love and faithfulness.
The wicked is overthrown through his evildoing, but the righteous finds refuge in his death.
Righteousness exalts a nation, but sin is a reproach to any people.
The Lord has made everything for its purpose, even the wicked for the day of trouble.
Better is a little with righteousness than great revenues with injustice.
He who justifies the wicked and he who condemns the righteous are both alike an abomination to the Lord.
The wicked accepts a bribe in secret to pervert the ways of justice.
To impose a fine on a righteous man is not good, nor to strike the noble for their uprightness.
The righteous hates falsehood, but the wicked brings shame and disgrace.
Righteousness guards him whose way is blameless, but sin overthrows the wicked.
The light of the righteous rejoices, but the lamp of the wicked will be put out.
A desire fulfilled is sweet to the soul, but to turn away from evil is an abomination to fools.
The righteous has enough to satisfy his appetite, but the belly of the wicked suffers want.
No one is established by wickedness, but the root of the righteous will never be moved.
The thoughts of the righteous are just; the counsels of the wicked are deceitful.
The words of the wicked lie in wait for blood, but the mouth of the upright delivers them.
The wicked are overthrown and are no more, but the house of the righteous will stand.
Whoever is righteous has regard for the life of his beast, but the mercy of the wicked is cruel.
Whoever is wicked covets the spoil of evildoers, but the root of the righteous bears fruit.
No ill befalls the righteous, but the wicked are filled with trouble.
One who is righteous is a guide to his neighbor, but the way of the wicked leads them astray.
Treasures gained by wickedness do not profit, but righteousness delivers from death.
The Lord does not let the righteous go hungry, but he thwarts the craving of the wicked.
Blessings are on the head of the righteous, but the mouth of the wicked conceals violence.
The memory of the righteous is a blessing, but the name of the wicked will rot.
The mouth of the righteous is a fountain of life, but the mouth of the wicked conceals violence.
The wage of the righteous leads to life, the gain of the wicked to sin.
The tongue of the righteous is choice silver; the heart of the wicked is of little worth.
What the wicked dreads will come upon him, but the desire of the righteous will be granted.
When the tempest passes, the wicked is no more, but the righteous is established forever.
The fear of the Lord prolongs life, but the years of the wicked will be short.
The hope of the righteous brings joy, but the expectation of the wicked will perish.
The righteous will never be removed, but the wicked will not dwell in the land.
The lips of the righteous know what is acceptable, but the mouth of the wicked, what is perverse.
The righteousness of the blameless keeps his way straight, but the wicked falls by his own wickedness.
The righteousness of the upright delivers them, but the treacherous are taken captive by their lust.
The righteous is delivered from trouble, and the wicked walks into it instead.
When it goes well with the righteous, the city rejoices, and when the wicked perish there are shouts of gladness.
By the blessing of the upright a city is exalted, but by the mouth of the wicked it is overthrown.
The wicked earns deceptive wages, but one who sows righteousness gets a sure reward.
Whoever is steadfast in righteousness will live, but he who pursues evil will die.
Be assured, an evil person will not go unpunished, but the offspring of the righteous will be delivered.
The desire of the righteous ends only in good, the expectation of the wicked in wrath.
If the righteous is repaid on earth, how much more the wicked and the sinner!
for my mouth will utter truth; wickedness is an abomination to my lips.
All the words of my mouth are righteous; there is nothing twisted or crooked in them.