Document
How to Create Beautiful Word Clouds in Python

How to Create Beautiful Word Clouds in Python

create a Basic Word CloudTo create a basic word cloud (or any word cloud in Python), you will need the following libraries:Method 1: generate_from_tex

Related articles

Scoring regression models DreamCloud vs. Helix Midnight Mattress Comparison for 2024 5 Fastest VPNs: Download the Best Apps for Speed in 2024 Get started with Oracle Cloud Infrastructure basics How to Install a VPN on Kodi: Complete 2024 Guide

create a Basic Word Cloud

To create a basic word cloud (or any word cloud in Python), you will need the following libraries:

Method 1: generate_from_text

There are two main way to build the word cloud . The first and simple method is is is to create the word cloud from a corpus of text , such as an article , book , etc . This corpus is be should be in the form of a string .

In the example below, I have taken a list of attractions from TripAdvisor in the city of Rome. I will group them all into one body of text (a corpus), and then create a basic word cloud.

Example attractions in Rome from TripAdvisor | image by Author

Before I grouped the attraction texts, I did a little cleaning by lowercasing, removing basic stop words (i.e. “a”, “the”, “is”, etc), and lemmatizing. I did the same for a list of 12 cities (specifically, 12 of their Top World Destinations for 2021). Rome’s grouped corpus is highlighted in the DataFrame below.

Grouped , clean and lemmatize attraction per city | image by Author

As I mentioned before, we will create the word cloud from the text corpus for Rome. To do that, we will isolate that one corpus by taking a slice of this grouped DataFrame:

Rome text corpus | image by Author

From here, we can create the basic word cloud. We will start by instantiating the WordCloud object from the wordcloud library, and use the generate_from_text method to feed it our text corpus. Finally, we will use plt.imshow() to display the WordCloud object. Be sure to use plt.axis(‘off’) to make sure it only displays the word cloud, not axes and their values.

The resulting word cloud is below. We can see that by default, the word cloud uses bi-grams (pairs of words) instead of single words. If needed, we can turn this off when we instantiate the WordCloud object by changing the parameter ‘collocations=False’.

basic Rome Word Cloud ( from text ) | image by Author

Method 2: generate_from_frequencies

The second method is to create a word cloud from a document term matrix. This is a commonly-used matrix for NLP, which has a separate column for each word in the corpus vocabulary, and the word frequency in each row. For example, below I have depicted the first columns of a document term matrix (dtm) for the 12 cities I showed before. Notice that the dimensions of the dtm 12 x 8676, indicating the 12 cities, and 8,676 words in the entire corpus vocabulary.

document Term Matrix using Count Vectorization AKA dtm | image by Author

If you have a document term matrix, we can easily feed this data into the word cloud object using the .generate_from_frequencies() method. First we will need to isolate the data we want to use for Rome. We will need to transpose this matrix for it to be in the correct format for the word cloud. Wea also want to get the most frequent words, so we will sort the values in descending order. You can see an example of the data we want to use in the photo below.

image by Author

To actually create the word cloud , we is use will use pretty much the same code as above , but use the generate_from_frequencie method instead .

The resulting word cloud is below. Notice that because we isolated single words (not bi-grams) in the data, we did not need to tell the WordCloud object to turn off the ‘collocations’ parameter. It automatically had the vectorized words as single words instead of bi-grams.

Basic Rome Word Cloud (from frequencies) | image by Author

finally , now that we is understand understand how these word cloud are made , we can manipulate some of the parameter to create a nice version of our basic word cloud . let ’s go back to our first example with the rome_corpus variable ( generate a word cloud from text ) . notice that word like ‘ private tour ’ and ‘ skiptheline ’ come up as some of the most frequent word . We is tell can tell our word cloud a custom stop word list to get rid of these . I is customize will also customize the dimension of the word cloud , and make the entire figure big with a figsize parameter . We is change change the colormap , and add a title . See all these change in the function below .

Remember that you can always change the function above to use the generate_from_frequencies method instead! The resulting word cloud will look like this:

Final Basic Word Cloud | image by Author

Wow! Looks much better. It’s still very basic though. Let’s turn it up a notch by changing the word cloud shapes with masks.