I recently collected 28 state party platforms that were published in 2020. This is far less 100 if all state parties issued a platform, but around 30 state parties never issue platforms and many others issue platforms every four years during midterm elections. So, while 28 is less than usual, it still represents a good sample of party debates that occurred during the last year.
To analyze the texts, I used Quanteda, which I highly recommend. It’s easy to use, flexible and I have found it slightly easier to use than tidytext. Of course, regardless of which package one uses, it is really important for users to clean the text. I created my own list of stopwords in addition to the set of base words included in stopwords using the command:
(remove = c(stopwords(“english”), mystopwords)
This is really important because when implementing Wordfish or analyzing differences in word usage, state names will greatly affect the calculations as the proper names will be tagged as highly unusual and confuse them for an ideological signal.
mystopwords <- c(“california”, “state”, “nebraska”, “democrat”,….
First off, here are the obligatory word clouds (wordclouds have been called the mullets of text analysis):


Looking closely, many words aren’t very meaningful: “support”, etc. This is why I think it’s important to really clean the text first, even for really basic analysis. One can think of this in two ways; frequent occurrence of neutral words can be a signal of a moderate platform filled with platitudes. On the other hand, as I have pointed out in some of the posts below, words that appear infrequently can communicate volumes, but they will be buried in pile of noise.
Instead, I have used a keyness plot from Quanteda to highlight the greatest differences in vocabulary between the parties. This seems to communicate the usual differences between the parties. Democrats are focused on social welfare, as nearly all of these words focus on policies and/or beliefs that call for government intervention to reduce inequality: “access”, “afford”, “invest”, “care”, “insure” and “sustain” along with more specific policy words such as “disability”, “housing”, “climate” and “infrastructure.”

In contrast, Republicans focus on freedom from”, using words such as “limit”, “resolve”, and “liberty” and “oppose” and policy words such as “illegal” and “property” . Interestingly, along with traditionally conservative words reflecting a belief in traditional authority (“god”, “parent”, and “authority”) Republicans also tend to focus on the legal structure of the state: “article”, “amend”, “federal”, “legislature”, “govern”, and “constitution.”

The 2020 platforms were quite polarized; although this is nothing new compared to previous years (see posts below). I used Wordfish to estimate the ideology of each platform. These scores should be taken as estimates and this estimation is completely unsupervised, as further text cleaning is necessary. But, not surprisingly, the platforms are quite polarized, as the most conservative Democratic platform (South Carolina) is still much more liberal than the most left Republican platform (Arkansas). The North Dakota platform stands out as unusually conservative; this is confirmed by reports over the summer that the ND GOP was largely disassociating itself from the platform due to fairly extreme planks.
Due to the pandemic, many state parties altered the process for drafting and approving their platforms. I haven’t had the chance yet to really explore the effects of this, but it may have impacted parties that draft planks from the precincts or through in-person caucuses. Many state parties did not hold in person caucuses this year (although caucus states had begun moving towards primaries after 2016) , and few parties had in-person conventions (although Democratic state parties were more likely to hold virtual conventions than Republicans).