Intro

Column

Selected songs

15

Experts curated our playlist

3

Participants

119

LCA classes

3

Column

Our Research

Listen to this short music snippet:


Do you find this snippet beautiful? Are you familiar with the song? Do you think this song fits with your overall genre preferences?

We are curious to know what factors influenced the perception of beauty the most. After listening to the song, could you estimate what are those factors? This question is at the heart of our research. Click through the website to check if your predictions are accurate!

Exploring Beauty Through Different Characteristics

Column

Procedure

To compile suitable stimuli, a pre-test was conducted. After choosing 15 stimuli snippets, we created a Qualtrics survey. Firstly, participants had to answer basic background questions, namely age, gender and country of origin. Afterwards, they were presented with a short version of Gold-MSI test. Then, participants were presented with the 15 songs in a random order. They had to choose whether they found the snippet beautiful or not. Additionally, they were asked if they were familiar with the snippet. Participants could listen to the song as many times as they like. Finally, they had to choose three favorite genres based on a STOMP selection. All in all, we collected the data from 119 participants.

Such procedure helps to divide people into classes by their musical preferences, and then check whether there are any significant changes in characteristics per class.


Characteristics of the Sample

In this section, we will analyse the graphs to determine the characteristics of our sample.

Before we begin with the analysis, we would like to mention that our data might be a bit biased, as our participants are predominantly females that are in their 20s.

Countries of Origin

Click on the tab Countries of Origin on the right to see the graph

As you can see from the graph, the majority of our participants are from The Netherlands. Nevertheless, we are proud to mention that we gathered a diverse group of participants. The respondents come from 26 different countries from all over the world: Asia, Africa, Europe and America. The majority of our participants are either European or Asian.

Genre Preferences

Click on the tab Genre Preferences on the right to see the graph

To assess genre preferences, we asked participants to rank three of their best preferred genres. The majority of our sample favours pop music. This might be due to the fact that our participants are predominantly in their 20s. Tied in the second place are classical and rock genres. Bluegrass proved to be the least favourite genre. One feasible explanation is that bluegrass is overall less common than the other genres, thus unfamiliar to the majority of participants.

STOMP

Click on the tab STOMP on the right to see the graph

The Short Test of Music Preference (STOMP) is designed to assess music preferences that are related to personality variables, self-views and cognitive abilities. The test consists of 4 categories:

  1. Reflective & Complex: classical, blues, folk, jazz, etc.

  2. Intense & Rebellious: alternative, rock, heavy metal, punk, etc.

  3. Upbeat & Conventional: country, religious, pop, soundtracks, etc.

  4. Energetic & Rhythmic: electronica, rap, soul, funk, etc.

The genre preferences were divided according to these groups.

As can be seen, Upbeat & Conventional category is the most liked. Genres from Energetic & Rhythmic category are least preferred by the participants.

Musical Sophistication and Genre Preference

Click on the tab Musical Sophistication and Genre Preference on the right to see the graph

To test individual differences in musical sophistication, we used the short version of Goldsmiths Musical Sophistication Index (Gold-MSI). The following 5 aspects are measured using a self-report questionnaire:

  1. active musical engagement: the amount of time and resources spent on music;

  2. self-reported perceptual abilities: the accuracy of musical listening skills;

  3. musical training: the amount of formal musical training received;

  4. self-reported singing abilities: the accuracy of singing;

  5. sophisticated emotional engagement with music: the ability to talk about the emotions that music expresses.

According to the test, the higher the overall score is (on a scale from 18 to 125), the more musically sophisticated the person is. To see if musical sophistication affects genre preferences, we looked at the distribution of the Gold-MSI scores in STOMP groups.

From the graph, we can infer that all 4 categories have pretty similar median scores. The highest Gold-MSI median is in Reflective & Complex category, but the differences between categories are not significant.

Column

Countries of Origin

Genre Preferences

STOMP

Musical Sophistication and Genre Preference

Background Information

Row

Musical Sophistication and Beauty

The majority of people would agree that perception of beauty, especially in music, is a highly subjective phenomenon. Or is it? In our research, we aim to explore the relationship between beauty assessment and musical sophistication.

Previous research done on aesthetics and music mainly focusses on personality traits and how those influence perception. For instance, awe is one of the profound aesthetic experiences, often described as being touched, moved, fascinated and amazed. It was found that people who are more open to experience are more susceptible to awe-like states (Silvia et al., 2015). Interestingly, the study was conducted in 2 domains, that is visual and auditory stimuli were used. Across both domains, openness to experience was the only factor predictive of the higher experience of awe. One of the drawbacks of the methodology is that judgements were made by listening to only one song (‘Hoppípolla’ by Sigur Rós). Furthermore, although none of the participants understood the Icelandic language, the overall perceived ‘melodicity’, familiarity, etc. of the language could have affected the perception of the song (Jenkin, 2014). In the present study, we use only instrumental music to control for the language factor. Furthermore, 15 music snippets were chosen as stimuli.

Usually, listening to music is an aesthetic experience that requires activation of not only affective, but also cognitive and evaluative processes. Studies have found that music expertise modulates the cortical processing of different aspects of music perception (e.g. Atienza et al., 2002; Bosnyak et al., 2004). Cognitive researchers (Müller et al., 2010) compared aesthetic judgements between experts and laypersons by using event-related potential (ERP) measurements. They found that when exposed to the same stimuli, experts’ and laypersons’ ERP measures systematically differed. We believe that if there is a difference in aesthetic judgements between groups of experts and laypersons on the ‘brain’ level, then there should be an observable distinction on a more conscious level, too. As in Müller’s paper, the question of whether the piece is beautiful or not, as opposed to ‘do you like it’, used in the majority of previous papers (e.g. Brattico & Jacobsen, 2009) will be used to quantify beauty assessment. In this way, the question becomes more linguistically sound and precise.

Sophistication is not the only factor that could potentially influence beauty perception. Genre preference might also have a profound impact on whether the participant finds a piece beautiful or not (Istók et al., 2013). The modernist view of music aesthetics (Burke & Gridley, 1990) supports the idea of genre hierarchy. This theory states that complex music, such as jazz, is less popularly valued because of its high intellectual demand. Followers of the theory would argue that jazz is a genre that can be comprehended and appreciated only by musically sophisticated individuals. For this reason, our stimuli includes a variety of genres, for instance jazz (Drama in Six Notes  ), bluegrass (Less is Moi  ) and electronic (syro u473t8+e [141.98]  ). In addition, we will check if certain genre preferences correspond with higher musical sophistication scores.

Inspired by the aforementioned literature and also personal experiences, we would like to see if musical sophistication influences the perception of beauty. We hypothesize that higher scores for music sophistication will align with higher scores for beauty. Furthermore, we will analyze the potential correlation between said scores and genre preference.

Article Review: Are musicians particularly sensitive to music?

In the article “Are Musicians Particularly Sensitive to Beauty and Goodness” (Güsewell & Ruch, 2014) the degree and form of musical practice of participants is compared to responsiveness to artistic, natural and non-aesthetic beauty and goodness. This was examined using self-report and stimulus-based instruments.

It was found that professional musicians had the highest scores in responsiveness to artistic beauty, experience seeking, and absorption compared to the other groups. The amateur musicians scored highest on overall responsiveness, responsiveness to non-aesthetic goodness and responsiveness to nature. This supports the hypothesis that there is a link between sensitivity to beauty, goodness and musical practice.

From the data, researchers concluded that the responsiveness to beauty was related to the degree of involvement in musical practice. It was suggested that the opportunity to artistically express oneself was needed for a balanced responsiveness to the beauty profile. The groups that scored highest in responsiveness to beauty were believed to have more opportunities to express themselves (amateurs and soloists) or take part in musical activities where strong expressive and artistic involvement was needed (high-level orchestra musicians).

However, the participants were not grouped based on the opportunity for expression through music. Thus, based on the results of soloists and amateur musicians it is not yet possible to conclude that personal interpretation of music increases responsiveness to beauty. In our study, we focus on the musical sophistication of a larger sample, including participants with both low and high scores. Such a sample might provide a more in-depth look at the relationship between musical experience and perception of beauty.


Contact Information

This research project was conducted as a part of an Honours “The Data Science of Everyday Music Listening” course at the University of Amsterdam (coordinated by dhr. dr. John Ashley Burgoyne).

Research team members:

  • Xiaoqing Li

  • Esther Liefting

  • Willem Pleiter

  • Denise Quek

  • Nikita van ’t Rood

  • Kristina Savickaja

If you have any questions or comments, please contact us at

Compiling the Playlist

Column

Rating the Songs

To create the stimuli, we searched online for known datasets that include musical pieces that were dissected according to their musical components. Since the search yielded no results, each member of the group supplied 5 instrumental songs that they found beautiful via Spotify. The chosen songs had to be instrumental to control for the influence of language. Afterwards, the compiled songs were evaluated by 3 musical experts (10+ years of formal musical training). Experts had to rate 30 second songs snippets on a 10-point Likert scale on the following criteria:

  • Melody: overall presence and dominance of melody, very unmelodious (1) - very melodious (10)

  • Tonalness: overall tonalness of the composition, very atonal (1) - very tonal (10)

  • Articulation: the rhythmic articulation of the song, very staccato (1) - completely legato (10)

  • Intensity: overall loudness, crescendos and decrescendos in a song, pianissimo (1) - fortissimo (10)

  • Pitch: overall distribution of the pitch, all bass (1) - all treble (10)

  • Rhythmic Clarity: overall presence of a pulse, very vague (1) - very firm (10)

  • Tempo: the general pulse of the song, very slow (1) - very fast (10)

  • Rhythmic Complexity: the extent to which different meters, odd tempo or complex rhythmic patterns are utilized, very simple (1) - very complex (10)

  • Mode: overall mode and feel of the song, minor (1) - major (10)

This method is based on the evaluation system used by Aljanaki et al. (2016). The final selection of 15 stimuli songs was chosen based on A) Feature Representability and B) Reliability.


A) Feature Representability
The panel on the right is interactive, hover over a point with your mouse to find out more

The combined box and jitterplot shows the overall distribution of the characteristics of the songs. While the boxplot represents the feature values of all 30 songs, the jitterplot illustrates the feature values of the 15 chosen stimuli songs.

As can be seen from the jitterplot, our final selection covers quite a large range for most parameters.

Expert Rating per Song

Column

Reliability

To finalize our selection of stimuli songs, we first estimated the reliability of the expert ratings per song. To do this, the distance scores between each of the expert‘s evaluation was computed. For example, each evaluator rated a song on Tempo. If the first rater gave it a 5, the second rater gave it a 6 and the third - 7, the distance is then calculated by taking the distance between the first and the second rater (6 - 5 = 1), the second and the third rater (7 - 6 = 1) and the distance between the first and the third rater (7 - 5 = 2). The sum of the differences (1 + 1 + 2 = 4) provides an estimate of a consensus for Tempo. Subsequently, this process was repeated for all components per song. Then, all reliability scores per component were summed to give an estimate of overall reliability. The table on the right shows these scores for all 30 songs.

As can be seen from the table, the reliability scores range between 26 and 64, with a lower score representing stronger consensus. Based on these scores, we estimated a cut-off point of < 45 and selected the final stimuli.

However, upon examining our prior selection, it became apparent that the distribution of melody ratings was skewed in favour of very melodious songs. Furthermore, there was a lack of atonal songs. To tackle these issues, we decided to discard The Kiss (reliability score of 42) and add Drama In Six Notes (reliability score of 46) instead.

Reliability Scores per Song

Songs

Row

Snippets

Blueming

Bygone Bumps

Cia Pat

Decision (Price of Love)

Elysium

Firth Of Fifth

Less Is Moi

Married Life

Resolver

Scarface Theme

Single Petal Of A Rose

Song For A New Beginning

syro u473t8+e

Šešių Natų Drama (Drama In Six Notes)

USA III Rail

Full Songs

Splitting the Sample

Column

The LCA

Latent Class Analysis (LCA) is a psychometric method in which participants are grouped based on how likely they would respond positive to a certain survey item. In our case, the said item is the song snippet that the participant find beautiful or not.

After running the data from 119 participants through the LCA, either a 2- or a 3-class model was the best fit. Since only the 3-class model performed good on absolute fit, a 3 class model was selected. The table shows the probability of a person belonging to a particular class based on their beauty judgement (yes/no) of a specific snippet. For instance, for “Married Life”, a person belonging to the ‘Likers’ group has a 100% chance of rating the song beautiful, ‘Indifferents’ - 86% and ‘Dislikers’ - 20%. These conditional probabilities are available for all classes, so you can hover over the table to see which songs were generally very liked and which were disliked.

As you may have figured out, the names of classes correspond with the quantity of songs that they find beautiful. Thus, ‘Likers’ find the majority of snippets beautiful, ‘Indifferents’ are the middle group and ‘Dislikers’ find most of the songs not beautiful. ‘Likers’ comprised 38% of our sample, ‘Indifferents’ - 48% and ‘Dislikers’ - 14%.

Column

LCA Class Table

Final Results

Column

ANOVA post-hoc

On the second tab, the post-hoc results are visible from our ANOVA. Analysis of variance (ANOVA) is a method in which means and variance between groups (in our case classes) are compared to see whether the differences in Gold-MSI scores are significant (not different due to variation or natural occurring randomness). If an ANOVA is significant, it means that the null-hypothesis that the groups do not differ from each other can be rejected, but the ANOVA itself will not tell you which groups are different. Assumptions of normality and equal variance were checked with a Shapiro Wilk’s test and Levene test, and came out OK (ie. the p-values were not significant for either test).

To find out which groups are different, so called post-hoc tests can be used, which compare the groups one by one, instead of all at once like ANOVA does. A conventional method to run post-hoc analyses is the Tukey-test (the results are on the second tab of the right panel). This test can be interpreted as follows: on the y-axis you see which test has been done (either comparing ‘Indifferents’ with ‘Likers’, ‘Likers’ with ‘Dislikers’ and so on). The x-axis tells you the size of the difference. The plot contains 3 intervals. These are 95% confidence intervals. Since the red lines do not contain a value of 0, we can confidently say that if we were to repeat our research an infinite amount of times, and construct confidence intervals an infinite amount of times, 95% of the confidence intervals that are calculated (which will be similar to our own confidence interval), will contain the true value of the difference between classes.

Phrased more clearly, the plot shows evidence that the ‘Likers’ tend to have higher Gold-scores than both the ‘Dislikers’ and the ‘Indifferents’, but there is no difference between the ‘Dislikers’ and the ‘Indifferents’ (since the confidence interval contains a zero). This confirms our hypothesis. In the next section, we will illustrate how big these differences are exactly and examine if any other characteristics can explain the differences found between groups.

Gold-MSI and Class

The boxplot on the right is interactive, hover over the boxes to find out more

The Gold-MSI and Class boxplot illustrates the differences between the classes. ‘Dislikers’ have the lowest median for Gold-MSI score at 62.00, closely followed by ‘Indifferents’ at 69.00. ‘Likers’ have the highest median value - 82.50. Furthermore, ‘Likers’ group has the largest interquartile range from 66.00 to 96.00, whereas ‘Indifferents’ and ‘Dislikers’ have similar ranges at 56.50 to 82.50 and 52.50 to 80.25 respectively.

Overall, the ‘Likers’ group scores higher in all factors, while ‘Dislikers’ and ‘Indifferents’ have relatively similar scores. As previously shown by ANOVA and Tukey’s test, ‘Likers’ differ from both ‘Indifferents’ and ‘Dislikers’ in their Gold-MSI scores.

STOMP and Class

STOMP and Class graph shows the relation between genre and class (‘Likers’, ‘Indifferents’, ‘Dislikers’).

The main conclusion to draw from this graph is that Intense & Rebellious has the highest percentage of ‘Dislikers’ out of all genre groups. In contrast, the Reflective & Complex and Energetic & Rhythmic genres were more popular with the ‘Likers’ and ‘Indifferents’, but have the lowest percentages of ‘Dislikers’.

In general, Upbeat & Conventional is the most popular genre group and was roughly equally appreciated by all groups. After considering the age and gender of the participants, we found that there were no significant differences in their responses. As a result, these graphs are not shown here.

Column

ANOVA Post-Hoc Results

Gold-MSI and Class

STOMP and Class

Conclusion and Discussion

Column

Conclusion

People with different cultural background, different levels of music knowledge and different music experience tend to have different music tastes. As an effort to gain insights into these differences, our research has investigated the relationship between music sophistication and the perceived beauty of music. Following the main research question, individuals’ perception of beauty is proved to be influenced by their musical sophistication. The investigation was conducted in three steps. Firstly, the association between musical sophistication and the biggest possible confounder of our research, genre preference, has been visualized. However, the differences are minute, thus musical sophistication is not likely to affect genre preferences.

Secondly, after sorting participants into the LCA classes based on their beauty perception of the songs snippets, their genre preferences were compared. In general, “Upbeat & Conventional” music was relatively popular with all 3 classes. Besides, “Intense & Rebellious” music is favoured by ‘Dislikers’ the most, while about one-third of ‘Likers’ prefered genres in the “Reflective & Complex” category.

Finally and most importantly, among the three groups of participants, the group of ‘Likers’ is significantly different from the other two groups. The group has the highest scores for the Gold-MSI test. People in this class perceived the majority of snippets as beautiful and obtained higher median scores for the overall musical sophistication. In other words, those who have more music-related experiences are likely to categorize more music as beautiful. Previous research (Güsewell & Ruch, 2014) suggested that professional musicians are more sensitive to the artistic beauty of music. In relation, those who have achieved relatively high scores on the Gold-MSI test might somehow akin to the experts and thus become more responsive to the beauty perception of the music as well. However, more research is needed to provide potential explanations for our research results.

Altogether, with an international sample and diversified music snippets, we are glad to say that the research results supported our predictions. Indeed, music sophistication is not likely to influence people’s music tastes in terms of genre preference, but could influence music perception in terms of aesthetic evaluation. Furthermore, our results are somewhat in line with the modernist view. Both ‘Likers’ and ‘Indifferents’, who have higher Gold-MSI scores, are more likely to appreciate complex music (“Reflective & Complex”) than ‘Dislikers’ (lower Gold-MSI scores).

Column

Discussion

This research has tried to move one step toward exploring the beauty of music. In the future, more research could be done to gain better insights into the diversified world of music aesthetics. In relation to some of the limitations of this study, several suggestions for future research are discussed.

Familiar vs Unfamiliar Songs
According to Pereira and his colleagues, familiarity could contribute to listeners’ engagement with the songs. As a result, this factor may play a crucial role in beauty evaluation. However, in our research, we strived to exclude the factor of familiarity by mainly selecting songs that are less popular. Therefore, the research falls short of comparing the differences in evaluations between familiar and unfamiliar songs. Future research could explore the familiarity aspect and how it affects the perceived beauty of music.

More Intense & Rebellious Songs
It is worth mentioning that most of the songs used in this research belong to the “Reflective & Complex” STOMP group. There was an obvious lack of “Intense & Rebellious” snippets. This fact could explain why the genres in this category were largely prefered by the ‘Dislikers’. Therefore, it might be the case that genre preference had an influence on class composition. For future research, we suggest that the genres of the songs used should be even more diverse to rule out this possible confounder.

The Confounder of Instrument Learning Experience
Research about the relationship between music sophistication and beauty perception is still insufficient. When it comes to evaluating instrumental music, those who have learned instruments may be more sensitive than those who have not. Meanwhile, the former group of people are likely to gain a higher score for music sophistication than the latter ones. Hence, the result of this research may be biased as the Likers-class included many instrument-learners. Accordingly, further research is needed to address the confounding influence brought by the instruments learning experience.

Objective vs Subjective Beauty
As a latent concept, beauty deserves more in-depth investigations. Whilst this research simply uses a “yes or no” question to measure people’s beauty perception, we believe that a more interesting pattern could be found if a different measurement is adopted. For example, a 7-point Likert scale can be used to evaluate how beautiful the snippets are. Moreover, a qualitative interview during which participants would be asked to define beauty in music, could be conducted as a pre-test.

Correlation vs Causal Relationship
While this study tried to test the relationship between musical sophistication and the perception of beauty, only correlation could be addressed. In other words, evidence for the causal relationship between the two variables is still absent, and therefore, whether music training and experience leads to more music enjoyment and appreciation remains speculative.

In a nutshell, to date, the field of music aesthetics remains mostly undeveloped. Academic research on how musical sophistication contributes or relates to the perception of beauty is still absent. Our study, despite the above-mentioned limitations, can serve as a stable foundation for further studies