CLMS Project Banner

Investigation: How Do We Find Identity?

The Digital World as Methodology

Working with an idea as abstract as national identity and the concepts that comprise it finds a new dimension in expression and exploration through a digital medium. A digital medium takes an abstract idea such as national identity and gives it the space in which that abstractness find concreteness. It is another avenue through which the anthropologist can convey concepts of national identity and share in the preservation of cultural heritage, as well as use the myriad of digital analytical tools to help explore hypotheses and theories such as development and change in national identity. A very important part of digitizing ethnography, however, lies within keeping the narrative of the project and the scholarly argument at the forefront; a digital form of ethnography will best be taken seriously, as a medium of scholarship, if a scholarly argument is still conveyed throughout the project. Digitizing allows one to see the complex web of interaction between historical events, nationalism ideology, and how these interactions manifest in a national identity, paralleling Geertz’s discussion of thick description (1973) to the present day. The digital format of scholarship provides a dynamic fit for the examination of anthropological theories regarding questions of ethnography, literature, language, and a countless other anthropological subjects of inquiry.


Visualizing Norwegian National Identity

The digital medium becomes then an enriching ground upon which ethnographic research becomes visual and more accessible, both in conveying scholarly arguments and in sharing and preserving cultural heritage. My argument centers upon this main conceptualization that national identity, and those concepts that describe it, over time changes from homogenous and very Norwegian-focused to more multicultural and encompassing growing complexity. Simply writing about such concepts is not enough to express or even examine if we are seeing such trends; this is where the power of digital platforms enters to help the scholar examine and display trends, and from there extrapolate the meaning-making and manifestations of national identity over time.

For this project, this website serves as the space in which the visualization of Norwegian national identity becomes embodied in this exploration of meaning-making and evolution in a temporal context. Since the examples of text that embody concepts of national identity are taken from literature, this website is presented, from a user’s perspective, as that of reading a book. The landing page serves as the title page and table of contents, from which they can see the overall themes of each chapter or individual page of the website. Each page will examine one of the groups of concepts that I have chosen as the manifestations or representations of national identity in the literature, which at this time encompass five broad categories: national identity and ethnicity, tangible heritage, intangible heritage, values and ideologies, and nature and the environment. Within each concept, specific words and phrases corresponding to the broader concepts are examined for their place and time within Norwegian literature. The markers searched in the literature will be presented in a list on each page, and will also be presented on each visualization.


What does the data comprise of?

The data consists of the absolute counts (absolutely frequency) of the markers chosen (see each page for the specific markers used for each broad category) as they occur in the corpus of texts from the years 1814-2013, with zero smoothing (to get the most accurate, hard counts). Using this methodology then allows us to explore exactly when and how many times these markers are used throughout Norwegian literature and get a sense of when and in what contexts we are seeing changes (if any) in the types of concepts used to describe Norwegianness.


Where did the data come from?

The data is gathered from the corpus created by Språkbanken, a language technology resource repository available through the Nasjonalbiblioteket, aka the Norwegian National Library. The corpus itself consists of the digitized books and newspapers available through the National Library, specifically those texts digitized through the year 2013 and consists of approximately 34 billion words. The data was gathered via the N-gram tool created by Språkbanken for the purpose of searching their repository for historical trends of words, much like this project aims to do. This tool is based upon the framework of the Google Ngram Viewer which allows the user to pull the textual examples of these concepts from the text and chart them on a timeline, which allows the user to see when and where these concepts occur, and if there are correlations to historical events. In other words, these digital data mining tools allow us to get into the space of the literature to deeply examine the underlying values and concepts of national identity, and bring them to the surface in an accessible digital and visual presentation.


How is the data visualized?

After running the N-grams I created for each broad category of national identity concepts, I downloaded the results of the absolutely frequencies of each marker (as projected from 1814-2013) into a CSV file. I was then able to upload that data into a JavaScript visualization framework known as Amcharts.js which has numerous templates and customizations for creating tables, maps, and other types of data visualizations. This type of visualizations then allows the user to see the actual counts and where they occur historically. Under this projection of the occurrences of these national identity concepts in the literature in the form of a temporal graph, a scholarly analysis of the trends will provide the user with musing and explanations of where the national identity concepts occur, any correlations to specific historical events, and what this means, informed by anthropological and historical theory. In essence, each page serves as a chapter in the book to visualize to the user where national identity has been developed throughout Norwegian literary history, and what, if any, meaning this has in correlation to Norwegian political, social, and cultural history.


The problems of digital tools

So why choose absolute frequency over relative frequency, with regards to exploring how factors of identity emerge in Norwegian literature? When investigating the use of the Google Ngram Viewer and the Språkbanken Ngram tool, I found myself faced with two sets of data: absolute frequency and relative frequency. According to the Culturomics and Google N-gram Viewer documentation, the relative frequency of these N-grams and their occurrence in the corpus is related to the percentage of the corpus that contains the chosen n-grams. I examined using relative frequency, which, according to the documentation, will allow you see trends in the use of certain words and phrases over time, with the data normalized to account for increased publishing and other factors that could skew the data. In my case, I had reservations with this type of frequency since I was using a corpus that I did not build myself, but was rather open-source through the Norwegian National Library. Therefore, I did not know exactly what comprised the corpus, only the approximate count of the words in the corpus. I also did not know if this repository and N-gram tool through Språkbanken also normalized (itself still an unclear term) the data. Therefore, I felt the best choice was to use absolute counts to get a sense of how markers actually appeared in the literature. Even though one of the caveats of this is that many peaks and increases in counts occur recently, because of increased publishing and digitization of more recently published material, we can see observe many interesting things through the absolute counts of the markers, including when they first appear in the literature and when there are increases and decreases in usage with respect to the historical record. It’s also interesting to see which markers are used more frequently, even in recent years – this in and of itself is interesting to see what is being written and published now with respect to their association with Norwegian national identity.

Another difficulty in using this type of tool to search a corpus of literature relates to translation. This particular tool is wonderful for its access to both Bokmål and Nynorsk material, but the tool itself lacks a way to examine the Sami language material. Therefore, many of the Sami language terms I originally intended to search did not show up in this tool, as digital material in that language does not seem, according to the documentation, to be a part of the corpus at this time. Therefore, many of my Sami language markers had to be omitted from this project at this time, unless they showed up in the Norwegian language literature material. I did not perform manual counts in the National Library’s database, as that is a different corpus and thus would not be methodologically comparable to the Språkbanken corpus used for this project. However, it is my intention in the future, as this project develops, to find a way to incorporate those markers into this project, and hopefully gain more insight into the Sami aspect of Norwegian national identity.