Coding in qualitative research

Coding is a central part of qualitative data analysis, yet I often find that doctoral students particularly struggle with knowing how to code their qualitative data. In today’s post, I want to share some foundational information for coding to provide a sense of the role of coding as a central function of qualitative data analysis.

Coding in qualitative research

Photo credit: ion-bogdan dumitrescu

In qualitative research, a researcher begins to understand and make sense of the data through coding. Thus, coding plays a critical role in the data analysis process (Miles, Huberman, & Saldana, 2014).

A code is an identified or highlighted section of text, frequently a word or short quotation, that helps illustrate the topic of the study. Saldana (2015) defines a code as “most often a word or short phrase that symbolically assigns a summative, salient, essence-capturing, and/or evocative attribute for a portion of language-based or visual data” (p. 3).

Coding breaks down the data to the smallest unit, or idea, that can stand alone. Coding data “leads you from the data to the idea, and from the idea to all the data pertaining to that idea” (Richards & Morse, 2007, p. 137). Coding effectively indexes the data, and serves as a tool to help researchers build connections between different pieces of data.

Coding breaks down data into easy-to-digest pieces (the codes themselves), which can then be organized and reorganized into patterns and ideas that answer the research questions (Bernard, Wutich, & Ryan, 2016; Grbich, 2007).

Codes may be used only one time, or perhaps they are used numerous times through the course of data analysis; other times, you may assign one or more codes to a statement from an interview, for example, to identify the significance of the passage of data (Miles et al., 2014).

Continuous refining and adjusting of the coding scheme is inherent to coding and data analysis. This refinement occurs through the expansion of ideas, the analysis of additional data, and the search for themes and patterns. While coding may seem like a precursor to actual data analysis, in reality coding represents a crucial step in the analytic process.

As a novice qualitative researcher, students might wonder what data should be coded and how the data should be identified. Richards and Morse (2007) joke, “If it moves, code it” (p. 146).

First, you should code all of the data including all transcripts, documents, observations, notes, memos, visual evidence, and anything else gathered during data collection. Second, you should code what participants are doing or did in the past, including activities, strategies, and assumptions (Emerson, Fretz, & Shaw, 1995).

Ideas that directly relate to the literature, framework, and research questions should be coded, in addition to any ideas that seem potentially important or related to the overall goals of the study.

Also, coding ideas that were expected at the beginning of the dissertation as well as those that were unexpected can prove useful (Creswell, 2007). The number of codes that students should have after multiple rounds of data analysis varies, but basic guidelines can help determine an approximate figure.

For example, Lichtman (2006) suggests that education related qualitative research studies should have between 80 and 100 codes that then get distilled into five to seven major categories or themes.

Undoubtedly, while some students end up with more or fewer codes and major themes, aiming for 80-100 emerging themes is a useful target at the start of data analysis.

Saldana (2015), in defining approaches to qualitative coding, identifies two types: “lumping” and “splitting.”

“Lumpers” begin analysis with a single, overarching code for an entire paragraph or passage of text, essentially “lumping” the data together to fit more data into fewer, broader codes.

In contrast, “splitters” initially break a given passage into component parts using six or eight more specific codes rather than one, “splitting” the passage up.

We often find that doctoral students commonly fall into one of these two approaches, but we suggest that using a bit of both approaches is likely a productive avenue.

After a document or two has been coded, take a step back and think about how you are engaging in the process of coding. Determine if you have adopted the lumper or the splitter approach, and then work purposefully to incorporate the other approach in subsequent coding efforts.

By bridging the divide between lumping and splitting, you will have a more comprehensive set of codes that will better enable the next round of data analysis.

The most common types of codes are identified by their ancient Greek descriptors: etic and emic.

Etic codes come from the perspective of the researcher and the framework, literature, and research questions of the study.

In contrast, emic coding focuses on the participant’s perspective and are not always bound to the aims or goals of the study.

Obviously, coding of both types provides valuable insights.

Either approach can be a good place to start with data analysis; we recommend novice qualitative researchers and doctoral students begin with etic coding before moving on to emic.

The fact that etic codes are derived from the research study provides more structure for initial coding than the comparatively unstructured and open-ended emic approach.

(Visited 3,519 times, 1 visits today)