This paper borrows from earlier versions presented at the iConf 2014, Berlin, Germany and Workshop on Information in Networks (WIN), New York, NY, 2013


Documentaries are meant to tell a story, that is, to create memory, spark the imagination and support sharing (Rose, 2012). Moreover, documentaries aim to change people’s knowledge and/ or behavior (Barrett & Leddy, 2008). How can we know if a documentary has achieved these goals? We report on a research project where we have been developing, applying and evaluating a theoretically–grounded, empirical and computational solution for assessing the impact of social justice documentaries in a scalable, robust and rigorous fashion. We leverage cutting–edge methods from socio–technical data analytics—namely natural language processing and network analysis—for this purpose. We also built a publicly available software tool (ConText) that supports these routines. In this paper, we focus on the theoretical framework for this project, present our methodology, and provide an illustrative example of the proposed solution.

Keywords: impact assessment, social justice documentaries, social network analysis, natural language processing

Acknowledgements: This work is supported by the FORD Foundation, grants 0125–6162 and 0145–0558. The authors are grateful to the following UIUC graduate students for their help on this project: Shubhanshu Mishra, Amirhossein Aleyasen, Kiumars Soltani and Sean Wilner. We would also like to thank Joaquin Alvarado, Chief Strategy Officer of the Center for Investigative Reporting for this advice on this work.

1. Introduction

The need for the rigorous and scientific evaluation of the impact of social justice documentaries has been repeatedly pointed out by funding agencies, practitioners and researchers who are active in the field of documentaries in particular and media in general (Barrett & Leddy, 2008; Clark & Abrash, 2011; KnightFoundation, 2011). In these domains, impact assessment has high practical relevance: when funding agencies, e.g. the JustFilms Division at the Ford Foundation, the Sundance Documentary Fund or BritDoc, award a grant to a film maker, they want reliable and comprehensive information on the return on their investment, where the goal with these investments is to cause change in society. However, as will be explained in the background section, the amount and depth of prior literature and work on this topic is limited. In a nutshell, assessment in this domain has been typically done by using (a) traditional, scalable, quantitative metrics, such as the number of visitors of a screening or webpage, and/or (b) conventional, in depth, qualitative methods for studying the perception of a theme or product by a few people, such as focus group interviews. Overall, the quantitative metrics are typically used on the community or societal level (macro–level), while the qualitative methods are applied on the individual or small–group level (micro–level). We argue that these two layers have to be integrated to gain a more comprehensive understanding of the impact of films.

Another major shortcoming with prior impact assessment work in this field is that even though the outlined evaluation methods do consider the reaction of target audiences, they fail to take into account (a) relational information about audience members and other stakeholders as well as (b) the information produced or shared by these groups. This matters as prior work has shown that (a) social ties impact behavior (McPherson, Smith-Lovin, & Cook., 2001) and (b) without considering of information produced or shared by network participants, we are limited in our ability to understand the transformative role that language can play in networks and vice versa (Corman, Kuhn, McPhee, & Dooley, 2002; Danowski, 1993). In order to address these limitations, we have been developing a theoretically grounded, computational methodology and pertinent technology that support people in mapping, monitoring and analyzing (a) the social network of stakeholders involved with the main theme of a movie and (b) the content disseminated by these agents. We are bringing these two types of behavioral information, i.e. social relationships and content, together by constructing and analyzing socio–semantic networks of social agents (stakeholders, audiences) and information. We argue that this approach provides a more comprehensive window into the structure, functioning and dynamics of the interplay of social agents and information than prior approaches used in this domain (Diesner, 2013; Gloor & Zhao, 2006; Roth & Cointet, 2010).

This paper is structured as follows: Section two reviews prior work on documentary impact assessment and concludes with identifying missing pieces. Section three addresses these shortcomings by reporting on the development of a theoretically grounded, computational solution for mapping and assessing this type of impact. We put the proposed solution into an application context by providing an illustrative example. Section four summarizes the results of this work, open questions and next steps.

2. Background

In this section, we synthesize prior work on assessing the impact of documentaries. Basically, there are three families of prior studies: case studies of individual movies, proposed frameworks and academic research.

2.1 Individual Case Studies

One main approach to measuring the impact of documentaries is through case studies, i.e. collections of quantitative metrics and/or anecdotal reports about a single production. Two examples are the assessment of “Legacy” (Applied_Research_Consulting_LLC, 2002), and the Working Films’ evaluation of “Blue Vinyl” (Barrett & Leddy, 2008). Also, BritDoc has released a series of high quality impact assessment studies ( Such evaluations approximate the influence of a documentary by considering (a combination of) the following indicators:

  • Cumulative counts of the number of screenings, video distributions, or people reached through campaign activities.
  • Comments from individual viewers; analyzed qualitatively on a case by case basis.
  • Lists of key organizations participating in the documentary–related campaign. Connections between these organizations are typically not considered.
  • A few instances of policy adoption.

Overall, case studies can be useful in highlighting the outcomes of a specific documentary. However they do not generalize to other productions. In other words, this approach fails to ensure that the same methodology is applicable across productions and genres such that findings for multiple films could be compared.

A widely adopted technology in this field that supports people in visually exploring and analyzing the data they have gathered and entered about their production is Sparkwise (, which offers a visually highly appealing and professional graphical user interface. Also, the Harmony Institute has recently releases of StoryPilot; an impact assessment technology for documentaries that allows for conduct assessment across productions (

2.2 Previously Proposed Frameworks

Various major media institutes and foundations, including the Center for Social Media, the Fledgling Fund, the Knight Foundation and the Rockefeller Foundation, have proposed systematic frameworks for impact assessment (Barrett & Leddy, 2008; Clark & Abrash, 2011; Figueroa, 2002; KnightFoundation, 2011). Each of these organizations has released such a framework, which typically involves the measurement of impact along five to seven dimensions that entail the following: the aforementioned quantitative metrics plus influence on the individual, community and societal level.

The main limitation with these frameworks is their normative and theoretical character, which means that applying them for practical purposes might require adaptations and changes to make them actionable. Furthermore, the indicators recommended in prior frameworks are highly similar to the anecdotal evidence mentioned in the case studies section. In terms of methodology, these frameworks typically combine simple cumulative frequency counts (number of screenings, viewers, website visitors and supportive organizations) with analyses of small samples of narrative descriptions from self–reports from participants.

Even more relevant with respect to our project is the fact that some of the proposed frameworks include indicators related to capturing social networks: for example, “interorganizational collaboration” (Fledgling Fund), “network building” (Center for Social Media), and “network cohesion” (Rockefeller Foundation) have been mentioned as key ingredients. However, there are no further details on how to collect, analyze, interpret and leverage such network data. Even where core network metrics, such as density and centrality, are mentioned (Rockefeller Foundation), these terms are simply introduced as possible metrics without providing information or practical guidance for how to use these metrics in an evaluation process.

2.3 Academic Research

The majority of scholarly work on this topic is confined to studying psychological effects of documentaries on individual viewers. Thus, most scholarly publications consider documentaries as a subcategory of mass media. A few exceptions exist: Whiteman (2004) uses a political science perspective to study several factors that affect a documentary’s impact. However, since his framework heavily depends on qualitative analysis such as observations and content analysis, it is highly similar to the first two groups of approaches.

Summarizing the reviewed families of assessment approaches, we conclude that although various types of approaches have been suggested and applied, most of them are similar in that they jointly consider traditional frequency counts on a large scale and qualitative indicators on a small scale. Several proposals have emphasized the importance of taking social networks and the content of information associated with network members into consideration. At least in the domain of assessing the impact of documentaries, these strategies are waiting to be put into action. The work presented herein is a step into this direction.

3. Method

Our solution is based on a theoretical framework that we developed by synthesizing indicators of impact based on empirically tested theories from the fields of media effects, diffusion research, social and semantic network analysis, and collective action. The resulting CoMTI (content, medium, target, and impact) framework also incorporates indicators specific to documentary evaluation that we identified in discussions with subject matter experts. This model is explained in detail below. Overall, the CoMTI framework considers a variety of stimuli that have been associated with cognitive, attitudinal, and behavioral change on the individual, communal and societal level over time. Starting from this model, our actual methodology involves three steps: building and analyzing a baseline model, a ground truth model, and change assessment. These steps are explained right after the clarification on the theoretical basis.

3.1 Theoretical Framework: CoMTI

There is little prior evidence that conclusively confirms or negates impact of media in general (Sparks, 2012). Even with advanced research designs, evidence for a causal relationship between media and impact (in that direction) remains vague. Several lab experiments have successfully shown short–term impacts. However, such highly controlled lab study settings are a limitation to the generalization of any findings to real–world situations. More importantly, the small–scale and typically point–wise nature of such studies often prevents longitudinal insights. Overall and despite many open questions about media impact, scholars agree that media content affects our perception and behavior in certain, maybe latent, ways (Bryant & Oliver, 2008; Laughey, 2007). A large common denominator of media effects research is the belief that humans can be affected by media stimuli. The holistic process of how stimuli influence people has been dissected into five categories; all of which were originally suggested by Laswell in his model of communication (Johnson & Klare, 1961; Lasswell, 1948). Most theories of media effects fit into one or more of these categories (Laughey, 2007). We use the Laswell model as a backbone for our theoretical framework by empirically identifying: What has been said (content) on which channel (medium) to whom (target) and with what effects (impact)? The Who dimension is partially entailed in the medium dimension, and will also be considered when we extract (groups of) stakeholders from network data, and by bringing text mining methods to the medium dimension. In the Lasswell formula, communication happens in order to influence a target audience. Thus, communication is conceptualized as a persuasive process (McQuail, 2010). This aligns with the goal of documentaries to lead to change in people’s knowledge and/ or behavior.

Applying the provided definition of media use, we argue that a documentary is not some one–way communication where some agent (seeks to) transfer ideas or messages to others in order to achieve certain effects, but rather a two–way process in which senders and receivers interact with each other: receivers’ responses and reactions to senders’ input form dynamic feedback loops. This inherently reciprocal and iterative process is represented in our framework as shown in Figure 1, and is essential to overcome Lasswell’s conceptualization which has been criticized for it’s a linear, one–way direction of communication flow. Such feedback loops have high practical implications as film producers and engagement workers can leverage them to model the landscape of stakeholders and discourse associated with the theme of a documentary prior to and during release in order to identify relevant social agents and themes to connect with. This helps to strategically allocate scarce resources.

Figure 1: CoMTI framework with a Feedback Loops
Figure 1: CoMTI framework with a Feedback Loops

We have synthesized the indicators of impact as suggested by prior work into a framework that we named CoMTI (content, medium, target, and impact). The framework is shown in table 2. This model is organized along the main dimensions of impact assessment and respective methods as explained below:

  • Dimension: a component or process through which a documentary can achieve impact.
  • Level: a set of sub–categories of evaluation criteria per dimension.
  • Index: a set of evaluation factors per level.
  • Analytics: suitable methods for discovering meaningful results per index category.
  • Item: a set of specific features to be measured per index.

The framework is grounded in a set of theories, and allows for large–scale, multi–level analysis by considering the following features:

  • Theoretical foundation: CoMTI is based on empirically and rigorously tested theories from domains including diffusion of innovation and information, media effects, marketing, social and semantic networks and collective action.
  • Domain expertise: CoMTI incorporates concepts specific to documentary evaluation that were suggested to us in consultations with subject matter experts from this domain.
  • Analytical Comprehensiveness: considered analytical methods and metrics originating from statistics, network analysis and text analysis.
  • Multi–modal units of analysis: CoMTI considers the entity types of people, organization and information.
  • Integrated approach: CoMTI combines traditional strategies for measuring documentary impact (frequency counts and qualitative analysis) with additional methods (network analysis, text analysis).

This framework entails a variety of stimuli that have been associated with cognitive, attitudinal and behavioral changes over time on the individual, communal, societal and global level. In this context, we consider a documentary as a special kind of media product. When it comes to identifying the impact of media content on people, prior work can be divided into three categories (Laughey, 2007):

  • Direct impact: media content can have powerful influence on the knowledge and behavior of the audience.
  • Indirect impact: media content is one of several factors that affect peoples’ behavior and cognition.
  • Null impact: media content does not have any significant influence on peoples’ cognitive and behavior.

The proposed framework assumes that (some portion of) the impact of a documentary can be measured; and that this impact can be direct, indirect or not evoked. Also, we conceptualize the entire process of making and distributing a documentary as a communication process, where participants exchange information and knowledge via behavioral signals, including natural language (Griffin & McClish, 2003). The CoMTI framework borrows elements from verified outcomes of media studies, but is also unique in the following three ways:

  • While most studies of media effects focus on one or two phases of the Lasswell’s formula, our framework models the whole communication processes around a documentary.
  • The proposed framework overcomes the linear, sender–driven, one–way flow of communication.
  • The proposed framework is tailored towards measuring the impact of documentaries by integrating dependent variables into measurable indices.
Table 1: CoMTI Framework for Impact Assessment
Table 1: CoMTI Framework for Impact Assessment

In the next section, we briefly detail every dimension of the CoMTI framework.

3.1.1 Content

Studies of media impact start from the presence or absence of certain kinds of content before measuring impact (Sparks, 2012). Taking the explicit and implicit content of a film and the communication related to (the theme of) the movie into account is essential for impact assessment and related strategic communication and interventions. The Content dimension of the CoMTI framework consists of the following levels of measurement:

  • Message: the main message that a film wants to convey. This can be elicited from filmmakers or in a more empirical fashion from film material such as the transcripts.
  • Expected Outcome: goals set by film makers for the scope of reach and intended changes.
  • Evaluation Priority: a ranked list of priorities with respect to intended outcomes, which can be elicited from producers. These rankings can be used to weight impact categories.
  • Resource: investment needed for a production, e.g. money, personnel, engagement work and follow–up activities. This information can be used to assess the effectiveness of a production: how much input is needed to move the needle how much?

The outlined levels of content are not limited to documentaries, but also applicable to other types of media data, and are related to each other throughout the data collection and evaluation process.

3.1.2 Medium

Some scholars argue that the medium or channel, which nowadays are often information and communication technologies, determine the characteristics of media products, content, and their political, economic, social and cultural usage (Innis, 2007; McLuhan, 1994). Acknowledging the importance of the medium, previous assessments of documentary impact typically report media statistics, such as the frequency of screenings, theatrical releases and broadcasts; considering higher numbers as (proxies for) greater impact (Barrett & Leddy, 2008; Clark & Abrash, 2011; John & James, 2011). One limitation with this strategy is that exposure does not have uniform impact across recipients. Prior studies on the diffusion of innovation have shown that different types of adopters perceive information at different points in the life cycle of a production and with varying degrees of depth of impact (Rogers, 2003). Moreover, social networking effects, e.g. word of mouth, strongly impact this process (E. Katz & Lazarsfeld, 2006; M. L. Katz & Shapiro, 1986). Thus, the choice of media for a documentary is likely to shape the breadth and depth of potential impact on the public.

Another problem is that prior studies do not differentiate between first–hand (seeing the actual film) versus secondary ((social) media reactions, public discourse) media exposure. We argue that this distinction matters because a) first–hand exposure is easier to track for distributors and b) secondary exposure has the potential for greater networking effects. This separation goes hand in hand with the distinction between push versus pull models for media: mass media (push) implies that communicators transmit information to large and scattered audiences (Dominick, 2007; Luhmann & Cross, 2000), while social media (pull) is based on interactions between users, and has been found to be more influential than mass media in terms of credibility, speed of message transfer and potential to change behavior under certain conditions (Bessière, Kiesler, Kraut, & Boneva, 2008; Jenkins, 2006; Keen, 2007). Corresponding data can be collected from news archives and the participatory web, respectively.

Finally, face–to–face interaction between individuals is another important channel. Interpersonal contact has been identified as the most powerful channel of cognitive, attitudinal and behavioral change (Bass, 2004; Rogers, 2003). These data are more difficult to collect than (social) media data; with (partial) mappings being possible via surveys and interviews.

3.1.3 Target

In marketing, the size of the reachable target audience matters as it determines for instance the cost–per–person of an advertisement. However for documentaries, this rationale does not apply, mainly because producers have no tangible metric for assessing effectiveness other than the number of pairs of eyes that have watched a film. Thus, the size of the audience can translate into impact, but needs to be complemented with additional factors (Barrett & Leddy, 2008; Clark & Abrash, 2011; Figueroa, 2002; John & James, 2011).

Another issue related to the target dimension is audience diversity: the more heterogeneous the audience, the broader the reach. Studies in risk communication, marketing, social influence and diffusion have shown that audiences who are homogeneous in terms of age, sex, income, education or physical proximity can limit the ripple effect of communication (Page, 2007; Prell, 2012; Rogers, 2003).

A classical finding from media effect studies is that ideas flow from media to opinion leaders to the rest of the world (E. Katz & Lazarsfeld, 2006; Lundgren & McMakin, 2011). In the CoMTI framework, formal opinion leaders, e.g. media editors and professional critics, are distinguished from informal opinion leaders, such as popular bloggers and grass–root organizations. The latter type of influencers can be identified from social media data via social network analysis (Hansen, Shneiderman, & Smith, 2010; Watts, 2007).

One common feature of previous efforts to measure documentary impact is the focus on advocacy (Barrett & Leddy, 2008; Clark & Abrash, 2011; John & James, 2011). Established communities of practice can be powerful change agents because members of tightly knit groups are subject to group norms (Drazin & Schoonhoven, 1996; Rogers, 2003). The importance of communities as change agents justifies their inclusion as a separate indicator in CoMTI.

Data for measuring the indices for the Target dimension mainly come from statistical reports by documentary producers, web analytics, surveys and archival records. For identifying informal opinion leaders, social network analysis can be used.

3.1.4 Impact

In the ComTI framework, impact is measured as a weighted function over four stimulus dimensions that are associated with cognitive, attitudinal and behavioral changes over time on the individual, communal, societal, and global level. Sometimes, a change might be clearly associated with a stimulus, e.g. the creation of a new piece of legislature or the adoption of a policy (Barrett & Leddy, 2008).

Studies in diffusion, risk communication and social contagion generally list four levels of the range of impact: individual, communal, societal and global (Kasperson et al., 1988; Lundgren & McMakin, 2011; Marsden, 1998; Rogers, 2003). In prior conceptualization of range, impact is assumed to start on the individual level and branch out to the next larger level; implying a linear diffusion mechanism from small to large scale. We do not make this assumption, but acknowledge the fact that impact might diffuse between any of these layers, maybe in an iterative or reverse fashion.

Research on human perception and behavior has identified the following sequential processes through which individuals experience change: knowledge, persuasion and decision (Rogers, 2003; Slovic, Finucane, Peters, & MacGregor, 2004). Knowledge is generated when an individual is exposed to new stimuli or information and develops an understanding of them. Persuasion means that an individual forms a positive or negative opinion towards stimuli or information. Decision follows if an individual becomes engaged in activities that lead to accepting or rejecting the given inputs. There is no common agreement on how to collect data corresponding to each these stages. KAP surveys have been used for several decades to provide information on the knowledge, attitudes and practices of health behavior and innovation adoption (Launiala, 2009).

The CoMTI framework conceptualizes the phase of potential documentary impact as involving cognitive, attitudinal and behavioral factors, and suggests corresponding indices. We choose the term cognitive because the mental activities related to knowledge acquisition are mainly of cognitive nature. Persuasion denotes the intent of communicators to induce attitudinal change in a direction desired by the senders. Attitudinal is neutral in that it does not imply any directionality of change. Behavior can be distinguished from cognition and attitude in that it represents tangible changes expressed in words or activities. We do not assume a strictly sequential order of these stages and allow for interaction effects.

In explaining changes in cognition, attitude and behavior, the network concept is vital. Numerous studies have shown that perceptions, feelings and behavior initiated by one member of a network can influence other network participants (Christakis & Fowler, 2007; De Nooy, Mrvar, & Batagelj, 2011; Marsden & Friedkin, 1993; Scherer & Cho, 2003). As discussed for the Medium dimension, social media and other forms of interpersonal interaction can be more influential for cognitive and behavioral changes than mass media exposure. Furthermore, empirical reports on measuring the impact of documentaries have listed the network of viewers or alliances of advocacy organizations as a sign of increased capacity (Barrett & Leddy, 2008; Clark & Abrash, 2011; John & James, 2011). For example, the degree of connectedness of the audience can be used to gauge the degree of cohesion of members for collective action. The sheer act of forming connections to others can be part of some behavioral change.

The temporal aspect of impact is an understudied issue. Many impact studies have relied on surveys and experiments from a single point in time, or use a survey with a before/after (watching and documentary) design (Bryant & Oliver, 2008; Sparks, 2012). The CoMTI framework incorporates the temporal aspect of impact by measuring indices at multiple points in time. In summary, the CoMTI framework considers spatial, temporal and phase–related aspects of change.

Data for measuring the Impact indices can be obtained through intensive mining of unstructured and semi–structured natural language text data, e.g. from the social web. Text mining and network analysis technique need to be used to extract instances of relevant entity classes (including people, organization and information) and detecting patterned relationships between them.

In summary, the CoMTI framework bridges the gap between theory and practice by offering a mapping from clearly defined, practically relevant and theoretically grounded indicators of impact to a) crucial dependent variables, i.e. relevant dimensions of impact and b) appropriate method for capturing, representing and analyzing these signals based on real–world data.

3.2 Methodology

Based upon our review of prior work and the proposed theoretical framework, we conclude that enabling a reliable, efficient, broad and deep understanding of documentary impact requires the capturing and analysis of the web of stakeholders and content associated with (the theme of) a movie. This implies the combination of two types of techniques:

  • Social network analysis, which helps to map and assess the structure, functioning and dynamics of the web of stakeholders (Borgatti, Mehra, Brass, & Labianca, 2009; Wasserman & Faust, 1994).
  • Techniques from Natural Language Processing (NLP) and Text Mining, which help to identify (the valence of) salient concepts and themes originating from or shared by stakeholders (McCallum, 2005).

In order to account for these requirements, we have selected specific techniques from the abovementioned family of methods to develop the following three–step methodology for assessing documentary impact: creating and analyzing a baseline model, a ground truth model, and a change model, which ultimately get compared to each other. These steps are explained in detail below.

Baseline model: First, we map the public discourse around the main issue(s) addressed in a movie prior to a film’s initial public screenings or release. This results in a baseline model. The main purpose with this step is to understand the given ecosystem of people and themes associated with the main issue of a film. This helps to identify where impact would be possible. The main issues of a film can be identified in a data driven way, e.g. by conducting topic modeling on the film transcript, and/ or be elicited from the film maker, producer or funder. Based on our initial experience, the outcomes from both strategies do not necessarily match. We decided to go with the issues identified by subject matter experts on the film since these are the topics on which they want to motivate some change. Once these issues are identified, we use ConText ( (Diesner, 2014), a tool we built for this project, to a) collect social media data (currently from from Facebook, Twitter and YouTube) and b) processing media coverage that we collect from external sources, currently via e.g. LexisNexis. The same tool is then used to construct a) social networks of agents mentioned in the bodies of unstructured, natural language text data, in structured meta–data and as social media account holders, and b) (semantic networks of) salient key terms, themes and sentiments that explicitly or implicitly occur in the text data and meta–data. The resulting networks are then used as input to conducting social network analysis. This allows us to assess the structure and reasoning about the underlying dynamics of these networks, and to identify (clusters of) key individuals and organizations that are relevant with respect to different dimensions of power and influence. We also inspect the retrieved semantic networks and key terms to identify main themes and trends in the discourse. Practitioners from the film domain can utilize this step to understand the given opportunity space, which helps them to link their campaign work to relevant stakeholders and messages, supporting them in strategically allocating resources and tapping into existing social capital and public awareness. Scientifically, this model is necessary to be able to plot any change against a baseline.

Ground truth model: Second, we extract the film’s main message or themes from genuine material from the documentary, more specifically from the transcript. This helps to understand the issues that are actually addressed in the film. We apply the same text mining techniques as in step one, but to the film transcript. This results in a ground truth model, i.e. the message that a documentary can communicate. We understand that there is much more to a film than what is contained in the transcript, namely the cast, images, sound and other aesthetic elements, which are not yet considered in our methodology.

Change model: Third, we measure whether the film has moved the needle on a given issue, i.e. we try to capture the measurable impact of a film. For this purpose, we reassess the key players and public discourse related to, i.e. co–mentioned with, the film from after its release onwards. For this step, the same types of data collection and analysis techniques as in step one are being used. First, we compare these networks and salient themes to the baseline model to identify any changes in stakeholders and language use that co–occur with the mentioning of the movie. This is change associated with the film. Second, we compare the networks and terms constructed in this step to the ground–truth model. The differences between both models indicate divergences in the perception of movie by its makers versus the public. Step three can be repeated several times throughout the life–cycle of a film to monitor changes in impact.

3.3 Technology

Conducting such analyses in a scalable and robust fashion requires an automated computational solution. To facilitate the outlined methodology, we have been developing ConText ( (Diesner, 2014). ConText is a publicly available tool that supports a) the construction of different types of network data based on unstructured, semi–structured and structured natural language text data (Diesner J, Aleyasen, Mishra, Schecter, & Contractor, 2014) and b) the joint consideration of any such text data and network data.

ConText has a graphical user interface to ease adoption by non–technical people. We also provide sample data and training material to empower non–technical people, e.g. film makers, impact producers and funders can conduct this type of impact assessment on their own.

The application domains for ConText are not limited to impact assessment: we have designed ConText as a general applicability tool for researchers and practitioners who conduct work related to the digital humanities and computational social sciences; with the limitation being that the evaluation criteria for impact from the CoMTI framework might not apply.

3.4 Illustrative Example

This section provides an illustrative example of the proposed computational solution to documentary impact assessment. The film we look at here is the The House I Live In, a documentary by Eugene Jarecki first screened at Sundance in 2012.

Baseline model: For this assessment, the funder of the film informed us that the main issue upon which the movie aims to have an impact is “mandatory minimum sentence” (MMS). We collected the international press coverage on this topic prior to film release from LexisNexis (N=167 articles). We used ConText to parse, deduplicate and preprocess these data. This set of steps transforms raw download data into to a curated corpus and metadata database.

Figure 2 shows a semantic network generated from the meta–data of the media coverage of MMS. This network was generated in ConText by linking any two index terms per article that occur within and across user–selected entity classes—in this case “subject”—and that meet or exceed the user–specified relevance score that LexisNexis provides. The visualization was generated in Gephi ( This image shows that the public discourse around MMS centers on legal issues including criminal controlled substance crime, criminal offenses and justice departments.

Figure 2: Media discourse on mandatory minimum sentencing prior to movie release (semantic networks of meta–data)
Figure 2: Media discourse on mandatory minimum sentencing prior to movie release (semantic networks of meta–data)

Figure 3 provides a summarizing visualization of the themes emerging from the bodies (as opposed to meta–data) of these news articles. This image was generated by applying topic modeling to the data and visualizing the main words for the main topics as a word cloud. These outcomes suggest that the media frame MMS as (a) a social issue centered on people and (b) a legal issued centered on drug abuse and sentencing.

Figure 3: Baseline model: Media discourse on mandatory minimum sentencing prior to movie release (visualization of topic modeling of text bodies of media coverage)
Figure 3: Baseline model: Media discourse on mandatory minimum sentencing prior to movie release (visualization of topic modeling of text bodies of media coverage)

Ground truth model: Comparing the salient themes from the media coverage (Figure 3) to those prevalent in the film’s transcript (Figure 4) shows a large common denominator: both texts (sets) portray MMS as a social issue. However, while media is more focused on prisons and violence, the film itself is more about politics related to drugs.

Figure 4: Ground-truth model: message that the documentary can convey (visualization of topic modeling of film transcript)
Figure 4: Ground-truth model: message that the documentary can convey (visualization of topic modeling of film transcript)

Change model: We assessed the media discourse on the actual movie after its release—again based on articles from LexisNexis (N = 167, selected same number of articles for comparability) that we converted into semantic networks based on the meta–data (Figure 5) and text bodies. Our results indicate that the press coverage is mainly centered on screening announcements and the director, but hardly addresses MMS—the main issue that the movie aims to have an impact on. While we as academics might consider this as a limitation, we were informed at the Sundance Producing summit where we presented our assessment of this film in 2013 that producers may aim to position a movie as a piece of art first and a communication vehicle for some issue later. This calls for a more long–term cycle of evaluation, which is supported by our methodology and technology.

Figure 5: Media discourse on The House I Live In after movie release (semantic networks of meta–data)
Figure 5: Media discourse on The House I Live In after movie release (semantic networks of meta–data)

To capture the public reaction to the movie, we also collected and analyzed social media data using ConText and NodeXL ( (Hansen et al., 2010). In the following images, accounts are displayed as nodes if they have more than 200 followers, and node size and hue increase with the number of followers. Mapping followers and followees of @DrugWarMovie—the handle for The House I Live In—shows that even though the film was successful in attracting a substantial number of followers (N = 2,804), many of them are not that important or influential themselves on Twitter (small number of accounts displayed, small node size) (Figure 6). In fact, the visually represented accounts are less and smaller in size than the accounts which the film account is following, even though the film is following less accounts (1,735) than it has followers. This indicates an asymmetry between following key players (successful) and attracting key players (less successful).

Figure 6: Twitter—sphere for @DrugWarMovie
Figure 6: Twitter—sphere for @DrugWarMovie

Zooming closer into the intersection of followers and followees (Figure 7) shows that most of these accounts are organizations involved with legalizing certain drugs. Only a few types of stakeholders that we consider as relevant in this content domain are involved in the public discourse on Twitter—more precisely one retired politician, two government workers, twelve small media companies and 33 NGOs.

Figure 7: Intersection of followers and followees (red = relevant types of account, purple = any other type of account)
Figure 7: Intersection of followers and followees (red = relevant types of account, purple = any other type of account)

We do not assume that all social media platforms to lead to the same (impression of) impact. Thus, we are also looking at another social networking service—Facebook: the semantic networks built from co–occurring and highly salient terms (defined in terms of TFIDF) that appear in the posts of the film’s fanpage suggests that the person posting those notes mainly addresses “watching the movie”, “release of the movie” and “war on drugs” (Figure 8). This represents classic campaign work. However, the user base (comments and replies to posts) not only picks up on these topics, but also brings new ones to the table, mainly related to the prison system and people of color. This finding suggests that it takes an engaged campaign worker to get a discussion started (missing on Twitter for this particular movie). Once this has been achieved, one possible form of impact is the public engaging with this topic and taking it into new directions. By looking at only one social media platform would not have allowed for gaining this differentiated view.

Figure 8: Co–occurrence of salient terms from posts on Facebook fan page for The House I Live In
Figure 8: Co–occurrence of salient terms from posts on Facebook fan page for The House I Live In
Figure 9: Co–occurrence of salient terms from comments on Facebook fan page for The House I Live In
Figure 9: Co–occurrence of salient terms from comments on Facebook fan page for The House I Live In

4. Conclusions, Discussion and Next Steps

Films are produced, screened and perceived as part of larger and continuously changing ecosystems that involve multiple stakeholders and themes. We have presented a novel, theoretically grounded, practically employed and evaluated solution for mapping and assessing the impact of (social justice) documentaries by analyzing the web of stakeholders and information related to (the main topic of) a film in a systematic, empirical and scalable fashion. This solution overcomes some of the main shortcomings of prior approaches used or proposed for this purpose. The tool we built to facilitate this process (ConText) is also applicable for conducting text mining and network analysis on data from other domains.

With this work, we are bringing text mining and network analysis to the content of information disseminated by network participants to a domain where these methods have not yet been used for impact assessment. Interacting and collaborating closely with practitioners and project partners from the documentary and media domain, we have received valuable feedback that informs us about practical needs for additional capabilities of the developed methodology and tool. For example, when we presented our assessment of the The House I Live In at the 2013 Sundance Creative Producing Summit, we had found that in classic media and on Twitter, the movie had not been successful in attracting attention related the substance matter of the film, namely mandatory minimum sentencing, but was mainly discussed as an art product. A person involved with consulting on behalf of the makers of this film informed us that they often try to frame a movie as an art product first, and as a vehicle for communicating information about some issue second.

Several limitations are present in our current conceptualization and implementation: First, our ground–truth model about a film considers only one dimension of a documentary, i.e. content as represented in the film script, while other key elements like visuals and sounds are neglected. While we do not incorporate these elements into the ground truth, reaction to it are being tracked. Second, we focus on public awareness as reflected in social media data, news coverage and interviews with focus groups. However, an additional or alternative impact goal might be political and/ or corporate change. In the near future, we plan to expand our framework and data sources to cover these dimensions as well. Moreover, we will take more dimensions laid out in the CoMTI framework into consideration through operationalization and measurement. Finally, as we have been conducting a range of case studies, we will synthesize our findings into empirical insights and try to identify patterns from these results.

Jana Diesner is an assistant professor at the iSchool at the University of Illinois Urbana-Champaign (UIUC), and an affiliate at the Department of Computer Science and the Information Trust Institute. She received her PhD from Carnegie Mellon, School of Computer Science.

Jinseok Kim is a Graduate School of Library and Information Science PhD student at the University of Illinois Urbana-Champaign (UIUC).

Susie Pak is an Associate Professor of History at St. John's University.


  • Applied_Research_Consulting_LLC. (2002). Outreach Extensions: National Legacy Outreach Campaign Evaluation.
  • Barrett, D., & Leddy, S. (2008). Assessing creative media's social impact: The Fledgling Fund.
  • Bass, F. M. (2004). Comments on “A new product growth for model consumer durables: The Bass Model”. Management science, 50(12 supplement), 1833-1840.
  • Bessière, K., Kiesler, S., Kraut, R., & Boneva, B. S. (2008). Effects of Internet use and social resources on changes in depression. Information, Community & Society, 11(1), 47-70.
  • Borgatti, S. P., Mehra, A., Brass, D. J., & Labianca, G. (2009). Network analysis in the social sciences. Science, 323(5916), 892-895.
  • Bryant, J., & Oliver, M. B. (2008). Media effects: Advances in theory and research: Taylor & Francis US.
  • Christakis, N. A., & Fowler, J. H. (2007). The spread of obesity in a large social network over 32 years. New England journal of medicine, 357(4), 370-379.
  • Clark, J., & Abrash, B. (2011). Social justice documentary: Designing for impact: Center for Social Media.
  • Corman, S. R., Kuhn, T., McPhee, R. D., & Dooley, K. J. (2002). Studying Complex Discursive Systems: Centering Resonance Analysis of Communication. Human Communication Research, 28(2), 157-206.
  • Danowski, J. A. (1993). Network Analysis of Message Content. Progress in Communication Sciences, 12, 198-221.
  • De Nooy, W., Mrvar, A., & Batagelj, V. (2011). Exploratory social network analysis with Pajek: Cambridge University Press.
  • Diesner, J. (2013). From Texts to Networks: Detecting and Managing the Impact of Methodological Choices for Extracting Network Data from Text Data. Künstliche Intelligenz/ Artificial Intelligence, 27(1), 75-78. doi: 10.1007/s13218-012-0225-0
  • Diesner, J. (2014). ConText: Software for the Integrated Analysis of Text Data and Network Data. Paper presented at the Social and Semantic Networks in Communication Research. Preconference at Conference of International Communication Association (ICA), Seattle, WA.
  • Diesner J, Aleyasen, A., Mishra, S., Schecter, A., & Contractor, N. (2014). Comparison of Communication Networks built from explicit and implicit data. Paper presented at the Computational Approaches to Social Modeling (CHASM 2014), ACM WebScience Conferenc, Blomington, IN.
  • Dominick, J. R. (2007). The dynamics of mass communication: Media in the digital age: Tata McGraw-Hill Education.
  • Drazin, R., & Schoonhoven, C. B. (1996). Community, population, and organization effects on innovation: a multilevel perspective. Academy of management journal, 39(5), 1065-1083.
  • Figueroa, M. E. (2002). Communication for social change: An integrated model for measuring the process and its outcomes: Rockefeller Foundation.
  • Gloor, P. A., & Zhao, Y. (2006, July). Analyzing actors and their discussion topics by semantic social network analysis. Paper presented at the 10th IEEE International Conference on Information Visualisation London, UK.
  • Griffin, E. A., & McClish, G. A. (2003). A first look at communication theory (4th ed.). Boston, MA: McGraw-Hill.
  • Hansen, D., Shneiderman, B., & Smith, M. A. (2010). Analyzing social media networks with NodeXL: Insights from a connected world. Burlington, MA: Morgan Kaufmann.
  • Innis, H. A. (2007). Empire and communications: Rowman & Littlefield Publishers.
  • Jenkins, H. (2006). Convergence culture: Where old and new media collide: NYU press.
  • John, S., & James, L. (2011). Impact: A practical guide for evaluating community information projects: Knight Foundation.
  • Johnson, C. F., & Klare, G. R. (1961). General models of communication research: A survey of developments of a decade. Journal of Communication, 11(1), 13-26.
  • Kasperson, R. E., Renn, O., Slovic, P., Brown, H. S., Emel, J., Goble, R., . . . Ratick, S. (1988). The social amplification of risk: A conceptual framework. Risk Analysis, 8(2), 177-187. doi: 10.1111/j.1539-6924.1988.tb01168.x
  • Katz, E., & Lazarsfeld, P. F. (2006). Personal influence: The part played by people in the flow of mass communications: Transaction Pub.
  • Katz, M. L., & Shapiro, C. (1986). Technology adoption in the presence of network externalities. The Journal of Political Economy, 94(4), 822.
  • Keen, A. (2007). The cult of the amateur: How blogs, MySpace, YouTube, and the rest of today's user-generated media are destroying our economy, our culture, and our values. New York, NY: Random House Digital, Inc.
  • KnightFoundation. (2011). Impact: A guide to evaluating community information projects.
  • Lasswell, H. D. (1948). The structure and function of communication in society. In L. Bryson (Ed.), The communication of ideas (pp. 37-51). New York, NY: Harper & Row.
  • Laughey, D. (2007). Key themes in media theory: Open University Press.
  • Launiala, A. (2009). How much can a KAP survey tell us about people's knowledge, attitudes and practices? Some observations from medical anthropology research on malaria in pregnancy in Malawi. Anthropology Matters, 11(1).
  • Luhmann, N., & Cross, K. (2000). The reality of the mass media: Stanford University Press Stanford, CA.
  • Lundgren, R. E., & McMakin, A. H. (2011). Risk communication: A handbook for communicating environmental, safety, and health risks: Wiley-IEEE Press.
  • Marsden, P. V. (1998). Memetics and social contagion: Two sides of the same coin. Journal of Memetics-Evolutionary Models of Information Transmission, 2(2), 171-185.
  • Marsden, P. V., & Friedkin, N. E. (1993). Network studies of social influence. Sociological Methods & Research, 22(1), 127-151. doi: Doi 10.1177/0049124193022001006
  • McCallum, A. (2005). Information extraction: distilling structured data from unstructured text. ACM Queue, 3(9), 48-57.
  • McLuhan, M. (1994). Understanding media: The extensions of man: MIT press.
  • McPherson, M., Smith-Lovin, L., & Cook., J. (2001). Birds of a Feather: Homophily in Social Networks. Annual Review of Sociology, 27, 415-444.
  • McQuail, D. (2010). McQuail's mass communication theory: Sage Publications Limited.
  • Page, S. E. (2007). The difference: How the power of diversity creates better groups, firms, schools, and societies. Princeton, NJ: Princeton University Press.
  • Prell, C. (2012). Social network analysis: History, theory & methodology. Thousand Oaks, CA: SAGE.
  • Rogers, E. M. (2003). Diffusion of innovations: Simon and Schuster.
  • Roth, C., & Cointet, J. P. (2010). Social and semantic coevolution in knowledge networks. Social Networks, 32(1), 16-29.
  • Scherer, C. W., & Cho, H. C. (2003). A social network contagion theory of risk perception. Risk Analysis, 23(2), 261-267. doi: 10.1111/1539-6924.00306
  • Slovic, P., Finucane, M. L., Peters, E., & MacGregor, D. G. (2004). Risk as analysis and risk as feelings: Some thoughts about affect, reason, risk, and rationality. Risk Analysis, 24(2), 311-322.
  • Sparks, G. G. (2012). Media effects research: A basic overview: Wadsworth Publishing Company.
  • Wasserman, S., & Faust, K. (1994). Social Network Analysis: Methods and Applications: Cambridge University Press.
  • Watts, D. J. (2007). The accidental influentials. Harvard Business Review, 85(2), 22-23.
  • Whiteman, D. (2004). Out of the theaters and into the streets: A coalition model of the political impact of documentary film and video. Political Communication, 21(1), 51-69.