maize mpub9970368 in

    BIG DATA

    1. Unroll.me email tracking and data sale

    Do you get too much email but don’t want to miss announcements of your favorite stores’ sales? Do you wish your email system could sense all of your not-quite-junk but not-quite-useless email and put it all in one place for when you were ready to read it? That service is exactly what Unroll.me, owned by Slice Intelligence, said it would do — filter, organize, and remove your email subscriptions and present them to you in a single email, reducing email clutter. It sounded like a game-changer.

    Then, in the New York Times’ April 2017 piece on ride-sharing service Uber and its founder, Travis Kalanick, there was a brief mention of Uber’s efforts to gather intelligence on its competitors. The Times made mention of one of those efforts: buying anonymized receipts issued by Uber competitor Lyft that had been harvested from emails accessed by the service Unroll.me. Unroll.me promised to help users organize and remove subscriptions from their email inboxes. As it turned out, a clause buried in the terms of service all users agreed to allowed Slice to sell their data as it saw fit.

    This case study explores the tensions that can be surfaced when a company engages in large-scale data collection behind the scenes while marketing itself as a customer-facing solution.

    Resources

    Discussion questions

    1. What are the key issues and perspectives in this case?
    2. The Internet Archive has a long-standing project – called the Wayback Machine – to archive web pages over time. Go to http://web.archive.org and type unroll.me into the search bar. Find the version of the site archived before the New York Times story broke on April 24, 2017. Compare that home page to the current one. What is the same, and what is different? What language do you see about the benefits and risks of using the tool? Was Unroll.me’s marketing appropriately descriptive of the service? Explain your response.
    3. Permission to harvest data from emails was in the end user license agreement (EULA) that subscribers agreed to when they registered for Slice’s free Unroll.me service. Is what Slice/Unroll.me did OK? Why or why not?
    4. Did Slice/Unroll.me have an obligation to compensate for the fact that most people either skip reading EULAs or are sidelined by EULAs’ length or legal language? Do you think consumers should actively take measures to prevent their data from being sold in this manner?
    5. Do you think the large public outcry against Unroll.me was warranted? Why or why not?
    6. Several of the articles above state some version of, “If you’re not paying for it, you’re the product,” meaning that you are “paying” for so-called free services by contributing your data and personal information. Do you think this is always true? If so, how much more thought do you think consumers should put into which services they use?
    7. Is it practical for users to avoid all services that might try to sell their data in this way? Explain your thinking.
    8. Google recently announced that Gmail would stop harvesting user data from emails sent and received, a practice it had engaged in since Gmail’s public release in order to customize ads served up to individual accountholders. Was what Slice Intelligence did substantially different from what Google was doing? Why or why not?
    9. Google has stopped Gmail from operating in this way, but all other Google services harvest user data for sale in some way. Should consumers stop using Google services? As a follow-up, if consumers did stop using Google services in large numbers, what would the consequences be?
    10. How can users work to be more aware of what services are doing with their data? Some argue that the simple answer to scenarios like these is to just always read the terms of service agreement. Is this a reasonable expectation? Why or why not?

    2. Big Data and discrimination

    Big Data seems to be invading our lives. From how it influences our car insurance premiums to the kinds of information we see online, Big Data is pervasive. Big Data is not just everywhere though. It’s constantly behind the scenes making decisions and drawing conclusions for us. And while it’s great, for example, that banks can notice when your banking behavior changes, it can be disturbing to consider that banks know where you are based on your purchases. It can be even more disturbing to realize that software and algorithms are making decisions about you without the benefit of human intervention.

    The use of Big Data can be both a benefit and a hindrance because it illuminates our actions and cultures in many ways and then it tries to digitize future thinking. For example, politicians can point to data about the United States prison population as evidence of the high rate of incarceration of the African American male population. The conclusions we draw about those high rates of incarceration may provide evidence of past discriminatory practices. Or consider this: what if judges decided what a convicted criminal’s sentence would be based predictions of future, not past, crimes? Algorithms like that might have been designed to protect society, but should a current crime’s punishment be based on the possibility of future criminality without human intervention? An unintended consequence could be that the cycle of discrimination would continue.

    Americans seem to inherently value equality and have huge faith in how technology can level any playing field. Against that backdrop, the question of whether technology could potentially perpetuate inequality is an urgent one. In this case study, we’ll consider how Big Data has the potential to both help us address current discriminatory practices and perpetuate past inequalities.

    Resources

    Discussion questions

    1. How is Big Data improving our lives now? What is the potential for improvement in the future?
    2. What are some examples of how Big Data has been used to discriminate? Do you see the discrimination as intentional or as an accidental byproduct of well-intentioned people? Why do you see it that way? If it is accidental, how can well-intentioned people address the issue?
    3. Who is likely to benefit from the use of Big Data? How does the identity of those beneficiaries inform issues around discrimination?
    4. Based on the readings, who has been most affected by discrimination in Big Data? Why do you think those populations are disproportionately impacted?
    5. Can you think of other examples when the advance of technology has had intended or unintended negative consequences? What can we learn from those examples when considering Big Data and discrimination today?
    6. From where does the Opportunity Project get its data? How has it helped to harness Big Data for positive impact? From what you see, what is the project doing to ensure that the data is not being used in a discriminatory fashion?
    7. After reading the Timm piece, browse a few other articles on the Insurance Thought Leadership website. What, if any, interests does the insurance industry have that might influence their thoughts on discrimination and the use of Big Data? Do you see other authors on this list whose perspective might be less objective? Why?
    8. Regardless of intent, how do we work toward minimizing Big Data’s potential for discrimination today?
    9. Do you think that Big Data will likely lead to discrimination in the future? Why or why not?

    3. Television sets collecting data without notifying consumers

    Beginning in 2014, Vizio manufactured internet-enabled televisions that continuously tracked and retrieved records of what consumers watched through proprietary software which was turned on by default. Vizio earned revenue by selling data about consumers’ television viewing history to third parties for three uses:

    1. audience measurement;
    2. analyzing advertising effectiveness (in such cases data was used “in the aggregate”); and
    3. targeting advertising to particular consumers on their other digital devices based on their television viewing habits.

    No names were attached to the data sold to third parties; however, it did include details such as sex, age, income, education, and marital status. Along with these details, Vizio provided their clients highly-specific, second-by-second information about television viewing activities in each household.

    In February 2014, a remote update installed tracking software to older model Vizio TVs. After the update, a pop-up notification appeared onscreen for one minute, stating that Smart Interactivity had been enabled and directing consumers to the Vizio website for more information. This notification provided no information about the collection or transmission of viewing data.

    Ultimately, the Federal Trade Commission fined Vizio $2.2 million for the unauthorized collection of this data, and Vizio was required to delete all of the data it collected under this program.

    Resources

    Discussion questions

    1. Some of the data about consumers that Vizio collected and sold was aggregated, showing trends in groups, while some was associated with individual users. In your opinion, is there a significant difference between these two ways of packaging consumer data for sale to third parties?
    2. How much specific information about an individual consumer would constitute a violation of privacy?
    3. Some invasions of privacy are egregious, like knowing that a hacker has posted your username and password online or discovering that someone has engaged in identity theft with your identity. In this case, the company would know when you watched television and what you watched. To what degree are you comfortable with this information being siphoned out of your television, potentially without your knowledge? Is TV viewing data worthy of protection? What kinds of conclusions, including possibly erroneous ones, could others draw about you based on TV data?
    4. Owners of Vizio TVs always had the option to go into their settings and turn off the software that was collecting their viewing data. Is the opportunity to opt out sufficient? Or should the law require that consumers opt in, effectively requiring default settings to prioritize consumer privacy?
    5. What kinds of anonymized data should companies be empowered to collect for the purpose of improving their product? What about for the purpose of building new profit areas by selling this data?
    6. Consumers’ viewing history is covered under certain privacy protections. As we see in the FTC filings linked above, “Practices are not permitted if they are likely to cause harm to consumers and that is not outweighed by countervailing benefits to consumers or competition” (see page 174). Do you believe there is potential injury to consumers in the gathering and selling of this data? What do you think would be the potential benefits to consumers or competition?
    7. The law also protects consumers from misrepresentations or deceptive omission of relevant information. Would you argue that setting the televisions to record and transmit details of viewing behavior by default constitutes intentional concealment? Or is it the consumer’s responsibility to discover what information her electronics are recording and transmitting about her behaviors? How would you define the consumer’s responsibilities with regards to protecting her own privacy?
    8. Vizio television sets tend to be priced at a lower price point than other television brands. Could you argue that Vizio’s practices were beneficial or harmful to consumers with lower incomes?
    9. Other electronics companies also collect this sort of data about customers, but only after customers have opted into the program. Further, other companies don’t seem to have sold data that can be used to link that viewing data to other devices a consumer owns. In Vizio’s business model, the data about a consumer’s viewing habits were associated with an IP address; because the IP address is linked to any internet-connected devices in the home, Vizio’s sale of this data enabled advertisers target ads to a consumer’s other devices, such as a phone or tablet, based on what she had watched on television. When the data about TV viewing habits is used to reach the same consumers in other venues, does this constitute an important distinction with regard to privacy?

    4. Big Data and self-driving trucks

    Long-haul automated trucking is being tested by multiple companies aiming for benefits in productivity, health, and safety. Like self-driving cars, autonomous trucks (with back-up professional drivers) are equipped with sensors, lasers, video cameras, and guidance systems which gather and process massive amounts of data.

    The goal of Daimler, Otto, Peterbilt, Volvo, and other companies is to use this data to enable four-ton (80,000-pound) behemoths to steer, accelerate, and brake as skillfully as experienced professional truckers do under challenging road conditions, in problematic weather, and surrounded by vehicles driven by less predictable human drivers.

    Recent events have raised questions of privacy and cybersecurity for all self-driving vehicles. The benefits of continual operation at optimal speed and the mitigation of errors made by sleep-deprived truckers are counterweighted by the special concerns of long-haul driving such as the maneuverability and stability of massive vehicles; regular transportation of hazardous cargos; and the potential loss of blue-collar jobs.

    The costs of testing self-driving trucks are much higher than for self-driving cars, so extensive highway trials of heavy vehicles has yet to occur. This interval offers us opportunities for thoughtful policies and regulations in advance of a general roll-out of long-haul trucking with varying levels of driver automation.

    Autonomous trucking has additional complexities beyond those for personal autonomous vehicles given its unique economic pressures: salaried drivers and businesses’ expectations of rapid delivery. The constellation of articles presented below provide a broad landscape of the issues related to the trucking industry, including

    • safe working conditions for drivers;
    • potential impact on commerce and employment;
    • industry definitions of levels of autonomy;
    • planning for effective testing of autonomous or semi-autonomous trucks; and
    • permitted activities on a state-by-state level.

    If time allows, we encourage your participants to read most or all of these articles. If time is limited, consider asking participants to divide into small groups, each taking one or two articles, and then summarizing the key information for the larger group.

    Resources

    Discussion questions

    1. How are the challenges of developing self-driving trucks different from those for self-driving cars?
    2. What do major stakeholders – truckers, trucking companies, and government – agree are the most important issues to be resolved?
    3. Brainstorm how each of these issues could be addressed via incentives, voluntary policies, government regulation, and legislation.
    4. Which combination of ideas in Question 3 offers the most promising benefits in comprehensively addressing stakeholder concerns?
    5. If you prioritize these possible solutions, how would you order them from high to low?
    6. What criteria did you use to rank them? Why? Whose perspectives are most satisfied in your priorities?
    7. What stakeholder concerns are poorly represented? How might these be addressed by modifying your solutions?
    8. Does anything you have discussed in this case study change how you feel about self-driving personal vehicles? Why or why not?

    5. Predictive policing: The seduction of technology

    The 2002 sci-fi film Minority Report describes a future where policing is guided by troves of data that will predict where and by whom crimes will be committed. There are indications that predictive policing — using past data to predict where future crimes may occur — is gaining widespread acceptance today in some of the nation’s largest cities.

    Algorithms have been developed to use Big Data to help police focus on locations and people most likely to be associated with crime. While these tools are intended to reduce bias and racial profiling, they may instead reinforce and amplify the prejudices they seek to eliminate.

    While predictive policing could potentially be used to study officer behavior or help with mental health interventions, to date most initiatives serve to validate and reinforce unfair law enforcement practices, especially among people of color.

    Debate over how we reconcile civil rights law with the collection of this data, who should have access to the data, and the profit motives of the corporations that sell body cameras and cloud storage that collect and store the data are very real concerns.

    Resources:

    Discussion questions

    1. Which civic and industry groups benefit most from the use of predictive policing as a strategy and a tool?
    2. Can predictive policing be salvaged as a way to reduce crime? How could it be improved?
    3. How do we reconcile the 4th Amendment right to prove “reasonable suspicion” as a condition of arrest with computer-generated probabilities based on location or social networks? Will the laws have to change to accommodate technology?
    4. How might local governments be incentivized to invest in predictive tools for use beyond criminal behavior, to instead distribute social services or combat police brutality?
    5. What businesses or industries might be accidentally impacted by predictive policing? For example, how might predictive policing indicate “less safe” areas that might discourage tourism or restaurant visits in an area? How might real estate values be impacted?
    6. With such tight security around the data these programs produce, who will audit the results and methodologies? Who should have access to the data?
    7. Are you personally more comfortable with geospatial predictions of crime than with identification of potential criminals? Why might that be? Are there any pitfalls in thinking that place is a better indicator than person?
    8. What are the pros and cons of using “deep learning” and AI when identifying patterns in policing practices from body cameras?
    9. Discuss the potential privacy concerns with developing a “heat map” of a city’s most dangerous people based on Big Data predictions. If, as described in the Verge article, these lists are not available under the Freedom of Information Act, as other civic documents are, what civil rights does a “potential” offender have?
    10. Is free speech violated if social network posts are added to the data used to identify the “most dangerous” people? Do these rights extend to online “speech?”
    11. Does the expansion of predictive policing into white collar crime equal the playing field or extend self-reinforcing crime fighting into yet a new area?

    6. Big Data in banking and loans

    Big data is the collection of personal data, such as browsing history, address, age, gender, and purchasing habits. Personal data can be collected by companies who specialize in gathering information, or it can be done by companies like Facebook and Google, but the uses are largely the same. Companies rely on Big Data brokers to create user profiles, which are often used for marketing. A home insurance company will be interested in targeting people who have just bought homes. A car company that sells minivans will be interested in targeting families with more than one child. The collection of personal profiles allow companies to target particular segments of a population for the products and services they offer.

    Use of Big Data has increased at an exponential rate over the last few years, and with its growth, so has its applications. In the United States, there are actually no laws that specifically prohibit the use of Big Data. Instead, companies regulate their Big Data usage by comparing their proposed project with privacy policies, contractual agreements, and laws related to consumer discrimination.

    In the banking industry, the two major limiting factors in the usage of Big Data are the Fair Credit Reporting Act and the Equal Credit Opportunity Act. Despite these regulations, there are concerns that Big Data is being used to discriminate against different groups in the case of loan applications, mortgages, and other bank related services.

    In this case study, you’ll look at how Big Data is being used in marketing and banking — with sometimes unanticipated results.

    Resources

    Discussion questions

    1. What details about your life produce a digital footprint? What in your life would you prefer that banks and mortgage lenders not know about you?
    2. What is a proxy and how can it be used? What are some of the limitations to using a proxy for predictive algorithms?
    3. Why would banks find the use of Big Data helpful?
    4. A number of companies have written algorithms that significantly reduce the likelihood of credit fraud, and thus pre-emptively save banks money. However, what are some issues with determining the risk of an applicant by Big Data profiles and algorithms?
    5. What is the position the FTC report takes on the use of Big Data in targeting groups and assessing risks? Do you agree with it? What might they have overlooked? Underemphasized? Why should businesses have access to Big Data?
    6. What are some of the laws in place to protect users? Do you think they are enough? Why or why not?
    7. How do the perspectives of the resource authors affect their message? What are some of the limitations on the various views?
    8. Considering the article from MReport, do you think businesses are listening to the ethical position in the FTC report? How did you decide your answer?
    9. What is the ethical difference between using personal data to target people for ads and using personal data to rush people through mortgage approvals?

    7. Bias in student predictive analytics data:  Does it help or hinder potential prospects/relationships?

    Predictive analytics is the practice of employing various statistical methods to examine current and historical information to forecast what may transpire in a particular area or in relation to a particular behavior in the future. It is increasingly used in education to assist teachers and administrators in their assessments of what interventions or programs will help students to achieve higher rates of success or mastery of content. That sounds like progress and a real help to schools where needs are high but budgets for interventions are limited.

    However, student panel participants at the EduCon 2.9 conference expressed concern. A student at Macomb Community College in Michigan said, “We don’t know who is choosing [the data] and who is pulling the strings.” In summarizing that conference session, Dobo (2017a) condensed their concern, saying that students “worry that the data will be used to label them before they have a chance to make their own impressions on a teacher.”

    Advocates of these predictive programs say they help educators find and help students at risk of failure, but the students on the panel presented another side of the story. What happens if this information is used against us? Will a digital dossier — possibly with inaccurate, incomplete or out-of-context data — follow us forever (Dobo 2017b)?

    Resources

    Discussion questions

    1. In the DeVaney article, how does the University of Michigan team frame predictive analytics as a positive? Which of their points feels most compelling? What concerns you?
    2. In the DeVaney reading, the Michigan team asserts that predictive analytics can better personalize the learning experience. How might it do that? How might it not?
    3. Consider the student voices in Dobo’s “Students Worry” article (2017a). Could the use of predictive analytics, which is meant to accelerate student growth, restrict students options, choices, and futures? Can you envision any scenarios in which a student’s past behavior might accurately or inaccurately predict future performance?
    4. What if one of the data points fed into the system was a student’s behavior record? Does that change how you feel about this issue?
    5. Is it likely that “digital dossiers” will be used to help or hinder students in seeking education or career opportunities?
    6. How do students who aren’t good “test takers” deal with the reality of predictive analytics?
    7. How will schools account for the fact that some of the instruments they rely on in predictive analytics (surveys, etc.) may have incorrect, misleading, or missing information?
    8. The report by the Education Commission of the States indicates that teacher perceptions of student performance can significantly impact actual performance, a phenomenon the report calls a “self-fulfilling prophecy.” How might this phenomenon help some students? Hinder others? How might teacher preparation programs and/or colleges and universities intervene regarding potential student performance expectation bias by teachers?
    9. Does what you now know about the data being collected and analyzed on the University of Michigan campus (primarily in large undergraduate courses, and only in a fraction of all courses) change your idea of what college will be like for future college students?
    10. What can you do to ensure that your “dossier” portrays accurate information about you?

    8. Cross conversion tracking: Linking in-store purchases with online ads

    Monday you buy a pair of socks through Amazon. Then on Tuesday, you suddenly have six ads parading a variety of socks accross your computer screen as soon as you open your browsing window. The ability to track both browsing behavior and purchases is a trademark of the modern world. Google, Facebook, and Amazon are able to create personalized profiles of your online browsing and shopping habits down to what kind of items you buy online, how often you purchase them, when you purchase them, and how much you spend. Many companies now believe that the key to finding the perfect market is in big data algorithms, and thus a considerable amount of money can be found in online advertising. This case study will look at how the Big Data mentality transforms business practices in marketing.

    Recently, Google unveiled a new campaign to track cross conversion purchases. Cross conversion tracking refers to a company’s ability to track multi-device purchases, and thus create algorithms to predict purchasing habits. For example, an individual might be scrolling through scarves on her smartphone while waiting for the bus, and see something they really like in an ad – but not buy it. A week later, when she’s on her laptop, she might go back to the store’s website, or the store itself, and buy the scarf. Google is able to connect the smartphone behavior to the laptop activity when the individual is logged into their Google account on both devices. By buying credit card purchasing data from a source like Axciom, Google can now triangulate the data and know where purchases are being made at in-store locations (but only in terms of the store and total amount spent; they do not know the individual items purchased). By combining the two data sources, Google can often figure out whether your online browsing yielded a purchase at that store. The browsing/purchasing information is then anonymized and used as data to predict purchasing habits based off exposure to online advertising. Google’s cross conversion tracking service is very useful to stores who want more precise data to better plan customer advertising campaigns. Google announced this campaign to companies that buy advertising from them in May of 2017. However, in July 2017, a privacy rights watchdog filed a legal complaint with the Federal Trade Commission.

    The privacy watchdog wants the government to review Google’s algorithm, wants the companies that sell Google the credit card information to be disclosed, and for opt out options for Google tracking to be more accessible and transparent. The opt out option for tracking is important, because if an individual is not signed into Google or is using Google’s Incognito Mode, then behavior will not be attributed to that particular user; thus, that particular user will be untrackable. Similarly, if someone goes to a store but uses cash, that customer’s identifying information (e.g., credit card number) cannot be tracked. In this case study, participants will become familiar with customer conversion tracking practices and discuss its relevance or lack of relevance to their lives as consumers.

    Resources

    Discussion questions

    1. How would you explain cross conversion tracking to a neighbor? To someone who already advertises online? To a brick-and-mortar store in your community who is looking for ways to maximize revenue?
    2. Why might Google have decided to create this technology in the first place? What might have motivated them to release this information in May 2017?
    3. What might be some limitations in the analytics algorithms or linking online ads up with online purchases and in store purchases? Do you feel that your on- and offline shopping habits would be captured accurately? Why or why not?
    4. Why might Google be unwilling to share the algorithms for creating the “double-blind” method of anonymizing data?
    5. Some scholars have been able to take anonymized data and reconnect it to someone’s identity. Does that change how you feel about the issue?
    6. If the government — and not Google — were able to track your shopping behavior, would that change how you felt about the issue?
    7. Tracking your in-store purchases doesn’t just mean Google knows how much you spent at a store. It also means it is quite likely Google knows where you were at a particular time. Does that change how you feel about this issue?
    8. Google is able to connect online and offline behavior because purchasing with a credit card creates a trail of data. Cash transactions cannot be tracked (unless, perhaps, one uses a customer loyalty card or customized coupon sometime in the future). Do these articles make you think differently about using cash versus a credit or debit card for your own family’s purchases? Why or why not?
    9. Sometimes, the same story can be reported in different ways in different publications. What is the general tone of the Business Insider article? The one from the Washington Post? What is the significance of the differences? Does either make you more or less confident about the issue? More or less comfortable?
    10. If you had the power to change any ethical problems with this practice, what would you change it to? What kind of legislation would you like applied to the issue? Why?
    11. Considering that individuals’ personal spending and browsing data is already collected online, does linking that information up to in-store purchases make a considerable difference in personal privacy? Why or why not? Additionally, personal information comes from a large conglomeration of sources, making it difficult to determine where it came from. Do you think this influences Google’s willingness to disclose their information providers?
    12. Does reading about and discussing this issue make you want to take action of some kind, like changing your Google Privacy settings? Why or why not?

    9. The ethics of Mechanical Turk

    The original Mechanical Turk (public domain).

    The original Mechanical Turk (public domain).

    Since 2005, Amazon has enabled access to a crowd labor exchange called Mechanical Turk (https://mturk.com), allowing anyone with Internet access to sign up to complete web-based microwork requiring human intelligence. Described as a “free market for digital labor,” Mechanical Turk allows as-needed recruitment of individuals to perform tasks including transcribing handwritten forms, rating and tagging content, completing surveys, and writing captions. As independent contractors, those working as Mechanical Turkers are not bound by federal labor laws and consequently are frequently earning significantly less than the minimum wage.

    The average wage for each task averages a few cents, and, if performed quickly, can generate around one dollar an hour in aggregate, before the 10 - 20% commission taken by Amazon and the self-employment taxes Turkers are required to pay on their earnings. Despite the absence of geographic barrier to participation, 80% of the people working as Turkers live in the United States.

    Resources

    Discussion questions

    1. Mechanical Turkers set their own hours and are not under any obligation to accept any particular task and can choose tasks that they find most interesting or the best paid. Why might this be attractive to potential workers? Is it attractive to you as a part-time supplemental job or as full-time free-lance employment?
    2. Mechanical Turk work requires available infrastructure in the form of computer access and connectivity. U.S. MTurkers have been found to be mostly female and white, and to be somewhat younger and more educated than the U.S. population overall. What are potential explanations for the demographic?
    3. Requesters seeking to recruit Mechanical Turkers must provide a billing address in the U.S., Australia, Canada, or the UK. Some 80% of Mechanical Turk workers are located in the U.S. Does this seem to violate the spirit of federal labor regulations?
    4. Approximately 30% of American workers are in the same category as MTurkers: independent contractors. Do you feel that is an accurate way to categorize gig economy opportunities like MTurkers, Uber drivers, and AirBnB hosts? Does it matter that five large companies post more than half of the tasks on the Mechanical Turk site?
    5. Professors and researchers must “publish or perish,” meaning that having new research about which to write academic articles is critical to how their career progress is evaluated. Today, there is greater competition for a shrinking pool of grant dollars. Some have found MTurk to be a cost-effective way to get a large quantity of survey data collected, images classified, or other feedback gathered. Might the platform’s necessary emphasis on speed affect data quality?
    6. Continuing on the research theme, discuss who the “average” MTurker is. If a researcher looks to this “average” user, will she get a representative sampling? What are the risks of MTurk as a survey or research population?
    7. Translation is one of the tasks performed by Mechanical Turkers. Do you see this as a threat to professional translators? Why or why not?
    8. The platform name is derived from a famous 18th century chess-playing automaton that appeared to play flawless chess or other board games. Nicknamed “the Turk,” the mechanical mannequin stood behind a desk-like piece of furniture and appeared to make the moves himself. The Turk is said to have bested both Napoleon Bonaparte and Ben Franklin but was later revealed as a hoax when it was discovered that inside the cabinet, a human chess master was controlling the Turk. What does the name “Mechanical Turk” indicate about the platform’s desire to obscure the human nature of the work in favor of the appearance of technical wizardry?

    10. The dark side of data: Using data as a means of stalking, surveilling, or preying on vulnerable populations

    Big Data, the umbrella term for harvesting numerous pieces of information about individuals and their personal preferences and collecting that information in huge repositories to be sorted by computer programs, has opened the door to the brokering or selling of that information to be used in various ways. Data can be collected about personal factors, interests, and lifestyle choices. That information comes from a variety of sources, and increasingly, those sources are being mashed together to yield even more precise profiles about who we are and what we do. Most of us are generating significant quantities of data via e-commerce, email, electronic health records, government data, and social media (including presence on particular sites, those with whom we have social connections, content and timing of social media posts, etc.) This has opened the door to the brokering or selling of that information to be used in various ways.

    In many instances, this data can be used for our benefit, such as in the case of health care professionals sharing information in order to assure us of the best care. Conversely, the same data can be deliberately harvested to target specific individuals or groups in order to sell products, as well as to scam or scare people into taking action that may not be in their best interests.

    In this case study, you’ll explore how the increasing quantity of collected data can shift from being helpful to harmful, frightening, or manipulative and consider what best steps forward might be.

    Resources

    Discussion questions

    1. Is there a need to establish norms for the responsible use of data? If so, who would administer and ensure compliance? Does your answer change depending on what type of organization might be managing your data (e.g., not-for-profit, government, corporation, school)?
    2. Should companies and organizations be more transparent about their data collection? What can companies and organizations do to ensure their customers and clients understand and are willing to participate in the use of their personal data (beyond providing standard privacy policy statements which people tend to accept without reading)?
    3. Is it important to ensure that both customers and clients understand the purpose of data capture and preservation as a means of having accurate information?
    4. As people become more aware that companies/organizations are capturing and keeping data, should we be concerned that they will deliberately provide misleading or false information that may have negative impact? (See the Schneier post above).
    5. How much say do you think you should have in who has access to and may broker data about you?
    6. Some of the articles talk about dynamic pricing that adjusts depending on who the algorithms determine the customer is, using such indicators as zip code to determine income and level of education, etc. Do you think that is a fair practice for consumers? For businesses?
    7. Opting out of some data collection can minimize what companies know about you, but it also might mean you lose eligibility for preferred pricing, coupons, or other discounts. How do you feel about that?
    8. It’s easier to opt out when you know that information is being captured. In some of these articles, customers’ data (such as knowing a phone’s location because it is pinging and looking for WiFi hotspots) is being captured without their knowledge. How do you feel about that?