More ways to avoid getting tripped up by bad data in the news

Call it fake news, bad information, or merely a well-intentioned reporter whose nose for news outweighs his/her data skills … learning to comprehend data in the news is tougher than ever before.

In this June New Yorker article, Michelle Nijhuis shares some of the strategies from the University of Washington’s course “Calling Bullshit in the Era of Big Data,” taught by Jevin West and Carl Bergstrom.

Here are some tips they recommend:

  • Recognize that bullshitters are different from liars, and be alert for both. To paraphrase the philosopher Harry Frankfurt, the liar knows the truth and leads others away from it; the bullshitter either doesn’t know the truth or doesn’t care about it, and is most interested in showing off his or her advantages …
  • Upon encountering a piece of information, in any form, ask, “Who is telling me this? How does he or she know it? What is he or she trying to sell me?”
  • Remember that if a data-based claim seems too good to be true, it probably is. Conclusions that dramatically confirm your personal opinions or experiences should be especially suspect …
  • Use Enrico Fermi’s guesstimation techniques to check the plausibility of data-based claims …
  • Watch out for unfair comparisons. Claims that many more people watched the video stream of the Trump Inauguration than that of the first Obama Inauguration, for instance, failed to acknowledge the vastly greater availability of streaming video in 2017.
  • Remember that correlation doesn’t imply causation. A correlation between two variables (ice-cream consumption and shark attacks) may well be due to a third variable (summer weather). These days, spurious correlations often emerge from data mining, the increasingly common practice of trawling large amounts of information for possible relationships …
  • Beware of Big Data hubris. The Google Flu Trends project, which claimed, with much fanfare, to anticipate seasonal flu outbreaks by tracking user searches for flu-related terms, proved to be a less reliable predictor of outbreaks than a simple model of local temperatures … Like all data-based claims, if an algorithm’s abilities sound too good to be true, they probably are.
  • Know that machines can be racist (or sexist, or otherwise prejudiced) …
  • Mind the Bullshit Asymmetry Principle, articulated by the Italian software developer Alberto Brandolini in 2013: the amount of energy needed to refute bullshit is an order of magnitude bigger than that needed to produce it.

 

 

How does the government spend money? v. 2

A while back, we showed you retired Microsoft CEO Steve Ballmer’s USAFacts site, which he built with a team to help Americans discover how the U.S. budget is being spent.

Back in May, team member Connie Williams shared the beta site for USASpending.gov (slowly transferring over to the “actual” site over the summer), which is a federally-mandated site for viewing government data.

Why not take both for a spin and think about how a compare-and-contrast task like this could be beneficial to your students?

Civic Engagement with Data at Carnegie Library of Pittsburgh

Photo of green maple leaves in dappled sunlight

We caught up in May with Eleanor Tutt of Carnegie Library of Pittsburgh, who also has funding from IMLS to work on data but with a focus on citizen engagement with data. Here’s an excerpt from a blog post outlining some of their work:

Data seems to come up in all sorts of conversations these days, and they reach way beyond math class. For example, civic data—which includes information about our city and citizens—is a great way to engage with your community on a deeper level, and can be a powerful tool for change! Since civic data is about the people and places you see every day, it can be tough to notice. Based out of CLP-Main, the Civic Information Services team is helping to uncover and share the ways data fits into life at the Library and throughout Pittsburgh, and we have a lot of fun stuff in store.

The STEM Committee has been busy sowing the seeds for their Super Science Kits, and we just couldn’t wait to join them. Some of our favorite collaborations so far can be found in the Tree Kit. Two activities included in this kit feature data front and center: “Forest Logbook” and “Make a Tree Map.”

Have you ever kept a nature journal? Ever taken notes while walking in the woods? Surprise! You were actually collecting data. The “Forest Logbook” activity invites you to tame wild data with a pencil and paper. While on a short nature walk around the library, kids will keep a close eye on the plants and animals they encounter, making notes as they go. Collecting nature data is especially exciting because we can measure anything from the size of a tree trunk to the furriness of a squirrel’s tail. And what fun would our data be if we couldn’t share it with friends? The group is encouraged to share and compare data with each other, which gives us the chance to spot similarities and differences. This activity serves as an easy introduction to observation and collaboration, both of which are crucial steps in data collection. While the trees are busy making oxygen outside, do you ever wonder what’s up with the air in your own home? You can check out one of our Speck Air Quality Monitors for some super practical data collection.

Pittsburghers are really lucky when it comes to data, because the Western Pennsylvania Regional Data Center gives us access to a bunch of cool civic information, from playgrounds to bus stops. The dataset we use for our “Make a Tree Map” activity was created by the City of Pittsburgh. Taking a close look at data and comparing it to what we see outside is an important part of data literacy, as we can use that step to determine why and how data is collected. After that, it’s time to create our own tree maps! Because we can create our map using characteristics from climbability to circumference, each one will be a totally unique look at the same set of data.

Sound fun? Read the rest of the post here. Also, here’s some trivia: at the time of his retirement, my great-uncle was the longest-serving employee at this library system, having spent over 40 years as a bookbinder.

Image: Pixabay.com (public domain)

 

 

Sharing Bear Locations with Tourists … On a Delay

Photo of black bear standing in grass

Here’s a snip of a cool article from the Smithsonian that reminds us that while data collection and sharing can be great, sometimes data’s immediacy can cause new problems and it’s important to put the brakes on. Sure, park rangers at Yosemite want to help visitors learn what bears do and how they move, so why not share the GPS data of some bears? At the same time, some tourists to the park, armed with real-time data, might use it to find bears … and that disrupts things. From the article:

Hundreds of black bears amble through … Yosemite National Park in California … [N]ow, thanks to a new tracking system, fans of the furry animals can follow the creatures’ meandering paths—from the safety of their couch.

As Scott Smith of the Associated Press reports, the park recently launched a website called Keep Bears Wild. One of the site’s main features is the aptly-named “Bear Tracker,” which traces the steps of bears that have been fitted with GPS collars. But the animals’ locations are delayed, Ryan F. Mandelbaum reports for Gizmodo, so curious humans aren’t tempted to scout the bears out. Rangers can turn the data on and off, and tracks will be removed during fall and winter to ensure that the bears can hibernate peacefully.

The goal of the project is to educate the public and whet the appetite of bear enthusiasts, without putting anyone in danger …

These may seem like intuitive precautions, but bears are repeatedly threatened by their interactions with humans. More than 400 of Yosemite’s bears have been hit by cars since 1995, according to the Keep Bears Wild site. And bears that feast on human food can become aggressive, forcing rangers to kill them “in the interest of public safety,” the site explains.

While the Bear Tracker provides limited data to the general public, it is also useful to park rangers, who can view the bears’ steps in real time. For the past year, a team led by wildlife biologist Ryan Leahy has been using the technology to track bears on iPads and computers, according to Ezra David Romero of Valley Public Radio News. And as Smith reports, rangers can follow GPS signals and block bears before they reach campsites.

The tracking devices also help rangers learn more about black bears’ behavior. The animals can traverse more than 30 miles in two days, the data suggests, and easily scale the 5,000-foot walls of Yosemite’s canyons. The trackers have revealed that the bears begin mating in May—one month earlier than previously thought.

An interesting ethics reminder that there are times when better access data could be unintentionally harmful …

Image: Public domain from Pixabay.com

News Deserts

The Columbia Journalism Review has been assembling a national map showing “news deserts” around the United States. It’s utterly fascinating. In some states, like Nebraska, nearly half the state does not have a daily newspaper (or hasn’t reported one). How does that impact voting? Civic engagement? The sharing of local and beyond-local information among fellow citizens? What questions are raised when you look at it?

Screenshot of a detail of the Columbia Journalism Review's map of "news deserts"
Screenshot of a detail of the Columbia Journalism Review’s map of “news deserts”

ALA poster: Data literacy strategies for addressing fake news

Angie Oehrli, Tyler Hoff, and I shared a poster at ALA in which we selected some of the many data literacy strategies we’ve been working on with our team and discussed their application in helping people gain comprehension strategies to suss out fake news.

A screenshot of our poster is below. You can view a low-resolution version of our poster (<1MB in file size) here or the whopping full-resolution version (79MB!) here.

As an added bonus, our side project of creating the new 8-book series Data Geek for Cherry Lake Publishing got to show off a bit. This series, inspired by the the themes of this project (but not supported with grant funds), got its premiere at ALA!

In fact, we’ll be giving out a series set to someone after each of the eight sessions at our second 4T Virtual Conference on Data Literacy. This free event (free SCECH for Michigan educators) is coming up July 20-21. And we have another special prize for all attendees that we’re keeping under wraps for now, but we know you’ll want one! Visit our conference page to register.

Thanks to all who attended our session — we enjoyed the conversation!

Mapping Opioids

(Note: this post first appeared at the Active Learning blog.)

First things first. You may have heard of the opioid crisis, but what is an opioid? I was surprised that when I went looking for a list of which prescription drugs are classed as opioids, it was somewhat tricky to find (my hypothesis is that some people know there’s an opioid crisis but don’t know that drugs like Percocet, morphine,OxyContin,  and Vicodin are opioids, leading me to suspect that part of the problem is that some patients don’t realize that the drug they just got is an opioid).

Here’s what WebMD says:

OK, now that we’ve got that background knowledge, let’s look at how visualizations about opioid prescriptions and fatalities in Michigan can yield some fascinating (albeit sobering) insights.

Julie Mack, with some graphics by Scott Levin, has a sobering article in MLive showing how opioid death has spiked in past years. In many counties, there were more opioid prescriptions written in 2015 than there were residents. (Of course, if opioids are dosed one month at a time, one resident’s year-long prescribed use would count as 12, right?)

One thing that really jumped out to me was the power of visualization via the two state maps at the bottom of the article.

The first colors counties according to which have higher rates of opioid prescriptions being written. Keep an eye on the Detroit area (southeastern corner).

Michigan map showing which counties have higher Rx prescription rates (Detroit area shown as very light)

Now take a look at the second one, ranking counties according to number of deaths per 100K residents:

Michigan map color-coded according to least and most opioid-related deaths per 100,000 residents. Detroit area is very darkly colored, indicating the largest number of deaths

Are you still keeping an eye on Detroit? Notice how the death rate is highest in that area even though the prescription rate is among the lowest. (I do wish that instead of min/max, there were intervals marked instead, perhaps correlating the color scale according to the national death rates from opioids or something.)

This map helps us instantly see that there isn’t a natural mapping of higher prescription rates to higher death rates. As a result, it’s easy to have questions arise. Imagine discussing this with students:

  1. How is the death rate higher even if the prescription rate is lower? Where are the drugs coming from?
  2. Based on what you see here, which counties should the state of Michigan’s public health services target for interventions? Which kinds of interventions would be suitable given the prescription and death rate maps?
  3. What recommendations would you make for your own county?
  4. What additional information would you need to be able to answer these questions?