Most of us are not math people, but even the numerically challenged should question this:
But when such thoughtful and challenging speakers as Debbie Abilock and Kristin Fontichiaro are giving a presentation entitled “Slaying the Data Dragon” it’s difficult to resist going. Trust me when I tell you they brought the awesome and then some – and at 8am, no less! Despite my “bed head” (as Deb called it) I manged to take copious notes…
The first thing to remember is that it’s not just about collecting data, it’s about interpreting the information as well as being aware what data is being collected (by whom? for what purposes?). Scientists and techies are not just being required to submit their interpretation of their data but all their data sets so that others can learn from and expand upon them. Big Data builds on past experiments – but we need to always question the data we didn’t collect ourselves.
(QUERY: if that’s the case, why do we blindly accept the data and interpretation provided by the Pew Internet & American Life surveys? are any of their data sets statistically significant?)
It’s also important to remember that computers can unearth connections we don’t see (or don’t think of to look for) but that they can’t made a distinction between good data and bad data; humans also need to interpret the correlations but can’t assume they understand the causations. Privacy concerns may be something that our students don’t share, but when our data is being tracked by the politicians, sports teams, stores, financial institutions and others in addition to the NSA, one has to ask the question, “how will we weigh the trade-off between privacy, consumerism and security?” What are the implications for the future, both immediate and longer term? Why do we share our data so freely? An extreme example of the downside is the ease with which the Nazi’s identified even assimilated Jews, based on data given freely to the government decades earlier.
Private browsing? Not so much. Acxiom is one data aggregator tracking your movements around the interwebs. Try downloading and using ghostery to see how many others are using trackers, monitoring your movements from site to site, feeding the data back to… whom? Don’t want to use the download but on a PC? Try right click / view source / ctrl F .gif to see who’s hidden trackers on the site. You can block and control who sees what you do!
But what about apps and tools like Fitbit and Jawbone? The data they collect from you isn’t just included in your profile, it’s shared with everyone else using those programs. Health data is protected, but what about our other data? Target can predict when you’re pregnant (assuming you use either an affinity card or your credit/debit card). Is that ok? It may be helpful to get recommendations on shopping sites, but isn’t it also a little creepy? Here’s a new term to learn: algorithmic regulation, which is supposed to help solve public problems without having to justify or explain by using personalized “nudges”. Some seem benign, like your doctor or dentist reminding you to come in for a check up, but what about reminders to floss, or take a walk, or purchase milk? Not reminders you set, but those that come from “elsewhere” based on data input from you and others? Or what about glasses that can fool you into thinking that broccoli is really cake?
The problem is that Big Data isn’t neutral, mostly because it influences policy decisions – policies made by people who, like most of us, don’t know how to interpret the data they’re given. An example of this is InBloom, a Gates-funded organization taking data from students without their permission or knowledge. Decision makers also need to look at both macro- and micro-levels, as data provided for a neighborhood or town may look very different when compared to larger areas. Infographics may be fun ways to represent data, but we need to learn how to read them. A good start are the ACRL visual literacy standards, which can be walked down to K-12. Working with teachers to create lessons that incorporate data interpretation also helps. We were left with a number of sites that either have collected data or are still doing so, good places to start with both colleagues and students:
- Google Flu Trends
- Health Map
- Socialexplorer (a paid and free version are available)
- Opportunity Index
- Learn Chemistry
- Google Correlate
- Google ngram
- Outbreak (an infographic)
- Duck Duck Go (search engine that does not filter/track – results are very different than those found in Google)
- NPR’s digital trail series
- Prey iPad app