Archive

Data Journalism

I’m spending the day at our first OpenDataNJ conference here at Montclair State. Lots of local government officials, developers and journalists here to figure out what data should be public and what’s the best way to do it.

12:00 – Seth Wainer of the City of Newark talks about the practical headaches of publishing data. PDFs are a huge problem because they are not easy to convert to usable data. His suggestion – do what you can to get rid of the PDFs before they are created.

11:45 – Mike Magyar, a journalist with New Jersey Spotlight, points out that while we’re all talking about how to make government data more available, government officials have a bad habit of hiding it when it serves their purposes. He cites examples of some of his analysis of New Jersey property tax data and how the Christie administration has stopped publishing some data after he used it to show that the relative increases in property taxes between the Christie and Corzine administrations were not that different.

John Haas_OpenData

Dr. John Hasse of Rowan University.

11:30 – Dr. John Hasse, of Rowan University, talking about the value of GIS location data, and a project he is working on at Rowan. In New Jersey there are 565 municipalities, so you have 565 decision-making bodies and they often do not make smart decisions. New Jersey also has strong land use laws, so a lot of important decisions with a big impact on the economy and the environment are done at a very local level.

The goal of his project is to take local data and map it according to GIS. They have four themes posted so far, for all 565 towns in the state. The goal is to make the data modular so it can be used easily by other folks.

Their prime focus is environmental, so they’re looking at land use, watersheds and impervious surfaces, farmland preservation, wildlife habitat and urban growth.

Here’s a link to their site: http://njmap.rowan.edu/

10:45 – So far one obvious issue is that big cities have the resources to do it while smaller ones are way behind. One good suggestion – smaller towns should start by putting data online that they already have in usable form, such as business licenses or property tax assessments. One idea – In Philadelphia, a software company volunteered their time to build the first version of the city’s data web site in exchange for getting some visibility in local government.

11:00 – Matthew Clark, Director of Office of Records Management for Monmouth County, says one trick is to offload as much of the data entry as possible to the data submitter, i.e., the person who fills out the form. Public wins, because they can fill out applications online, from home. Town wins because they need less clerical help and the data can easily be put online.

This also makes it much easier and faster for government officials to know what is going on. For example, in Monmouth County, after Superstorm Sandy, it was a huge advantage for government officials to have real-time data of damage claims being filed to guide their response. After a flyover of the area, the damage did not appear to be that bad. But once managers began to see the claims, they realized that with a lot of the houses, the water had surged through the lower floor of the house and then receded, causing serious internal damage that was not visible externally.

BigDataI’ve seen some great data lately showing how folks are using the Internet in 2013 and also comparing that data to other online and general computer activities (remember, a HUGE amount of the stuff we do on our computers worldwide is NOT on the Web).

Anyway, great post here showing the top 30 Web sites in terms of monthly traffic. Bet you have never heard of a lot of them, unless you also read Chinese:

http://www.buzzfeed.com/charliewarzel/what-people-are-actually-doing-on-the-internet-in-2013

But that article is in part based on this really useful list over at Digital Marketing Ramblings of every social network the author can find:

http://expandedramblings.com/index.php/resource-how-many-people-use-the-top-social-media/

It’s a great place to check out some networks you haven’t heard of.

I also wanted to link to this chart over at Wired, however, that I discovered while researching info for a class last week on data visualization. It compares some really big data sets, including all of the business e-mail sent in a year and Google’s index of all of the content it has found on the web. Most surprising to me? The size of Kaiser-Permanente’s database of electronic health records of the folks it insures.

http://www.wired.com/magazine/2013/04/bigdata/

How many SD cards can a DC-10 hold?

How many SD cards can a DC-10 hold?

The basic idea here is that I’ll list every thing I’ve bookmarked in the past week that might be of use or interest to others. And it helps when I saved one item in Safari on the university computer and another in Firefox on the home machine — and I’m in class with my iPad.

This is by far my favorite. Which is faster, FedEx or the Internet? No, silly, I know that FedEx doesn’t own a wired network. But it does own machines (e.g., trucks and planes) that can haul around a large quantity of devices that store data. So if you have to get a few petabytes of data from here at Montclair State to Los Angeles, and it really, REALLY has to be there tomorrow morning, who you gonna call? Spoiler alert: FedEx wins, and will continue to do so for many years. Yes, the Internet will get faster, but storage devices will get smaller and hold more data.

A great map here of the underseas cables that carry data for the Web and other devices around the world. A little-appreciated historical fact: In the late Victorian era, Britain’s vast lead in ownership of underseas telegraph cables gave London a big commercial advantage. When the volcano Krakatoa exploded in 1883, for example, sending an immense dust cloud around the world, Lloyds of London (and readers of the Times of London) learned of the news in only three hours, giving them a leg up in trading stocks affected by the supply of rubber, spices and other products from Java (Simon Winchester, Krakatoa).

Kudos to my buddies at NBCNews on this one — A look at gun deaths across the USA over one weekend. The criticism of the media for ignoring gun violence seems to be having an impact. It’s obvious that journalists everywhere are giving greater play to stories about folks who should never have had access to a gun committing horrendous crimes.

I’m fascinated by the progress in 3-D printing. Assuming I don’t get run down by a bus, I’m certainly going to own one, but I have no idea what I’ll do with it. Sort of like how I felt in 1982 when I paid $3,000 for a Mac and an Imagewriter printer. According to this piece, I could use it to “print” food for astronauts.

Some tips on using Pinterest in the classroom.

And Tech Crunch has the latest data on social media — Facebook is still way ahead, but Pinterest has caught up with Twitter. Two other surprising data points: Blacks are more likely than whites to use Twitter, and the more college ducation you have, the less likely you are to use social media.

There’s been a fair amount of handwringing about NBCNews’ decision to shut down Everyblock.com. Everyblock was touted as a pioneering information Web site that posted local data on such topics as garbage collection, snow removal and graffiti eradication.

But to me, Everyblock never made much sense and for the fundamental reason a lot of data journalism faces — readers don’t care about the raw data. They want the story within the data.

That thought came up a couple of weeks ago as I watched the presentations during ourĀ  hack-a-thon here at Montclaieveryblockr State. The winning project analyzed traffic accident data on the Garden State Parkway and concluded that the most dangerous stretch of road was between exits 130 and 140, and the most deadly time to be on the roadway was during a snowstorm.

Yes, the raw data was interesting, as a regular parkway driver, but it was the conclusion drawn from the data that was the real headline.

Everyblock never figured out a way to write those headlines. I remember checking their site in the early days to see what data they had on lower Manhattan. There were reports on what graffiti the city said it had erased each month, by neighborhoods. But what was missing was context, and photos. If I’m a reporter doing a story on graffiti, I want to show before and after photos, AND, more importantly, I want to know whether the city is successfully fighting the graffiti artists, i.e., who is winning. The raw data didn’t provide that.

Here’s hoping someone else picks up the everyblock staff and coding and finds a way to better integrate them. The original decision to buy them was driven by the folks at msnbc.com, as some poorly-thought-out way to gain traction in the hyperlocal market (The NewsVine was another example). A much better partner would be AOL’s Patch, which already has 850 reporters and editors writing about local communities who could pull headlines from the data, or the Journal-Register folks (digitalfirstmedia.com) who seem to be in the forefront of figuring out how to do local.

We held our first Hack Jersey event this weekend here at Montclair State, with about 60 journalists and coders participating. Among the speakers were Matthew Ericson, the deputy graphics director at the New York Times; Tom Meagher, data editor at Digital First Media; Stephen Engelberg and Jeff Larson of ProPublica; and New Jersey data expert Marc Pfeiffer.

Here are links to the video we streamed of their addresses. I’ll have more in a bit on how the video streaming went. We tried to get fancy with Google Hangouts and use separate logins for multiple cameras. It’s worth trying, but there are some headaches to consider.

The links:

Matt Ericson (NYTimes):

Tom Meagher (Digital First Media):

Marc Pfeiffer (former N.J. official):

Stephen Engelberg and Jeff Larson (ProPublica)

The participants present their apps:

Awards presentation:

One Man's Guitar

One man who's got something to say about...

The George Macy Imagery

The Literary Art of the Limited Editions Club and the Heritage Press

The NP Mom

Answers to questions, you always forget to ask!

David Herzog

Thoughts on What's New in Journalism

MultimediaShooter

keeping track, so you don't have to...

Reportr.net

This blog on media, society and technology is run by Professor Alfred Hermida, an award-winning online news pioneer, digital media scholar and journalism educator.

MediaShift

Thoughts on What's New in Journalism

andydickinson.net

online journalism, newspaper video and digital media

InteractiveNarratives.org

Thoughts on What's New in Journalism

MediaStorm Blog

Thoughts on What's New in Journalism

Journalism 2.0

Thoughts on What's New in Journalism

Online Journalism Blog

A conversation.

LostRemote

Thoughts on What's New in Journalism

Poynter

Thoughts on What's New in Journalism

BuzzMachine

The media pundit's pundit. Written by NYC insider Jeff Jarvis, BuzzMachine covers news, media, journalism, and politics.