MuckRock API

Looking at how to analyze basic health metrics of the FOIA request system using the MuckRock API.

Speaking of early explorations for Astoria Digital projects, we’ve been looking into whether there’s something we can contribute in the area of information availability, transparency, and accountability for the government.

That’s a huge topic as a whole so we started by looking specifically into the state of FOIA requests related to police disciplinary action. There are a lot of layers to this, and luckily there are organizations out there that have been working parts of the problem.

MuckRock

One organization producing work we’ve been looking at is MuckRock. MuckRock lets users file, track, and share U.S. public records requests. From their about page:

MuckRock is a non-profit, collaborative news site that brings together journalists, researchers, activists, and regular citizens to request, analyze, and share government documents, making politics more transparent and democracies more informed.

The site provides a repository of hundreds of thousands of pages of original government materials, information on how to file requests, and tools to make the requesting process easier. In addition, MuckRock staff and outside contributors are using these primary source documents received through the site to create original investigative reporting and analysis.

And they have an API! It provides all kinds of interesting data.

Exploring MuckRock API data

I've been wondering if MuckRock can help shine a light on the following for FOIA requests:

  • Turnaround time
  • Completion/success rate
  • Cost

Because it’s one thing to have the right to ask for something, but it’s another to successfully get what you asked for. We should be able to know these basic health metrics of the request system.

I’m not there yet, but here’s the output of a quick exploration (code on GitHub):

{
  no_docs: { totalCount: 286 },
  done: { totalCount: 966, totalDays: 22211, tatAvg: 22.992753623188406 },
  processed: { totalCount: 150 },
  rejected: { totalCount: 133 },
  appealing: { totalCount: 13 },
  abandoned: { totalCount: 61 },
  partial: { totalCount: 20 },
  ack: { totalCount: 174 },
  payment: { totalCount: 20 },
  fix: { totalCount: 173 },
  submitted: { totalCount: 3 },
  lawsuit: { totalCount: 1 }
}

Still a long way to go but it's a start.

Looking at the output above, what we can say is that, based on 966 FOIA requests marked as "done", the average turnaround time was just under 23 days.

We need to find out what "done" means here. We probably also want to know what date range these requests happened in.

And then there are all of the other statuses I haven't even touched on yet.

Back to those basic health metrics

Thinking back to that quick list of health metrics to analyze:

  • Turnaround time
  • Completion/success rate
  • Cost

Turnaround time for all outcome categories feels within reach. Also, the data needed to calculate the success rate of FOIA requests is probably already in the output, but a better understanding of the categories is needed (what is "success"? what is "failure"?).

Tracking cost for FOIA requests will require new data processing logic. Helpfully, the data itself already comes back with the API response so it's there for the taking.

We'll see where this goes.