Project with Research Alliance for NYC Schools

The Research Alliance conducts rigorous studies on topics that matter to the city’s public schools. We worked with them to discover data insights and visualize these for education advocates and policymakers. We chose to explore topics around the theme of geography and schools, through interactive maps using the D3 library.

An interactive map helped in communicating multivariate analysis quickly


Data Research

Some questions stood out during our research - where are most of the schools located? Which neighborhoods are seeing more school closures than others? How are the closures affecting student performance and class sizes? We set about finding the answers to these questions, by first looking through existing data collected by the Research Alliance. Their data spanned a period of 17 years from 1996 to 2012, which effectively set a time boundary for our analysis.

After analyzing the datasets using RStudio, we observed some problems with data quality. First of all, there were gaps in the data. More specifically, many schools had not collected data for a few years. We had to interpolate the values to compensate for that. Secondly, the school coordinates were not always listed. That required looking at public records for the correct location. Thirdly, both the names & IDs of schools changed over time, due to a combination of school closures and renaming. We did not have a good solution to handle this, so the visualization ended up highlighting the discrepancy.

This visualization provided an insight into school openings across time and geography. However, the dots moved around when an ID was reused for a new school.


Design concerns

Maps are useful for exploring geographic data, but they come with their design concerns. The first challenge was with displaying quantitative information using differing bubble sizes. For dense school districts, the bubbles would overlap each other, making it difficult to distinguish one from the other. We had to add custom code to move colliding bubbles away from each other. The second challenge was communicating quantitative changes over time. Animating bubble size is meant to be a way to address that. However, watching shrinking/expanding bubbles is not the ideal way to see patterns over time. We introduced sparklines to help overcome this limitation with bubble animations. They served well at observing changes over time for a single data stream. 

Bubble map helped to convey relative size of schools across geography. You can see them overlapping in the Bronx boroughs towards the top left.


Performance issues

A website is the preferred mode of delivery for interactive visualizations, since it is readily accessible on desktop, tablet & phones. However, they bring up some interesting performance issues. The first challenge was loading a large shapefile of NYC school districts. By converting shapefile to topo-json format, we were able to improve our load time by 3x. The second challenge was the large dataset. We used RStudio to slice and aggregate the relevant features, and export into a flat CSV file. In hindsight, a lot of these performance issues could have been overcome by using a cloud-based GIS platform like ArcGIS or Carto, but that would have meant giving up some design control.