Part IV · Methods and Research Design

Chapter 20. Quantitative Data Methods

A practical guide to working with numeric data in Community Mapping—from census and administrative records to surveys, spatial data, and indicators—with emphasis on validation, ethical use, and honest acknowledgment of what numbers can and cannot tell us.

6,850 words · 27 min read

Chapter 20: Quantitative Data Methods


Chapter Overview

Quantitative data—numeric information about populations, services, environments, economies, and spaces—is foundational to Community Mapping. This chapter examines the major sources of quantitative data available to community mappers, from national censuses and municipal open data to surveys, service use records, and spatial datasets. It covers not just where to find numbers, but how to assess their quality, understand their limitations, and use them ethically. Numbers alone do not tell the truth—they tell a truth, shaped by what was counted, who was counted, and who decided what mattered.


Learning Outcomes

By the end of this chapter, you will be able to:

  1. Identify the major sources of quantitative data for Community Mapping and assess their strengths and limitations
  2. Explain how census data is collected, what it measures, and where it systematically undercounts populations
  3. Apply data cleaning and validation techniques to ensure accuracy and transparency in analysis
  4. Recognize the ethical implications of working with administrative data, service use records, and spatial datasets
  5. Evaluate indicators and metrics for their relevance, validity, and potential to mask complexity
  6. Articulate what quantitative data can and cannot tell us about community life
  7. Use the Data Source Inventory template to systematically document data provenance, gaps, and governance

Key Terms

  • Census Data: Population data collected by national or regional governments, typically every 5-10 years, covering demographics, housing, income, education, and employment.
  • Administrative Data: Records generated by institutions (governments, schools, hospitals, social services) as part of their normal operations—not created for research but often repurposed for analysis.
  • Open Data: Data made publicly available by governments or organizations, typically under licenses that permit reuse, redistribution, and analysis.
  • Indicator: A measurable variable used to represent or track a complex concept (e.g., poverty rate as an indicator of economic hardship).
  • Data Provenance: The documented history of where data came from, how it was collected, who collected it, and what transformations it has undergone.
  • Undercount: The systematic failure to include certain populations in data collection, leading to their invisibility in analysis and policy.

20.1 Census Data

Census data is the bedrock of most quantitative Community Mapping. In Canada, Statistics Canada conducts a national census every five years, collecting information on population size, age, sex, household composition, income, education, employment, housing, language, immigration status, Indigenous identity, and more. The United States Census Bureau operates on a similar model with a decennial census and rolling American Community Survey (ACS) data. Most countries with statistical infrastructure conduct some form of national census.

For community mappers, census data offers unmatched geographic granularity. In Canada, data is available at multiple spatial scales: census tracts (roughly 2,500-8,000 people), dissemination areas (400-700 people), and in some cases dissemination blocks (the smallest unit, approximating a city block). This granularity makes it possible to analyze patterns at the neighborhood level, not just citywide averages.

Census data also offers consistency over time. Because the same questions are asked in the same format every cycle, it is possible to track change: population growth, aging, income shifts, housing affordability trends, and shifts in ethnocultural composition. Longitudinal analysis—comparing census cycles—is one of the most powerful uses of census data.

But census data has profound limitations, and community mappers must be honest about them.

Undercounts are real and patterned. Census enumeration systematically undercounts certain populations: people experiencing homelessness, undocumented immigrants, highly mobile populations, people living in overcrowded or informal housing, some Indigenous communities (especially remote or on-reserve populations), and people who distrust government or fear how their data will be used. A community mapper working with a neighborhood with significant homeless populations or informal housing cannot assume census totals are accurate. The undercount is not random—it is structural, reflecting who is easiest to reach with the tools the census uses.

Long-form vs. short-form matters. In Canada, the census has two versions: a short form sent to all households (basic demographics) and a long form sent to a sample (detailed socioeconomic data). In 2011, the Canadian government replaced the mandatory long-form census with a voluntary National Household Survey (NHS). The result was a data disaster: lower response rates, especially in high-need communities, made the data unreliable for small-area analysis. The mandatory long form was reinstated in 2016, but the 2011 gap remains a cautionary tale. When governments weaken data collection in the name of reducing burden, the cost is borne by the communities who become invisible.

Census categories shape what we can see. The census asks questions in specific ways, with specific categories. If the census does not ask about a dimension of identity or experience, it becomes invisible in the data. For years, the Canadian census did not capture gender diversity beyond male/female; advocates fought for years to add non-binary and transgender categories. The census asks about "visible minority" status but uses categories that aggregate diverse populations in ways that obscure important differences. Community mappers must understand that census categories are political choices, not neutral facts.

Privacy suppression hides small populations. To protect privacy, Statistics Canada suppresses data for geographic areas or demographic groups with very small populations. A dissemination area with fewer than a threshold number of residents in a category may show up as "suppressed" or "F" (too unreliable to release). This is an ethical necessity, but it means that some communities—especially small, rural, or marginalized populations—are analytically invisible.

Despite these limitations, census data remains essential. A community mapper working on housing affordability, aging in place, ethnocultural diversity, or transit access will start with census data. The key is to use it with humility: acknowledge the gaps, triangulate with other sources, and recognize that what the census shows is a starting point, not the full picture.

In practice: Use census data from Statistics Canada's publicly available Census Profile tool, which provides pre-aggregated tables by geography. For spatial analysis, use census boundary files from the Open Government portal and join them to census variables in GIS. For detailed work, access full census microdata files (anonymized individual records) through Statistics Canada's Research Data Centres, though this requires formal application and secure access.


20.2 Administrative Data

Administrative data is data generated by institutions as part of their normal operations. It includes:

  • Health system records (hospital admissions, emergency department visits, prescription data)
  • Education system records (enrollment, attendance, test scores, suspensions)
  • Social service records (child welfare cases, income assistance applications, shelter use)
  • Criminal justice records (arrests, charges, incarcerations, parole)
  • Municipal service records (building permits, business licenses, bylaw complaints, 311 calls)

Administrative data has major advantages. It is collected continuously, not once every five years. It often has precise temporal and spatial detail. It reflects actual service use, not self-reported estimates. And because it is generated for operational purposes, it often already exists—no new data collection is required.

But administrative data comes with profound ethical, methodological, and interpretive challenges.

Access is restricted, and for good reason. Health records, child welfare cases, and criminal justice data are rightly protected by privacy laws. In Canada, access to health administrative data typically requires ethics approval, a formal research agreement with a data steward (such as a provincial health ministry or ICES in Ontario), and analysis within secure environments. Casual community mappers do not get access. Even researchers with legitimate purposes face long timelines and strict oversight. This is necessary—these records contain sensitive, identifiable information—but it means administrative data is not equally available to all.

Administrative data reflects system contact, not population reality. A dataset showing emergency department visits by neighborhood tells you where people who went to emergency departments live. It does not tell you about people who needed care but did not go, could not go, or chose not to go. A dataset showing child welfare apprehensions by neighborhood reflects apprehension patterns—which are shaped by surveillance intensity, reporting biases, and system racism—not the distribution of child maltreatment. Administrative data is always shaped by the system that produced it, and systems are not neutral.

Different systems have different access regimes. Health data is highly protected. Education data is moderately protected. Municipal permit data is often public. Criminal justice data sits somewhere in between, with some summary data available publicly and individual records restricted. The governance patchwork means that assembling a multi-system picture of a community is difficult, time-consuming, and often requires multiple ethics approvals and data-sharing agreements.

Administrative data can reinforce surveillance and harm. Mapping where child welfare cases are concentrated can support resource allocation—or it can justify increased surveillance of already over-policed communities. Mapping where arrests occur can inform harm reduction—or it can rationalize predictive policing tools that perpetuate racial profiling. Community mappers working with administrative data must ask: Who benefits from this analysis? Who could be harmed? What safeguards are in place?

In practice: If you have legitimate research access to administrative data through a university, government partnership, or data trust arrangement, treat it with care. Document provenance. Aggregate to protect privacy. Be explicit about what the data measures and what it does not. Do not conflate "system contact" with "prevalence" or "need." And when publishing or sharing findings, consult with affected communities about how results are framed and whether the analysis could be weaponized.


20.3 Municipal Open Data

Many cities and regional governments now publish open data—datasets made freely available for public use, often under licenses that permit reuse, redistribution, and analysis. Municipal open data portals offer datasets on:

  • Infrastructure (roads, bike lanes, sidewalks, transit routes, parks)
  • Services (recreation programs, libraries, community centers, fire stations)
  • Business and economy (business licenses, development permits, property values)
  • Environment (tree canopy, air quality, water quality, waste collection)
  • 311 service requests (potholes, graffiti, noise complaints, abandoned vehicles)

Municipal open data is a gift to community mappers. It is free, often updated regularly, frequently includes spatial coordinates (latitude/longitude or addresses), and typically comes with metadata describing what variables mean, when data was collected, and who to contact with questions.

Toronto's Open Data portal (open.toronto.ca) is one of the most comprehensive in Canada, with over 400 datasets covering everything from public art to building permits to COVID-19 case counts. Vancouver, Halifax, Edmonton, Calgary, and Ottawa all have robust open data portals. In the United States, Data.gov aggregates federal datasets, and cities like New York, Chicago, and San Francisco maintain extensive municipal portals.

But municipal open data is not neutral, not complete, and not always usable.

What gets published reflects municipal priorities. Cities tend to publish data about things they manage (infrastructure, services, permits) and are less likely to publish data about things they would prefer not to highlight (complaints, enforcement failures, budget shortfalls). Open data portals rarely include income, health, or detailed socioeconomic data—that comes from census or health authorities, not municipalities.

Data quality varies wildly. Some datasets are meticulously maintained, updated weekly, and accompanied by clear documentation. Others are one-time dumps of old data with no metadata, inconsistent formatting, and missing values. A community mapper cannot assume that because a dataset exists on an open data portal, it is accurate or current.

Spatial precision is inconsistent. Some datasets include precise latitude/longitude coordinates. Others provide only postal codes, ward boundaries, or neighborhood names. Some include addresses but fail to geocode them. A community mapper planning to map a dataset must first assess whether it has usable spatial information.

Licensing matters. Most municipal open data in Canada is published under the Open Government License, which permits free use with attribution. Some datasets, especially those sourced from third parties, may have more restrictive licenses. Always check the license before using data in a public-facing project or publication.

In practice: Start with the Open Data Canada portal (open.canada.ca) to find federal datasets and links to municipal portals. For city-specific data, go directly to the city's open data portal. Download datasets in machine-readable formats (CSV, GeoJSON, Shapefile) rather than PDFs or images. Read the metadata. Check the last update date. If a dataset has not been updated in years, question whether it is still relevant.


20.4 Survey Data

Surveys are structured instruments for collecting data from a sample of people through questions with pre-defined response options. Surveys are widely used in Community Mapping to measure attitudes, behaviors, needs, and satisfaction when existing data sources do not provide answers.

When surveys are useful: You need to know how residents perceive safety, but crime statistics and police data do not measure perception. You need to understand barriers to accessing a service, but service use data does not tell you who tried and failed. You need to measure social connection, but census data does not ask about it. Surveys fill gaps.

When surveys are not useful: The population you want to study is very small, highly mobile, or hard to reach—response rates will be too low for reliable analysis. The question you need answered requires nuance, context, or storytelling that fixed-choice questions cannot capture—qualitative interviews would work better. You do not have the resources to reach a representative sample—a convenience sample will produce biased, non-generalizable results.

Sampling matters as much as the questions. A survey is only as good as the sample it reaches. A truly random sample—where every person in the population has an equal chance of being selected—is the gold standard, but it is expensive and difficult to achieve in community settings. More common are stratified samples (ensuring representation of key subgroups), convenience samples (surveying whoever is easiest to reach), or purposive samples (targeting specific populations). Each has trade-offs. A convenience sample of people attending a community meeting will over-represent engaged, English-speaking residents and miss people who work evenings, lack childcare, distrust institutions, or speak other languages.

Response rates shape validity. A survey with a 10% response rate is not representative—it tells you what the 10% who responded think, and that 10% is likely different from the 90% who did not. Best-practice surveys aim for response rates above 50%, though this is increasingly difficult. Low response rates require honest acknowledgment in analysis and reporting: "This survey reflects the views of X residents who chose to respond. It may not represent the broader population."

Survey design is a craft. Good surveys use clear, jargon-free language. They avoid leading questions ("Do you agree that our parks are wonderful?"). They offer balanced response scales (not just "agree" options, but also "disagree"). They are short enough that people will complete them—15 minutes is a reasonable maximum. They are tested with a small pilot group before full rollout. And they are translated into the languages spoken in the community, not just distributed in English.

In practice: If you are designing a survey for Community Mapping, use validated question sets where possible (e.g., Statistics Canada's General Social Survey questions, WHO quality-of-life instruments, or neighborhood cohesion scales from academic research). Do not reinvent the wheel—validated questions allow comparison across studies and strengthen credibility. Use online survey tools (LimeSurvey, SurveyMonkey) for digital distribution, but supplement with paper surveys, phone interviews, or in-person intercepts if your population has low digital access. And always pilot test.


20.5 Service Use Data

Service use data tracks who accesses services, when, where, and how often. Examples include:

  • Food bank visits (client counts, household size, visit frequency)
  • Library usage (circulation, program attendance, computer use)
  • Transit ridership (boardings by route, stop, and time of day)
  • Recreation program enrollment (which programs, which neighborhoods)
  • Health clinic visits (patient counts, services provided, wait times)

Service use data is operationally generated—it exists because organizations need to track demand, justify budgets, and manage operations. For community mappers, it provides real-time insight into how services are being used, who is using them, and where gaps exist.

But service use data has significant limitations and ethical risks.

Service use reflects access, not need. A map showing low food bank use in a neighborhood does not mean there is no food insecurity—it may mean the food bank is inaccessible, culturally unwelcoming, or unknown to residents. Service use data systematically undercounts people who need services but do not access them due to distance, eligibility barriers, stigma, language, documentation status, or distrust.

Different sectors have vastly different privacy regimes. Library circulation data is relatively low-sensitivity (though still protected by professional ethics). Health system service use data is highly sensitive and tightly regulated. Social service data—especially child welfare, mental health, and income assistance—is both sensitive and fraught with surveillance implications. Criminal justice service data (arrests, incarcerations, parole contacts) is the most ethically complex: it reflects system behavior as much as individual behavior, and mapping it can reinforce punitive narratives.

Aggregation matters. Service use data must be aggregated to protect individual privacy. Reporting that "15 households from postal code M5X visited the food bank in March" is acceptable. Reporting names, addresses, or identifiable details is not. In small communities, even aggregated data can be identifying—if a dataset shows "one Indigenous woman aged 70+ accessed mental health services," that person may be identifiable to neighbors.

Service use data can enable or undermine equity. A nonprofit using service use data to identify underserved neighborhoods and open a satellite location is using data for equity. A municipality using emergency shelter data to map where homeless people congregate in order to justify encampment evictions is weaponizing data. Context, intent, and governance determine whether service use analysis supports or harms communities.

In practice: If you are working with service use data, document informed consent practices. Confirm that clients were told their anonymized data might be used for analysis. Aggregate to the coarsest spatial and demographic resolution that still allows meaningful analysis—postal code is better than street address. Report results in ways that do not stigmatize users or justify punitive responses. And when possible, involve service users in interpreting findings.


20.6 Economic Data

Economic data describes employment, income, business activity, investment, property values, and economic change. In Community Mapping, economic data is used to understand economic opportunity, vulnerability, and the distribution of wealth and resources.

Common sources include:

  • Census data: Income, employment rates, occupation, industry, poverty rates (low-income measure or low-income cut-off).
  • Business registries: Lists of registered businesses by sector and location (often available through municipal open data or chambers of commerce).
  • Labour force surveys: Statistics Canada's Labour Force Survey provides monthly employment, unemployment, and labour force participation data, though only at large geographic scales.
  • Property assessment data: Municipal property assessments provide information on property values, ownership, land use, and tax status—sometimes available as open data.
  • Commercial real estate data: Vacancy rates, rental rates, and commercial property sales (typically from private real estate databases, not freely available).

Economic data is particularly useful for analyzing gentrification, business concentration, economic resilience, and the local multiplier effect (how much money stays in the community). A community mapper working with a Business Improvement Area (BIA) might map local businesses, categorize them by sector, and analyze clustering patterns to inform economic development strategy.

But economic data has blind spots.

The informal economy is invisible. Official business registries and tax data capture registered, formal businesses. They do not capture informal work: cash-based services, gig work, bartering, unpaid care work, or grey-market activity. In communities where the informal economy is significant—immigrant neighborhoods, low-income areas, rural communities—official economic data radically undercounts economic activity.

Income data lags and aggregates. Census income data is collected every five years and reflects income from two years prior (the 2021 Census asked about 2020 income). By the time it is released, it is already outdated. Income data is also aggregated to protect privacy—you can see median household income by dissemination area, but not individual household incomes.

Property data reflects value, not access or stability. High property values in a neighborhood may signal investment and amenities, or they may signal displacement pressure and unaffordability. Property assessment data tells you what land is worth, not who can afford it, who is being pushed out, or what the human cost of rising values is.

In practice: Use census income and employment data as a baseline, but supplement with qualitative knowledge about informal economies, precarious work, and economic stressors that numbers alone miss. When mapping businesses, distinguish between locally owned and chain/franchise operations—the local multiplier is much higher for local businesses. And when analyzing property values, pair the data with tenant organizing reports, eviction data, and resident narratives about affordability and displacement.


20.7 Environmental Data

Environmental data describes the physical and natural conditions that shape health, safety, and quality of life. In Community Mapping, environmental data is used to assess risks, identify inequities, and inform planning and resilience work.

Common sources include:

  • Climate data: Temperature, precipitation, extreme weather events (Environment and Climate Change Canada; NOAA in the U.S.).
  • Air quality data: Pollutant concentrations (PM2.5, ozone, nitrogen dioxide) by monitoring station (Environment Canada; provincial environment ministries; community air monitoring networks).
  • Water quality data: Drinking water safety, surface water contamination, beach closures (municipal water utilities; provincial ministries).
  • Flood and hazard maps: Floodplains, wildfire risk zones, earthquake hazard zones (federal and provincial emergency management agencies; FEMA in the U.S.).
  • Green space and tree canopy: Park locations, tree cover percentage, land use classification (municipal open data; derived from satellite imagery).
  • Noise pollution: Traffic noise, industrial noise, airport noise (specialized studies; rarely available as open data).

Environmental data is essential for understanding environmental justice—the pattern where low-income communities and racialized communities disproportionately live near highways, industrial sites, and polluted areas, and have less access to green space. A community mapper working on climate adaptation might map heat vulnerability by overlaying census data (seniors, low-income households, renters) with tree canopy data and summer temperature data to identify neighborhoods at highest risk during heat waves.

But environmental data has significant gaps and quality issues.

Monitoring is sparse and uneven. Air quality monitoring stations are expensive and rare—a city might have five stations for a population of half a million. Spatial interpolation techniques can estimate pollution levels between stations, but these are models, not measurements. Rural and remote areas often have no monitoring at all.

Environmental data often lacks social context. A map showing air pollution hotspots is useful, but it becomes meaningful only when overlaid with demographic data showing who lives there, who is most vulnerable, and what the health impacts are. Environmental data alone is just geography—it becomes an equity issue when paired with social data.

Hazard maps reflect past risk models, not future conditions. Floodplain maps are based on historical flood records and hydrological models. As climate change alters precipitation patterns, these maps become outdated. A community mapper using a floodplain map from 2010 is using a snapshot of past risk, not current or future risk.

In practice: Use Environment and Climate Change Canada's open data portal for climate and air quality data. For local environmental data, check municipal open data portals, conservation authority maps, and provincial environment ministry websites. For environmental justice analysis, always pair environmental data with census or health data to show who is affected. And acknowledge uncertainty—environmental models are useful, but they are not perfect predictions.


20.8 Spatial Data

Spatial data is data with a geographic component—points (locations), lines (roads, rivers, transit routes), or polygons (neighborhoods, parks, flood zones). Spatial data is foundational to mapping, but "spatial" is not synonymous with "geographic base maps." Spatial data includes any dataset where location is a key attribute.

Common spatial datasets for Community Mapping:

  • Boundaries: Census tracts, dissemination areas, wards, postal code areas, school catchments, health regions (Statistics Canada; municipal GIS departments).
  • Infrastructure: Roads, sidewalks, bike lanes, transit routes and stops, water mains, electricity grids (municipal open data; OpenStreetMap).
  • Points of interest: Schools, hospitals, libraries, parks, community centers, grocery stores, social services (map.ca; municipal open data; OpenStreetMap).
  • Parcels: Property boundaries, ownership, land use, zoning (municipal GIS; sometimes restricted access).
  • Imagery: Satellite imagery, aerial photography, street-level imagery (Sentinel Hub, Mapillary; government GIS portals; for satellite needs see Chapter 30's discussion of aerial-surveillance ethics).

Spatial data powers proximity analysis (Who lives within 500 meters of a grocery store?), accessibility analysis (Which neighborhoods lack transit access?), and overlay analysis (Where do high-risk flood zones overlap with vulnerable populations?).

But spatial data requires care in use.

Accuracy varies. Official government datasets (census boundaries, municipal infrastructure) are typically high-accuracy. Crowdsourced data (OpenStreetMap) ranges from excellent in urban areas with active contributors to incomplete or outdated in rural or less-mapped areas. Community-sourced platforms like map.ca are updated frequently but completeness depends on local participation and curation.

Spatial data can be out of date. A road network file from 2015 will not include recent developments. A park boundary file may not reflect new acquisitions. Always check the "last updated" date on spatial datasets and treat old data with skepticism.

Coordinate systems and projections matter. Spatial data uses coordinate reference systems (CRS) to define location. The most common is WGS84 (latitude/longitude), used by GPS and web maps. But some datasets use projected coordinate systems optimized for accuracy in a specific region. If you mix datasets with different CRS without reprojecting them, they will not align. Most GIS software (QGIS, ArcGIS) handles this automatically, but manual checks are essential.

In practice: Start with map.ca for community-curated points of interest and asset mapping, or OpenStreetMap for base maps—both are free and often more current than government datasets. For official boundaries and infrastructure, use government open data portals. For high-resolution imagery, use Sentinel Hub or provincial GIS portals (many provinces offer free orthophotos). And always visually inspect spatial data after loading it—look for misalignments, missing areas, or obvious errors before relying on it for analysis.


20.9 Indicators and Metrics

An indicator is a measurable variable used to represent a complex, abstract concept. Poverty rate is an indicator of economic hardship. Life expectancy is an indicator of population health. Tree canopy percentage is an indicator of environmental quality. Indicators reduce complexity into numbers that can be tracked, compared, and communicated.

Indicators are essential for Community Mapping. They allow us to compare neighborhoods, track change over time, and set targets. A city working toward equity goals might track indicators like "percentage of residents within a 10-minute walk of a park" or "median household income gap between neighborhoods." Indicators make the invisible measurable.

But indicators are also reductive. They simplify. They hide. And they can mislead.

Indicators measure proxies, not the thing itself. Poverty rate (percentage of households below a low-income threshold) is a proxy for economic hardship, but it misses non-income forms of wealth (social networks, home ownership, access to informal economies). Life expectancy is a proxy for health, but it misses quality of life, disability, and chronic pain. Indicators are useful, but they are not the full story.

Indicators can mask inequality. Averages hide distribution. A neighborhood with a median household income of $60,000 might sound middle-class, but if half the households earn $20,000 and half earn $100,000, the median hides deep inequality. Aggregate indicators at the city or regional level can mask neighborhood-level disparities. Always disaggregate when possible.

Indicators reflect choices about what matters. The decision to track one indicator and not another is political. A city that tracks business density but not living wage jobs, park acreage but not park accessibility, crime rates but not community safety perceptions—these are choices that shape what gets prioritized. Community mappers must ask: Who chose these indicators? What do they reveal? What do they hide?

Indicators can drive the wrong behavior. When an indicator becomes a target, it can distort priorities. The UK's use of emergency department wait time targets led hospitals to focus on meeting the metric rather than improving care quality—patients were sometimes held in ambulances to avoid starting the wait-time clock. Indicators must be used as tools for understanding, not as rigid targets divorced from context.

In practice: Use indicators, but pair them with qualitative data and lived experience. Report not just averages but also ranges, distributions, and outliers. Be transparent about what the indicator measures and what it does not. And when possible, involve communities in choosing which indicators matter—residents often have different priorities than planners or researchers.


20.10 Data Cleaning and Validation

Data is messy. Addresses are misspelled. Dates are in inconsistent formats. Fields are blank. Duplicate records exist. Outliers are present (a household income of $10 million in a low-income census tract—data entry error or real?). Before any quantitative analysis, data must be cleaned and validated.

Data cleaning is not a neutral technical step. Every cleaning decision is a methodological choice that shapes findings. Removing outliers makes analysis cleaner but may delete real, important cases. Filling in missing values with averages (imputation) smooths the data but introduces assumptions. Excluding incomplete records reduces noise but may introduce bias if missingness is patterned.

Common data cleaning tasks:

  • Standardizing formats: Converting dates to a consistent format (YYYY-MM-DD). Standardizing address formats. Ensuring numeric fields are stored as numbers, not text.
  • Handling missing data: Deciding whether to delete records with missing values, impute values, or analyze only complete cases. Document the decision and report how much data was missing.
  • Removing duplicates: Identifying and merging duplicate records (e.g., the same community organization listed twice with slightly different names).
  • Validating ranges: Checking that values fall within plausible ranges (age 0-120, income > 0, percentages 0-100). Flagging implausible values for review.
  • Geocoding addresses: Converting street addresses to latitude/longitude coordinates using geocoding tools (map.ca's built-in geocoding for community assets, or for advanced needs: OpenStreetMap's Nominatim or ESRI's ArcGIS World Geocoding Service). Reviewing geocoding match rates and manually correcting failures.

Garbage in, garbage out. A map built on uncleaned data is not just inaccurate—it is misleading. A poorly geocoded dataset where 30% of addresses failed to match and were assigned to default locations (city center, post office) will produce a map showing false clustering at those default points. A dataset with duplicate records will inflate counts and distort analysis. Cleaning is not optional.

Document what you did. Keep a cleaning log: what steps you took, what you changed, what you deleted, and why. If you removed outliers, document the threshold and the number removed. If you imputed missing values, document the method. Transparency in cleaning is as important as transparency in analysis. Future users of your data—or reviewers of your findings—need to know what transformations occurred.

Validation means checking against reality. After cleaning and analysis, validate findings against ground truth. If your map shows 50 social service organizations in a neighborhood, call a few and confirm they exist, are at those addresses, and provide the services listed. If your analysis shows a neighborhood has zero grocery stores, do a site visit or check satellite imagery. Validation catches errors that cleaning alone misses.

In practice: Use scripting tools (R, Python) for repeatable, documented cleaning rather than manual edits in Excel—scripts leave an audit trail. Use data validation rules (e.g., flag any address that geocodes with low confidence). And budget time for cleaning—it often takes longer than analysis itself. The rule of thumb in data science is that 80% of the work is cleaning, 20% is analysis. Community Mapping is no different.


20.11 Synthesis and Implications

Quantitative data is powerful, but it is not truth—it is evidence shaped by what was measured, who was measured, how it was measured, and what choices were made in cleaning and analysis. This chapter has examined the major sources of quantitative data available to community mappers: census data with its undercounts and categorical limitations, administrative data with its access restrictions and surveillance risks, municipal open data with its variable quality, survey data with its sampling challenges, service use data that reflects access not need, economic data that misses informal economies, environmental data with sparse monitoring, spatial data that requires accuracy checks, and indicators that simplify and sometimes mislead.

The implication for practice is clear: numbers must be paired with stories, validated against lived experience, and used with humility.

A census tract showing low income does not tell you what poverty feels like, what informal supports exist, or what residents' priorities are. A map showing food banks does not tell you why people need them, what barriers they face, or what would reduce that need. A pollution map does not tell you the health impacts, the community organizing response, or the political economy of why that factory was sited there. Quantitative data sets the stage. Qualitative data and community voice tell the story.

Quantitative methods also require ethical vigilance. Data about vulnerable populations—service use, health records, child welfare involvement, arrests—can be used to support or to harm. A map showing where people experiencing homelessness sleep can inform outreach and housing support, or it can justify encampment sweeps and criminalization. Intent, governance, and transparency determine whether quantitative Community Mapping advances equity or reinforces control.

Finally, quantitative data is never complete. Every dataset has gaps: populations not counted, variables not measured, contexts not captured. A responsible community mapper documents those gaps, reports uncertainty, and resists the temptation to overstate what the numbers show. The phrase "suggested: further research on…" should appear often in your work. It is not a weakness. It is honesty.

Chapter 19 (Research Design) provided the conceptual foundation for designing ethical, rigorous, community-engaged research. This chapter provided the practical toolkit for working with numbers. Chapter 21 will turn to qualitative methods—the interviews, focus groups, observations, and stories that provide the texture, meaning, and human truth that numbers alone cannot capture. Together, these chapters form the methodological core of Community Mapping research.


20.12 Data Source Inventory

The Data Source Inventory is a structured template for documenting every quantitative data source used in a Community Mapping project. Use this template to maintain a master record of data provenance, licensing, quality, gaps, and governance. Each data source gets one entry.

Template fields:

  1. Source Name: Official name of the dataset or data source.
  2. Owner/Publisher: Organization or agency that owns or publishes the data.
  3. Date of Data Collection: When the data was collected (not when you downloaded it).
  4. Last Updated: Most recent update to the dataset.
  5. Update Frequency: How often the data is refreshed (e.g., annually, monthly, one-time).
  6. Geographic Coverage: What area the data covers (neighborhood, city, province, country).
  7. Geographic Granularity: Smallest spatial unit available (e.g., address, postal code, dissemination area, census tract).
  8. Key Variables: List of key variables or indicators included.
  9. Format: File format (CSV, Shapefile, GeoJSON, API, PDF, etc.).
  10. License: Data license or terms of use (e.g., Open Government License, Creative Commons, proprietary).
  11. Access Method: Where and how you obtained the data (URL, portal, direct request, partnership agreement).
  12. Known Limitations/Gaps: Documented issues with completeness, accuracy, bias, or coverage.
  13. Cleaning Steps Taken: Summary of any transformations, filtering, or cleaning you performed.
  14. Contact: Contact person or organization for questions or corrections.
  15. Notes: Any additional context or caveats.

Example entry:

  1. Source Name: 2021 Census of Population, Income and Demographic Data
  2. Owner/Publisher: Statistics Canada
  3. Date of Data Collection: May 2021
  4. Last Updated: February 2022 (initial release)
  5. Update Frequency: Every 5 years
  6. Geographic Coverage: Canada (national)
  7. Geographic Granularity: Dissemination area (DA)
  8. Key Variables: Median household income, low-income measure (LIM), age distribution, household composition, Indigenous identity, visible minority status, immigration status, housing tenure.
  9. Format: CSV tables downloaded from Census Profile tool; boundary files as Shapefile
  10. License: Open Government License - Canada
  11. Access Method: https://www12.statcan.gc.ca/census-recensement/index-eng.cfm
  12. Known Limitations/Gaps: Undercounts homeless populations, highly mobile populations, some remote Indigenous communities. Privacy suppression for small population counts. Income data reflects 2020, released in 2022.
  13. Cleaning Steps Taken: Joined census variables to DA boundary file using DAUID. Suppressed values (marked "F" or "x") treated as missing. Created derived variable: % of population in low income.
  14. Contact: Statistics Canada general inquiries: 1-800-263-1136
  15. Notes: Long-form census reinstated in 2016 after 2011 voluntary NHS. Comparisons with 2011 NHS data are problematic due to methodology change.

Maintain this inventory as a living document throughout your Community Mapping project. It serves as both a reference for your team and a transparency record for anyone reviewing or replicating your work.


Discussion Questions

  1. The chapter argues that census undercounts are "not random—they are structural." What does this mean? Who is systematically undercounted, and why? What are the implications for policy and planning when the most vulnerable populations are invisible in the data?

  2. Consider a scenario where a municipality uses administrative data on 311 complaints to map "problem areas" for increased bylaw enforcement. What are the ethical risks of this analysis? How might complaint patterns reflect wealth, language access, and familiarity with city systems rather than actual bylaw violations?

  3. You are working with a community organization that wants to survey residents about their priorities for a new community center. The organization distributes an online survey through its email list. What are the sampling limitations? Who is likely over-represented? Who is likely excluded? How could the survey design be improved?

  4. The chapter states that service use data "reflects access, not need." Give a concrete example where low service use might indicate barriers rather than low need. How can community mappers avoid conflating the two?

  5. Why is the informal economy "invisible" in official economic data? What are the implications for understanding economic activity in immigrant neighborhoods, low-income communities, or rural areas? How can qualitative methods help fill this gap?

  6. Environmental justice research shows that low-income and racialized communities disproportionately live near pollution sources and have less access to green space. If you were mapping environmental justice in a city, what quantitative datasets would you combine? What indicators would you use? What would numbers alone miss?

  7. The chapter warns that "indicators can drive the wrong behavior" when they become rigid targets. Can you think of examples (from Community Mapping or other fields) where focusing on an indicator led to distorted priorities or unintended consequences?

  8. Reflect on the phrase "garbage in, garbage out." Why is data cleaning a methodological choice, not just a technical step? How might decisions to remove outliers, impute missing values, or exclude incomplete records introduce bias?


Field Exercise: Data Source Audit

Purpose: This exercise develops skills in locating, evaluating, and documenting quantitative data sources for Community Mapping. You will practice critical assessment of data quality, provenance, and limitations.

Materials Needed:

  • Computer with internet access
  • Data Source Inventory template (Section 20.12)
  • Spreadsheet or word processing software for documentation

Steps:

  1. Choose a community or topic. Select a specific geographic area (your neighborhood, a nearby town, or a municipal ward) and a Community Mapping topic (housing affordability, food access, aging in place, youth services, climate vulnerability, etc.).

  2. Identify three quantitative data sources. Find three different types of data that could inform your chosen topic. For example:

    • Census data (e.g., age distribution, income, housing tenure)
    • Municipal open data (e.g., park locations, transit routes, business licenses)
    • A third source of your choice (survey data, environmental data, spatial data, etc.)
  3. Complete a Data Source Inventory entry for each. Use the template in Section 20.12 to document each data source. Fill in all fields. Where information is missing or unclear, note that as a limitation.

  4. Assess quality and usability. For each data source, write a 1-paragraph assessment addressing:

    • How current is the data?
    • What populations or areas might be undercounted or excluded?
    • What key variables are present or missing?
    • Is the data available in a usable format (machine-readable, spatial)?
    • What are the biggest limitations for your chosen topic?
  5. Reflect on gaps. Write a 1-page reflection:

    • What could these three data sources tell you about your topic?
    • What critical information is missing from quantitative data alone?
    • What qualitative methods (interviews, observations, participatory mapping) would you pair with this data to get a fuller picture?
    • If you had unlimited resources, what additional data would you collect?

Deliverable: A completed Data Source Inventory (3 entries) and a 1-page reflection.

Time Estimate: 90-120 minutes

Safety and Ethics Notes: Do not download or work with any data that includes personal identifiers (names, addresses of individuals, health records). Use only publicly available, aggregate data. If a dataset requires ethics approval or a data-sharing agreement, document that requirement in the inventory but do not attempt to access the data without proper authorization.


Key Takeaways

  • Quantitative data—census, administrative, open data, surveys, service use, economic, environmental, spatial, and indicators—provides essential evidence for Community Mapping, but every source has limitations, biases, and gaps.
  • Census data offers unmatched granularity and consistency, but systematically undercounts homeless, mobile, undocumented, and some Indigenous populations; the 2011 voluntary NHS in Canada was a data disaster that underscores the importance of mandatory long-form collection.
  • Administrative data reflects system contact, not population reality; access is restricted for good ethical reasons, and analysis must not conflate service use with prevalence or need.
  • Municipal open data is a gift, but quality varies; not all datasets are current, complete, or accurately geocoded—always check metadata and last update dates.
  • Service use data, economic data, and environmental data all require triangulation with other sources and qualitative context to avoid misleading conclusions.
  • Data cleaning is a methodological choice, not a neutral technical step; every decision to remove, impute, or aggregate data shapes findings and must be documented transparently.
  • Indicators simplify complexity and enable comparison, but they also mask inequality, reflect political choices about what matters, and can drive distorted priorities when treated as rigid targets.

Recommended Further Reading

Foundational:

  • Statistics Canada. (2021). 2021 Census of Population. https://www12.statcan.gc.ca/census-recensement/index-eng.cfm — The authoritative source for Canadian census data, with detailed methodology, data quality notes, and user guides.
  • Suggested: Foundational texts on survey methodology, indicator development, and data ethics in community research.

Academic Research:

  • Suggested: Research on census undercounts and their implications for equity; environmental justice and spatial analysis; the politics of indicators and metrics; and ethical frameworks for working with administrative data.

Practical Guides:

  • Open Government Portal (Canada). Open Data Inventory. https://open.canada.ca — Links to federal and municipal open data portals across Canada.
  • Suggested: Practitioner guides on data cleaning workflows, geocoding techniques, and community-engaged data governance.

Case Studies:

  • Statistics Canada. (2016). Report on the 2011 National Household Survey. — A post-mortem on the voluntary NHS and its data quality failures, illustrating the consequences of weakening mandatory data collection.
  • First Nations Information Governance Centre. OCAP Principles. https://fnigc.ca/ocap-training/ — Ownership, Control, Access, and Possession principles for Indigenous data sovereignty.
  • Suggested: Case studies of environmental justice mapping, food access analysis, and community-led data projects that integrate quantitative and qualitative methods.

Plain-Language Summary

Quantitative data—numbers about populations, services, environments, and places—is a core tool in Community Mapping. This chapter covers the main sources: census data (population, income, housing), open data from governments (parks, transit, permits), surveys, service use records (who accesses food banks, libraries, health clinics), economic data (businesses, income, property values), environmental data (pollution, climate, green space), and spatial data (maps and boundaries).

Each source is useful but incomplete. Census data misses people who are homeless, highly mobile, or distrustful of government. Service use data shows who accesses services, not who needs them but can't access them. Survey results depend on who responds—and low response rates mean the findings might not represent everyone. Environmental data comes from monitoring stations that are sparse and unevenly distributed.

The chapter emphasizes that numbers alone don't tell the full story. They need to be paired with stories, validated against lived experience, and used carefully. Data cleaning—fixing errors, handling missing information, checking for duplicates—is not just a technical task; every cleaning choice affects the findings and must be documented.

Quantitative data is essential for Community Mapping, but it must be used with honesty about its gaps, humility about its limitations, and a commitment to pairing it with qualitative methods that capture what numbers miss.


End of Chapter 20.