Part X · Building the Knowledgebase

Chapter 53. Tagging, Linking, and the Knowledge Graph

How tagging, cross-references, and graph structures transform isolated data into connected knowledge. Covers tag discipline, provenance, hierarchies vs networks, and AI-assisted tagging done right.

5,420 words · 22 min read

Chapter 53: Tagging, Linking, and the Knowledge Graph


Chapter Overview

This chapter explores how tagging, cross-references, and graph structures transform isolated data entries into connected, queryable knowledge. A tag is metadata that describes what something is, where it belongs, who created it, or when it matters. A link is a relationship that connects one entity to another. Together, tags and links create a knowledge graph — a network of interconnected information that supports discovery, synthesis, and intelligence. This chapter covers tag discipline, provenance, spatial and temporal tagging, AI-assisted workflows, and long-term maintenance.


Learning Outcomes

By the end of this chapter, you will be able to:

  1. Define what a knowledge graph adds beyond simple databases or file systems
  2. Apply tagging discipline to ensure consistency, discoverability, and long-term value
  3. Distinguish between hierarchical taxonomies and networked knowledge structures
  4. Identify how spatial, temporal, and provenance tags support community mapping research
  5. Evaluate AI-assisted tagging workflows and recognize failure modes like tag drift
  6. Articulate maintenance strategies to prevent tag soup and knowledge decay
  7. Design a tag audit process to sustain knowledgebase quality over time

Key Terms

  • Knowledge Graph: A network of interconnected entities (places, people, organizations, events) linked by typed relationships, enabling complex queries and pattern discovery.
  • Tag: Structured metadata attached to a knowledge object (document, place, person, event) to describe category, topic, time, place, or provenance.
  • Provenance: Information about the origin of a piece of knowledge — who entered it, when, from what source, with what confidence.
  • Tag Soup: A degraded state where tags proliferate without discipline, resulting in duplicates, inconsistencies, and loss of discoverability.
  • Backlink: A reverse reference showing all other entities that link to the current entity, making relationships bidirectional and supporting serendipitous discovery.

53.1 What a Knowledge Graph Adds

A knowledge graph is not a new concept. Tim Berners-Lee articulated the vision of linked data in the early web — a semantic web where information connects not just through hyperlinks, but through structured, typed relationships. Wikidata builds on this foundation with a property-item model that links entities (people, places, organizations, concepts) through relationships like "located in," "founded by," or "member of." Google's Knowledge Graph powers search results by answering not just "show pages about X" but "tell me X's founding year, headquarters, and CEO."

In the context of a community mapping knowledgebase, a knowledge graph does something simpler and more powerful: it transforms isolated facts into a queryable web of understanding.

Without a graph, a knowledgebase is a collection of files or database rows. "Document A describes Organization X." "Document B describes Event Y." "Place Z is mentioned in Document C." These facts sit separately. You can search for keywords, but you cannot ask relational questions like "show me all events organized by X," "what organizations operate in Place Z," or "which documents share the same topic tags."

A knowledge graph makes those questions answerable. When every entity (place, person, organization, event, document) has an identifier and relationships are explicitly recorded, you can traverse the graph: from an organization to the events it runs, to the places those events occur, to the documents that describe them, to the people who validated those documents. Queries that would require manual cross-referencing across dozens of files become automated.

This matters most when the knowledgebase grows beyond human memory. A small project with 20 organizations and 15 documents can be managed in a spreadsheet or folder structure. A knowledgebase with 500 organizations, 3,000 documents, 200 places, and 1,200 events cannot. At scale, the knowledge graph is what keeps the system usable.

The graph also enables discovery. A researcher exploring food security might search for "food bank" tags and discover linked organizations, related policy documents, and overlapping service areas. A planner might query "all youth programs within 2 km of this school" and get not just a list, but context: who runs them, when they meet, what capacity they have, and what gaps exist. The graph surfaces relationships that single-document search cannot.

Finally, the graph supports validation and quality control. If an organization is mentioned in five documents but has no knowledgebase entry, that's a gap. If a place is tagged in ten events but has no coordinates, that's incomplete data. Graph queries can detect these issues systematically, guiding maintenance work that would otherwise rely on random discovery.


53.2 Tagging Discipline

Tagging is easy to start and hard to sustain. Anyone can add a tag. The challenge is ensuring that tags remain consistent, meaningful, and useful over time.

Tagging discipline begins with a controlled vocabulary — a defined list of acceptable tags within each domain. For topics, this might be a taxonomy of categories: "housing," "health," "food security," "youth services." For status, it might be a fixed set: "verified," "draft," "outdated," "archived." For confidence, it might be: "high," "medium," "low," "unverified."

Without a controlled vocabulary, tagging drifts. One contributor tags an organization "health." Another tags it "healthcare." A third tags it "public health." A fourth tags it "medical services." These are all semantically similar, but in a database or file system, they are four separate tags. Queries for "health" miss entries tagged "healthcare." Aggregation becomes impossible.

Controlled vocabularies prevent this. They constrain choice. They make contributors choose from a predefined list rather than inventing new tags on the fly. This feels restrictive at first, but over time, it is what keeps the knowledgebase navigable.

Controlled vocabularies must be documented and accessible. If contributors don't know the vocabulary exists or can't find it, they will invent their own tags. The vocabulary should live in a shared location — the knowledgebase documentation, a pinned guide, or the tagging interface itself (dropdown menus, autocomplete fields, validation rules).

Controlled vocabularies also need governance (Chapter 35). Who decides which tags are allowed? How are new tags added when the vocabulary is incomplete? How are deprecated tags handled when they are no longer relevant? Without governance, vocabulary drift is inevitable. Someone adds "COVID-19" as a tag in 2020. By 2025, should that tag remain active, or should it be archived? Should entries tagged "COVID-19" be retagged with something broader like "public health crisis" or "pandemic response"? These are judgment calls, and they require a process.

Tag discipline also means clarity of purpose. What is this tag for? Is "youth" a tag describing the topic of a document (a policy about youth), the audience (a resource for youth), or the subject (an organization serving youth)? If the purpose is ambiguous, different contributors will interpret tags differently, and consistency breaks down.

Some knowledgebases solve this with tag namespaces — prefixes that clarify domain. Instead of "youth," the tags might be "topic:youth," "audience:youth," "demographic:youth." This removes ambiguity and supports more precise queries.

Finally, tagging discipline requires feedback loops. When contributors tag inconsistently, someone must notice and correct it. When new patterns emerge — recurring free-text tags that should become part of the controlled vocabulary — someone must formalize them. Without these loops, the vocabulary stagnates or the system degrades into tag soup.


53.3 Cross-References and Backlinks

Cross-references are explicit links from one entity to another. A document about an organization references that organization's ID. An event entry references the place where it occurs. A person entry references the organizations they belong to. These forward links are the foundation of the knowledge graph.

But forward links alone are incomplete. If Document A links to Organization X, you can navigate from A to X. But if you are viewing Organization X, can you see that Document A references it? Not without a backlink — a reverse index that shows all entities linking to the current one.

Backlinks are what make knowledge graphs bidirectional. They support discovery in both directions. A researcher viewing an organization's entry sees not only basic info (name, address, contact) but also every document, event, and place that references it. This is powerful for validation, context, and completeness. If an organization is mentioned in ten documents but only linked in three, the backlinks reveal the gap.

Backlinks also support serendipitous discovery. A planner researching youth programs might stumble upon a community center's entry and notice, via backlinks, that it is also tagged in housing advocacy documents. This cross-domain connection might not have been obvious from the youth programs search alone, but the graph reveals it.

Some tools automate backlinks. Obsidian, Roam Research, and Logseq — all note-taking systems built on linked knowledge — automatically generate backlink lists on every page. If Note A links to Note B, then Note B's page shows "Linked mentions: Note A." This happens without manual effort. The same principle applies in a community mapping knowledgebase: if the database schema supports it, backlinks can be computed automatically from forward references.

Backlinks also make maintenance easier. When an organization merges, splits, or closes, you can query all documents, events, and places that reference it. Instead of hoping you found every mention through search, the graph tells you exactly where updates are needed.

Cross-references must be typed to be fully useful. It is not enough to know that Entity A links to Entity B. You need to know how they are related. Is B the location of A? The organizer of A? The funder of A? The parent organization of A? Typed relationships make queries precise. "Show me events organized by this group" is different from "show me events located at this place" — even though both are links.

Wikidata uses this model extensively. A Wikidata item for "Toronto" has dozens of relationship types: "located in country: Canada," "twinned with: Kyiv," "head of government: Olivia Chow," "founded: 1793." Each relationship is typed, bidirectional, and supports queries in both directions.

For a community mapping knowledgebase, typical relationship types might include:

  • "organizes" (organization → event)
  • "occurs at" (event → place)
  • "serves" (organization → demographic group)
  • "documents" (document → organization/event/place)
  • "authored by" (document → person/organization)
  • "validated by" (any entity → person/organization)

Typed relationships also support graph traversal algorithms. Want to find all events within 5 km of a school that are organized by youth-serving nonprofits with high trust scores? That query requires traversing multiple relationship types and combining spatial, categorical, and quality filters. Untyped links cannot support this.


53.4 Hierarchies vs Networks

Traditional knowledge organization favors hierarchies — tree structures where every item belongs to one category, which belongs to a parent category, and so on. Dewey Decimal Classification, organizational charts, and file-folder systems are all hierarchical.

Hierarchies are simple and familiar. They answer the question "where does this belong?" with a single path: Health > Public Health > Disease Prevention > Vaccination Programs. Browsing is intuitive — start at the top, drill down.

But hierarchies force single-category assignment, and reality is messier. A community health fair might belong under "health," but it is also a "community event," an "outreach strategy," and a "partnership initiative." Forcing it into one category loses context. Duplication (placing it in multiple branches) creates maintenance headaches.

This is where network structures — knowledge graphs — excel. In a network, an entity can have multiple tags, multiple relationships, and multiple pathways to discovery. The health fair is tagged "health," "community event," "outreach," and "partnership." It links to the organizations that run it, the place where it occurs, and the documents that describe it. There is no single "correct" location. The entity exists in a web of relationships, discoverable from any relevant angle.

StackOverflow's tagging model demonstrates this well. A question can have multiple tags: [python], [web-scraping], [beautifulsoup]. There is no hierarchy forcing the question into a single category. Tags are flat, combinable, and support multi-faceted discovery. A user searching "python web-scraping" finds all questions tagged with both, regardless of which tag was added first or considered "primary."

Networks also support emergent structure. In a hierarchy, structure is imposed top-down. Someone designs the categories before any content exists. In a network, structure emerges from usage. If many entities link to the same organization, that organization becomes a hub. If a place is tagged in dozens of events, it becomes visible as a key community node. The graph reveals centrality, clustering, and patterns that a predefined hierarchy would obscure.

But networks are harder to browse than hierarchies. Without a single tree to navigate, users need other discovery mechanisms: search, tag filtering, graph visualization, or curated entry points. A well-designed knowledgebase balances both: a lightweight hierarchy (top-level domains like "organizations," "places," "events") for orientation, and a rich network of tags and links for depth.

Some systems combine hierarchical taxonomies with networked tags. A document might sit in a folder structure (hierarchy) but also carry tags and cross-references (network). The folder provides basic organization; the tags and links provide discoverability and context. This hybrid model is common in digital asset management and content management systems.

The key lesson: don't force community knowledge into rigid hierarchies. Communities are networks. Organizations collaborate. People belong to multiple groups. Events occur in overlapping domains. A knowledge graph that reflects this networked reality will serve users better than a forced tree structure.


53.5 Spatial Tags and Place

Community mapping knowledgebases are inherently spatial. Almost every entity — organizations, events, services, assets, needs — has a location. Spatial tags make that location queryable.

The simplest spatial tag is a place reference — linking an entity to a named location in the knowledgebase. An event links to "Riverside Community Centre." A service links to "Downtown Public Library." These references enable queries like "show me all events at this location" or "list all services in this neighborhood."

More powerful are coordinate-based tags — latitude and longitude that support precise spatial queries. With coordinates, you can query "all organizations within 2 km of this school," "events along this transit corridor," or "service gaps in this census tract." Coordinate tagging also enables map visualization, distance calculations, and proximity analysis.

But coordinates alone are not enough. A point on a map doesn't tell you whether that location is a fixed site (a building) or a service area (a mobile clinic covering multiple neighborhoods). Spatial tags need context: Is this the headquarters? The primary service location? One of several sites? The administrative boundary? The service catchment area?

Some knowledgebases handle this with spatial relationship types:

  • "headquarters_at" (point)
  • "serves_area" (polygon or named region)
  • "operates_at" (list of points)
  • "covers_region" (administrative boundary reference)

Spatial tags also need temporal scope (see 53.6). An organization's location in 2024 might not be its location in 2020. A farmers' market occurs at a specific place, but only seasonally. Spatial + temporal tagging together capture the full context.

Place-based tagging supports map-first navigation. Instead of searching text or browsing categories, users start with a map. They zoom to their neighborhood and see what's there: organizations, events, services, assets. This is intuitive for community members, planners, and advocates. The map becomes the interface, and tags become the query language.

Finally, spatial tags enable gap analysis. If every organization in a category has coordinates, you can visualize their distribution. Clustering reveals where services are concentrated. Empty areas reveal gaps. Overlay demographic data (Chapter 52 covered this) and the map shows not just where services are, but where they are needed most.

Practical implementation note: not every entity needs coordinates. A policy document doesn't have a location. An advocacy coalition might operate city-wide. Forcing spatial tags where they don't apply creates clutter. Tag what is meaningfully spatial; leave the rest untagged.


53.6 Temporal Tags

Time is as important as place in community knowledge. An organization's funding status changes. A service opens, closes, or shifts hours. An event happens once, or recurs. A document describes a moment in time, but may be reviewed and updated.

Temporal tags capture this. At minimum, every knowledge entity should have:

  • Created date: When was this entry added to the knowledgebase?
  • Updated date: When was it last modified?
  • Valid-from / valid-to dates: When does this information apply? (For time-bound entities like events, contracts, or seasonal services.)

These tags support time-scoped queries. "Show me organizations that were active in 2020 but are no longer operating." "List events occurring in the next 30 days." "Find documents updated since my last visit."

Temporal tags also support version history and provenance (53.7). If an organization's entry has been updated five times, when did each change occur? Who made it? What was changed? Without temporal tracking, the knowledgebase becomes a snapshot with no memory of its evolution.

For recurring entities, temporal tagging gets more complex. A weekly food bank operates every Thursday. A monthly community meeting happens the first Tuesday of each month. A seasonal farmers' market runs May through October. These patterns require structured temporal metadata: recurrence rules, start/end dates, and exceptions (e.g., "closed on holidays").

Some systems use iCalendar recurrence rules (RRULE) for this. An event tagged with RRULE:FREQ=WEEKLY;BYDAY=TH recurs every Thursday. An event tagged with RRULE:FREQ=MONTHLY;BYDAY=1TU recurs on the first Tuesday of each month. This is a standardized, machine-readable format that calendaring systems already understand.

Temporal tags also enable decay detection. If a service listing hasn't been validated in two years, it is probably outdated. A query can flag all entities with "last_validated" dates older than a threshold, guiding maintenance work (53.9).

Finally, temporal tags support historical research and longitudinal analysis. A knowledgebase tracking a community over decades becomes a historical record. Researchers can query "how many youth programs existed in 2015 vs 2025?" or "which organizations survived the pandemic, and which did not?" This requires consistent temporal tagging from the start.


53.7 Provenance Tags

Provenance is the answer to three questions: Who entered this? When? From what source?

Without provenance, knowledge entries are orphaned. You see a claim that an organization serves 500 clients annually. Is that self-reported? From a funder's audit? From a news article? Who added it to the knowledgebase? When? Can you trust it?

Provenance tags make knowledge accountable and verifiable. At minimum, every entry should include:

  • Author: Who created this entry? (Person, organization, or system.)
  • Source: Where did this information come from? (Interview, website, government registry, annual report.)
  • Date added: When was this entry created?
  • Confidence level: How certain is this information? (High, medium, low, unverified.)

Some knowledgebases add:

  • Validator: Who reviewed and confirmed this entry?
  • Date validated: When was it last checked?
  • Validation method: How was it confirmed? (Site visit, phone call, cross-reference with official data.)

Provenance is what makes a community knowledgebase trustworthy. A knowledgebase without provenance is just a collection of claims. A knowledgebase with provenance is an auditable, improvable record.

Provenance also supports quality filtering. A query can prioritize entries with high confidence, recent validation dates, and trusted sources. Entries flagged as "unverified" can be surfaced for review. Entries sourced from official registries can be treated as authoritative; entries sourced from community tips can be flagged for follow-up.

Provenance tags also support credit and accountability. If a community organization contributed knowledge — conducting interviews, validating data, or documenting local assets — provenance records their contribution. This is both ethical (credit where credit is due) and practical (if questions arise, you know who to ask).

In collaborative knowledgebases, provenance prevents edit wars and disputes. If two contributors disagree about an organization's service area, the provenance record shows the sources each relied on. This shifts the conversation from "I think X" vs "I think Y" to "Source A says X; Source B says Y; how do we resolve this?"

Finally, provenance enables change tracking over time. When an organization's entry is updated, the old version is archived with its provenance. The new version has its own provenance. Researchers can trace the history: what changed, when, why, and based on what source. This is essential for longitudinal research and knowledgebase integrity.


53.8 AI-Assisted Tagging Done Right

AI can accelerate tagging work — extracting topics from text, suggesting categories, identifying named entities (people, places, organizations), and proposing cross-references. But AI-assisted tagging must be human-supervised to avoid tag drift, hallucinations, and loss of quality.

The safe workflow is:

  1. AI suggests tags based on document content, entity extraction, or similarity to existing entries.
  2. Human reviews suggestions — approving, rejecting, or editing before they are committed.
  3. Approved tags are logged with provenance — marked as "AI-suggested, human-approved" with reviewer name and date.

This keeps the human in the loop. AI is a tool for acceleration, not autonomy.

AI is particularly good at entity extraction. A document mentions "Riverside Community Centre" and "Youth Empowerment Collective." An AI model trained on the knowledgebase can recognize these as known entities and suggest links. The human reviewer confirms that the references are correct and meaningful, then commits the links.

AI can also suggest topic tags by analyzing text. A document about food security might be auto-tagged "food access," "poverty," "health equity." The human reviewer checks: Are these tags accurate? Are they in the controlled vocabulary? Are they the most relevant tags, or did the AI pick up incidental mentions?

The failure mode is unchecked AI tagging. If AI suggestions are auto-committed without review, errors accumulate. The AI hallucinates an organization name that doesn't exist, and it gets tagged. The AI misinterprets a metaphorical use of "network" as referring to a social network, and tags it incorrectly. The AI suggests a deprecated tag because its training data was outdated. Over time, the knowledgebase degrades.

This is tag drift — the gradual decay of tagging quality when discipline breaks down. AI-assisted tagging without human oversight accelerates drift.

Another risk is over-tagging. AI might suggest ten tags for a document when three are sufficient. More tags are not always better. Too many tags dilute meaning and make filtering harder. Human reviewers must prune AI suggestions to the most relevant.

Finally, AI tagging must respect provenance (53.7). If an AI suggests a cross-reference, that suggestion is logged: "AI-extracted reference to Organization X, reviewed and approved by [Reviewer], [Date]." If the suggestion turns out to be wrong later, the provenance record shows how the error was introduced and who reviewed it.

Done right, AI-assisted tagging is a powerful multiplier. It surfaces patterns humans might miss, speeds up tedious work, and frees contributors to focus on validation and quality. Done carelessly, it is a quality liability.


53.9 Tag Maintenance Over Time

Tags are not static. Controlled vocabularies evolve. Tags become obsolete. New domains emerge. Tag soup accumulates when contributors ignore the vocabulary or invent their own.

Maintenance strategies include:

Regular tag audits (53.11 covers this in detail). Periodically, someone queries the knowledgebase for tag usage. Which tags are used most? Which are used rarely? Are there duplicates or near-duplicates? Are there free-text tags that should be formalized into the controlled vocabulary?

Tag consolidation. If "healthcare," "health," and "public health" are all in use but mean the same thing in this context, consolidate them. Choose the canonical term, update all entries, and deprecate the others.

Tag deprecation. Some tags outlive their usefulness. A "COVID-19 response" tag might be retired after the acute phase ends, with entries retagged to broader categories like "public health crisis" or archived.

Vocabulary expansion. When new topics emerge repeatedly in free-text tags, formalize them. If contributors keep tagging entries "climate adaptation" but it's not in the controlled vocabulary, add it.

Automated tag suggestions for review. If an entry has no topic tags, or its tags are all low-confidence, flag it for manual review. Queries can surface these gaps systematically.

Tag usage metrics. Track which tags are applied to how many entries. A tag used on only two entries might be too specific. A tag used on 500 entries might need to be split into sub-categories.

Maintenance also requires governance (Chapter 35). Who is responsible for tag audits? How often do they happen? Who has authority to consolidate, deprecate, or add tags? Without clear ownership, maintenance doesn't happen.

Tag maintenance is unglamorous but essential. It is the difference between a knowledgebase that remains useful over years and one that devolves into an unusable mess.


53.10 Synthesis and Implications

This chapter has traced how tagging, linking, and graph structures transform isolated data into connected, queryable knowledge. The core insight is simple: knowledge is relational. A fact in isolation has limited value. A fact embedded in a network of related facts — cross-referenced, tagged, time-stamped, sourced, and validated — becomes useful intelligence.

The practical implications:

  1. Invest in tagging discipline early. Controlled vocabularies, tag governance, and provenance tracking are easier to establish at the start than to retrofit later. A small knowledgebase with messy tags becomes a large knowledgebase with unusable tags.

  2. Design for networks, not hierarchies. Communities are not trees. Organizations collaborate across domains. Events span multiple categories. A knowledge graph that reflects this networked reality will serve users better than rigid folder structures.

  3. Make provenance non-negotiable. Every entry should answer: Who added this? When? From what source? This is what makes community knowledge trustworthy and improvable.

  4. Use AI to assist, not replace, human judgment. AI can suggest tags, extract entities, and surface patterns. But humans must review, approve, and maintain quality. Unchecked AI tagging is a path to tag drift and knowledge decay.

  5. Build maintenance into the process. Tags degrade without care. Regular audits, tag consolidation, and vocabulary updates must be planned work, not afterthoughts.

  6. Combine spatial, temporal, and categorical tagging. Community knowledge lives in place and time. An organization exists at an address, but also in a service area, a time period, and a category. Full context requires all three dimensions.

The broader implication connects to Chapter 52's discussion of knowledgebase architecture. A well-tagged, well-linked knowledge graph is not just a technical achievement — it is a governance and coordination tool. It makes community knowledge visible, queryable, and improvable. It supports research, planning, advocacy, and coordination. It ensures that the knowledge captured is not lost when individual contributors move on.

But it requires commitment. Tagging and linking are detail work. They don't have the visible impact of a published report or a public map. Yet without them, the knowledgebase becomes a pile of documents rather than a coherent intelligence system.


53.11 Tag Audit Workshop

Purpose: This exercise teaches you to audit an existing tag system, identify quality issues, and propose improvements — the core maintenance skill for sustaining a knowledge graph over time.

Materials Needed:

  • Access to a knowledgebase or document collection with existing tags (can be simulated with a spreadsheet)
  • Spreadsheet or database tool for tag analysis
  • Controlled vocabulary document (if one exists)

Steps:

  1. Export tag usage data. Generate a list of all tags currently in use, with counts showing how many entries use each tag. If your system doesn't support this, manually compile a sample of 50-100 entries.

  2. Identify duplicates and near-duplicates. Look for tags that are semantically similar but spelled differently (e.g., "youth," "youth services," "young people"). List consolidation candidates.

  3. Check vocabulary compliance. If a controlled vocabulary exists, identify tags that are not in the approved list. Are these legitimate gaps in the vocabulary, or are they contributor errors?

  4. Assess tag specificity. Are there tags that are too broad (e.g., "community") or too narrow (used on only one or two entries)? Propose splits or consolidations.

  5. Review temporal relevance. Are there tags tied to specific events or time periods that are no longer active (e.g., "pandemic response," "2020 election")? Should these be archived, consolidated, or kept active?

  6. Check for orphaned tags. Are there tags with zero usage (perhaps deprecated but not removed)? Propose cleanup.

  7. Propose vocabulary updates. Based on your audit, recommend:

    • Tags to consolidate
    • Tags to deprecate
    • New tags to add to the controlled vocabulary
    • Tags that need clearer definitions or usage guidance

Deliverable: A 2-3 page tag audit report with:

  • Summary statistics (total tags, most-used, least-used, duplicates found)
  • List of recommended consolidations, deprecations, and additions
  • Proposed updates to the controlled vocabulary

Time Estimate: 2-3 hours

Safety and Ethics Notes: If auditing a real community knowledgebase, do not unilaterally change tags without consulting the governance process (Chapter 35). This exercise is diagnostic; actual tag changes require approval.


Key Takeaways

  • A knowledge graph transforms isolated data into connected, queryable knowledge by linking entities through typed relationships and structured tags.
  • Tagging discipline — controlled vocabularies, governance, and provenance — prevents tag soup and ensures long-term discoverability.
  • Spatial, temporal, and provenance tags provide the context necessary to make community knowledge trustworthy and useful.
  • AI-assisted tagging can accelerate work, but human review is essential to prevent tag drift and maintain quality.
  • Tag maintenance — audits, consolidation, deprecation, and vocabulary updates — is unglamorous but essential to sustaining knowledgebase quality.
  • Networks, not rigid hierarchies, are the natural structure for community knowledge.

Recommended Further Reading

Foundational:

  • Berners-Lee, T., Hendler, J., & Lassila, O. (2001). "The Semantic Web." Scientific American, 284(5), 34-43. (Foundational vision of linked data.)
  • Suggested: Introductory texts on knowledge graphs, linked data principles, and graph databases.

Academic Research:

  • Suggested: Research on Wikidata's property-item model, StackOverflow's tagging and reputation systems, and graph-based knowledge representation in library and information science.

Practical Guides:

  • Suggested: Best-practice guides for taxonomy design, controlled vocabulary maintenance, and metadata standards (e.g., Dublin Core for digital libraries).

Case Studies:

  • Suggested: Case studies of real-world knowledge graph implementations — e.g., Google Knowledge Graph, Wikidata curation workflows, Obsidian or Roam Research for personal knowledge management, and enterprise knowledge management systems.

Plain-Language Summary

Tags and links turn a pile of files into connected knowledge. A tag says what something is — a topic, a place, a time period, or who created it. A link connects one thing to another — this document describes that organization, this event happens at that location. Together, tags and links create a knowledge graph: a web of information you can explore, search, and analyze in powerful ways.

Good tagging requires discipline. You need a list of approved tags so everyone uses the same words. You need to track where information came from — who added it, when, and from what source. You need to update tags over time as things change. Without this care, tags become a mess, and the knowledgebase stops being useful.

AI can help by suggesting tags, but humans need to review the suggestions. AI makes mistakes, and unchecked errors pile up fast. Done right, AI speeds up the work. Done wrong, it creates more problems than it solves.

The payoff is a knowledgebase that gets smarter as it grows. You can ask complex questions — "show me youth programs near this school run by trusted organizations" — and get answers. You can see patterns across hundreds of entries that no human could track manually. You can trust what you find because every entry shows where it came from and when it was checked.


End of Chapter 53.