📖 Risk vs Issue — Critical Distinction

🇺🇸 English

Risk: An uncertain event or condition that may occur in the future. Managed proactively. Lives in the Risk Register.

Issue: A problem or obstacle that is currently affecting the project. Requires immediate action. Lives in the Issue Log.

Relationship: When a risk occurs, it becomes an issue. When an issue is resolved, its resolution may create new risks. PM manages both simultaneously.

Root Cause Analysis (RCA): Identifying the fundamental cause of an issue, not just the symptom. Tools: 5 Whys, Fishbone diagram. Fix root cause to prevent recurrence.

Reference: PMI — Issue Management

🇻🇳 Tiếng Việt

Rủi ro (Risk): Sự kiện không chắc chắn có thể xảy ra trong tương lai. Quản lý proactively. Ghi trong Risk Register.

Vấn đề (Issue): Sự cố hoặc trở ngại đang ảnh hưởng đến dự án ngay bây giờ. Cần hành động ngay. Ghi trong Issue Log.

Mối quan hệ: Khi rủi ro xảy ra, nó trở thành vấn đề. PM quản lý cả hai đồng thời.

Root Cause Analysis (RCA): Xác định nguyên nhân gốc của vấn đề, không chỉ triệu chứng. Công cụ: 5 Whys, Fishbone diagram.

🔧 Issue Management Process

Issue Log & Management
ID | Description | Impact | Priority | Owner | Due | Status | Resolution I-01 | Partner Bank A sandbox down | HIGH | P1 | PM | D+2 | ACTIVE | Mock server built; partner escalated I-02 | Dev environment network issue | MED | P2 | Tech | D+1 | RESOLVED | IT fixed firewall rule; post-mortem done I-03 | PH team member visa delay | MED | P2 | PM | D+5 | ACTIVE | Reassigning tasks; remote onboarding I-04 | Budget overrun in Sprint 8 | HIGH | P1 | PM | D+3 | ESCALATED| Escalated to sponsor; analysis in progress ── PRIORITY LEVELS ───────────────────────────────────── P1 (Critical): Blocks delivery, immediate action, escalate if not resolved in 24h P2 (High): Impacts quality/schedule, resolve within 3 days P3 (Medium): Workaround available, resolve within sprint P4 (Low): Minor impact, resolve at next opportunity ── 5 WHYS EXAMPLE (Issue I-04: Budget overrun) ────────── Why #1: Sprint 8 cost 20% over budget Why #2: Team spent more hours than estimated Why #3: API integration testing took 3x longer than planned Why #4: Partner sandbox had intermittent failures requiring repeated testing Why #5: No SLA for sandbox uptime was included in partner agreement → Root Cause: Inadequate SLA definition for partner sandbox environment Fix: Negotiate sandbox SLA; add uptime clause to all future partner contracts

Issue Resolution Strategies

StrategyWhen to UseExample
Resolve directlyPM has authority and resources to fixTeam member conflict resolved through mediation
Implement workaroundCan't fix root cause now, need to continuePartner API down → build mock server to unblock dev
EscalateBeyond PM authority, needs organizational decisionBudget overrun → escalate to sponsor for approval
Accept impactIssue cannot be resolved, adjust baselinesKey developer sick for 2 weeks → revise schedule baseline
Root cause eliminationPrevent recurrenceFix the process that allowed the issue to arise
🎯
Exam Tips — Issue Management
  • Risk ≠ Issue: Risk = future, uncertain. Issue = present, happening now. This distinction is always tested.
  • When a risk becomes an issue: move from risk register → issue log, close risk entry, open issue entry
  • Always log issues — even minor ones. The log provides traceability for lessons learned and audits.
  • Root cause analysis: fix the CAUSE, not just the symptom. 5 Whys is the simplest exam-acceptable method.
  • Unresolved issues should be escalated, not ignored. Escalation is a professional responsibility.
  • In Agile: issues = impediments. Scrum Master's primary responsibility is removing impediments.

💼 Thực chiến / Scenario

🏢

FinTech Company X — Issue Resolution Under Pressure

Situation: Two days before Sprint 10 review with Bank Partner A. A critical bug is discovered: loan applications submitted between 11pm-12am are failing silently (no error shown to user, no application saved). Estimated 47 affected applications over 5 days.

Immediate response (Issue I-07, P1):

  1. Log: I-07 created in Jira with severity Critical, P1. Owner: Tech Lead + PM.
  2. Workaround: Temporarily disable 11pm-12am submission window (maintenance page). This stops new occurrences within 30 minutes of discovery.
  3. Communication: PM notifies partner PM and sponsor within 1 hour with: what happened, what data is affected, immediate action taken, resolution timeline.
  4. Root cause (5 Whys): Timezone handling bug — server uses UTC, application form uses local time. Midnight crossover creates invalid timestamp that fails DB constraint silently. Root cause: no timezone normalization test in the test suite.
  5. Fix: Timezone normalization fix deployed in 4 hours. Affected applications manually recovered. Automated timezone test added to test suite (prevents recurrence).
  6. Post-mortem: RCA documented. Lessons learned: all time-sensitive operations need timezone test coverage.

PMP lesson: Issue response is measured in hours, not days for P1. Workaround first (stop the bleeding), then fix root cause, then prevent recurrence. Transparency with stakeholders is non-negotiable.

✏️ Practice Questions

Question 1
A project team member reports that a key vendor is two weeks late delivering a component. The PM had identified this as a risk earlier. What should the PM do FIRST?
  • A. Create a new risk entry in the risk register
  • B. Close the risk entry and open a new issue in the issue log, then implement the risk response plan
  • C. Escalate directly to the sponsor since the vendor is late
  • D. Wait another week to see if the vendor catches up before taking action
✅ Answer: B — When a risk event occurs, the risk becomes an issue. The PM should: (1) close or update the risk register entry to show it has occurred, (2) open an issue in the issue log, and (3) implement the previously prepared risk response plan. Option A is wrong (it was already a risk). Option C is premature — implement the response plan first. Option D delays resolution unnecessarily when a planned response already exists.
Question 2
Three bugs have been found in production after launch: Bug A (critical, crashes for 5% of users), Bug B (high, wrong calculation in reports), Bug C (low, UI alignment issue). How should the PM prioritize resolution?
  • A. Resolve in order they were found (A, B, C)
  • B. Bug A first (critical user impact), then B (business accuracy), then C (cosmetic)
  • C. Resolve all simultaneously with the full team
  • D. Resolve C first since it's quickest and clears the list
✅ Answer: B — Issues should be prioritized by business impact and severity, not discovery order or effort level. Bug A is critical because it causes crashes for real users — immediate business harm. Bug B affects data accuracy in reports — financial and compliance risk. Bug C is cosmetic — low business impact. Resolving the easiest item first (D) is a common trap: "quick wins" that ignore urgency create the illusion of progress while critical issues fester. At FinTech Company X, a wrong calculation in reports (Bug B) could also carry regulatory risk, so context matters.
Question 3
The PM's root cause analysis reveals that multiple issues this sprint were caused by insufficient code review — developers were skipping reviews to meet sprint commitments. This is BEST addressed by:
  • A. Removing the code review requirement to reduce pressure
  • B. Addressing the underlying process problem: update Definition of Done to require code review, and have a team discussion about quality vs. speed trade-offs
  • C. Reporting the developers to HR
  • D. Adding more testers to catch errors downstream
✅ Answer: B — Root cause analysis points to a process gap (code review not enforced) and a team norm problem (quality vs. speed trade-off). The fix is to address the root cause: reinforce the Definition of Done to explicitly require code review, and facilitate a team retrospective to align on quality standards and realistic sprint commitments. Removing the requirement (A) accepts technical debt. Adding testers (D) is a downstream band-aid that doesn't prevent defects from being written. HR escalation (C) is disproportionate for a process/norms issue.

🤖 AI Tools for PMs

🤖
How AI Augments This Process

AI helps PMs analyze issue root causes, generate structured resolution plans, draft escalation communications, and identify recurring issue patterns across sprints.

Sample Claude Prompts

Issue resolution plan generation I have a project issue that needs a structured resolution plan. Issue: [describe the problem — specific, observable] Severity: [Critical / High / Medium / Low] Impact: [what is blocked or degraded] Discovery: [how and when it was found] Duration: [how long it has existed] Stakeholders affected: [who is impacted] What's been tried: [attempts to resolve] Root cause hypothesis: [what I think caused it] Generate a structured Issue Resolution Plan: 1. Root cause confirmation approach (how to validate the hypothesis) 2. Immediate containment actions (stop the bleeding now) 3. Short-term corrective action (fix the issue within [X] days) 4. Long-term preventive action (prevent recurrence) 5. Owner assignments for each action 6. Communication plan: who needs to know, how, by when 7. Resolution verification criteria (how do we know it's actually fixed?) 8. Issue closure checklist
Sprint issue pattern analysis I want to analyze recurring issues across my last [N] sprints to find systemic problems. Issue data: Sprint 1 issues: [list or describe] Sprint 2 issues: [list or describe] Sprint 3 issues: [list or describe] [continue] Issue types (if categorized): [technical debt / dependency / process / communication / quality / resource] Average resolution time: [days] Escaped to production: [count] Analyze and: 1. Identify the top 3 recurring issue themes 2. Categorize issues by root cause type (people / process / technology / external) 3. Calculate the "issue cost" (sprint capacity consumed by issue resolution) 4. Identify which issues signal systemic risk (not just one-offs) 5. Recommend process changes to reduce issue frequency by category 6. Suggest 2 items to add to the Definition of Done to catch issues earlier
Escalation decision and message I have an issue that may need to be escalated. Help me decide and prepare. Issue: [description] My attempts to resolve: [what I've tried] Blockers to resolution at my level: [what I can't fix myself] Time sensitivity: [how urgent is this] Business impact if unresolved: [cost in time, money, or risk] Who I'm considering escalating to: [role/name] My concern about escalating: [worried about looking incompetent? political risk? don't want to alarm stakeholders?] Help me: 1. Decide: is escalation necessary? (criteria: impact, time, authority needed) 2. What level to escalate to 3. When to escalate (now vs. give it 24-48h more) 4. Draft the escalation message (email or Slack) — professional, concise, solution-oriented 5. What NOT to do (common escalation mistakes that damage credibility)

Jira / Confluence Template

Jira — Issue Log
── JIRA: ISSUE LOG TEMPLATE ────────────────────────────── Issue Type: Issue (distinct from Bug — project-level problem) Summary: [ISSUE-###] [Category] — [Short description] Priority: Critical / High / Medium / Low Labels: issue | category:[dependency/process/resource/technical/external] ── ISSUE DESCRIPTION ───────────────────────────────────── What happened: [Observable problem statement — specific] Impact: [What is blocked or degraded — business terms] Severity: [ ] Critical — stops work [ ] High — major impact [ ] Medium — workaround available [ ] Low — minor First identified: [YYYY-MM-DD] by [name/role] Duration: [How long has this existed] ── ROOT CAUSE ──────────────────────────────────────────── Root cause: [5 Whys result or initial hypothesis] Category: [ ] Process [ ] People [ ] Technology [ ] External ── RESOLUTION ──────────────────────────────────────────── Owner: [Name / role] Target date: [YYYY-MM-DD] Actions: 1. [Immediate action — owner — due] 2. [Long-term fix — owner — due] Status: [ ] Open [ ] In Progress [ ] Escalated [ ] Resolved