Decision Analytics: Measuring Leadership Under Pressure
How After Action quantifies participant decision quality in crisis exercises
After Action | Version 1.0 | April 2026
Executive Summary
Crisis response isn't won by having the right playbook — it's won by the team executing under pressure. Two organizations with identical plans will produce dramatically different outcomes during an incident because of how their people think, coordinate, and commit to action when things are unclear and time is short.
Measuring this is hard. Traditional cyber assessments score controls and technology, not human behavior. Survey-based maturity assessments score policy documents. None of them capture whether the people actually responsible for responding can do it.
The After Action Decision Analytics Engine solves this. During every facilitated exercise, participants submit structured decisions (text + rationale + confidence). The engine analyzes those decisions across five dimensions — specificity, leadership, consistency, fatigue, and coordination — to produce behavioral profiles of individual participants and the team as a whole.
This whitepaper documents the methodology.
1. What Gets Captured
During a live exercise, each inject triggers a decision prompt. Participants type their response and provide:
- Decision text — free-form, what they would do
- Rationale — why (optional)
- Confidence — 1–5 scale, how certain they are
- Category — technical response / strategic decision / communication / escalation / recovery
- Impact areas — which business functions their decision affects
This data is richer than a questionnaire and messier than a log. It looks like real crisis decision-making because it is.
2. The Five Analysis Dimensions
2.1 Specificity Score (0–100)
What it measures: How detailed and action-oriented a decision is. A decision like "I would escalate this" scores low. A decision like "I would immediately notify our CISO and activate the IR team to isolate affected endpoints via EDR, while our comms lead drafts a holding statement for the CEO" scores high.
Algorithm:
specificity_score = 0
+15 if word_count >= 10
+15 if word_count >= 25
+10 if word_count >= 50
+20 if contains any action keyword
+15 if mentions any stakeholder
+10 if contains escalation language
+15 if rationale field is > 20 chars
capped at 100
Action keyword list: notify, escalate, isolate, contain, communicate, activate, deploy, implement, initiate, engage, brief, convene, document, preserve, assess, monitor, disconnect, block, investigate, restore
Stakeholder keyword list: CISO, CEO, board, legal, counsel, HR, media, regulator, customer, vendor, partner, team, management, executive
Escalation keyword list: escalate, priority, urgent, immediate, critical, alert
Why pattern matching, not NLP: The vocabulary of incident response is narrow. A dictionary of 30–50 terms covers 95% of real decisions. Using pattern matching is deterministic, fast, and auditable in a way that transformer-based NLP is not.
2.2 Leadership Score (0–100)
What it measures: How much a decision reflects leadership behavior vs. individual contributor behavior. Leaders coordinate, explain, and escalate. ICs act.
Formula:
leadership_score =
(action_rate × 25) // did they take action, not just observe?
+ (stakeholder_rate × 25) // did they coordinate with others?
+ (escalation_rate × 15) // did they escalate when warranted?
+ (rationale_rate × 15) // did they explain their reasoning?
+ (specificity_score/100 × 20) // how detailed?
The *_rate variables are 0 or 1 per decision (present or absent), then averaged across all decisions by the participant.
A participant who writes "do X" on every decision will score high on action_rate but low on stakeholder_rate — they behave like an IC. A participant who writes "I'll notify X, then Y, because Z" on every decision will score high on all dimensions — they behave like a leader.
2.3 Consistency Score (0–100)
What it measures: How calibrated a participant's confidence is across decisions. Are they consistently confident, consistently uncertain, or wildly variable?
Formula:
stddev = standard deviation of (confidence values)
consistency_score = max(0, 100 - stddev × 20)
A participant with perfectly flat confidence (always 4) scores 100. A participant bouncing between 2 and 5 scores around 50. A participant bouncing 1–5 scores near 0.
Why consistency matters: In real crisis response, leaders who are "confidently wrong" are worse than leaders who are "uncertainly right." A high-variance confidence profile suggests someone who isn't calibrated to the situation — a red flag.
2.4 Fatigue Indicator
What it measures: Whether response quality degrades over the course of an exercise.
Method:
sort decisions by inject order
compute time-between-decisions for each pair
fatigue_indicator = true IF:
mean(last 3 intervals) > 2 × mean(first 3 intervals)
If later decisions take twice as long as early decisions, the participant is fatiguing. This is the most predictive behavioral signal for real incident response failure — teams that degrade under cognitive load lose containment in real incidents.
2.5 Strength/Weakness Areas by Category
What it measures: Which inject categories a participant performs well in, and which they struggle with.
Method:
for each decision:
bucket by inject_category
track confidence and specificity per bucket
for each category:
avg_confidence = mean of bucket confidence values
avg_specificity = mean of bucket specificity scores
IF avg_confidence >= 4 AND avg_specificity >= 60:
mark as STRENGTH
ELIF avg_confidence < 3 OR avg_specificity < 40:
mark as IMPROVEMENT AREA
A participant might be a strength in technical response (high confidence, detailed decisions) but an improvement area in communications (low confidence, vague decisions). This lets facilitators provide targeted coaching.
3. Team-Level Analysis
Individual profiles are useful. Team-level patterns are where the real insight lives.
3.1 Confidence Trend by Inject Order
Tracks average team confidence across the sequence of injects:
for inject in sorted injects:
avg_confidence[inject.order] = mean of all participant confidences on that inject
A healthy team has a flat or slightly rising curve — they gain confidence as they work through the scenario. A struggling team has a steeply falling curve — they lose composure as pressure mounts.
Graph this trend and show it to the team. It's the single most persuasive visualization in a post-exercise debrief.
3.2 Category Performance
Per-category team averages:
category_performance[category] = {
decision_count: N,
avg_confidence: mean,
avg_quality: mean_specificity,
gap_correlation: how well the category decisions correlate with identified gaps
}
A team might be strong in technical response categories but weak in communications. This maps directly to the capability areas in the readiness score.
3.3 Team Alignment Score
What it measures: How tightly the team's decisions cluster around each other. High alignment = coherent response. Low alignment = siloed thinking.
Method: For each inject, compute the pairwise similarity of decisions (using shared keyword overlap). Average across all injects.
A team with consistently low alignment may have a communication problem — they're not talking to each other during the exercise, so they arrive at inconsistent conclusions.
3.4 Decision Velocity
decision_velocity = {
avg_time_between_decisions_minutes: mean,
fastest_response_minutes: min,
slowest_response_minutes: max,
fatigue_indicator: boolean,
}
Slow decision velocity on the first few injects → team isn't warmed up. Slow velocity on the last few → fatigue. Both are actionable findings for the facilitator's debrief.
4. Predictive Insights
The engine uses these metrics to generate predictive insights — rule-based classifications of patterns into four types:
4.1 Strength insights
Examples:
- "Strong leadership signal from Sarah Chen — leadership score 87, consistently high across technical and communications categories"
- "Team maintained confidence throughout the exercise — no fatigue indicator, sustained engagement"
4.2 Risk insights
Examples:
- "Team alignment dropped from 78 to 42 after inject 5 — the team is losing coordination under pressure"
- "Marcus Wright showed high-variance confidence (stddev 1.8) — may need additional playbook training before the next exercise"
4.3 Trend insights
Examples:
- "Average team confidence trended downward from 4.1 to 2.8 across the exercise — classic fatigue pattern"
- "Decision specificity improved in later injects — team found their rhythm"
4.4 Recommendation insights
Examples:
- "Schedule a communications-focused tabletop within 90 days — current category average is 47"
- "Pair participant X with participant Y for next exercise — complementary strengths"
Each insight has a confidence level (0–100) based on how many data points support it.
5. Integration with Scoring
The decision analytics engine is tightly integrated with the readiness scoring engine:
- Confidence data feeds into the
executive_alignmentbonus (+5 points if avg confidence >= 70/100) - Decision speed data feeds into the
decision_speedcapability area - Category performance informs the "strength areas" section of the AAR
- Fatigue indicator is a direct input to the coaching insights in the post-exercise debrief
This integration means the analytics aren't a side channel — they're part of the main scoring model.
6. Privacy & Ethics
6.1 What's stored
- Decision text (the participant's own words)
- Rationale (optional)
- Confidence rating
- Category and impact areas (tagged by participant)
- Timestamp
6.2 What's not stored
- Audio recordings
- Facial analysis
- Typing biometrics
- Any data the participant didn't explicitly submit
6.3 Who can see individual decisions
- The facilitator (during and after the exercise)
- The participant themselves (via their post-exercise debrief screen)
- Org administrators (via the AAR)
Individual decisions are not exposed to other participants during the exercise to prevent groupthink. Only after the exercise, in the AAR, do participants see each other's decisions alongside the facilitator's observations.
6.4 Anonymization in exports
Any decision data included in the carrier field intake export is aggregated only — individual decision text is never shared outside the client's own org.
7. Why Not Use an LLM?
LLMs could analyze decision text with more linguistic nuance than pattern matching. The reasons we don't:
- Determinism — same input, same score, every time. Critical for audit.
- Speed — 50ms pattern match vs. 3–15s LLM call
- Cost — zero marginal cost per decision, vs. per-token LLM billing
- Privacy — decision text never leaves the platform; nothing sent to a vendor
- Auditability — every keyword in the dictionary is in git history
LLM augmentation is layered as an optional enhancement for AAR prose and coaching narratives, but the quantitative scoring is deterministic.
8. Implementation
The engine is a single 635-line file: src/lib/decision-analytics.ts. Public API:
analyzeDecisionQuality(text, rationale): DecisionQualityMetrics
buildParticipantProfile(id, decisions): ParticipantProfile
computeExerciseAnalytics(exerciseId, decisions): ExerciseAnalytics
generatePredictiveInsights(org, exercises): PredictiveInsight[]
aggregateQualityMetrics(metrics): DecisionQualityMetrics
Every function is pure. No DB, no LLM, no network. Fully tested (100+ unit tests in the repo).
9. Licensing
The engine's pattern dictionaries, scoring weights, and insight generation rules are proprietary trade secret. The source is in src/lib/decision-analytics.ts and licensed via licensing@afteraction.dev.
© 2024-2026 After Action. Decision analytics methodology is proprietary. Contact licensing@afteraction.dev for commercial terms.