Single Points of Failure¶
Complete analysis of the 3 critical components that, if they fail, cause major system breakdowns.
Overview¶
A Single Point of Failure (SPOF) is a component whose failure causes an entire subsystem or the whole system to stop functioning.
Expensis has 3 SPOFs:
| Component | Impact | Affected Workflows |
|---|---|---|
| MessageConsumedHandler | ALL command chains break | 99% of workflows |
| SyncProfileHandler | ALL provider syncs stop | 100% of automated syncs |
| sync:manager cron | NO automated syncs run | 100% of automation |
SPOF #1: MessageConsumedHandler¶
What It Does¶
Enables ALL command chaining by listening for completed commands and automatically dispatching chained commands.
Location: src/MessageBus/MessageConsumedHandler.php
class MessageConsumedHandler implements EventSubscriberInterface
{
public static function getSubscribedEvents()
{
return [
WorkerMessageHandledEvent::class => 'onWorkerMessageHandled',
];
}
public function onWorkerMessageHandled(WorkerMessageHandledEvent $event)
{
$message = $event->getEnvelope()->getMessage();
if ($message instanceof ChainableCommandInterface) {
foreach ($message->getChain() as $chainedCommand) {
$this->messageBus->dispatch($chainedCommand);
}
}
}
}
Impact if It Fails¶
graph LR
A[Command A] --> B[Handler A completes]
B --> C[ Chain stops here]
style C fill:#ff6b6b
Without MessageConsumedHandler: - First command executes - Handler completes - Chained commands NEVER execute
Affected Workflows: - KPN SP16 sync (4-command chain) - Vodafone sync (2-command chain) - T-Mobile sync (2-command chain) - All provider syncs with totals - Routit CDR processing (6-phase chain) - Manual totals recalculation - 99% of all workflows
Failure Symptoms¶
-- Syncs complete but totals not calculated
SELECT
st.id,
st.status as sync_status,
st.customer_id,
t.id as totals_id
FROM sync_task st
LEFT JOIN total_usage_new_table t ON
st.customer_id = t.customer_id
AND st.cycle_start_date = t.cycle_start_date
WHERE st.status = 'completed'
AND st.created_at > NOW() - INTERVAL 1 HOUR
AND t.id IS NULL;
-- If results found: chains not executing
Detection¶
# 1. Check if handler exists
ls -la src/MessageBus/MessageConsumedHandler.php
# 2. Verify event subscription
php bin/console debug:event-dispatcher | grep -i "worker.message.handled"
# Should show MessageConsumedHandler subscribed
# 3. Test chain execution
php bin/console totals:new --customer=123 --cycle=2025-01-01
# Check if backup created (part of chain)
SELECT * FROM total_usage_new_backup_table
WHERE customer_id = 123
AND cycle_start_date = '2025-01-01';
Recovery¶
# 1. Verify file exists and has no syntax errors
php -l src/MessageBus/MessageConsumedHandler.php
# 2. Check services configuration
grep -A 10 "MessageConsumedHandler" config/services.yaml
# 3. Clear Symfony cache
php bin/console cache:clear --env=prod
# 4. Restart messenger workers
php bin/console messenger:stop-workers
# 5. Test execution
php bin/console totals:new --customer=123 --cycle=2025-01-01 -vvv
Prevention¶
Monitoring: Alert if chains not executing Tests: Verify chain execution in CI/CD Backups: Keep backup of working file Documentation: This page!
SPOF #2: SyncProfileHandler¶
What It Does¶
Master router that routes ALL provider sync commands to their specific handlers.
Location: src/MessageBus/AsynchronousHandler/SyncProfileHandler.php:153
#[AsMessageHandler]
class SyncProfileHandler
{
public function __invoke(SyncProfileCommand $command)
{
$syncType = $command->getSyncProfile()->getSyncType();
$syncCommand = match($syncType) {
SyncType::KPN_SP16, SyncType::GRIP
=> new KpnSp16SyncCommand(...),
SyncType::MY_VODAFONE
=> new VodafoneSyncCommand(...),
SyncType::T_MOBILE
=> new TMobileCalviSyncCommand(...),
SyncType::KPN_EEN
=> new KpnEenSyncCommand(...),
SyncType::YIELDER
=> new YielderSyncCommand(...),
default
=> throw new \Exception("Unknown sync type: $syncType")
};
$this->messageBus->dispatch($syncCommand);
}
}
Impact if It Fails¶
graph TB
SM[sync:manager] --> SPC[SyncProfileCommand]
SPC --> SPH[ SyncProfileHandler<br/>BROKEN]
SPH --> NONE[Nothing happens]
style SPH fill:#ff6b6b
style NONE fill:#ff6b6b
Without SyncProfileHandler: - sync:manager runs - SyncProfileCommand dispatched - NO provider handlers called - ZERO syncs execute
Affected Workflows: - ALL KPN syncs (SP16, EEN, GRIP) - ALL Vodafone syncs - ALL T-Mobile syncs - ALL Yielder syncs - 100% of automated provider syncs
Failure Symptoms¶
-- Cron runs but no sync tasks created
SELECT COUNT(*) FROM sync_task
WHERE created_at > NOW() - INTERVAL 15 MINUTE;
-- Should be > 0 every 5 minutes
-- If 0: SyncProfileHandler or sync:manager broken
Detection¶
# 1. Check if handler exists
ls -la src/MessageBus/AsynchronousHandler/SyncProfileHandler.php
# 2. Verify handler registration
php bin/console debug:messenger | grep SyncProfileHandler
# 3. Test manual sync
php bin/console sync:manager --customer=123 -vvv
# Check logs for routing errors
tail -50 /var/www/expensis/var/log/prod.log | grep -i "syncprofile\|routing"
Recovery¶
# 1. Verify file integrity
php -l src/MessageBus/AsynchronousHandler/SyncProfileHandler.php
# 2. Check for routing configuration
cat config/packages/messenger.yaml | grep -A 5 SyncProfileCommand
# 3. Clear cache
php bin/console cache:clear --env=prod
# 4. Restart workers
php bin/console messenger:stop-workers
# 5. Test routing
php bin/console sync:manager --customer=123 --force -vvv
Prevention¶
Monitoring: Alert if no sync tasks created in 15 minutes Tests: Verify routing for all sync types Logging: Log all routing decisions Alerts: Email on routing failures
SPOF #3: sync:manager Cron Job¶
What It Does¶
Master orchestrator that triggers ALL automated provider syncs every 5 minutes.
Schedule: */5 * * * *
Cron Entry:
*/5 * * * * cd /var/www/expensis && php bin/console sync:manager >> /var/log/expensis/sync-manager.log 2>&1
Impact if It Fails¶
graph TB
CRON[ Cron stopped or<br/>entry missing]
CRON --> NOTHING[NO automation runs]
style CRON fill:#ff6b6b
style NOTHING fill:#ff6b6b
Without sync:manager cron: - Cron daemon may be running - sync:manager never executes - NO SyncProfileCommands dispatched - ZERO automated syncs occur
Affected Workflows: - ALL automated provider syncs - 100% of automation
Manual syncs still work (can run command manually)
Failure Symptoms¶
-- No sync tasks created recently
SELECT MAX(created_at) as last_sync_task
FROM sync_task;
-- If > 15 minutes ago: cron not running
# No recent cron execution
grep "sync:manager" /var/log/syslog | tail -20
# Should see entries every 5 minutes
Detection¶
# 1. Check cron daemon status
sudo systemctl status cron
# 2. Verify cron entry exists
crontab -l | grep sync:manager
# 3. Check execution logs
grep "sync:manager" /var/log/syslog | tail -20
# 4. Test manual execution
cd /var/www/expensis
php bin/console sync:manager -vvv
Recovery¶
# 1. Check if cron daemon running
sudo systemctl status cron
# If stopped:
sudo systemctl start cron
sudo systemctl enable cron
# 2. Verify cron entry
crontab -e
# Add if missing:
*/5 * * * * cd /var/www/expensis && php bin/console sync:manager >> /var/log/expensis/sync-manager.log 2>&1
# 3. Check file permissions
ls -la /var/www/expensis/bin/console
# Should be executable
# 4. Test execution
cd /var/www/expensis
php bin/console sync:manager -vvv
# 5. Monitor logs
tail -f /var/log/syslog | grep sync:manager
Prevention¶
Monitoring: Alert if no execution in 15 minutes Logging: Log every execution Health Check: Automated script checks cron status Backup: Document cron entry Alerts: Email + Slack on failures
SPOF Comparison¶
| Metric | MessageConsumedHandler | SyncProfileHandler | sync:manager Cron |
|---|---|---|---|
| Type | Event Subscriber | Command Handler | Scheduled Task |
| Impact | 99% workflows | 100% provider syncs | 100% automation |
| Failure Rate | Very Low | Very Low | Low |
| Detection Time | Minutes | Minutes | 15+ minutes |
| Recovery Time | Seconds | Seconds | Minutes |
| Severity | CRITICAL | CRITICAL | CRITICAL |
SPOF Detection Dashboard¶
-- Combined health check query
SELECT
'MessageConsumedHandler' as component,
CASE
WHEN EXISTS (
SELECT 1 FROM sync_task st
LEFT JOIN total_usage_new_table t ON st.customer_id = t.customer_id
AND st.cycle_start_date = t.cycle_start_date
WHERE st.status = 'completed'
AND st.created_at > NOW() - INTERVAL 1 HOUR
AND t.id IS NULL
) THEN ' FAILING'
ELSE ' OK'
END as status
UNION ALL
SELECT
'SyncProfileHandler' as component,
CASE
WHEN (SELECT COUNT(*) FROM sync_task
WHERE created_at > NOW() - INTERVAL 15 MINUTE) = 0
THEN ' FAILING'
ELSE ' OK'
END as status
UNION ALL
SELECT
'sync:manager Cron' as component,
CASE
WHEN (SELECT MAX(created_at) FROM sync_task) < NOW() - INTERVAL 15 MINUTE
THEN ' FAILING'
ELSE ' OK'
END as status;
Mitigation Strategies¶
For MessageConsumedHandler¶
- Version Control: Never modify without backup
- Testing: Test chain execution in staging first
- Monitoring: Alert on failed chain execution
- Documentation: Keep this page updated
For SyncProfileHandler¶
- Routing Tests: Test all sync types
- Error Handling: Catch and log unknown types
- Monitoring: Alert on routing failures
- Fallback: Log to database if routing fails
For sync:manager Cron¶
- Redundancy: Consider multiple cron servers
- Monitoring: External monitoring service
- Alerts: Multiple alert channels (email, Slack, PagerDuty)
- Health Endpoint: HTTP endpoint to check cron health
Related Documentation¶
- Critical Paths - Dependency path analysis
- Dependency Graph - Complete system dependencies
- Common Issues - Troubleshooting SPOFs
- Master Sync Workflow - See SPOFs in action