Skip to content

Single Points of Failure

Complete analysis of the 3 critical components that, if they fail, cause major system breakdowns.


Overview

A Single Point of Failure (SPOF) is a component whose failure causes an entire subsystem or the whole system to stop functioning.

Expensis has 3 SPOFs:

Component Impact Affected Workflows
MessageConsumedHandler ALL command chains break 99% of workflows
SyncProfileHandler ALL provider syncs stop 100% of automated syncs
sync:manager cron NO automated syncs run 100% of automation

SPOF #1: MessageConsumedHandler

What It Does

Enables ALL command chaining by listening for completed commands and automatically dispatching chained commands.

Location: src/MessageBus/MessageConsumedHandler.php

class MessageConsumedHandler implements EventSubscriberInterface
{
    public static function getSubscribedEvents()
    {
        return [
            WorkerMessageHandledEvent::class => 'onWorkerMessageHandled',
        ];
    }

    public function onWorkerMessageHandled(WorkerMessageHandledEvent $event)
    {
        $message = $event->getEnvelope()->getMessage();

        if ($message instanceof ChainableCommandInterface) {
            foreach ($message->getChain() as $chainedCommand) {
                $this->messageBus->dispatch($chainedCommand);
            }
        }
    }
}

Impact if It Fails

graph LR
    A[Command A] --> B[Handler A completes]
    B --> C[ Chain stops here]

    style C fill:#ff6b6b

Without MessageConsumedHandler: - First command executes - Handler completes - Chained commands NEVER execute

Affected Workflows: - KPN SP16 sync (4-command chain) - Vodafone sync (2-command chain) - T-Mobile sync (2-command chain) - All provider syncs with totals - Routit CDR processing (6-phase chain) - Manual totals recalculation - 99% of all workflows

Failure Symptoms

-- Syncs complete but totals not calculated
SELECT
    st.id,
    st.status as sync_status,
    st.customer_id,
    t.id as totals_id
FROM sync_task st
LEFT JOIN total_usage_new_table t ON
    st.customer_id = t.customer_id
    AND st.cycle_start_date = t.cycle_start_date
WHERE st.status = 'completed'
AND st.created_at > NOW() - INTERVAL 1 HOUR
AND t.id IS NULL;

-- If results found: chains not executing

Detection

# 1. Check if handler exists
ls -la src/MessageBus/MessageConsumedHandler.php

# 2. Verify event subscription
php bin/console debug:event-dispatcher | grep -i "worker.message.handled"

# Should show MessageConsumedHandler subscribed

# 3. Test chain execution
php bin/console totals:new --customer=123 --cycle=2025-01-01

# Check if backup created (part of chain)
SELECT * FROM total_usage_new_backup_table
WHERE customer_id = 123
AND cycle_start_date = '2025-01-01';

Recovery

# 1. Verify file exists and has no syntax errors
php -l src/MessageBus/MessageConsumedHandler.php

# 2. Check services configuration
grep -A 10 "MessageConsumedHandler" config/services.yaml

# 3. Clear Symfony cache
php bin/console cache:clear --env=prod

# 4. Restart messenger workers
php bin/console messenger:stop-workers

# 5. Test execution
php bin/console totals:new --customer=123 --cycle=2025-01-01 -vvv

Prevention

Monitoring: Alert if chains not executing Tests: Verify chain execution in CI/CD Backups: Keep backup of working file Documentation: This page!


SPOF #2: SyncProfileHandler

What It Does

Master router that routes ALL provider sync commands to their specific handlers.

Location: src/MessageBus/AsynchronousHandler/SyncProfileHandler.php:153

#[AsMessageHandler]
class SyncProfileHandler
{
    public function __invoke(SyncProfileCommand $command)
    {
        $syncType = $command->getSyncProfile()->getSyncType();

        $syncCommand = match($syncType) {
            SyncType::KPN_SP16, SyncType::GRIP
                => new KpnSp16SyncCommand(...),
            SyncType::MY_VODAFONE
                => new VodafoneSyncCommand(...),
            SyncType::T_MOBILE
                => new TMobileCalviSyncCommand(...),
            SyncType::KPN_EEN
                => new KpnEenSyncCommand(...),
            SyncType::YIELDER
                => new YielderSyncCommand(...),
            default
                => throw new \Exception("Unknown sync type: $syncType")
        };

        $this->messageBus->dispatch($syncCommand);
    }
}

Impact if It Fails

graph TB
    SM[sync:manager] --> SPC[SyncProfileCommand]
    SPC --> SPH[ SyncProfileHandler<br/>BROKEN]
    SPH --> NONE[Nothing happens]

    style SPH fill:#ff6b6b
    style NONE fill:#ff6b6b

Without SyncProfileHandler: - sync:manager runs - SyncProfileCommand dispatched - NO provider handlers called - ZERO syncs execute

Affected Workflows: - ALL KPN syncs (SP16, EEN, GRIP) - ALL Vodafone syncs - ALL T-Mobile syncs - ALL Yielder syncs - 100% of automated provider syncs

Failure Symptoms

-- Cron runs but no sync tasks created
SELECT COUNT(*) FROM sync_task
WHERE created_at > NOW() - INTERVAL 15 MINUTE;

-- Should be > 0 every 5 minutes
-- If 0: SyncProfileHandler or sync:manager broken

Detection

# 1. Check if handler exists
ls -la src/MessageBus/AsynchronousHandler/SyncProfileHandler.php

# 2. Verify handler registration
php bin/console debug:messenger | grep SyncProfileHandler

# 3. Test manual sync
php bin/console sync:manager --customer=123 -vvv

# Check logs for routing errors
tail -50 /var/www/expensis/var/log/prod.log | grep -i "syncprofile\|routing"

Recovery

# 1. Verify file integrity
php -l src/MessageBus/AsynchronousHandler/SyncProfileHandler.php

# 2. Check for routing configuration
cat config/packages/messenger.yaml | grep -A 5 SyncProfileCommand

# 3. Clear cache
php bin/console cache:clear --env=prod

# 4. Restart workers
php bin/console messenger:stop-workers

# 5. Test routing
php bin/console sync:manager --customer=123 --force -vvv

Prevention

Monitoring: Alert if no sync tasks created in 15 minutes Tests: Verify routing for all sync types Logging: Log all routing decisions Alerts: Email on routing failures


SPOF #3: sync:manager Cron Job

What It Does

Master orchestrator that triggers ALL automated provider syncs every 5 minutes.

Schedule: */5 * * * *

Cron Entry:

*/5 * * * * cd /var/www/expensis && php bin/console sync:manager >> /var/log/expensis/sync-manager.log 2>&1

Impact if It Fails

graph TB
    CRON[ Cron stopped or<br/>entry missing]
    CRON --> NOTHING[NO automation runs]

    style CRON fill:#ff6b6b
    style NOTHING fill:#ff6b6b

Without sync:manager cron: - Cron daemon may be running - sync:manager never executes - NO SyncProfileCommands dispatched - ZERO automated syncs occur

Affected Workflows: - ALL automated provider syncs - 100% of automation

Manual syncs still work (can run command manually)

Failure Symptoms

-- No sync tasks created recently
SELECT MAX(created_at) as last_sync_task
FROM sync_task;

-- If > 15 minutes ago: cron not running
# No recent cron execution
grep "sync:manager" /var/log/syslog | tail -20

# Should see entries every 5 minutes

Detection

# 1. Check cron daemon status
sudo systemctl status cron

# 2. Verify cron entry exists
crontab -l | grep sync:manager

# 3. Check execution logs
grep "sync:manager" /var/log/syslog | tail -20

# 4. Test manual execution
cd /var/www/expensis
php bin/console sync:manager -vvv

Recovery

# 1. Check if cron daemon running
sudo systemctl status cron

# If stopped:
sudo systemctl start cron
sudo systemctl enable cron

# 2. Verify cron entry
crontab -e
# Add if missing:
*/5 * * * * cd /var/www/expensis && php bin/console sync:manager >> /var/log/expensis/sync-manager.log 2>&1

# 3. Check file permissions
ls -la /var/www/expensis/bin/console
# Should be executable

# 4. Test execution
cd /var/www/expensis
php bin/console sync:manager -vvv

# 5. Monitor logs
tail -f /var/log/syslog | grep sync:manager

Prevention

Monitoring: Alert if no execution in 15 minutes Logging: Log every execution Health Check: Automated script checks cron status Backup: Document cron entry Alerts: Email + Slack on failures


SPOF Comparison

Metric MessageConsumedHandler SyncProfileHandler sync:manager Cron
Type Event Subscriber Command Handler Scheduled Task
Impact 99% workflows 100% provider syncs 100% automation
Failure Rate Very Low Very Low Low
Detection Time Minutes Minutes 15+ minutes
Recovery Time Seconds Seconds Minutes
Severity CRITICAL CRITICAL CRITICAL

SPOF Detection Dashboard

-- Combined health check query
SELECT
    'MessageConsumedHandler' as component,
    CASE
        WHEN EXISTS (
            SELECT 1 FROM sync_task st
            LEFT JOIN total_usage_new_table t ON st.customer_id = t.customer_id
                AND st.cycle_start_date = t.cycle_start_date
            WHERE st.status = 'completed'
            AND st.created_at > NOW() - INTERVAL 1 HOUR
            AND t.id IS NULL
        ) THEN ' FAILING'
        ELSE ' OK'
    END as status

UNION ALL

SELECT
    'SyncProfileHandler' as component,
    CASE
        WHEN (SELECT COUNT(*) FROM sync_task
              WHERE created_at > NOW() - INTERVAL 15 MINUTE) = 0
        THEN ' FAILING'
        ELSE ' OK'
    END as status

UNION ALL

SELECT
    'sync:manager Cron' as component,
    CASE
        WHEN (SELECT MAX(created_at) FROM sync_task) < NOW() - INTERVAL 15 MINUTE
        THEN ' FAILING'
        ELSE ' OK'
    END as status;

Mitigation Strategies

For MessageConsumedHandler

  1. Version Control: Never modify without backup
  2. Testing: Test chain execution in staging first
  3. Monitoring: Alert on failed chain execution
  4. Documentation: Keep this page updated

For SyncProfileHandler

  1. Routing Tests: Test all sync types
  2. Error Handling: Catch and log unknown types
  3. Monitoring: Alert on routing failures
  4. Fallback: Log to database if routing fails

For sync:manager Cron

  1. Redundancy: Consider multiple cron servers
  2. Monitoring: External monitoring service
  3. Alerts: Multiple alert channels (email, Slack, PagerDuty)
  4. Health Endpoint: HTTP endpoint to check cron health