Skip to main content
Version: 4.6.1

Monitoring Best Practices

This guide explains how to monitor metrics in real-time and make practical decisions. It covers how to respond to common issues and adjust capacity.

Quick Reference

Key Metrics to Watch:

  • Queue + Wait Time: Indicates Limited Inflow appropriateness
  • Inflow vs Outflow: Indicates integration health
  • Process Time: Indicates server load state

Healthy Indicators:

  • Inflow ≈ Outflow (within 10-20%)
  • Process Time: Stable or decreasing
  • Queue Size: Low with short wait times
  • Outflow Rate: >80%

Warning Signs:

  • Process Time increasing steadily → Server stress or performance degradation
  • Outflow Rate <70% → Missing explicit exits or integration issues
  • Queue Size growing with increasing wait time → Demand exceeding capacity

Critical Issues (Immediate Action Required):

  • Process Time spiking dramatically → Reduce Limited Inflow immediately
  • Outflow Rate <50% → Critical integration failure; reduce Timeout and investigate
  • Queue Size growing rapidly with high wait time → Capacity exhausted; assess server state

Common Scenarios and Responses

Scenario 1: High Queue, High Wait Time, Server Has Capacity

Symptoms:

  • Queue Size: High
  • Average Wait Time: High
  • Server Resources: Underutilized

What it means: Limited Inflow is set too low. Your server can handle more traffic, but NetFUNNEL is restricting entry too much.

Immediate Actions:

  1. Check server resource utilization (CPU, memory, I/O)
  2. If resources are underutilized, increase Limited Inflow by 10-20%
  3. Monitor Queue Size and Wait Time for 5-10 minutes - they should decrease
  4. If improved, consider another incremental increase

Example:

Current situation:
- Limited Inflow: 100 TPS
- Queue Size: 200 users
- Wait Time: 20 seconds
- Server CPU: 50% (has capacity)

Action: Increase Limited Inflow to 110-120 TPS
Monitor: Check Queue/Wait Time should decrease

Scenario 2: Low Outflow Rate with High Queue

Symptoms:

  • Inflow: 100 TPS
  • Outflow: 50 TPS (or lower)
  • Outflow Rate: <50%
  • Queue Size: Growing
  • Wait Time: Increasing

What it means: Explicit service exits are not happening properly. Users are entering but not explicitly returning keys, causing capacity to be held unnecessarily.

Immediate Actions:

  1. Reduce Timeout values immediately:
    • If Process Time is 1-2 seconds, set Timeout minimum to 1s and maximum to 2s
    • This frees up capacity quickly for new users
    • Timeout settings can be adjusted in segment Advanced - Timing (Basic Control) or Advanced - Timing (Section Control)
  2. Monitor Queue Size - it should start decreasing
  3. If queue remains long after Timeout adjustment, consider increasing Limited Inflow (if server resources allow)

Long-term Actions:

  1. Investigate root cause:
    • Check if nfStop() calls are missing in code
    • Verify integration implementation
    • Review error logs for integration failures
  2. Fix integration issues:
    • Add nfStop() calls in all appropriate places
    • Ensure error handling includes key return
  3. Optimize Timeout settings:
    • Set Timeout based on actual Process Time + buffer
    • Example: If Process Time averages 5 seconds, set Timeout to 6-7 seconds
Missing Explicit Exits

If Outflow Rate is consistently below 70%, this is a critical integration issue. While Timeout can mitigate the immediate problem, you must fix the root cause.

Scenario 3: Process Time Increasing

Symptoms:

  • Process Time: Gradually increasing
  • Queue Size: May or may not be increasing

What it means: Server load is increasing or performance is degrading. This is an indirect indicator of server stress.

Immediate Actions:

  1. Check server resources (CPU, memory via APM or server monitoring)
  2. If server is overloaded: Reduce Limited Inflow by 40-50% immediately
  3. If server has capacity: May be other issues (network, database, etc.) - investigate
  4. Monitor Process Time trends - if it continues increasing, reduce Limited Inflow further

Example:

Current situation:
- Limited Inflow: 100 TPS
- Process Time: Increasing from 2s to 5s
- Server CPU: 90% (overloaded)

Action: Reduce Limited Inflow to 50-60 TPS immediately
Monitor: Check Process Time should stabilize

Long-term Actions:

  1. Performance optimization:
    • Profile application to identify slow components
    • Optimize database queries
    • Scale server resources if needed
  2. Capacity planning:
    • Determine optimal Limited Inflow based on Process Time thresholds
    • Set up alerts for Process Time exceeding thresholds

Timeout Optimization

Timeout settings determine how long NetFUNNEL waits before automatically returning keys when explicit exits don't occur.

Default Range: 6-20 seconds (minimum-maximum)

How It Works:

  • NetFUNNEL uses the maximum value initially
  • Keys are automatically returned after timeout if nfStop() isn't called

Setting Optimal Timeout:

  1. Monitor Process Time over time to identify typical range
  2. Set minimum to typical minimum Process Time
  3. Set maximum to typical maximum Process Time + 20-30% buffer
  4. Example: If Process Time is 8-12s, set Timeout to 8-15s
  5. If Outflow Rate is low (<70%), reduce Timeout to free capacity faster
Timeout vs Process Time

If Process Time regularly exceeds Timeout, users will be forcibly exited before service completion. Always set Timeout above typical Process Time with a safety buffer.

Important Notes

About Limited Inflow adjustments:

  • Increase: 10-20% incrementally, monitor for 5-10 minutes
  • Decrease: 40-50% aggressively when protecting server

About Process Time:

  • NetFUNNEL doesn't directly monitor server CPU/memory
  • Process Time is an indirect indicator of server load
  • Always cross-reference with your server monitoring tools (APM, etc.)