Monitoring Best Practices
This guide explains how to monitor metrics in real-time and make practical decisions. It covers how to respond to common issues and adjust capacity.
Quick Reference
Key Metrics to Watch:
- Queue + Wait Time: Indicates Limited Inflow appropriateness
- Inflow vs Outflow: Indicates integration health
- Process Time: Indicates server load state
Healthy Indicators:
- Inflow ≈ Outflow (within 10-20%)
- Process Time: Stable or decreasing
- Queue Size: Low with short wait times
- Outflow Rate: >80%
Warning Signs:
- Process Time increasing steadily → Server stress or performance degradation
- Outflow Rate <70% → Missing explicit exits or integration issues
- Queue Size growing with increasing wait time → Demand exceeding capacity
Critical Issues (Immediate Action Required):
- Process Time spiking dramatically → Reduce Limited Inflow immediately
- Outflow Rate <50% → Critical integration failure; reduce Timeout and investigate
- Queue Size growing rapidly with high wait time → Capacity exhausted; assess server state
Common Scenarios and Responses
Scenario 1: High Queue, High Wait Time, Server Has Capacity
Symptoms:
- Queue Size: High
- Average Wait Time: High
- Server Resources: Underutilized
What it means: Limited Inflow is set too low. Your server can handle more traffic, but NetFUNNEL is restricting entry too much.
Immediate Actions:
- Check server resource utilization (CPU, memory, I/O)
- If resources are underutilized, increase Limited Inflow by 10-20%
- Monitor Queue Size and Wait Time for 5-10 minutes - they should decrease
- If improved, consider another incremental increase
Example:
Current situation:
- Limited Inflow: 100 TPS
- Queue Size: 200 users
- Wait Time: 20 seconds
- Server CPU: 50% (has capacity)
Action: Increase Limited Inflow to 110-120 TPS
Monitor: Check Queue/Wait Time should decrease
Scenario 2: Low Outflow Rate with High Queue
Symptoms:
- Inflow: 100 TPS
- Outflow: 50 TPS (or lower)
- Outflow Rate: <50%
- Queue Size: Growing
- Wait Time: Increasing
What it means: Explicit service exits are not happening properly. Users are entering but not explicitly returning keys, causing capacity to be held unnecessarily.
Immediate Actions:
- Reduce Timeout values immediately:
- If Process Time is 1-2 seconds, set Timeout minimum to 1s and maximum to 2s
- This frees up capacity quickly for new users
- Timeout settings can be adjusted in segment Advanced - Timing (Basic Control) or Advanced - Timing (Section Control)
- Monitor Queue Size - it should start decreasing
- If queue remains long after Timeout adjustment, consider increasing Limited Inflow (if server resources allow)
Long-term Actions:
- Investigate root cause:
- Check if
nfStop()calls are missing in code - Verify integration implementation
- Review error logs for integration failures
- Check if
- Fix integration issues:
- Add
nfStop()calls in all appropriate places - Ensure error handling includes key return
- Add
- Optimize Timeout settings:
- Set Timeout based on actual Process Time + buffer
- Example: If Process Time averages 5 seconds, set Timeout to 6-7 seconds
If Outflow Rate is consistently below 70%, this is a critical integration issue. While Timeout can mitigate the immediate problem, you must fix the root cause.
Scenario 3: Process Time Increasing
Symptoms:
- Process Time: Gradually increasing
- Queue Size: May or may not be increasing
What it means: Server load is increasing or performance is degrading. This is an indirect indicator of server stress.
Immediate Actions:
- Check server resources (CPU, memory via APM or server monitoring)
- If server is overloaded: Reduce Limited Inflow by 40-50% immediately
- If server has capacity: May be other issues (network, database, etc.) - investigate
- Monitor Process Time trends - if it continues increasing, reduce Limited Inflow further
Example:
Current situation:
- Limited Inflow: 100 TPS
- Process Time: Increasing from 2s to 5s
- Server CPU: 90% (overloaded)
Action: Reduce Limited Inflow to 50-60 TPS immediately
Monitor: Check Process Time should stabilize
Long-term Actions:
- Performance optimization:
- Profile application to identify slow components
- Optimize database queries
- Scale server resources if needed
- Capacity planning:
- Determine optimal Limited Inflow based on Process Time thresholds
- Set up alerts for Process Time exceeding thresholds
Timeout Optimization
Timeout settings determine how long NetFUNNEL waits before automatically returning keys when explicit exits don't occur.
Default Range: 6-20 seconds (minimum-maximum)
How It Works:
- NetFUNNEL uses the maximum value initially
- Keys are automatically returned after timeout if
nfStop()isn't called
Setting Optimal Timeout:
- Monitor Process Time over time to identify typical range
- Set minimum to typical minimum Process Time
- Set maximum to typical maximum Process Time + 20-30% buffer
- Example: If Process Time is 8-12s, set Timeout to 8-15s
- If Outflow Rate is low (<70%), reduce Timeout to free capacity faster
If Process Time regularly exceeds Timeout, users will be forcibly exited before service completion. Always set Timeout above typical Process Time with a safety buffer.
Important Notes
About Limited Inflow adjustments:
- Increase: 10-20% incrementally, monitor for 5-10 minutes
- Decrease: 40-50% aggressively when protecting server
About Process Time:
- NetFUNNEL doesn't directly monitor server CPU/memory
- Process Time is an indirect indicator of server load
- Always cross-reference with your server monitoring tools (APM, etc.)