Statistics Best Practices
This guide explains how to analyze statistics data and make practical decisions. It covers practical topics such as how to set Limited Inflow and how to plan capacity adjustments.
Regular Monitoring Checklist
Weekly Review:
- Check Outflow Rate (should be 80% or higher)
- Check Queue Size and Wait Time during peak hours
- Compare Process Time between normal hours and peak hours
Monthly Review:
- Review peak Inflow patterns over the past month
- Analyze capacity utilization (identify periods when Queue/Wait Time increased)
- Check integration health (identify segments with consistently low Outflow Rate)
When Planning Capacity Changes:
- Analyze 3-6 months of historical data (use Month view)
- Identify peak periods and check server resources (refer to APM records)
- Monitor for 1-2 weeks after changes (use Day view)
Determining Optimal Limited Inflow
Step 1: Understand Normal and Peak Periods
What to check:
- Check Inflow: Check the rate of initial requests (Entry Requests) coming in during normal hours and peak hours
- Check queue conditions: Check how many users are waiting (Queue Size) and average wait time (Wait Time) at those times
Example Pattern:
Time Inflow (TPS) Queue Size Wait Time Limited Inflow Interpretation
09:00 80 20 3s 100 Normal hours
10:00 120 50 8s 100 Peak starting
11:00 150 200 20s 100 Peak (waiting occurs)
12:00 130 180 18s 100 Peak continues
Step 2: Evaluate Limited Inflow Appropriateness During Peak
Evaluation Criteria:
- Check server resources: Check WAS server CPU and other computing resource usage during peak hours by referring to APM records
- Decision:
- Server has available capacity but wait time is long → Consider increasing Limited Inflow
- Server is overloaded and wait time is long → Maintain or decrease Limited Inflow
- Server has available capacity and wait time is low → Keep current settings
What to check:
- Queue Size and Wait Time during peak hours
- Server CPU usage during peak hours (refer to APM records)
- Whether the server has available capacity or is overloaded
Example Evaluation:
Peak hour situation:
- Inflow: 150 TPS
- Queue Size: 200 users
- Wait Time: 20 seconds
- Server CPU usage: 50% (APM records)
Decision: Server has available capacity but wait time is long → Consider increasing Limited Inflow
Step 3: Check Performance Degradation via Process Time
Important Principle:
- Ideal situation: For Basic Segments, normal hour processing time (Process Time) and peak hour processing time should be nearly identical.
Pattern Analysis:
-
Are normal hour and peak hour Process Time similar?
- Similar → Normal, adjust Limited Inflow based on Queue/Wait Time and server resources
-
Has Process Time increased only during peak hours?
- Increased → Server may be responding slowly
- If server resources are still available (check APM records), consider increasing Limited Inflow even if queue size increases
- If server is overloaded, do not increase Limited Inflow; investigate and fix the performance issue first
Example Pattern:
Time Process Time Queue Size Interpretation
09:00 2.5s 20 Normal hours (normal)
10:00 2.6s 50 Peak starting (normal)
11:00 4.5s 200 Peak - Process Time increased (server response delay)
12:00 4.2s 180 Peak continues - Process Time increased
Decision: Process Time increased only during peak hours → Possible server response delay
→ If server resources are available (check APM), consider increasing Limited Inflow even if queue size increases
Action:
- If Process Time during peak hours has increased significantly compared to normal hours, it may indicate a server performance issue
- First, check server resources (CPU, memory via APM records):
- If server has available capacity: Consider increasing Limited Inflow to allow more concurrent requests, which may help if the delay is due to queuing rather than server overload
- If server is overloaded: Do not increase Limited Inflow; investigate and resolve the performance bottleneck first
- Simultaneously investigate server response delay causes through server logs or APM
Detecting Integration Issues
Pattern: Inflow vs Outflow Divergence
What it means:
- Inflow consistently higher than Outflow = Users are entering but not completing service properly
- Low Outflow Rate (<80%) = Missing
nfStop()calls or integration problems
Example Pattern:
Time Inflow (TPS) Outflow (TPS) Outflow Rate Interpretation
09:00 100 95 95% Healthy
10:00 100 80 80% Healthy
11:00 100 60 60% Issue - missing exit calls
12:00 100 55 55% Issue - code review needed
Action:
- Immediate: Check which segments have low Outflow Rate (use Segment view)
- Temporary measure: If keys are not being returned, adjust key return timeout to force automatic key return
- Lower the timeout (e.g., minimum 6s → 3-4s) to make automatic return happen faster
- This won't be reflected in Outflow Rate, but capacity will be released faster so other users can enter
- Timeout settings can be adjusted in segment Advanced - Timing (Basic Control) or Advanced - Timing (Section Control)
- Root cause investigation: Review recent code changes, find missing
nfStop()calls - Root cause fix: Add explicit exits to all code paths, ensure error handling includes key return
Important Considerations When Changing Capacity
Post-Change Monitoring
After changing Limited Inflow, always monitor:
- Immediately after change: Monitor for 1-2 weeks using Day view
- What to check:
- Whether Queue/Wait Time has improved
- Whether expected effects have appeared
- Adjust: Make additional adjustments based on actual results
Incremental Change Principles
- Increase: Increase by 10-20% at a time, monitor, then repeat
- Decrease: If server protection is urgent, decrease by 40-50% immediately; otherwise, decrease gradually