Version: 4.6.1

Statistics Best Practices

This guide explains how to analyze statistics data and make practical decisions. It covers practical topics such as how to set Limited Inflow and how to plan capacity adjustments.

Regular Monitoring Checklist

Weekly Review:

Check Outflow Rate (should be 80% or higher)
Check Queue Size and Wait Time during peak hours
Compare Process Time between normal hours and peak hours

Monthly Review:

Review peak Inflow patterns over the past month
Analyze capacity utilization (identify periods when Queue/Wait Time increased)
Check integration health (identify segments with consistently low Outflow Rate)

When Planning Capacity Changes:

Analyze 3-6 months of historical data (use Month view)
Identify peak periods and check server resources (refer to APM records)
Monitor for 1-2 weeks after changes (use Day view)

Determining Optimal Limited Inflow

Step 1: Understand Normal and Peak Periods

What to check:

Check Inflow: Check the rate of initial requests (Entry Requests) coming in during normal hours and peak hours
Check queue conditions: Check how many users are waiting (Queue Size) and average wait time (Wait Time) at those times

Example Pattern:

Time    Inflow (TPS)    Queue Size    Wait Time    Limited Inflow    Interpretation
00   80              20            3s           100              Normal hours
00   120             50            8s           100              Peak starting
00   150             200           20s          100              Peak (waiting occurs)
00   130             180           18s          100              Peak continues

Step 2: Evaluate Limited Inflow Appropriateness During Peak

Evaluation Criteria:

Check server resources: Check WAS server CPU and other computing resource usage during peak hours by referring to APM records
Decision:
- Server has available capacity but wait time is long → Consider increasing Limited Inflow
- Server is overloaded and wait time is long → Maintain or decrease Limited Inflow
- Server has available capacity and wait time is low → Keep current settings

What to check:

Queue Size and Wait Time during peak hours
Server CPU usage during peak hours (refer to APM records)
Whether the server has available capacity or is overloaded

Example Evaluation:

Peak hour situation:
- Inflow: 150 TPS
- Queue Size: 200 users
- Wait Time: 20 seconds
- Server CPU usage: 50% (APM records)

Decision: Server has available capacity but wait time is long → Consider increasing Limited Inflow

Step 3: Check Performance Degradation via Process Time

Important Principle:

Ideal situation: For Basic Segments, normal hour processing time (Process Time) and peak hour processing time should be nearly identical.

Pattern Analysis:

Are normal hour and peak hour Process Time similar?
- Similar → Normal, adjust Limited Inflow based on Queue/Wait Time and server resources
Has Process Time increased only during peak hours?
- Increased → Server may be responding slowly
- If server resources are still available (check APM records), consider increasing Limited Inflow even if queue size increases
- If server is overloaded, do not increase Limited Inflow; investigate and fix the performance issue first

Example Pattern:

Time    Process Time    Queue Size    Interpretation
09:00   2.5s            20            Normal hours (normal)
10:00   2.6s            50            Peak starting (normal)
11:00   4.5s            200           Peak - Process Time increased (server response delay)
12:00   4.2s            180           Peak continues - Process Time increased

Decision: Process Time increased only during peak hours → Possible server response delay
→ If server resources are available (check APM), consider increasing Limited Inflow even if queue size increases

Action:

If Process Time during peak hours has increased significantly compared to normal hours, it may indicate a server performance issue
First, check server resources (CPU, memory via APM records):
- If server has available capacity: Consider increasing Limited Inflow to allow more concurrent requests, which may help if the delay is due to queuing rather than server overload
- If server is overloaded: Do not increase Limited Inflow; investigate and resolve the performance bottleneck first
Simultaneously investigate server response delay causes through server logs or APM

Detecting Integration Issues

Pattern: Inflow vs Outflow Divergence

What it means:

Inflow consistently higher than Outflow = Users are entering but not completing service properly
Low Outflow Rate (<80%) = Missing nfStop() calls or integration problems

Example Pattern:

Time    Inflow (TPS)    Outflow (TPS)    Outflow Rate    Interpretation
00   100             95               95%             Healthy
00   100             80               80%             Healthy
00   100             60               60%             Issue - missing exit calls
00   100             55               55%             Issue - code review needed

Action:

Immediate: Check which segments have low Outflow Rate (use Segment view)
Temporary measure: If keys are not being returned, adjust key return timeout to force automatic key return
- Lower the timeout (e.g., minimum 6s → 3-4s) to make automatic return happen faster
- This won't be reflected in Outflow Rate, but capacity will be released faster so other users can enter
- Timeout settings can be adjusted in segment Advanced - Timing (Basic Control) or Advanced - Timing (Section Control)
Root cause investigation: Review recent code changes, find missing nfStop() calls
Root cause fix: Add explicit exits to all code paths, ensure error handling includes key return

Important Considerations When Changing Capacity

Post-Change Monitoring

After changing Limited Inflow, always monitor:

Immediately after change: Monitor for 1-2 weeks using Day view
What to check:
- Whether Queue/Wait Time has improved
- Whether expected effects have appeared
Adjust: Make additional adjustments based on actual results

Incremental Change Principles

Increase: Increase by 10-20% at a time, monitor, then repeat
Decrease: If server protection is urgent, decrease by 40-50% immediately; otherwise, decrease gradually

Regular Monitoring Checklist​

Determining Optimal Limited Inflow​

Step 1: Understand Normal and Peak Periods​

Step 2: Evaluate Limited Inflow Appropriateness During Peak​

Step 3: Check Performance Degradation via Process Time​

Detecting Integration Issues​

Pattern: Inflow vs Outflow Divergence​

Important Considerations When Changing Capacity​

Post-Change Monitoring​

Incremental Change Principles​

Regular Monitoring Checklist

Determining Optimal Limited Inflow

Step 1: Understand Normal and Peak Periods

Step 2: Evaluate Limited Inflow Appropriateness During Peak

Step 3: Check Performance Degradation via Process Time

Detecting Integration Issues

Pattern: Inflow vs Outflow Divergence

Important Considerations When Changing Capacity

Post-Change Monitoring

Incremental Change Principles