From Surprises to Awareness: Forecasting R4HA in Near Real-Time

By Frank Tidemand, Capacity & Performance Consultant at SMT Data
Running a mainframe system will include many workloads with different characteristics. And they will all influence 4-hour Rolling Average in their own unique way. The behavior may sometimes result in a surprise peak that can end up breaking your IT budget.
This blog discusses what 4-hour Rolling Average MSU usage means to pricing, how to make some predictions and possible mitigation of a predicted new peak before it hits you.
What is 4-Hour Rolling Average in IBM Pricing?
In the world of IBM mainframe software (especially z/OS), MLC pricing is based on CPU usage, and the 4-hour rolling average MSU usage is a critical metric used to determine your peak usage, which affects your monthly charges.
IBM and other software vendors have traditionally charged based on the highest 4-hour rolling average during the month, also known as the R4HA peak. That peak determines your Monthly License Charges (MLC). While this metric is less critical for customers that have moved to IBM’s Tailored Fit Pricing (TFP), it is still important to understand and react to peaks in a timely manner as they can affect performance and stability.
Even a short spike in CPU consumption can increase your MLC costs if it causes a sustained increase in the R4HA.
Example of a R4HA MSU Usage and Soft Limit ITBI graph
So, what’s the most optimal use of CPU under R4HA?
Answer: Run as much workload as possible when your current R4HA is below your monthly peak — and throttle or cap workloads when you’re at or near the peak. If you manage to control your workload and keep it 1 MSU under the monthly peak, you will get most cycles within to same payment. In other words, any time you avoid creating a new peak, you are saving money (and potentially avoiding performance or operational problems).
Keeping the usage as flat as possible for the month will give most service units for your money.
But can’t I just set a max capacity on my system?
Answer: Yes, that is possible. The downside is that max capacity whether it’s defined for an LPAR or in a capacity group on the hardware will lead to capping of the workload when it hits the ceiling. This may hurt your business if it happens during prime time where customers are active. You would not want to throw your customers out of your store.
So, controlling this is the key to being on top of the price/performance numbers in your system. When you reach the capping point the processor efficiency is likely to go down in a shared environment. This will increase the MSU usage even further and challenge your desired 4-hour Rolling Average.
How to monitor the 4-hour Rolling Average
Monitoring the R4HA effectively is critical if you’re trying to optimize IBM MLC costs – and the good news is there are ways to do that. ITBI is able to give you the data in close to real time. SMF type 70 records have the current 4-hour Rolling Average on your systems. Simply collecting those numbers and adding them up for your environment will give great insight into your current usage.
But what’s coming? The ability to foresee ahead of time when you are about to break your desired max – and pay extra charges – or reach your capping is crucial to be able to act before your shop slowly closes.
To predict and prepare?
One way to predict would be to look at underlying numbers for the current average and the extend with current use to see where we are in 30 minutes, in one hour, in two hours… This will provide guidance on what to expect.
In the table below, you can see an example of a R4HA prediction:
For SYS1 Calc MSU current: |
491 |
- after 30 min: |
520 |
after 60 min: |
555 |
||||
For SYS2 Calc MSU current: |
186 |
- after 30 min: |
199 |
after 60 min: |
215 |
||||
For SYS3 Calc MSU current: |
69 |
- after 30 min: |
95 |
after 60 min: |
122 |
||||
For SYS4 Calc MSU current: |
24 |
- after 30 min: |
24 |
after 60 min: |
25 |
||||
For SYS5 Calc MSU current: |
229 |
- after 30 min: |
234 |
after 60 min: |
249 |
||||
For SYS6 Calc MSU current: |
24 |
- after 30 min: |
24 |
after 60 min: |
23 |
||||
For TOTAL Calc MSU current: |
1023 |
- after 30 min: |
1096 |
after 60 min: |
1189 |
Being on top of your 4-hour Rolling Average will save you money, save you sleepless nights and keep your business open.
What is the plan when you have the knowledge
But a plan to mitigate needs to be in place. What will be the choices of mitigating actions? Close a sandbox lpar?, stop monitors?, limit number of batch initiators? add additional processors to the configuration? There isn’t much point in knowing you are about to crash unless you have a plan to do something about it.
Creating a plan before the peak will build up in front of you, will allow you to react in a timely manner. Collect the possible actions that will work for you and prioritize them for the situation. Consider if some or all can be automated.
Would this work for you?
Would the above be a help to you? Let us know if you have different thoughts on this topic or if you would like to see this on your systems. If you are sitting back with further questions do not hesitate to contact me on fti@smtdata.com.
We are looking into enhancing ITBI with near-real time insights. Above is the first actionable items. Let us know if you see other areas where near real-time insights into mainframe capacity and performance would bring value to your organization.
-
Frank TidemandSenior ITBI Consultant