Virtual CICS User Group Sponsors
Virtual CICS User Group | July 2023
Problem Analysis and Performance Tuning for CICS
Ezriel Gross, Principal Solutions Advisor
Rocket Software
Read the Transcription
[00:00:12] – Host – Amanda Hendley
Well, thanks everyone for joining us today for our Virtual CICS user group session. Unfortunately, Trevor couldn’t be here this week, so you’re stuck with me. My name is Amanda Hendley and I’m working with Planet Mainframe, who is partnering as a new sponsor partner for the Virtual CICS user groups. And I’m excited because I’ve had the opportunity to co host these sessions and really excited to welcome you today for this session with Ezriel Gross from Rocket. Our agenda today is pretty straightforward. We’re going to get right into our presentation in a couple of minutes. Afterwards, we’ve got time for Q&A and then we’ll talk a little bit about some news and articles and announce the next session. So thanks again and let me give a shout out to our partners for this user group. So Broadcom mainframe software, IntelliMagic and I mentioned Planet Mainframe. You can find out a lot more about these organizations and what they do in the CICS space. If you go to the iTech-Ed virtual user group page for CICS and give them a shout, thank them, tell them how much you’re enjoying the program.
[00:01:54] – Host – Amanda Hendley
And if you know of another organization that should be participating, you can reach out to Trevor or I and let us know. So, as I said real quick, we’re ready to get started. So I want to welcome Ezriel gross from Rocket Software here today. He’s a Principal Solutions Advisor specializing in IBM CICS tools. He was formerly the CEO of Circle Software, which was acquired in 2019, where he specialized in hands-on classes and consulting in CICS Db2 and NQ series. He’s been a gold consultant and IBM champion for many years and has a lot of specialties, including web services, web support, performance tuning, DevOps, and Liberty as they relate to CICS. And he might tell us more about it, but he recently co-architected C\Prof product that captures CICS trace without running into a CICS region. Sorry. So, Ezriel thank you so much for joining us today.
[00:03:02] – Speaker – Ezriel Gross – Principal Solutions Advisor – Rocket Software
Thanks, Amanda.
[00:03:27] – Speaker – Ezriel Gross
(Slide 1 – Title) Perfect. Well, thank you very much, Amanda. Again. My name is Ezriel Gross. I’m a Principal Solutions Advisor at Rocket Software. And today we’re going to talk a little bit about problem analysis and performance tuning for CICS.
[00:03:40] – Speaker – Ezriel Gross
(Slide2 – Agenda) But this presentation has a slightly different approach to the subject. Effectively, we’re going to start out with what are the challenges facing some CICS customers in various different organizations today? Things like complex code and skill shortages that we’re all probably aware of by now. And part of the problem I ran into, and one of the reasons I built this presentation is many people have come to me over the last couple of years and they have no idea where to start. If they wanted to do problem analysis or performance tuning for that matter, they have some trouble figuring out “How do I go about it?”. So this presentation is a little bit about how I go about problem analysis and performance tuning. I actually have a method that I call Detect, Verify, and Solve, which I’m going to describe for you guys today. It requires a couple of pieces of software, and I describe why you need the individual pieces of software.
[00:04:36] – Speaker – Ezriel Gross
You need some sort of a monitor. I happen to use IBM OMEGAMON for CICS. It’s a product that we write for IBM. It’s what I have readily accessible to me. Again, almost any monitor you can use, as long as it has certain functions and features. I’ll talk a little bit about CICS Performance Analyzer and how I use that as part of the verification of the problem that occurs, or what I actually want to tune within an environment. And then I’ll finish that off with a newer product, which is Rocket C\Prof, which Amanda mentioned a little bit earlier. It collects CICS trace, but doesn’t run in a CICS region.
[00:05:12] – Speaker – Ezriel Gross
(Slide 3 – Challenges across the Organization) So just to get started, let me just point out that when you’re taking a look at what are the challenges that we’re stuck with kind of today is the fact that we have these applications. They’ve been written many, many years ago in various different languages. And you might think that they’re basically stable, and a lot of them may be very stable, but if you’re in banking or brokerage, you might find that you get requirements set forth by the government where you have to physically make changes to these applications in any case.
So you might continually change these applications, expand different markets, and every time you change an application, you can introduce problems or you can introduce performance issues associated with it. So these are just some of the things and some of the people that might be interested in ensuring that the applications are working properly. For example, performance problems seem to appear without warning. Deep dive skills are hard to find. That seems to be the number one concern that I run into when I do problem and analysis and performance tuning. And then again, in the world of CICS, you have these dynamic workloads. In the old days, maybe we had nine to five type businesses. Today we’re running more or less 24/7 – “I guess it’s 9AM somewhere in the world, right?” And so it depends when your periods are, especially if you’re trying to do some tuning to ensure the applications are running problem properly. So again, these are some of the things you can run into.
[00:06:51] – Speaker – Ezriel Gross
(Slide 4 – CICS applications can be complex systems…) However, when you take a look at the systems themselves and this is again a problem when you run applications for many, many years. You start out with something that might have been simple at one point, but then it gets very complicated as the years go on. We keep adding and adding pieces to it, which makes it extremely complicated to work with. And then suddenly you want to make a change to the application and you’re wondering, “Wait, is anyone else using this particular program?”. So if you just wanted to figure out how many people invoke one particular program in CICS, it would be a very difficult thing to do, because suddenly, somewhere in an application over here decides, “Well, I can use this function that’s provided by this program, which I happen to know about. And I can go in there and I can invoke that program, get the results back.” And needless to say, it might not be a program that they’re responsible for or in control of. And then next thing you know, somebody makes a change and their application breaks down. So trying to change, or even fix, problems can be difficult. And then optimization is always difficult to do as well, because what we’re doing today is we’re changing much of the user interface.
In the old days it used to be 3270 based. Today it’s more driven via the web. Some people are trying to combine more and more application logic together and the next thing you know, the application takes minutes to run instead of sub-second. Then there’s always newer terms that we’re popping up with modernization. What does it mean to modernize an application. We’re not actually changing the language. We’re still running, let’s say, COBOL or C or PL1, or Assembler, maybe some of it we’re rewriting in Java. But what does it mean to modernize an application or API-enable an application? So we get all these new terms which require us to change applications and then suddenly the whole paradigm changes within the environment. Maybe suddenly the application is using more CPU. Or maybe, for example, it suddenly creates problems that didn’t exist before.
[00:09:02] – Speaker – Ezriel Gross
(Slide 5 – CICS Problem Analysis / Performance Tuning) So when you’re dealing with problem analysis and performance tuning in the world of CICS, a lot of the time I get questions as to where I start. So if you’ve been around CICS for a long period of time, you probably know that there is a Problem Determination Manual. It was renamed the CICS Troubleshooting Guide as of CICS v5.4. I’ve provided some links on this slide. If you’re interested in the online version, you can click here if you need a PDF to download. If you’re like me and you like to read the manuals. And again, this one is at version five six. But you can obviously find whatever release that you’re running. Because, again, the currently supported releases are v5.4 up until v6.1 – v5.4 and support at the end of this year. And so you can look and get started. The Troubleshooting Guide is kind of really interesting because it kind of gives you items to look at. Things like WAIT problems and other stuff that you can actually drill down and get some advice as to what to actually look at.
But when we’re taking a look at a problem. We have to look from two different areas. One is on the system side of the house. We have a group of system programmers, a lot fewer than we had in the past, unfortunately, but nonetheless, we have a group of system programmers that are responsible for ensuring that the system is up and running. And that includes ensuring that we don’t have any system outages or we keep those to a minimum. They deal with WAITs and loops and hangs, but they generally deal with the entire system, not individual users. Now, don’t get me wrong, they’ll give you some help if you have a problem as an individual user, but nonetheless, they’re more interested in ensuring that the region is up and running properly. Now, poor performance, possibly due to application design or some changes to an application, can affect an entire system. One application (and I’ll tell some stories as I go through this) can actually drive your system to its knees, because suddenly it’s drawing or using more CPU than it had in the past by a long shot. And so, nonetheless, if the system is affected, usually the system programmer will be involved. However, there could also be application related issues, right? So from an application point of view, is it isolated to an individual application or is it so bad that it’s affecting the system?
And what types of things can you see? Things like transaction ABENDs or deadly embraces? An application suspended for excessive amounts of time. Wait or hang problems. Or response times being erratic. We like consistent response times.
[00:11:45] – Speaker – Ezriel Gross
(Slide 6 – CICS tasks and programs) So this is just a reminder of what we’re actually looking at. So when we deal with applications running in CICS, there is some terminology that you want to be familiar with, and I’m sure most of you probably are. The overall transaction is how we define an application. A lot of people refer to it by the four character transaction code. But as we know, in a pseudo-conversational environment, that means we run multiple tasks. Each task is a logical unit of work. So it starts up and then it ends. And the point is that whatever occurs within a logical unit of work is completely recoverable. That’s the idea behind CICS. So if you take out $300 from your checking account and you move it to your savings account and you took the money out and something fails in between, you’d be pretty upset if they don’t put the money back into checking. The point is that these individual tasks will run programs themselves and those could be quite varied. You could run one set of programs. I just saw one recently where a single logical unit of work went through 66 individual programs.
So we never know how many programs we go through. Each one is the functional code that we’re running. Now, keep in mind, CICS itself of course, kind of processes tasks concurrently. Now there’s two forms of this. One is it could be truly concurrent. In a thread-safe environment you could be running on multiple TCBS which means you could all be getting CPU at the same time. However, in the old days of CICS, and it’s still true today and most applications will run a bit like this, is that there’s one main TCB known as the quasi-reentering TCB. And the way he does his work is he does it through task switching. So he will switch between different units of work and give them a chance to possibly run.
[00:13:40] – Speaker – Ezriel Gross
(Slide 7 – EXEC interface) Now, in terms of the applications themselves, they look just like batch COBOL PL1 or assembler programs. What makes them special is we have a sprinkling of EXEC CICS commands within those programs. And every time we hit an EXEC CICS command we leave the boundaries of our application program and we end up on the system side of the house. That means we go through DFH EIP (the EXEC Interface Program) and then he gives control to the module that’s going to actually process our request.
So it could be ETC for Terminal Control, EIFC for File Control. And notice I even stuck a little word there called “WAIT?”. Because if the file I/O needs to complete then you can’t run, you get suspended, somebody else gets a chance to run if you’re on the QRTCB. If you’re on the LATCB maybe you just sit there and you wait and you control that TCB but you’re not affecting others in the environment. So basically this is how CICS applications physically run.
[00:14:40] – Speaker – Ezriel Gross
(Slide 8 – Methodology for Solving Problems) So, in terms of doing problem analysis and performance tuning, over the years I’ve kind of built this Detect, Verify and Solve, which are the ways I approach performance tuning and problem analysis. Detection, as you’re going to learn, is something about monitoring the situation, that is monitoring the CICS regions and even monitoring the applications themselves. “Is the problem occurring all the time or is it just a one off? Maybe this problem doesn’t occur very often. What are the odds it’ll occur again. Maybe it was exactly just the right transaction mix that the ODS of it ever happening again are slim to none.”
So what I use is I use a product to verify what I’ve seen. “Is it every Monday at 10:00 a.m. that I have this particular problem.” So I will use or look at SMF data to verify that. And again, I have a product for that. And then finally, you need something that gives you that forensic analysis to the application itself. If you want to do any kind of sort of deep dive into the application and you want to do it from outside the world of the monitors and SMF data, you need to be able to deep dive into see what an application is physically doing.
[00:16:01] – Speaker – Ezriel Gross
(Slide 9 – CICS Performance / Problem Analysis Tasks) So that’s what I’m going to talk about here. I’m going to talk about products for monitoring and alerting. And again, you may have a different monitor than the one I’m using. As long as the monitor has the same features or similar features to the monitor I use, then you should be okay. You should be able to do the same things I’m doing on this page. Then I’m going to talk a little bit about problem analysis. And again, I analyze SMF data for that, right? So, long term analysis, if I’m trying to figure out whether or not I’m approaching a peak period and we might need more CPU over time, I’m going to use SMF data and I use CICS Performance Analyzers. You’re going to see for that. But if you use SAS or MICS, you’re just as good. You can probably run reports based on that as well. Then you need something that will give you this deep dive capability to see what an application is physically doing. Unless you want to actually step or look through all the source code available for a particular application, you need to be able to take a look and see what’s running within your environment.
[00:17:01] – Speaker – Ezriel Gross
(Slide 10 – Detect: Verify (Analyze) and Solve) Now, the first question you might have is “OK Ezra, why do I need all these products? I mean, I got a monitor. It does basically what I need it to do. I can take a look at applications running. I can even remediate problems using a particular monitor.” Well, what I’ve found over the years, regardless of the monitor that I’m working with, is I can use it to avoid delays and slowdowns. I can keep a system up and running by canceling tasks. I can even set it up. So I’m alerted, I’m told that there’s a problem and in some cases, it can be automatically remediated.
However, again, when you’re using a monitor, access to the historical data is limited. It depends on how much storage you want to associate with the monitor. For example, in my particular situation, I can keep tasks sometimes for a few hours. But in a really large environment, it’s hard to keep that kind of data for long periods of time because generally we don’t have the DASD to support it. So it is really good. I do use a monitor, but again, the data doesn’t last forever.
[00:18:10] – Speaker – Ezriel Gross
(Slide 11 – Detect: Verify (Analyze) and Solve) Then, even when it comes to the SMF data, we have some problems over time. First of all, SMF data will show us things associated with the performance of an individual transaction running in our system, if we’re doing performance monitoring, and it’ll even give us some statistical data associated with our regions, right? We can look at end-of-day stats, we can look at interval related statistics. But part of the problem is, again, the data is limited. It only gives you kind of a summary. I can see how much CPU is used. I can see how long the transaction waited. In some cases, I can even see why the transaction waited, but I can’t see the physical commands and the individual items that they were waiting for. Now, there is some ability to turn on more levels of SMF, but again, the cost can be fairly prohibitive.
[00:19:03] – Speaker – Ezriel Gross
(Slide 12 – Detect: Verify (Analyze) and Solve) So I actually use another tool called Rocket C\Prof, which I’ll go over a little bit, but effectively it collects CICS trace without running in a CICS system. Now, if you’ve worked with CICS long enough, and if you’ve ever had a problem in CICS, one of the things you’ll notice is that if you send a problem off to IBM, half the time they’ll tell you, “You know what, can you run trace for us? Collect that data and send it back?” So obviously that’s the data that they would use to see what’s actually running in your system. And it can be quite helpful. Without that level of data, sometimes it’s really difficult to figure out what’s going on in the system. I’m not going to say it’s impossible, but what I will say is it could take a lot longer to permanently solve a problem without having access to that level of detail.
[00:19:52] – Speaker – Ezriel Gross
(Slide 13 – CICS Performance Management Tools) So that’s how I’m going to start. Again, these are the products I use. I’m not saying you can’t use the products you have in house, but you need something at the high level for monitoring, right? Being able to monitor and remediate problems quickly. It’s some sort of a monitor that you need to run in the world of CICS. You need something that can do analysis over periods of time. If I want to see whether or not the individual transaction rate within my CICS environment is going up over the last six months or a year. You need to be able to look at data and summarize that information and have a look see to see if it’s a problem, for example, if it occurs at a particular period of time. And then you need the ability to try and solve that problem.
[00:20:37] – Speaker – Ezriel Gross
(Slide 14 – IBM OMEGAMON for CICS) So I’m going to start with IBM OMEGAMON for CICS. I use it for the detection level.
[00:20:55] – Speaker – Ezriel Gross
(Slide 15 – IBM OMEGAMON for CICS Overview) And again, it’s pretty similar to a lot of the different monitors out in the marketplace today. It just happens to be the one that I use. It’s been around since the 80s. And there are various different user interfaces as you can see on the right hand side.
The original one was called the Classic. And today we use the E3270 interface, which gives you access to a lot more information even at the CICSplex wide level. So most customers, I won’t say all, but most customers at some point in time have built a CICSplex. And if you have a CICSplex, it’s just a way to monitor CICS systems together in groups such that the application flow of one or another particular application goes through a different set of regions. So it’s nice to be able to monitor it at the CICSplex-wide level. It should give you, whatever monitor you’re using, real-time in historical data collection and reporting. And then, for example, OMEGAMON has a small application trace facility. He’ll give you things like bottleneck analysis so you can look individually at the problems that could occur. You have the ability to limit resources. So this is what I was describing before. If one particular transaction uses too much CPU, you should be able to shut that puppy down. Proactive alerting, at least it should put out messages to tell you that there’s a problem going on. And I’ll show you there’s some other things that we can see within the monitors that will show you that a process or a problem is occurring.
Now, I’ll focus a little on task history collection. If you can get a history of tasks and see what they were doing at the time that they were running, that will help you in terms of even solving problems in a lot of the situations. Again, that is the data that I think is limited, you can’t collect that forever. Now, in terms of OMEGAMON itself, it has some new CICS metrics, has additional user interfaces. The current release is version 5.6. It came out in June 2022 with the v6.1 release of CICS.
[00:22:55] – Speaker – Ezriel Gross
(Slide 16 – New Features to date in v 5.6.0) Now, the other important thing about a monitor that you should look out for is they should be upgrading it. So I’ve actually included a couple of pages here just to show you that in terms of OMEGAMON for CICS, there was a Fixpack that came out in February of this year, which added a bunch of new facilities that didn’t appear at the time that the product was announced. So, it’s important to me that whatever monitor you’re running, you’re not just paying maintenance and running the same old monitor and you haven’t had an upgrade in two or three years. Things change all the time, And so therefore you want to see new types of stuff. For example, background tasks, the ability to hide the background tasks. So you’re not looking at the long running task when you’re trying to solve problems within side of a CICS system. Or in this particular example, the ability to import CP/SM groups into OMEGAMON and get them to recognize those. So those are kind of nice new things. They’re doing a lot of work with the FIND command so that you can issue FINDs across an entire CICSplex. If you want to look for things like TCP/IP services. One of the other things that came out was the correlation between CICS and Db2. As we know, a lot of the workload today, especially if you’re a Db2 user, goes from CICS to Db2 and back again. And is the problem in CICS? Or is the problem in Db2. And in fact, for a number of years now, for a live running transaction, you could actually select the transaction, click on a link and it would take you right to OMEGAMON for Db2. Problem is it didn’t work with historical data. As of October 2022, you could click on a link in Task History and find yourself over in OMEGAMON for Db2’s history to actually see the thread and how it operated within the world of Db2.
[00:24:45] – Speaker – Ezriel Gross
(Slide 17 – New Features to date in v 5.6.0) Now, one of the biggest things that they changed, so I’m going to highlight it on this slide, is the ability to do resource limiting of the CPU at the millisecond level. And the other enhancements were associated with programs. So we can get details at the program level within OMEGAMON for CICS. So that’s quite important to me, because before I described a scenario where I saw an application with 66 different programs running, the last thing I want to do is focus on the wrong programs when I’m trying to do performance tuning.
[00:25:24] – Speaker – Ezriel Gross
(Slide 18 – Checking Overall System Health) So there are two ways to focus on monitoring a CICS system. One is obviously at the system level itself, the other is at the application level. So let’s start with the system level to check the overall system health. You want to be able to start at some particular starting point and then drill down all the way to an individual region to see how it’s operating.
[00:25:48] – Speaker – Ezriel Gross
(Slide 19 – CICSplex Summary Screen) Now, again, this is examples using OMEGAMON for CICS. You can see right here that I have two CICSplexes at the top. CCVPLEXH and FUWplex. I have two LPARs RSO1, RSO2. And I have a separate plex for my WUIS. It’ll show you the number of regions. And then what it’ll do is it’ll give you some generic information at the top so you know where to drill down where a problem occurs. Now, unfortunately, in my system, I haven’t set the thresholds properly. A threshold is what you’re seeing here when you see colorization. So if you wanted to, you could set up, I think it’s six or so different colors. Obviously red or green are really easy or no color at all is really easy to see as well. And the idea being is if you set a particular threshold, if you expect a particular Plex to have a certain transaction rate or a certain CPU utilization, you plug it in on this screen and then, should a problem occur, you colorize it, right? So you say, I know exactly where to go when a problem occurs.
And then you want some generic information. Like any of the regions are short on storage. You want to see if anyone’s waiting on buffers, NQ type WAITs, string WAITs, right? So these are the types of things that would give you the idea that you want to drill down further into the individual Plex to see what regions could be waiting or having problems. There is a way to set service level agreements, and therefore you can look at the performance index of a set of CICS regions as well. Now, this bottom screen is just me hitting PF11. You’ll notice this was showing me columns 2-12 of 19. So I wanted to show you some of the other columns over on the right-hand side. Highest MAXT percentage could be quite interesting, right? And then if you wanted to modify these workspaces, you could actually move the columns around as you see fit, but you could see storage violations and other things there.
[00:27:47] – Speaker – Ezriel Gross
(Slide 20 – CICSplex Regions Summary Screen) Now, this is me actually drilling down in FUWPLEX, and I can actually see the transaction rates of my individual regions. I can see the max task percentage. And then again, more details. And you can also shift to the right. You’ll notice in this particular example, there’s over 30 columns that I can take a look at from this perspective. And again, you just hit PF11 or use a wider screen and you can see much of the details.
[00:28:14] – Speaker – Ezriel Gross
(Slide 21 – Region Overview Screen) Now finally, you can drill down. I just selected one of the regions. And so this will give you details of the physical region itself. It’ll give you things like transaction rate, CPU utilization, and each one of these is a separate screen. I could blow it up to full screen. So you can even see a little bit about the tasks that are currently running in this particular system. It’ll give you information on DSA usage, on the connections and so on. So again, if you use thresholds properly and you set up the proper alerts, you should be notified and be able to go in and take a look at this directly.
[00:28:57] – Speaker – Ezriel Gross
(Slide 22 – Analyzing Individual Transactions) So now let’s talk about analyzing individual transactions, right? And this is really about using task history. Why task history? Let me go back a slide (to Slide 21) and point out that even though I can see a number of transactions running here, the OS transactions are the OMEGAMON agents. But I see a couple of SSP3 and SSC1. By the time you go in and select the transaction (and they’re in Db2 right now as you can see) they may go away, and so that’s why I kind of focus on task history, because task history should be around for a period of time, and therefore you have a little bit more time to analyze that situation.
[00:29:38] – Speaker – Ezriel Gross
(Slide 23 – Task History Detail Screen – tabs) So this is me actually drilling down to task history. Now, you can see there’s a number of tabs. At the top there’s the overall Details tab of the transaction that was running specifically. And then two of my favorite tabs are the Related tab. Rarely do you have a transaction that runs in a single CICS region and nowhere else in today’s world. We’ve spread our workload all over the place. We use DPL, we use transaction routing. And so really, one transaction is the anchor point within an individual CICS system. However, if you end up running a transaction across three or four different systems, then each one of them will have a transaction ID.
Clicking on the Related tab will show you all the workload related to this individual transaction. Right, so this is the task detail of an individual data transaction running in a particular system. And here are all his subsequent transactions that he uses. So this is probably from the AOR. I can see the original request came in through a TOR, and I went through three other regions using mirror transactions to process the request. Now, the cool part is you can see things like CPU time of the individual transactions here, the Elapsed Time, Total Wait Time, Dispatch Time, and so on. The Task Number can help you relate it to other reports you might run down the road. And then I also wanted to show the Programs tab (although I’ll show a little bit more detail later on) and that will actually show you details at the program level, how many times the program was invoked, the CPU Time of the program, the Elapsed Time, Dispatch Time, and so on. Number of CICS commands, number of ABENDs, number of TCB switches. All helpful when you’re trying to tune an individual transaction.
[00:31:25] – Speaker – Ezriel Gross
(Slide 24 – Program Tracking Feature – New) So this is just some notes associated with the program tracking feature, and I can show you some other examples of this.
[00:31:33] – Speaker – Ezriel Gross
(Back to Slide 23) The one tab I didn’t highlight here in the previous slide (Slide 23) is the I/O tab. The I/O tab, if you’ve done any Db2 processing ,at the bottom of the I/O tab there’ll be a link that will take you right into OMEGAMON for Db2. And you can find more information about the Db2 thread and what it was doing in the world of Db2. But I didn’t have enough room on the page to highlight it.
[00:31:55] – Speaker – Ezriel Gross
(Back to Slide 24) Now, again, this program feature has to be turned on. It’s not completely free. There’s a worst case of a .25% increase based on 1ms per transaction. But nonetheless, you can get details at the program level.
[00:32:10] – Speaker – Ezriel Gross
(Slide 25 – Task Program Details) This is just an example of more or less the Program screen. And you can see the individual programs associated with a transaction that was running. It’ll tell you how many times it was invoked. And then, interestingly enough, I’m going to focus on things like CPU Time, and then you can see the Elapsed Time is a lot longer, right? And so if I was dispatched for a period of time and I can see that I’ve used an amount of CPU, then why did it take so long to run?
So this might be something I might want to deep dive and take a look at the details of that particular program running, especially since the number of CICS calls does not apparently seem to be that particularly large. So nonetheless, you can see this level of detail at the program level. Gives you a feel for if you want to do application tuning, where to start in terms of applications.
[00:33:02] – Speaker – Ezriel Gross
(Slide 26 – Task Program Details) Not only that, but obviously if you’re collecting task history and it’s running for, I don’t know, maybe you have an hour’s worth of data and you run I don’t know how many thousands of transactions per second, the next problem you’re going to run into is, “How do I find where I want to go?” So within OMEGAMON, the FIND command has been updated, so you can actually put in the program name and then click OK and it’ll only pull up the transactions that went through that particular program. And then you could see how that program operated across multiple different transactions. Not necessarily the same TRAN ID, because remember, under the covers you could link to a program from a completely different transaction because you want to run that particular sub function. And so if you wanted to figure out how many different tasks invoked a particular program, that wouldn’t be a bad place to start.
[00:33:52] – Speaker – Ezriel Gross
(Slide 27 – Program Aggregation – Region Level) Now, as long as we have details on programs, we can also get a program summary by region. And we have a tab called Used, and that might give you a feel for how often a particular program is used. Obviously, I’m probably going to look at this MICSTRS, right, because his invoked count was 12,003. And so he was invoked quite a bit of time while this region was running. And you can see the CPU Time Total, the Average CPU Time for a transaction, the Elapsed Time and so on. So you can get this level of detail at the region level itself. And I just provided all the additional tabs by hitting PF11 and showing you this bottom screen. So it’s pretty much what you saw before, but instead of looking at the transaction level, you can look at an aggregate at the region level.
[00:34:40] – Speaker – Ezriel Gross
(Slide 28- Program Aggregation – Region Level) And then if you select one of those programs, it’ll give you a lot more detail. It’ll give you some program statistics for the program itself. It’ll give you the usage data for that particular program. So this is kind of the same data, but it’s not tabular. It’s more or less on this particular page and then some information about the installation of that particular program.
[00:35:03] – Speaker – Ezriel Gross
(Slide 29 – Task History Collection – timespan) Okay, now this is all really fine, but again, most of the time I’m going to use a monitor to do initial analysis of an application running in my system. I might allow the monitor to remediate the problem. I might have alerts set up to notify me. I might even have it do an action and kill a task so I don’t have to go into that sort of headless chicken mode where I’m running around trying to solve the problem while all the users are being impacted in the system. However, when it comes to how long the data can be kept, I’m just showing you in the world of OMEGAMON, there’s the task history data space. This is not a really great example because I’m using a data space. I tend to use a file today for that, and I generally give it about, I don’t know, 70MB or so. And I can keep about 100,000 transactions in my history.
You’ll notice in this particular case, it says “Your time span is two days and 9 hours.” But keep in mind, I only have one record collected. So if you actually ran any workload based on the default, you wouldn’t be able to keep more than 50 or so transactions. So the amount of data and the size of this file is directly related to how much history actually appears in your environment. And therefore to try and sit there and use a monitor to do any sort of deep dive analysis is difficult to do. Because you’re hoping, “Well, okay, I got 30 minutes before my data disappears.” It’s going to make it hard for you to figure out what you might want to do next. Now, don’t get me wrong. It’ll give you a good starting point if you have history within a monitor to be able to see what’s going on within a transaction. But generally you want some more information. Like, I want to know if this problem occurs every Monday at 10:00, or is it the first Monday of the month, how much data do I have to be able to analyze this application over time?
[00:37:00] – Speaker – Ezriel Gross
(Slide 30 – CICS Performance Management Tools) Well, generally you need something else to do that, but this is the detection piece, right? So this is about monitoring.
[00:37:08] – Speaker – Ezriel Gross
(Slide 31 – IBM CICS Performance Analyzer – CICS PA) So let’s now talk a little bit about performance analysis. And I use for verification CICS Performance Analyzer. It’s a tool that basically gives you the ability to analyze the SMF data that you collect (that assumes you are collecting SMF data). There’s two different types and we’ll go through it just a little bit here.
[00:38:12] – Speaker – Ezriel Gross
(Slide 32 – What is IBM CICS Performance Analyzer) So, first of all, you should be collecting SMF data in my opinion because otherwise you don’t have a feel for how your systems are running over a period of time. And then, whether or not you’re using CICS Performance Analyzer or some other tool to analyze that SMF data, you need to be able to run reports and in some cases, visualize the data so you can see exactly how things are running at a particular period of time. So CICS PA uses SMF data as input. One of the things I like specifically about CICS PA is it comes with about 250 supplied report forms. So that means you don’t have to start from scratch and figure out what fields you might want to look at.
We give you some ideas as these are the reports you’d run if you’re doing CPU analysis. These are the reports you’d run if you were doing WAIT type analysis and so on. So it comes with a bunch of report forms. It also has something called report sets, which I just want to point out for a minute. A report set is a combination of report forms, and that is your ability to actually run one job and produce 5/10/15 different reports at the same time with one pass through your data. So that also saves you some time in CPU associated with printing the reports and looking at the data.
[00:38:50] – Speaker – Ezriel Gross
(Slide 33 – CICS Monitoring Facility) But in terms of collection of data, CICS supports obviously CMF data, which is SMF 110, Type 1 and Type 2 records. And again, this data is written to SMF for later processing offline, although you can use log streams today and get immediate access to your data. Now there are four different classes of data. I tend to collect exception data. Because you hope you don’t have that many exceptions in your CICS environment. And therefore the information provided within the exception data will show you problems that are occurring in your system. So that pays to look at. There is Identity Class. There’s Transaction Resource Identity. If that’s something you really need. I haven’t used it much over the years. Performance Data is important to me. That’s one record per transaction. I can see the performance of individual transactions running in my system. I have no issue with transaction resource data. The problem is it can be so voluminous and expensive to run that I try and avoid it unless I absolutely need it. So anyway, CICS will compress the data by default, so hopefully it won’t use that much space. And you could use a monitoring control table to eliminate the fields you don’t want. I find with compression, most people don’t eliminate fields anymore, but they add additional application related fields so they can get some idea of how an application is running. And I’ll show you an example of one a little later on. Now there are other products, including sample program by IBM to actually produce reports based on the SMF data. I just happen to use CICS PA.
[00:40:28] – Speaker – Ezriel Gross
(Slide 34 – CMF data types – Performance and Exception) So again, Performance class, why would you want to collect this? Because it provides detailed transaction information, gives you things like processor elapsed time, time spent waiting for I/O (you get one record per transaction). Exception class will just show you about problems that are encountered, like resource shortages, queuing for file strings, wait on temporary storage buffers, highlights problems in the CICS system, ones that you might want to look at and remediate permanently. I saw one within an environment just recently where there were a ton of temp storage buffer WAITs, which I thought was really interesting considering the customer had no recoverable temp storage in the environment.
[00:41:14] – Speaker – Ezriel Gross
(Slide 35 – CICS Statistics) So you do have to understand a little bit about the system to effectively do performance tuning. But nonetheless, at least you have an idea of where to start. Now, that is really about performance level information. There are also statistics. Statistics can be collected at various intervals during the day. You could have interval statistics you could do on-demand statistics. By default, you generally get-endof day statistics. And at least if you have end-of-day statistics, it’ll provide you with counts and wait times for resources, processor usage and so on. So if you’re collecting the monitoring data, which are the performance records, it makes sense to collect statistics. At the very least, end-of-day stats shouldn’t be that expensive. I tend to see interval statistics. IBM has reduced the interval from 3 hours down to, I think, 1 hour. But the finer you can collect statistics, the better you know how you are operating within an individual interval at very little additional cost.
[00:42:18] – Speaker – Ezriel Gross
(Slide 36 – When does CICS collect statistics?) At is IBM CICS Performance Analyzer) So again, interval statistics, you need statistics recording turned on in the SIT, and then you could set the interval of time where you collect that interval statistics. End-of-day stats, if you run 24/7 by default, runs at midnight.
[00:42:33] – Speaker – Ezriel Gross
(Slide 37 – Response time structure of CICS transaction) So, what is it that you’re actually looking for when you look at this data? Now we have to analyze an individual transaction and point out a few things for you. Generally, most of the problems with CICS transactions deal with, A: they’re using too much CPU or, B: they’re waiting too long to actually run. So this is where you could tend to focus. So this is a total transaction, and you’ll see that when a transaction starts up, generally there’s this first dispatch delay. That’s usually dictated by what’s running in your system – I just showed a customer the other day the fact that their transactions were waiting a quarter of a second just to get that initial start. That usually generally means the QRTCB is overloaded because their workload is all starting on QR. Now they do have thread-safe-based applications, but nonetheless, because QR is so overloaded during a period of time, that means everybody waits, right, because everybody has to get their first start on QR unless you use quasi-reentrant required, for example. Now, you could be running, you get your dispatch, you run, you’re using CPU time and then inevitably, you’ll issue some command that will force you to wait. Most of the commands are associated with I/O (Journal control, temp storage, terminal control, file control I/O). But you can get other types of delays that deal with inter-system communication. Maybe you’re running a transaction and you do a DPL to a program in another region. So suddenly the transaction in the first region is waiting for a response from the transaction in the second region. So you could see WAITs all over the place. Nonetheless, while you’re dispatched, you’re hoping you’re getting CPU, but that’s not always the case either. There may not be enough CPUs in the environment to always give you CPU. So Dispatch time is the total amount of time you could get CPU, and then CPU time is some number less than that amount. Now, the minute you run and get suspended, guess what? Now you go into a dispatch wait, because you have to get control again, especially if you’re waiting for QR back to the QRTCB.
[00:44:41] – Speaker – Ezriel Gross
(Slide 38 – Response time) Now, in general, you tend to see that again, when you’re suspended, you’re not running, you’re waiting for something to finish so that you can run when you’re dispatched. Hopefully, you’re getting CPU time most of the time, but you could also be waiting because the CPU is not available for you. What we like to see, and it’s produced even with statistics, is the CPU to dispatch ratio of the QRTCB should be 80% or higher. In today’s world, I like to see 90% to 95% for a production CICS region.
[00:45:14] – Speaker – Ezriel Gross
(Slide 39 – Suspend Time Breakdown) Now, these are just some of the different reasons you can wait. I wanted to list a whole slew of them for you here. And so, notice suspend time is made up of this is the total time of suspend, the first dispatch time, plus any I/O wait time, any other account wait times, and unaccounted wait time, which can occur as well associated with the system. And again, you could see various different WAITs. Many of them deal with I/O, because obviously we have to wait for the data to be pulled off a disk or wherever else it might live and return to you. You could see various WAITs because of the lock manager. So we want to ensure integrity. So because of integrity, two people can’t update the same record at the same time. And so you could see that type of a WAIT, you could see NQ type delays ensuring that we’re single-threading access to a particular resource. A lot of people have used that to avoid affinity type issues so that they can mark their applications thread-safe. So these are the different reasons that you can wait inside of an environment.
[00:46:19] – Speaker – Ezriel Gross
(Slide 40 – Why Analyze SMF data) So, again, CICS PA, I find it quite useful. You can analyze CICS application performance at the application level, you can improve overall CICS resource usage. And then this is a big one for me, and I like to point it out to people. A lot of times you’ll see a problem, it seems intuitively obvious as to what you need to do to try and improve the performance of the application. And then what people will do is they’ll make that change and then they’ll never go back and look to see how it operated. It’s important that whenever you make a change that you evaluate the effects of your CICS tuning. Sometimes the effects will not be as good as you thought it would be and it can actually be worse. You could use it for improving transaction performance, increased availability and so on. It could aid in capacity planning and tuning and you can get some deep dive into what an application is doing. So one of the examples I’m going to show you is a report where you can see quite a number of I/Os that are occurring within a particular individual application.
[00:47:27] – Speaker – Ezriel Gross
(Slide 41 – Benefits of using CICS PA) So one of the other things about the product is it’s relatively easy to use. It comes with these pre-built reports. You drive it through an ISPF dialog. Effectively, what that dialog does is build JCL that you can submit and run your reports. And again, you can do things like trend and capacity planning, statistics reporting, or even some level of transaction profiling.
[00:47:49] – Speaker – Ezriel Gross
(Slide 42 – Performance Summary Screen) So here’s a report, it’s an example of a report that I worked on a number of years ago for an individual customer. And it’s just an interesting example as to how far you can get this thing to work. So you’ll notice over on the left-hand side, all the transactions are the same. These are all Web Services type transactions and they’re about 180 plus transactions, or I should say, Web Services that were available to this customer. But what they were noticing was and again, (yes, the names have been changed to protect the innocent here) they were noticing that the Tran ID was always the same for all 180 Web Services and they thought about splitting them out into individual Tran IDs. That would have been the easy way to do it, but unfortunately, the way they were doing their billing made it impossible for them to create additional transactions to run each of these Web Services under. So we came up with this idea. We added our own user field into the monitoring control table and because everything was coming in via the web, we could invoke an analyzer program that could actually put out what that particular web request was. And then suddenly we were able to highlight the individual web requests. Now, obviously I didn’t show you all 180 over here, but I gave you an idea of the individual web descriptions of each of the services. Now, if I were going to go and do any tuning, which I did, for this particular environment, what do you think I’m going to focus on? Well, if the complaint is CPU time, then what I’m going to do is look at the largest number of transactions (44,309). The response time is pretty darn good over there. The CPU time is pretty darn good over there. I’m not going to spend time on this, because I want a big bang for my buck. So let’s see, let’s go down to wdecision where 31,634 transactions ran. It was a third of a second in terms of response time, and then the average user CPU time was an 8th of a second. Now, I don’t know about you guys, but whenever I don’t see a zero starting in my CPU time, I got a problem here, right? To me, that is WAY too much CPU for an individual logical unit of work. So that kind of bothers me. So I was kind of focusing on the ones that didn’t start with one “0”. I actually prefer the ones that start with two zeros, but you don’t always see those – sometimes these applications are pretty busy. Now in this particular case, I modified this summary report to show me things like dispatch wait time, file control wait time, and then total file control requests. I don’t know about you guys, but unless I’m really doing some massive processing, I find it hard to believe that the file control requests, on average, should be thousands of requests. Something is kind of wrong here.
Now, I will tell you, I’ll give you the end result of this application, so I won’t leave you hanging here, so to speak. But in this particular case, one of the problems they ran into was even completely different than the application running itself. Part of the problem was that the customers that were running the web-based front-end, they tended to click the submit button too many times. And the way the application was designed, it would actually submit multiple same requests into the world of CICS. And the solution to the problem was, instead of running the whole transaction, which could do thousands of I/Os, under the covers we did a little bit of mainframe-based caching to figure out if it was exactly the same request as we saw before. And we returned the web response that we did in the past. As long as the request was made within – I forget how many seconds it was – we were actually able to save 20%. You save a lot more in terms of CPU and response time if you can modify an individual application and improve the performance there. So again, this gave me an idea of where to start to do some of that deep dive sort of analysis.
[00:51:59] – Speaker – Ezriel Gross
(Slide 43 – Visualize the Data using an Analytics Engine) So, one of the other things you’ll run into today is a lot of people don’t like tabular reports anymore. So this is just an example that CICS Performance Analyzer has this support pack called CA Ten, which allows you to provide the data not just in tabular format, but in JSON format. And we provide some sample dashboards that you can look at even if you don’t have CICS PA. You can go to that link, you can download a sample copy of the dashboards as well as some sample data to load into the dashboards. So you can actually take a look at the information yourselves and see the same sort of information that I’m looking at here.
[00:52:36] – Speaker – Ezriel Gross
(Slide 44 – Visualization with Splunk or Elastic Dashboards) So this was the sample Splunk application, and this was me running it with the sample data. And I was just taking a look at some of the individual visual reports that were available to me. You notice transactions per second – I can see that over my entire environment, over individual Apple IDs for individual transactions – some overall metrics,for the entire system. And then again, what I’m looking at is individual tables, reports. This is average response time by Apple ID, this one’s average CPU time by Apple ID. And obviously I could change out the various different Apple IDs and look at them individually. So again, this is reporting over a longer period of time. And this is the assumption that you don’t want to keep the SMF data forever, and maybe you just want to keep a subset of it and you want to port that down to an analytics engine to do long-term reporting across.
[00:53:32] – Speaker – Ezriel Gross
(Slide 45 – Visualization with Splunk or Elastic Dashboards) And so these are just a number of other reports. Average response time by transaction. Here’s one where you can actually decide how you want to graph it. You can graph response time, user CPU time, and dispatch time across the screen. And obviously those lines are so flat you can’t even see them here. So obviously the left hand side of this needs to be changed from seconds to something else. But again, these are just sample dashboards, so you can build all the ones that you might want yourselves. You can even do your own WAIT analysis report down here and see over a period of time how long you were waiting and for what period of time and based on what types of WAITs that were occurring in your system. So sometimes it’s a lot easier to visualize the data, especially if you want to report it to upper management -sometimes they love to see these pretty little graphs.
[00:54:22] – Speaker – Ezriel Gross
(Slide 46 – CICS Performance Management Tools) So that is what I use for verification.
[00:54:38] – Speaker – Ezriel Gross
(Slide 47 – Rocket C\Prof) So let’s talk about the last product. The last product is rocket steep prof. It’s fairly new, although it’s been around for a number of years. It was originally a Fundi Circle product, and when we got bought out by Rocket, it became a Rocket product. And it’s gotten a lot of interest lately again, especially because the resource levels that are available in terms of people and time to actually do problem analysis is less and less. So if you can have something to do some deep dive analysis, this is just one of the tools that you can use for deep dive analysis within the world of CICS.
[00:55:16] – Speaker – Ezriel Gross
(Slide 48 – What is C\Prof) So what makes it different than any other tool I’ve used in the past is it’s a completely different approach that uses less CPU. Obviously you need some elements of Trace enabled – you don’t need all of Trace enabled. You can limit the number of trace elements that are running. You can actually have C\Prof turn on and turn off the trace elements at will, when needed, to actually take a look at the problems that are occurring. But hopefully it makes it inexpensive to capture Trace. And it doesn’t require that you snap a dump to get a dump of the physical environment itself. You don’t need a transaction abandon, you don’t need to run AUX Trace. This runs completely external to the CICS system.
[00:56:00] – Speaker – Ezriel Gross
(Slide 49 -How does it work?) So just a couple of highlights for the product itself. It allows you to peek inside CICS, but CICS doesn’t know it’s occurring. It uses cross-memory services to collect the information in the Trace table. And that’s quite useful because it doesn’t add to the overhead of running CICS itself – that could be quite important. It uses internal trace, it can format it in pure auxiliary trace format so you can ship that off to IBM for problem analysis. And again, how often you collect Trace, how much Trace you collect, is really up to you. Another big feature is – notice I have highlighted supports regions using MRO just like you saw within OMEGAMON for CICS – I can map a transaction at its starting point, and then I can see all the other transactions started in supporting regions. The way C\Prof runs is he collects that data from multiple regions at the same time, collects that data and aggregates it together so you can see it from the top down. You can see where the transaction started, where it started another transaction in another region, what it did over there, when it came back, when it accessed the next application. And you can drill down all the way to the CICS command level.
[00:57:10] – Speaker – Ezriel Gross
(Slide 50 – Highlights: C\Prof) The last piece of this that I find is really interesting is the fact that we give you this information in a higher level format. If you’ve ever read Trace, it’s difficult to learn. I’ve been doing this for, I don’t know, 30 plus years, and I will tell you that it took me quite a number of years to actually figure out what the individual elements of Trace mean. And so depending on the experience level, the hope is at the highest level, an application programmer or a junior Sysprog should be able to look at an element of Trace and know exactly what it’s doing, where it’s occurring and in what program. Then you could drill down to the underlying trace. And so the Trace is still there, but what we show you to start out is not the individual items within the trace level. Now, what levels of trace you collect, what you might need in an environment may be more or less depending on what you’re running in the system.
[00:58:06] – Speaker – Ezriel Gross
(Slide 51 – Multiple Trace Capture Modes) So there’s a couple of different ways to run this. There’s a record which allows you to record the application trace moving forward. Think of it like AUX Trace but not running in CICS, and it’ll give you the application perspective of trace. Now, we keep the data in our own layouts and format. Sometimes you want the data in an AUX format so you can actually record to AUX Trace data sets. Not CICS, AUX Trace data sets, but basically data sets that look like AUX Trace, which means you can run the standard IBM Utilities DFHDU version release modification to format and look at that trace data. If you need to ship it to a vendor, this is the perfect way to collect it. There’s also a way to take that data out of AUX Trace and load it into your viewer. Whether you’re using a system dump from the trace or an AUX Trace you collected separately, you can always look at it using our particular viewer. Then there’s this concept of a history of what happened in the world of CICS, right? So, for example, I get this all the time. A customer tells me “Somewhat at 10:00 during the day, we get this application message, or we get this system message like a short on storage condition, and we’d love to know what was going on in the environment at that particular time.” So we have this thing called Snap, which says dump whatever’s currently in the trace table. If your trace table is large enough, which you can make it as large as you want, it lives above the bar, so who cares. As long as you’re not seeing excessive paging, I would make it pretty big. And then when you dump what’s in the trace table, you get an idea of what was running when a problem occurred, which is quite nice, right? But it has to be keyed off of some sort of a message, because in truth, if you go in there and try and snap it yourself, which you could do, chances are pretty good the problem will be long gone by the time you get to dump what’s in the trace table.
[00:59:01] – Speaker – Ezriel Gross
(Slide 52 – C\Prof translates this…) So this is what standard full trace looks like. And again, if you’ve looked at this before, you probably want to throw up, right? It’s really difficult. You have to look up each individual trace point ID to figure out what these individual elements are and it makes it really hard. If you’re looking at full, or even abbreviated trace which is one line per page, it’s really difficult to analyze and figure out what CICS is doing.
[01:00:22] – Speaker – Ezriel Gross
(Slide 53 – So…Into a consolidated form like this) What we’ve done is we’ve kind of built this application event screen, which summarizes the trace for you. Now, let me just point out a few things here because this is kind of interesting to see. You’ll notice that this is the relative response time of the transaction from start to finish so you could see how it ran. But then you could see the Apple ID and you could see that this particular transaction went across multiple different CICS regions. In fact, you can even see the task numbers in the individual CICS regions where parts of this application ran. You can see the program that was in control at the time you issued the request, the elapsed time of this individual command, and what the call type was, any individual resource associated with the call, and then the response time, I’m sorry, the EIBRESP (and RESP2 is available), and the TCB it ran under. And then this is kind of cool. It’ll give you the statement number or the offset depending on which one you’d prefer to look at, inside of this program where this command was issued. Now, should you decide to look at the physical trace, you could drill down to the individual trace just by selecting one of these, or selecting a number of these, and looking at the underlying trace. For example, if you took a look at a REWRITE command, even though we’re summarizing it into one line here, chances are if you go down and select it, it’s probably made up of a couple of hundred different trace entries. Which you might want to look at. But at least this gives you a summary and from here you could drill down to the physical trace. And again, this runs all external to CICS. So other than the cost of running trace, it will not impact the running of your CICS system.
[01:02:00] – Speaker – Ezriel Gross
(Slide 54 – Or this…) And then for those people that do all their development via the web, there’s certainly a web user interface that you can do the same sorts of things that you could do on a 3270 terminal. And again, you can also drill down to the trace itself by clicking the down arrows from here.
[01:02:15] – Speaker – Ezriel Gross
(Slide 55 – CICS Performance Management Tools) So this is what I kind of use for my deep dive analysis Because programs run in multiple different regions – one application could be running, like I said, 60 or 70 different programs. And what I usually have to do is drill down, find the individual program, I have to see if it’s using too much CPU, I have to see if its response time is really terrible. Maybe it’s doing, I don’t know, a couple of thousand VSAM reads, and that would be the place that I would start for performance tuning. In terms of problem analysis, this is really good because it’ll tell you where your “azers” are. It’ll show you within which programs they may occur. And then problem analysis is not always associated with a failure of an application. Sometimes it’s associated with how long the application is physically running. So that is a little bit about how I do problem analysis and performance tuning using my Detect, Verify, and Solve method. I have used this quite frequently, even within the last couple of weeks for a number of customers, and it works really well on finding performance-type problems. And certainly if you have ABEND situations or problem analysis type work that you need to do, you can use this process for that as well. So now I’ll open it up to any questions.
[01:03:52] – Host – Amanda Hendley
(Slide 56 – Summary. Questions?) We do have a couple of questions that came in.
Yeah, I’m reading one from Alan. “Under the quasi-reentrant TCB, hard operating system WAITs lock out all other users until the WAIT is satisfied and are be avoided. In a thread-safe environment, are operating system WAITs acceptable from a performance standpoint since only the current thread is in WAIT?” So, Alan, the answer to that is probably mostly yes. If that’s the case and your application is totally thread-safe, then you probably want to make it quasi-reentrant required. It starts its processing on the LATCB, which is quite important even when you have a thread-safe application. If it issues any non thread-safe commands, it has to queue up for the QRTCB. So I’ve seen an environment where an application is physically thread-safe, but because it’s doing so much TCB switching, it’s always waiting on QR, and that becomes another WAIT that can occur. And so nonetheless, yes, if you’re completely thread-safe, the application is allowed to WAIT. The quasi-reentrant required is specifically designed for that. So I would take a look at that, not quasi-reentrant thread-safe, but quasi-reentrant required for a particular program. If you make it required, then that application is supposed to be able to go and do operating system type WAITs and hang up that TCB. Now, keep in mind, there is still a limited number of L8TCBs within a CICS environment as well. And so depending on how popular that application is, you can run into other system type issues. So it’s something you can look at. But nonetheless, I’d be careful. Anyway.
Next Question: “How are we able to collect trace information without an increase in overhead in the CICS region?” Well, trace is granular, so I will tell you that if you run the minimum levels of trace that I use, for example, in terms of C\Prof, there’s probably a 5% hit, and that is only while trace is turned on. So C\Prof can turn it on, collect the trace, and turn it off. There are other ways to do that as well. For example, there’s standard trace and there’s special tracing, so I can set it up so I’m only collecting trace for a particular set of transactions. So now, instead of seeing an overall hit for the entire region at, let’s say, 5%, I could say I only want to trace these three transactions. And since they’re only 10% of my workload those are the ones that are going to see the increase in overhead of, let’s say, 5% at the time. So what is 5% of 10%? It’s not even measurable. So, you do need to be able to run trace for a short period of time. IBM will require it from time to time anyway, so nonetheless, sometimes taking a small hit is worth it if it’s able to solve a problem for you.
Next Question: “OMEGAMON CICS history is presumably recorded on two or more collector files that are reused as they fill. Can full collector files be offloaded for analysis possibly days later?” Not at this time. And that’s why I collect the SMF data and that’s why I look at the trace data for further analysis. That’s the problem with OMEGAMON history right now is it can’t be offloaded for longer periods of time. It’s really meant for task history. For short-term analysis, the best you can do is make it as large as possible, and hopefully within your time-frame, you can look at that level of data. Otherwise the SMF data has some level of detail, even with the new program interface, which will give you details on the program level information. That data can be cut to SMF using OMEGAMON SMF 112 records – type 202. And so if you wanted to keep much of what you see in CICS task history within OMEGAMON for longer periods of time, I would look at recording it off to SMF.
Next Question: “Do the files trace should have a specific size for this tool?” Not specific, but obviously as large as possible. The bigger problem you run into is if your system is so busy and the trace table is wrapping very quickly, sometimes it’s hard for us to keep up with the recording of it, but there are various different tools we use to ensure that we can collect that data. Personally, I’ll be honest with you, if I were making the recommendation today, I would tell customers that their trace table, especially if they have trace turned on, should be 100MB or higher, assuming you’re not seeing any Paging, so you have enough real storage to support whatever you’re using in terms of memory. The more data you get in trace, the better off you are if a problem occurs, because you can solve them relatively quickly. The smaller the trace table size, the faster it wraps and the less long term detail you get. IBM has increased the size, I think, to 22MB. I tend not to run ever with less than 32MB. And like I said, it’s above the bar. It doesn’t affect or impact the region size so I don’t see any reason why you shouldn’t be using a trace table of 100 to 300MB. The more data in there, the better off you are. The only impact that there is is Paging. And I should also save System ABENDs like a CICS System ABENDs and comes down. Obviously to take a dump takes a little bit longer because he now has to record all that trace information out to DASD. But other than that, there’s no impact on the size. There’s more or less an impact when you’re actually collecting the trace itself.
So any other questions?
[01:09:53] – Host – Amanda Hendley
I did just share one more question from Alan.
[01:10:00] – Speaker – Ezriel Gross
Yeah, we could take that offline Alan, if you want to see how to add fields to an MCT table. Because there’s a couple of different steps. You have to create a user field, right, and then within the application program itself, you have to issue a command to start collecting that particular field as the transaction runs. So I could show you that my email address should be available. Send me an email, happy to go through it with you individually.
[01:10:31] – Host – Amanda Hendley
Great, well, thank you so much for the presentation and for covering some Q and A with us. We do post these videos and the deck available on the website shortly so you can access those and get to go through the session at your own pace for today. A couple more items for you. So I’ve got a couple of articles about the mainframe market in general that I thought were interesting. These are QR codes, so if you want to snap a picture of them from your phone, you should be able to go to these websites. But we can also share these in the LinkedIn group as well. And then I wanted to let you know that we’re doing a call for contributors over at Planet Mainframe. So if you’re interested in getting something you’ve written published and visible to the entire mainframe community consider sending it to Planetmainframe.com. And I wanted to also bring your attention if you haven’t gone there. The Broadcom Community area is just a wealth of resources. It’s places for you to post questions and answer questions and really meet more of your fellow mainframers at Community.Broadcom.com. So that’s a group that’s available to you as a resource as well and other ways to get involved.
So we’ve got a couple of announcements as far as our Twitter and our YouTube, we’ve really streamlined some things on Twitter and YouTube so that you can access all of the different user groups information in one place. So if you’re familiar with the CICS group, but obviously if you’ve got some interest in the Db2 or the IMS area, you can check those out. And on Twitter it’s all going to fall under one Twitter name. At this point I mentioned our LinkedIn group. Please go join the CICS LinkedIn group so that we can get you any of these updates and information and you should be on the list. But we do a newsletter. So every other month when we’re not having this live meeting, we’re going to send you a newsletter that’s got a recap of the meeting and additional news and articles for you. So let’s see with that, I’d want to thank our sponsors again, Broadcom Mainframe Software, Intellimagic and Planet Mainframe, and announce our next meeting. So we’re going to be meeting on September 12 for “Using CICS Artifacts to Build Web Service APIs. You can register at the QR code right there on your screen, but also it’s on the website already.
So we’re looking forward to seeing you in two months. Ezra, again, thank you so much for joining us today.
[01:13:26] – Speaker – Ezriel Gross
You’re welcome. Thank you for having me.
[01:13:28] – Host – Amanda Hendley
All right, everyone, have a great rest of the week. Bye.
Upcoming Virtual CICS Meetings
November 12, 2024
Virtual CICS User Group Meeting
Seven Deadly Sins of CICS Integration
Scott Brod and Russ Teubner
Broadcom
January 14, 2025
Virtual CICS User Group Meeting
Ansible Automation Platform in Action: Provisioning z/OS Middleware with the Latest CICS TS Collection
Drew Hughes, Developer - CICS Modernization Team at IBM
Andrew Twydell, Software Engineer at IBM