Virtual IMS User Group Sponsors
Virtual IMS User Group | April 2024
How to Help IBM and YOU Quickly Resolve IMS Problems
Steve Nathan
IBM
We all know IMS is perfect. But on the rare occasion when there is a problem is must be solved quickly. This presentation will document how to make that happen.
Steve Nathan, IBM
Stephen P. Nathan has 38 years of experience as an IMS developer, application analyst, DBA, systems programmer, and performance tuner. He has worked for IBM in IMS Level 2 Support since 2003
Read the Transcription
Amanda Hendley (00:00)
Well, thank you everyone for joining us for today’s virtual IMS User Group session. I am excited to have Steve Nathan here with us today. We’re going to have a great talk.
Amanda Hendley (00:12)
If you’re new here, we typically start our meeting with a pretty simple agenda, an introduction and welcome. Hi, my name is Amanda Hendley. Glad to have you here. I’m based in Atlanta, Georgia, where it is pretty overcast and rainy today. Today, but we had a nice clear day yesterday for the Eclipse, which was fun. We are going to have a couple of introductory things, then our presentation. There’ll be plenty of time for Q&A afterwards. I’ve got a couple of news and articles for you to check out, and then we’ll talk about what’s next. With no further ado, let’s go ahead and get started. Again, all So along the way, if you have questions or comments, you can put them into the meeting chat. And I know a question that we typically get is, is the presentation going to be available, the recording going to be available? The answer is yes across the board. And If you are not already signed up for our newsletter, I encourage you to get the newsletter. If you want to drop me a message in chat to sign you up, I can do that. Otherwise, you can sign up at virtualusergroups.com. The newsletter is going to give the announcements for the next sessions. It’s going to give you the recap articles and any other resources that are available. So let’s get going.
Amanda Hendley (01:39)
Before we move too much further, I want to thank our sponsor for this series, BMC. Obviously, they have a lot of great resources and tools, so please check them out. Tell them I sent you.
Amanda Hendley (01:56)
And Then after today’s session, we have an exit survey. It’s going to come quick when you start to close out, but I would appreciate you doing our exit survey. It’s two questions, one about the session and one about future sessions. Just take a moment for us there, please.
Amanda Hendley (02:17)
Now we’re ready to get started. So pretty easy painless intro. I’m going to stop my share so that Steve can start his as I introduce him. You should be able to go ahead and take over, Steve. While he’s doing that, I’ll let you know that he has 38 years of experience as an IMS developer, application analyst, DBA, systems program and performance tuner, and has worked for IBM in IMS level 2 support since 2003. Welcome. Thanks for being here, and I’m excited for the session. Take it away.
Steve Nathan (02:56)
Hello, everybody. Hope you can all hear me. Hello to all IMS friends out there. This is going to be my yelling session. In IMS level 2, in IMS support, we’re trying to resolve the problems as quickly as possible, and hopefully this session will help us do that.
Steve Nathan (03:21)
We have our standard disclaimer.
Steve Nathan (03:25)
Here’s an agenda. There’s a lot of things you can set up before we ever get a problem. While the problem is happening is a very key part of it. After the problem, what you can do to resolve it. And then as a last resort, you can come to IBM Then I have an appendix on the IMS connect recorder trace.
Steve Nathan (03:51)
And we do know that IMS is perfect. We do, once in a while, have an avant or something like Most of the problems that you encounter and that you give to us are things just aren’t working right, performance, things like that. So we’ll talk about that also.
Steve Nathan (04:11)
Okay. First of all, I want to thank Jeff Maddox, who since retired from IBM, but he did several sessions called Making it Difficult for IMS Problems to Hide, and I’ve certainly taken a lot of bullets from him. And also Kevin Stuart, a wonderful IMS expert, and he explained time out and LOCKTIME and DEADLOK, which we’re going to talk about later.
Steve Nathan (04:40)
Okay, this is what I said before.
Steve Nathan (04:45)
So before the problem occurs, there’s a lot of things you can set up in your zOS and IMS and TCP/IP environment.
Steve Nathan (04:55)
The first thing you’re going to need, and you could take this back to management if necessary to show them, you’re going to need some tools. You need an IMS monitor, whether it’s MainView from BMC or a OMEGAMON for IMS or TMON. You should have one of those tools. You should have a tool to analyze CPU. STROBE now belongs to BMC, and IBM has the Application Performance Analyzer. But you should have one of those two.
Steve Nathan (05:29)
You want to analyze logs and traces, not just IMS, but Db2, whatever. So BMC has the AMI Log Analyzer for IMS. And IBM has two products, the IMS Performance Analyzer and the IMS Problem Investigator. And you should have one of that set. You must have one of that set. If If you’re using IMS Connect, you must have either the BMC Energizer or IMS Connect extensions. So you can take this to your management if necessary.
Steve Nathan (06:15)
There are a couple of tools that I’m going to be referencing here. I have a link to the documentation. The first one is DFS-DA10, if you’re not familiar with that. It’s an IBM tool. It comes with IMS. And it lets you go to your log record and select and format log records. You can select them with a string or based on bit settings or data in fields.
Steve Nathan (06:46)
You can print the log. You can also use it to copy selected records. I use this for extracting certain records from a log and creating a mini log. And it can be used on any file, not just an IMS log. I’ve used this to browse image copies and try and find things. And the selection works on it, too. The selection is just offset and value. So you can do this with any file, not just an IMS log. It can be very useful. There’s a start after and a stop after parameter. So you could just look at part of a file. You don’t have to process the whole thing. The only little trick is if you don’t say STOPAFT=EOF and you have more than 16 million records, it’s going to stop because that’s the default. So always code STOPAFT=EOF whenever you’re using DFSERA10.
Steve Nathan (07:53)
There is an exit routine called DFSERA70. You look at the documentation. And what I’ve listed here, you can select records based on a lot of different values, all the records for a PST or region ID, all the records for a PSB or a DVD. If you’re looking at a database, you can look at an RBA or a block.
Steve Nathan (08:22)
There’s more values here, all the records for a user ID. You can see all these values in here. It’s a very, very handy tool.
Steve Nathan (08:35)
The other program to look at that IBM provides is DDLT0, the DL/I test program, and there’s a link to the documentation there. You can run it as BMP or batch, and it makes DV and DC calls from SYSIN control cards. You can code SSAs, you can code AIBs, you can repeat calls, you can compare results, and you can print the data only when the output is useful. For example, only print out the data if there’s a G status code or something like that.
Steve Nathan (09:15)
You can DUMP DL/I control blocks. You can punch selected control statements to create a new tests. You can merge tests. And one other thing which is very handy, You can do WTOs and WTORs. So you can start one WT0, issue a WTOR to hold it, start another WT0 and have it run while the first one is waiting, and then reply the WTOR and have the second one take off again. So this is really good for testing conflicts between programs. And as you’re going to see later on, it’s also good for analyzing a database calls if there’s a problem.
Steve Nathan (10:05)
Some of these zOS things that your zOS system programmer should set up to make it easier to find problems. Number one is make sure the system trace table is large enough to capture events. When we get an SVC DUMP, there is the system trace table, which I’ll show you an example of. It’s a table that zOS keeps in core. The default size is only 1M per CPU, and we really recommend three meg per CPU. With today’s large computers, that is not a problem. The way to set this up is in the COMMNDxx PARMLIB member, you just have to put this command “TRACE ST,3M” and that’ll get the system trace table to have enough entries that we can try and find data.
Steve Nathan (11:07)
Now, if you want to look at a SYSTRACE, and I recommend that you do that, just to play with it, if nothing else, go into an SVC DUMP and you enter the command, IP SYSTRACE. You can say TIME(LOCAL). And you can look at it now. There’s a lot of stuff in it. Now, there’s a lot of stuff in it. I’ve given you two links for documents that will help you decipher all the things in a system trace. And just for the fun of it, I have an example here of a system trace table. And you can see SVC’s, you can see I/Os, You can see dispatches. There’s PSWs in there, so you can see who’s calling it. Somebody did a get mean. Somebody did an exCP. You can get down to really very, very fine detail files using the sys trace. So I recommend playing with it.
Steve Nathan (12:08)
The next thing that we have to make large enough is something called the master trace table. This keeps part of the current MVS SYSLOG in core. So when we get an SVC DUMP, we can see the latest job log entries and sys log entries. So the The default is only 24K, and we want to make it 1000K. So in the SCHED PARMLIB member, you should set this parameter empty size 1000K.
Steve Nathan (12:47)
Another thing which is very important is something called the Common Storage Tracker. Common Storage, which is SQA and CSA, everybody can put storage in there, and we’re constantly analyzing CSA and SQA, and we need to keep track of who is allocated what. In your DIAGxxxx PARMLIB member, there is a parameter of VSM TRACK an(ON) SQA(ON), make sure that is set. There’s negligible performance. The table is stored in the SQA, and we use this all the time in support. Not just IMS, everybody.
Steve Nathan (13:37)
Now, if you have a SVC DUMP and you have looked at this storage, have you turned And if you had to turn this parameter on, you can go in and display storage in various ways. You can ask for a summary or a detail and sort it by a lot of ways, by address, my ASID, etc. Now, when you have a private address space and it ABENDS or it ends, zOS cleans off the storage. But if you’ve allocated CSA or SQA and you haven’t explicitly freed that and your program ends, now you’ve orphaned that common storage. You could enter the command here. You see where the keyword is OG, meaning “owner gone”. And you can find all of the programs that have ended and left CSA and SQA dirty. Very important.
Steve Nathan (14:47)
This just shows, if I did IPCS VERBX ‘OWNCOMM SUMMARY’ the first thing you see is a grand total of how much is allocated in each one, SQA, CSA, E-SQA, and CSA. And a whole line for owner gone. So you can see a lot of people have less CSA dirty here.
Steve Nathan (15:14)
And then when you page down, you’re going to see one line per ASID. And AC means it’s active and OGE means it’s owner gone. And then you can see how much CSA and SQA each address space currently on owns or used to own. And now, if you do it and you say, I want detail, and in this case, I said by ASID and address, and I chose the IMS ASID, it’ll show you each piece of allocated storage in CSA and SQA. This address here, it starts with the zero, zero, so this is below the bar. So this is just CSA, and it’ll show you the return address of who did that, got that storage. In IMS, it’s usually just one place. And then it shows you the first few bytes of what that storage looks like. So you can see a lot of what IMS or anybody else is allocated in common storage.
Steve Nathan (16:37)
Speaking of common storage, there was a new parameter in zOS a few releases ago called BESTFITCSA(YES). What happened was in a previous release of zOS, they changed how CSA and SQL were allocated, and they just chose the first chunk of storage that fit, as opposed to trying to look for a good spot to put it. And that became the default, and a lot of people were fragmenting the CSA and ECSA. So they added this parameter to go back to the old way to use best fit. But for historical reasons, they didn’t want to change the default. So the default is null. So please make sure that in your DIAGxx parameter, you said VSM best fit CSA, yes. Very important.
Steve Nathan (17:43)
Okay, these are just some of the other arameter, you can see in the DIAGxx PARMLIB member.
Steve Nathan (17:56)
Okay, we’re going to be taking SVC DUMPs, and we have to make sure there’s enough room for them. In the command PARMLIB member, there’s this command to allocate the amount of space. The default is only 500M, and in today’s address spaces, that will not work. We want to set that to 5,000M.
Steve Nathan (18:28)
Now, SVT DUMPs are extremely important, and you can get them two different ways. You can get them manually or you can set a SLIP. The commands to get these DUMPs can be prepared ahead of time. In the IEADMCxx PARMLIB member, you can free set up some commands for taking manual DUMPs. And then in the IEASLPxx PARMLIB member, you can set some SLIPs, have them ready to go, and then activate them without having to type them in. And you can specify wild cards for JOBNAMEs. And we’re going to look at some of these.
Steve Nathan (19:16)
Okay. Now, if you want to take a manual DUMP, the first thing you would go to your zOS console and you say DUMP COMM=(comments), and then you can give us some comments. And it would come back an outstanding reply. And then you have to reply all of the parameters for the SVC DUMP. You have to say “JOBNAME=”, and you have to say “SDATA=”. You have to type all this data in every time. Or you could put a member into the IEA DMC PARMLIB member. You can say JOBNAME with IMS and a wild card, others in a wild card, pre-put in all the S data parameters. And then when you want to take the manual DUMP, just say DUMP COMM=(comments), PARMLIB=xx, and it’ll create the manual DUMP with the parameters that are in PARMLIB. So we recommend that you take the time to set up some DUMPs that you know you’re going to be taking.
Steve Nathan (20:27)
SLIPs. We’re We’re going to ask you to set SLIPs. You should be able to set your own SLIPs and know sometimes that you want them without us telling you. You can put some of these skeleton members into the PARMLIB member and then edit the timeline member and then say set SLIP equals and it’ll enable it. I’ll give you some examples. You can set up a SLIP on a system ABEND. So here’s an example. You say ID equals, maybe ID would be OC4 or 80A, whatever. Id is your ID. Comp is the completion code. I’m sorry, that’s the 80A or the OC1. And you can have this set up in advance. And so if you get a bunch of OC1, say in an NPR, you can quickly edit and set the SLIP and get an SVC DUMP for us to look at or for you to look at. You can do the same thing on a User ABEND. The comp just has a U in it. Or if you’re starting to get a bunch of error messages and you want to get us an SVC DUMP, you can set a SLIP on a message ID. So have this member in PARMLIB, edit it for the message you’re interested in, and then set the SLIP and get an If you see them, these are all very important.
Steve Nathan (22:08)
database-lockingCollecting the proper SMF records. We look at SMF records all the time, and this is a minimum that you should be collecting. I talk to the zOS performance support people, and they read to this list also. I’m sure you’re all connecting SMF 30 and SMF7x, which are your RMF records. But make sure you’re also collecting SMF 79-15. These are IRLM long lock records. We’re going to talk more about them later. But please make sure you have these turned on for when you start getting database locking problems. SMF 98s are new. They’re called high frequency throughput statistics or workload interaction correlator. Names don’t mean anything. But what this collects is very detailed data and zOS performance support can use these records to get extreme detail of what’s happening inside your system. So they’re new, but please turn them on. And then SMF 99 and 113 are things you should also have on all the time.
Steve Nathan (23:43)
Now, you want to avoid an ABENDS40D. When storage becomes full, zOS is going to try and create a DUMP like an 878 or an 80A. But zOS US needs some storage on its own to create that DUMP. If the DUMP is successful, then the data is preserved and then IMS cleanup will get control, and IMS will clean things up. If there is not enough storage to create the DUMP, you’re going to get an ABENDS40D, meaning it terminated at end of memory. You don’t get the data in the SVC DUMP, and the IMS cleanup routines might not get control, and you may end up having to IPL your system. You may have things dirty. But you can reserve storage so that zOS could take a DUMP in a JES exit called IEFUSI and make sure that your system programmer has done this.
Steve Nathan (24:53)
Also, If you set these two SLIPs in IEASLPxx, then they’ll catch the 80A or the 878 before the system tries to take the DUMP and get the 40D. We recommend that these two SLIPs be set all the time. And you notice the keyword there is enabled. So that means these will be enabled when you IPL.
Steve Nathan (25:30)
We want to avoid 322 and 522 ABENDS, especially in NPRs, because when you get these, IMS may not clean up resources. Now, JES has an exit called IEFUTL, so that instead of giving a 322 or 522, it temporarily expands those limits and it avoids the bad ABEND. Now, IMS ships in its sample exit something called DFSUTL, which is an IMS-specific sample exit. We recommend very strongly that you take this sample and implement it in your zOS system to avoid 322s and 522s for IMS regions.
Steve Nathan (26:29)
The same is true for a 722 ABEND, lines exceeded. This is fairly new, but IMS has supported this. JES has the IEF USO exit, and now IMS has a new sample exit called DFS USO that we ship in our sample line. Please make sure your zOS Assistant Programmer implements this exit also to avoid 722 ABENDS, which IMS may not clean up resources.
Steve Nathan (27:07)
In the IMS in the PARM, there is a format option, FTMO. Make sure that is set to D. That way, IMS will produce a system DUMP if it has any error. And if for some reason the system DUMP can’t be taken, it will use the SYSMDUMP card as a backup. And I have a link here to the DUMP Formating options, which you really want to use D.
Steve Nathan (27:45)
You should have SYSMDUMP cards in your IMS regions, control, DLISAS, and DBRC as a backup in case the system DUMP doesn’t work. You must specify DISP=MOD, and you have to scratch it and reallocate it after each use. In the IEADMR00 PARMLIB member, you can specify what the S data is for the system DUMPs. And if you want to send them through a GDG, we have an example here of what you could put into your regions. Getting a DUMP is very important.
Steve Nathan (28:37)
For the dependent regions, make sure there’s a SYSUDUMP. And this IEADMP00 PARMLIB member. Also, you can put in some parameters, and these are the ones that we recommend.
Steve Nathan (28:56)
Please make sure you install the IMS DUMP Format Formatter. I want you to look on IMS DUMPs, too. The IMS DUMP Formatter is amazing. It’s got extensive options for formatting IMS and related address spaces and SVC DUMPs, including IMS Connect and ODVM, lots of things. I have a link here to how to install the IMS DUMP Formatter, and I recommend that you play with it in a Sandbox system. Get an SVC DUMP of IMS in its regions and look at all the wonderful things in there. I’ll be showing you some of them later on in the presentation.
Steve Nathan (29:43)
IMS has table traces. And there are some of them that we recommend that should be on at all times. They’re internal. They’re just going to trace to internal tables. There is very low overhead. But if these traces are on all the time and all of a sudden there was a problem, we could look at these internal traces and maybe help to analyze what the problem is. So If you’re a DFS/VSM PROCLI member, we recommend that you specify these four traces with the keyword equals on so that they’ll be on for internal tracing. Now, you You can put them to external data sets by issuing this command and then saying option log. There are other table traces we may ask you to turn on in special circumstances, but these four should always be on and internal.
Steve Nathan (30:50)
There are external trace data sets, DFSTRA01 and 02. You can allocate them via DD cards or you can have DSF MBA Dynamic Allocation Members. Please make sure that you have these set up. If they do not exist and then you turn on a trace and say option log, it’s going to write it to the OLDS, and this could affect performance, you really want these records to go to the trace data sets.
Steve Nathan (31:25)
Okay, let’s talk about LOCKTIME and DEADLOK. LOCKTIME is an IMS parameter. It’s specified in the VSM PROCLIB member for online or the VSMP for batch. DEADLOK is an IRLM parameter. Now, LOCKTIME, here we go. LOCKTIME in IMS corresponds to time out in IRLM. Okay, What does this all mean? Well, thanks to Kevin Stuart, I think I can explain it.
Steve Nathan (32:08)
First of all, here’s a link to the documentation for enabling the IRLM LOCKTIME out feature. This is the parameter, LOCKTIME equals, and you have one for online and one for batch. And what this says is, how long should an online region or batch region be waiting for a lock before IRLM comes back and tells IMS, this guy has waited a long time for a lock.
Steve Nathan (32:40)
You’re setting a parameter in here to pass to IRLM to tell him how long to wait for a lock before he comes back. And what you can say is if this guy is holding the lock for too long, what do you want me to do with this address space? Do you want me to abandon it or do you want me to give him a BD status code? And you can specify a LOCKTIME between 1:00 and 32,767 seconds. How long do you want any region to wait on a lock.
Steve Nathan (33:19)
Okay. Now, IMS will pass a timeout parameter to IRLM when it starts up. So you’re establishing the timeout value in IRLM. If LOCKTIME is not coded, the default is 300 seconds. If LOCKTIME is coded, IMS is going to pass the small the color or the online or batch values. Now, if a lock in IRLM waits longer than this LOCKTIME or this time value. Irlm just calls our LOCKTIMEout exit. And this is where we’ll check online our batch and how long it’s been waiting and if we really have waited too long. We can tell IRLM to reject the lock, and IRLM is going to issue this message.
Steve Nathan (34:22)
We’re also going to write this SMF 7915 log record, which are going to be invaluable for analyzing long locks. That’s why we want these collected. You can change the IRLM timeout value by command. And that’s going to help. And you can change the IMS LOCKTIME parameter. But IMS passes the timeout value to IRLM only when it starts up. So if you update the LOCKTIME value via IMS command, it only changes it in IMS, but it hasn’t notified IRLM for the change. So the IRLM time net value is what is set when IMS came up or what you changed with the Modify IRLM command. So if you change the IMS one, also change the IRLM one at the same time.
Steve Nathan (35:33)
Okay, DEADLOK, that’s an IRLM parameter. IMS doesn’t tell IRLM what it is. It’s only an IRLM. Yeah. IRLM checks for a timeout during DEADLOK cycles. So DEADLOK is usually much smaller than timeout or LOCKTIME. And IRLM checks for both DEADLOKs and time counts, regardless of any IMS parameter.
Steve Nathan (36:10)
For many systems, a DEADLOK time of one second is good. You don’t want to wait too long. And ILM only checks after two cycles. So even if you say DEADLOK one, he’s only going to look for it after two seconds. And another recommendation is that the ILM timeout value be an integer factor of the onliner batch timeout value in IMS. I know this is complicated, You can study it, and then you can send me questions, and then I’ll go ask Kevin.
Steve Nathan (36:52)
Okay. There are seven Common Service Layer (CSL) address spaces. So CSL address spaces, and I’ve listed them here. DBRC using BPE is optional, but these are the seven CSL address spaces.
Steve Nathan (37:16)
They’re built on a set of common services called BPE, which stands for a Based Primitive Environment. Now, BPE provides internal tracing for itself and for the CSL address spaces. And we use these internal trace tables all the time. They’re extremely useful. Now, the size of these trace tables is controlled by control cards. You have a BPE CFG PROCLIB member, which is read any time any one of these address spaces starts up.
Steve Nathan (37:59)
And I’m highly recommending, if not yelling or screaming, that these trace level statements should be in the BPECFGxx member for every CQS address space, the same one. If you start up a DBRC address space and it reads a control card for CQS, it’ll just ignore it. So it’s easier just to have one set of values that turns the trace on high and has 300 pages. I will tell you that it will not affect performance and that you won’t run out of storage. So please have this set in all your CSL address spaces so that we can look at these traces if we need them. Very important. And when I run my report against them, I print out what the pages are and I can tell if this hasn’t been set.
Steve Nathan (38:59)
These are all the things that you can do before we’ve even gotten a problem to help minimize problems or collect the proper data when There is a problem. Now we’ve got a problem. If it’s just IMS-admitted, that’s standard. We’ve probably taken care of that with the previous things. But most of the time What we get is, oh, things are running slow. We’re getting these error messages. So what happens when you have a problem?
Steve Nathan (39:38)
The first thing we highly recommend is that you have documented procedures for gathering the documentation. Make sure your operations know what to do when there’s a problem. Have a written checklist and set up automation to do as many things as possible.
Steve Nathan (40:00)
That can be very helpful. The first thing I yell at when there’s a problem is, Turn on the DC monitor. I say that all the time. And here’s the command. You can set it on or you can set it on and say stop automatically after a certain number of seconds. I would like at least two minutes. Okay, 60 seconds is okay. I’ll settle for 30 seconds. The DC monitor has a lot of good data, which is not in the IMS log. The IMS log is for integrity and for recovery. It’s not for performance. It’s not for problem analysis. We use it, but we really need that monitor on.
Steve Nathan (40:55)
The next thing is look at your online monitors. Look at your OMEGAMON for For IMS or for Db2 or CICS or any other online monitors that you have. Do you see a bunch of regions waiting on locks? Look at the online monitors. And, and I’ve emphasized this in red, have your zOS team look at RMF Monitor 3 or whatever equivalent you have in your shop. Because it may show that IMS regions are waiting for CPU or waiting fr DASD or other resource constraints. You can see this in real-time, and it can be very, very helpful. Because if you don’t look at them, then when you open up a case, I’m going to ask you to send me the RM data for when the problem happened, the RMF monitor 3. But if you look at it right then, you can solve the problem without having to open a case.
Steve Nathan (42:03)
Okay, set a SLIP. Don’t wait for us to tell you. If you see a bunch of error messages constantly coming out, set a SLIP on the error message. If you see NPR is constantly bending on something, set a SLIP, get an SVC DUMP. We like SVC DUMPs. And you’ve seen how to do that.
Steve Nathan (42:35)
Okay, take an SVC DUMP. Don’t wait for us to tell you. Use the DUMP COMM command. I showed you how to do that. Okay, get all the IMS address spaces, the control region, DO1, DBRC, IRLM, any suspect, BMPs, MPRs, IFPs, and other address spaces as necessary, Db2, MQ, RRS, things like that. And they should all be all in one DUMP. You have a JOBNAME parameter and you have a job list parameter, JL= include all of these into the one SVC DUMP. It’s much easier for us to look at it all in one DUMP.
Steve Nathan (43:27)
If there’s a problem with the OTMA interface from IMS Connect or MQ, these are two OTMA traces that we’d like to have on. Trace set on TMEMBER, and you give it the TMEMBER name, like IMS Connect or MQ member name, This is going to write 6701 log records to the IMS log. And then Trace set on table OTMT, option log, volume high. This is a trace table, and this is going to write them to the Dfstraxx data sets. And also take an SVC DUMP of IMS, make sure it’s got the control region and connector MQ so I can look at all the OTMA control blocks for what’s happening at the time. This is very important.
Steve Nathan (44:23)
Okay, if you’re having problems, the application talking to databases, getting back bad Status Codes, whatever. Turn on this trace. Trace set on PSB. You give it a psb name in the keyword comp. And we’ll talk a lot more about that later.
Steve Nathan (44:44)
This is while the problem was happening. If there’s a problem with the application, IMSDC calls, like get each of the IOPCB or change insert purge to IOPCB or all PCBs. Things aren’t working the way you want. Are you getting unexpected status codes? Turn on this trace. Trace set on program, and you give it the PSV name. And we’ll talk a lot more about this trace later. I’m just trying to get you to do these things while the problem is happening without us telling you to do them.
Steve Nathan (45:28)
If there’s a problem CPU or long weights, activate STROBE, activate API.
Steve Nathan (45:36)
If things are going on in IMS Connect, turn on the IMS Connect recorder trace. There’s two of them. There’s the standard one or the BPE one. And all the details for this are in the appendix. So all I’m telling you here is, Turn on the trace in the appendix. It’ll tell you how and how to look at it.
Steve Nathan (46:13)
Okay, IMS Connect extension has traces. And it has what’s called a collection level and a trace level. If you’re trying to get detailed information, make sure the collection level is set to four. Make sure the trace level is set to two. And there are online commands to change this dynamically. And I’m sure BMC Energizer has a similar function. These can be very useful. TCP/IP packet traces. If you’re using IMS Connect and you’re seeing messages like HWSP 1415 or 1485, and we’re going to talk a lot We’ll talk more about this later, start a TCP/IP packet trace for the port that IMS Connect is listening on. I know at least one TCP/IP system programmer is listening, Perry, so turn it on. Okay, make sure your TCP/IP team is ready to do this. We’ll talk more about it later.
Steve Nathan (47:25)
lE ABENDS If you’re getting If you use your 4.0.xx, like 40.38 and 40.39, these are LE ABENDS, and what they’re trapping is real ABENDS, such as OC1 or OC4 or OC7. And these 40XX event DUMPs are usually not helpful. What we really want is real ABENDS, OC1s or OC4s. So these options, these LE options, should be turned on in your applications. You want to say TRAP(ON,NOSPIE). You don’t want LE to intercept these admins. And then you want to say TERMTHDACT(UADUMP). You want to get a DUMP when they happen. Or you can set a SLIP on the ‘OCx ABENDS’. Now, if If you have an SVC DUMP and you’re trying to find out from the LE information what the real application ABENDs were and registers, I have a link here to the documentation to this APAR. It’s an informational APAR. That’s what the II stands for. And it’ll tell you if you get one of these LE ABENDs and you have an SVC DUMP, how to find all the real DUMP information. This is very useful.
Steve Nathan (48:59)
Okay, you may want to stop an IMS region. And before you do that, please always take an SVC DUMP before stopping or canceling IMS. Now, there is this command, “IMSxxx,DUMP”, that gets you a user 20 ABEND. Please don’t do that because IMS cleans up some stuff before it takes the ABEND and then we lose it. For IMS dependent regions, you can do /STOP REGION xxx, whatever. That’ll just stop it or stop region, add DUMP to get a tissue DUMP. Sometimes that will hang, depending on where you are with IMS. If you do “/STOP REGION XXX CANCEL” it’ll get rid of that. But if your region happens to be talking to IMS at the time, it can also bring down the IMS control region on a user 113. So in a test environment may be, so try not to use stop region cancel.
Steve Nathan (50:17)
As a last resort, take an SVC DUMP, and then you can cancel, always cancel IMS with a DUMP or use your Omegamon kill of the equivalent But be ready to IPL.
Steve Nathan (50:32)
Hopefully, you never get here. Okay, now we want to get ready to send us the documentation or for you to look at the documentation. Turn off all the traces. Switch the olds so that we have a fresh old and all the data we know is on the previous olds. Copy the monitor data data sets. Copy the recorder trace data set. Save all the information. Save the trace data sets. Save the RMF monitor 3 data. We want to know we have all this when we go to look at the problem.
Steve Nathan (51:23)
The problem is over, and now we’re trying to analyze it. But first of all, you should be trying to analyze it. I’m going to be using IMSPA and IMSPI for some of these examples because that’s what I happen to use. I’m sure that the BMC has similar reports and could show you how to use those. It’s the method that counts, not the tool. The first thing My motto is A-G-F. Always Google first. Google those about all the IMS manuals. So if you get an IMS event, go into Google and Search IMS, for example, 0402, and it’s going to come back and it’s going to take you to the documentation in the IMS manuals for that event.
Steve Nathan (52:29)
Or if you get a bad message number, Google knows that. If you search on a, Search IMS and DFS 554A, you’re going to get a link directly to the IMS manual for that message. So again, my motto, always Google first. That’s what I do, even in support. We have our own tools for finding things. I like Google. Okay, Google knows about closed APARs. So suppose you see an OC4 in DFSASK00. Go into Google and search on event OC4, DFSASK00. It’s going to come back and give you a link to this APAR. It doesn’t know about open APARs. But as soon as an APAR is closed, Google knows about it. Or if necessary, read the DUMP directly. IMS has a dedicated team to maintain the IMS documentation. They are the best in the business without a doubt. Every day, we’re updating these manuals. And here’s a link to the IMS documentation. Okay.
Steve Nathan (54:06)
We want to analyze the DC monitor data. You were very nice. You turned on the monitor. What are we going to do with it? Well, first of all, IMS provides this program, DFSUTR20, to process the data. Unfortunately, it doesn’t produce anything useful. Okay. I use IMSPA, or you can use the BMC tool to produce reports. And what I do is I produce all the reports in one execution of IMSPA, and I provided you the JCL and the control cards to do that.
Steve Nathan (54:50)
Here’s the JCL, and then these are the control cards. If you use these control cards, all of the reports are going to come to one DDName, the DDName on Reports, and then you can have everything in one place. And then you can scan through the report, you can find a PSB, you can find a database name, everything is there for you. Very important for analysis.
Steve Nathan (55:24)
If you had a problem with your database calls and you turned on the program trace. Okay, I’m sorry, with the DC calls and you turned on the program trace, you can use DFSERA10 to extract all those records. Okay, there’s 6701-LA-3A, which is the record written before the call, and the L A3B, which is the record written after the call. Unfortunately, it doesn’t to format these calls very well.
Steve Nathan (56:05)
I recommend that you use IMSPI to format these records. In IMSPI, you can filter on records. So you can go to IMS PI and say, show me all the 6701s where this CTDUP ID is LA3A or LA3B. And it’ll come with a list of all of these 6701 records. And if you select any one of them, it’s going to show you in detail all of the IMS control blocks, the call function, the PCB, the status code, the I/O area, all sorts of good stuff. So you can see, all right, your program did a change call to something you weren’t expecting.
Steve Nathan (56:58)
Okay, If you had the PSB trace on, your program was doing database calls and you didn’t understand why or how, you can use DFSERA10 with this Exit DFSERA50 And that will format these trace records very nicely. Here’s the JCL for it. You select the five Fox Fogg records. You use an exit routine of DFS tra 50. And they’re going to go to the Trace Punch, TRC Punch DD card. And what you get is all of the control, every single deal database call that your program issued in DLT0 control card format. It did a replace to relative PCB51. It did an insert. It gives the data. It did it get unique. You can see the SSA there. Your programmer says, I did this, and you can fill in the cards and say, No, you didn’t. Very useful for analyzing database calls.
Steve Nathan (58:22)
One other report that is extremely useful is the IMSPA program tracing trace report against the DC monitor. For every region, it’s going to trace every DO1 call, the time, the status, the I/Os. You can select by PSB or TREM code or region against the monitor.
Steve Nathan (58:47)
And this is the start of what that looks like. It shows I did a get_unique to the IOPCB. 302 milliseconds Later, I did a get_unique to this database, and I did one OSAM I/O. Okay? Later on, I did a get_unique to a database. I got a GE status code. But to do that, I got 1, 2, 3, 4, 5, 6 VSAM I/Os and one OSAM I/O. Now, this doesn’t have the SSAs, but you can match this up to the program trace or the programmer should know what he’s doing. There’s more from the program trace. It’s extremely useful in looking in detail on what your program is doing.
Steve Nathan (59:48)
Okay, DEADLOKs. You can use DFSERA10 to print out… Whenever there’s a DEADLOK, IMS writes 67 FoxFox DEADLOK record to the IMS log. And you can format them using DFS-ERA10 with these control cards or IMSPA with these control cards. And then you can look at this and you can figure out what these DEADLOKs are. The IMS manuals have a link. Have a whole section in there on how to read a DEADLOK report. Okay. Rich Lewis wrote an IMS Orange Book on IMS locking in incredible detail. It’s still available online, and I have here a link to that. So with these two links, you should be able to analyze your own DEADLOKs. Of course, if you have a problem, you can open a case, but play with it first.
Steve Nathan (01:00:56)
Okay, if you’re having CPU problems or weights and you’ve run the STROBE or APA, run those reports or use the interactive panels for looking at those sample data sets. If you see something that implicates IMS, save that sample data set. If you are using APA, you can do what’s called an extract of that APA sample file and send that to the case. And we can We’re going to look at it here. If you’re having STROBE, you’re going to have to send us the STROBE reports. We can’t look at the sample file. But nine times out of 10, the CPU is in the application anyway.
Steve Nathan (01:01:47)
If you’re looking at OTMA synchronous call-out, ICAL, many customers are using this now. IMS writes without turning on any traces, all of these records, detailed records for every I call. It shows you the resumed TPIPE that the application did to get the I call messages or the cancel resumed TPIPE if it timed out an IMS connect. It shows you that the message was sent. It shows you that the client acted or enacted. It shows you that the client sent a response. Response, and then it shows you that OTMA act or napped that response. These records can be very handy for analyzing ICALs. You could format these with The IMS PI, that’s what I use.
Steve Nathan (01:02:52)
And this shows the control cards to filter an IMS PI. There’s also a link in the IMS manual. We’ve documented in detail what these synchronous callout log records mean. So there’s a link to look at that.
Steve Nathan (01:03:18)
Analyzing a TCP/FP packet trace. If you want to look at it, you can use IPCS. First of all, you’re going to need an IPCS directory. And And I just give you a job here just to set that up for yourself. Everybody’s going to need a directory.
Steve Nathan (01:03:38)
And then this is the job to format the PCP/IP packet trace. This is some of the important fields that you’re going to see. There’s a source destination, which is an IP address and a target destination. For this particular packet. And the source port and the destination port, 5,000 is probably IMS Connect. And then there are flags that say, what did this particular packet do? And these are what some of the flags are.
Steve Nathan (01:04:19)
If you see an act, this is a TCPIP internal act, not an application act. If you see a SYN that says on I’m starting a socket connection. A PSH says, I’m sending data. A FIN says, I’m ending a socket connection. And then a RST says the socket was reset, which is very bad if you’re trying to talk to IMS Connect. And these flashs can be combined. You can act SYN says, I’m acting this Socket startup, or I’m acting this data in internally for the NTC-PIP. If you’re using AT-TLS, the data itself will be corrupted, encrypted, but we’re really interested in the flow, not necessarily the data. You can get the data elsewhere in other traces.
Steve Nathan (01:05:19)
If you see these messages, HWSP 1415 E or sometimes 1485 E. Okay. IMS Connect is just the messenger. It did a TCP/IP read or write and got back minus one return code from TCP/IP. It’s not IMS Connect’s fault. What it usually means is that your IMS Connect client ended the Socket connection prematurely. So it should be investigated by your TCP/IP system programmer and the application team. And if you get stuck, you can open a case, but you can look at it yourself. The IP address, this is the IP address of the client, and then there’s an error code. The most common ones are 1121, which says that the Connect Client reset the socket or it did a time out. But you can Google the others, what these E-return codes are. The ID is the client ID of the IMS Connect client that sent the message or that we tried to read from. If the ID is DELDUMMY, it means that the client opened up a socket and never sent in a message. IMS Connect accepted the socket open, turned around and did a read for the initial message so we could figure out who you are and got the minus one return code immediately. And sometimes you can see hundreds and thousands of these because some misbehaving IMS Connect client is doing us a connect and then immediately stopping. Okay, well, you have the TCP/IP address, go yell at them. If the ID starts with HWS, that’s the IMSTM resource adapter. If the ID starts with GMP, that’s the IMS TM service provider for zOS Connect. Otherwise, it’s probably a roll your own IMS TM Connect application. But you should be able to work on these yourselves.
Steve Nathan (01:07:44)
Analyzing virtual storage. I just threw some stuff in here. It’s very important in case you want to… If you’re running out of storage, if you want to increase pools or buffers. Okay. You can use the General Storage statistics report, which is part of the IMSPA IRUR report. It shows real storage and virtual storage and control region DM1 storage. Or you can use the EDA option at the IMS DUMP Formatter. Okay, I’ll show you those.
Steve Nathan (01:08:25)
If you’re using the DUMP Formatter, there’s an option called EDA. And then there’s a suboption called 5, which says SYS. And then there’s another parameter under that says STATS. And it’ll show you, For example, the number of CPs and how much real storage you have.
Steve Nathan (01:08:52)
It’ll show you common storage right now. Okay? How much is what the limit is, how much is currently allocated, and what percentage is allocated. You can see very quickly if you’re running out of common storage.
Steve Nathan (01:09:11)
It shows you the control region private storage.
Steve Nathan (01:09:18)
It shows you the DLI private storage. You can also I also see all of the CSA and SQA that IMS is using because we create a CDE or an SDE for every piece of storage that we allocate. You can use the EDA option and say cdecom. And it’s going to create a list of every piece of storage that we’ve allocated in CSA and SQA, and it has a name. LSCD is a control box or a module name or RDS or DCB. We have a DCB for the RDS data set.So you’re going to see every piece of storage that we allocated. And then At the bottom, there’s a summary of all of the storage that we’ve allocated in common. Now, IMS overall performance. You can produce many reports with one execution of IMSPA. The output should go to data set, not to CIS out. The IRUR report has the most data. And browse all of these reports. Now, detail analysis of these reports, that’s a topic for another day. I’m not going to go over them, but I will show you how I collect them.
Steve Nathan (01:10:58)
I do all the reports reports in one execution of IMSPA. You have your log input, you have your output data sets, and then you have one DD name for each one of the reports. So these are going to data sets. And then in the IPICMD, you put all these commands, you’re producing all of these reports at once.
Steve Nathan (01:11:26)
And they’re worth browsing. Okay. Okay, well, now, you’ve already solved like 90% of the problems yourself. And so for the 10% that you have to come to us for, here’s what you have to do. First of all, if you’re opening a case, make the explanation as detailed as possible, especially in the initial entry. We can start work faster and we can route it to the proper routine. Include recent changes in the environment, exact time, exact resources, this trans code, this PSB name. Show the system built, if there was a DUMP. Make that initial entry as detailed as possible. Send all the documentation terst, even text files, job log, SYS log, et cetera. If you don’t send it to us, then we have to go through a couple of steps to get it to our zOS system so that we can browse it.
Steve Nathan (01:12:42)
Send all these files without us having to ask for them. The DUMPs, the MVS SYS logs, SYSONE log rec, the IMS logs. Now, these logs must contain at least two IMS checkpoints. Otherwise, We can’t run some of our reports. All of the job logs, the trace data sets, not the reports, the data sets. Send files with very descriptive names. IMSLOG1, IMSLOG2, or IMSLOG.T0102, T0105. We spend a lot of time looking through log data sets, trying to get them into and for which IMS they belong to. So make those file names very descriptive, please.
Steve Nathan (01:13:40)
Okay, that’s the end of my story. Here’s the appendix. I’m not going to go through this. This is for your information. I go into great detail on collecting the IMS Connect I’m going to take a quarter trace and I show you exactly what’s in it, in case you want to interpret it yourself. Okay. And of course, I’ll take questions. And we’ll leave it up to the moderator to do that. How do I get out of this?
Amanda Hendley (01:14:21)
You’re welcome to drop questions in the chat, or you can come off mute and ask them directly.
Steve Nathan (01:14:29)
Close this.
Amanda Hendley (01:14:30)
Can you all hear me?
Steve Nathan (01:14:33)
Close this. I can hear you. Yes, we can hear you. Hi, Karen. Hi, Steve. I’m here.
Speaker 3 (01:14:51)
Hey, Steve, just saying hi.
Steve Nathan (01:14:53)
Hi, Paul. Hi, John. Good to see you again. See, I told you these are all my friends out there. I’ve been with IMS for 50 years, not 38, actually. So I know most of you.
Amanda Hendley (01:15:08)
That’s awesome.
Steve Nathan (01:15:09)
IMS, it’s a family. It really is.
Amanda Hendley (01:15:17)
I definitely recognize some names from the IMS user group for sure, and then also from the Influencer program that Planet Mainframe is running.
Steve Nathan (01:15:31)
Oh, yes, John, Sushi. For those of you that go to SHARE the IMS project goes out for Sushi one night during SHARE. And I wish I could invite you all. We have a very good sushi place right here in Ithaca, New York, for I am. So if you get to Ithaca, I will take you out for sushi. John, you’re invited.
Speaker 3 (01:15:58)
Now I’m up to visit my son in Buffalo, Steve. I’ll pick you up on that.
Steve Nathan (01:16:02)
Good idea.
Amanda Hendley (01:16:04)
How is the eclipse?
Steve Nathan (01:16:07)
Oh, unfortunately, right now it’s beautiful. It’s bright and sunny. Yesterday, it was overcast, and for literally one second, there was a hole in the cloud. We saw part of the eclipse, and then it closed up again. Oh, dear. It got a little bit dark, and that was it.
Amanda Hendley (01:16:31)
Was anyone in a good pattern for it? Did anyone see it in almost totality yesterday?
Speaker 3 (01:16:43)
I have a good news story to tell, especially Zee, who I see on the phone. Zee might appreciate. For personal reasons, I happened to be in Manhattan yesterday. Between three o’clock and four o’clock, no one was working in Manhattan. Everybody was out on the street. Everybody was friendly. Nobody was pushing and shoving and arguing, as is common in Manhattan. Everybody was sharing Eclipse glasses as they were, if somebody didn’t have one. I haven’t seen that much camaraderie in Manhattan among busy, busy people in a long, long time.
Steve Nathan (01:17:45)
Good to hear.
Amanda Hendley (01:17:57)
Well, I’ve got just a couple of slides. I If we don’t want to talk about any questions, we can keep talking about whatever else, too. But I did want to drop that. Again, our sponsor BMC, they have the BMC AMI data tool that you can check out. These are QR codes that are very scannable for use. We are running our most influential mainframeers program over at Planet Mainframe, and it’s an opportunity for us to profile that have been really influential in the space and influential to other people. It’s not a big competition or anything, but a chance for us to really take a look at some great folks out there. I was telling Dusty earlier, I’ve had such a good time reading everyone’s profiles and getting a chance to write about people through this program. Those are being released every Monday through Friday from Planet Mainframe, multiple profiles at a time just because we had so many to do.
Steve Nathan (01:19:06)
You say if you’re talking to Dusty, good luck.
Amanda Hendley (01:19:11)
Then on the job board, there are a couple of opportunities, but I just threw up one opportunity where they’re looking for some IMS talent in the Chicago area. At Planet Mainframe, we are, of course, always looking for contributors that want to write or do podcasts or videos for the community. Just the same. I am looking for speakers for IMS. If you’ve got a presentation or tips or tricks you want to share, let me know. I’m just ahendley@planetmainframe.com, or you can find me on LinkedIn. Let me know. Then we’ve got really great growing social media groups. I think our most popular is probably LinkedIn. You can go check us out at VUG LinkedIn for our virtual user group there. Again, thank you, BMC. Then in the exit survey, again, I’ll just plug it. It’s hard to miss if you close out really fast, but it’s just two questions. Then one of those being, what is a future session you’d like to see? Last time we met, I asked you all what you thought about a session with AI and IMS, and there’s some interest there, so that’s something we’re pursuing, too. But any session that you’re looking for, let me know, and we’ll try to get one on the calendar.
Amanda Hendley (01:20:52)
Did we have any remaining questions or comments from today?
Steve Nathan (01:20:56)
Yeah. When we will have to the recording of this session and about the presentation itself.
Amanda Hendley (01:21:05)
It’ll all be posted in about a week.
Steve Nathan (01:21:09)
All right, thanks.
Speaker 3 (01:21:11)
Yeah. Amanda, one other comment. Like Steve did, if you could actually put real links instead of the things that we have to use our phones for, that would be helpful. It looked like on one view graph, you actually had real links, but the one before that was the ones that we had. I find sometimes my phone works with those, and other times it doesn’t always.
Amanda Hendley (01:21:46)
Yeah, especially in the newsletter when we share these things, you’ll get real links because you can’t just copy paste off my screen, unfortunately, which would be Actually, we should get on the phone with the Zoom folks, right? Because that’d be a pretty cool trick. But these will all be in the newsletter. And if you’re not signed up for the newsletter yet, you can drop your email address in chat, and I’ll grab it or shoot me a note, or just go to the website and sign up. Because when we don’t have a meeting that month, we do a newsletter that’s got a recap article, links for the video and everything, other news, announcements There’s links like these. Some good resources there. I think overall, we might send you between meetings, you might get four emails from us in the course of two months. It’s not a big… It’s not a lot of stuff, but not a spam thing. All right. Well, great. Thanks, Larry. If there’s nothing else, Steve, thank you so much.
Steve Nathan (01:23:03)
My pleasure. I always like it.
Speaker 3 (01:23:04)
Yeah, thanks very much.
Upcoming Virtual IMS Meetings
December 10, 2024
Virtual IMS User Group Meeting
Understanding What IMS Applications Can Do and How You Can Benefit From REST APIS
February 11, 2025
Virtual IMS User Group Meeting
IMS Catalog implementation using Ansible Playbooks
Sahil Gupta and Santosh Belgaonkar
BMC