[00:00:00] – Amanda Hendley, Host
Again, as people are getting settled, thank you for coming to our virtual user group. My name is Amanda Hendley. I’m excited to have you here. I’m excited that Craig Mullins is here as well to talk with you today. We have a quick agenda. So this session is sponsored by IntelliMagic. I guess it’s IBM IntelliMagic now, and they are a partner. Please check out their solutions. I’ll share a link later where you can check out some recent news that they’ve released. And if you know of anyone that would like to be a part of this user group, we have a partnership opportunity open to be a virtual user group sponsor. And in addition to our Db2 group, we do have an IMS and a CICS user group. Over at Planet Mainframe, we’re celebrating COBOL all month this month. So I believe it’s next week. This week will be the third or fourth release of our series on COBOL. And then we’ve really enjoyed receiving a lot of anecdotes and personal stories about people’s experience with COBOL. There’s just such a love for the language. So check us out. And then next month is all about security.
[00:01:23] – Amanda Hendley, Host
So we’ve got some great stuff lined up there for you. If you want to contribute to Planet Mainframe, shoot me an email. Check us out at planetmainframe.com, and I’d love to publish what you’ve got.
[00:01:44] – Amanda Hendley, Host
I took care of that one. After today’s session, we have an exit survey. It’s quick when you close out your browser or app. It’s just going to ask you two questions. I’d really appreciate your feedback. And with that… Oh, I have one promo. If you are planning to attend GSE UK, you want to attend or you’re thinking about it, we do have a promotional code for your registration. You can save 10% with PlanetMF2024 when you register. So I’d love to see you there. You can come visit the Planet Mainframe booth. We’re also going to do some on-site video recordings while we’re there. And if you would like to be recorded as a part of our podcast and video series, just give us a shout. We’ll set something up. All right, Craig, now I’m going to stop my screen share and let you take it over.
[00:02:39] – Craig Mullins, Presenter
Okay. All right.
[00:02:42] – Amanda Hendley, Host
And Craig is the President and Principal Consultant of Mullins Consulting, and he is an in-demand analyst, author, and speaker with over three decades of experience in database systems development. He teaches, he writes, It does everything. I think I’ve seen Craig present several times. So, Craig, I’m going to let you take it away. I think this will be a great… I’m just really excited about your session, getting your first-person perspective.
[00:03:13] – Craig Mullins, Presenter
Well, thank you, Amanda. Everybody can see the screen. Can you see it there, Amanda?
[00:03:17] – Amanda Hendley, Host
Looks great.
[00:03:18] – Craig Mullins, Presenter
Great. So I want to thank everybody for attending this session today. Thank Planet Mainframe for inviting me. And the name of the session today is, Oh, the Things I’ve seen, Db2 Stories and best practices. It’s based on Dr. Seuss type of view into some of the things that I’ve seen during my days as a Db2 consultant. Now, hopefully, you’ll find this journey to be an interesting one. Along the way, we’ll try to have some fun and make sure we’ll talk about some actual Db2 activities and issues and bring it all back around to the best practices that can eliminate those things I’ve seen. If you take a look at the agenda here, you might be able to guess at some of the things that I’ve seen and we’ll be talking about, but maybe not all of them. Join me and we’ll start our travels by discussing Db2 RUNSTATS, or to put it in a Dr. Seuss way, one stats, two stats, old stats, new stats, dealing with outdated RUNSTATS. What have I seen with RUNSTATS? The Dr. Seuss way of saying it is RUNSTATS aren’t run. Now RUNSTATS are old, but RUNSTATS are needed, or so we’ve been told.
[00:04:45] – Craig Mullins, Presenter
A lot of times, what I see out there is that RUNSTATS are not being run. Why is that? Well, there’s a number of different reasons. Sometimes I see tables, tablespaces out there with the dreaded negative one. What that means is that RUNSTATS were never run. I see this not just on newer objects, but recently at a site, there were tablespaces that were over 10 years old and still had negative one stats. This is just a case of neglect. Sometimes RUNSTATS aren’t run because the team had been burned in the past. They were doing reorg RUNSTATS, rebind, and got some bad performance. They get to a point where things are running okay, and they just stop running RUNSTATS. That’s really not a good idea because your data is changing, your environment’s changing. It’s better to keep running RUNSTATS, even if you only rebind sparingly or rarely. That way, at least the dynamic SQL has the current stats and a better chance of being optimized correctly. Another issue that I see more frequently than you might expect is when RUNSTATS are run, then some manual statistics get applied to get the access pass that the organization wants.
[00:06:25] – Craig Mullins, Presenter
This may have happened a while ago, 5, 10, 15, even more years ago. They manually changed the stats and then put up the old stop sign saying, Never RUNSTATS again. We got this manually corrected. That’s not a good practice either. If you want to do that, a reasonable implementation would be to schedule RUNSTATS regularly, just as you did before, but then add a step to that RUNSTATS job after the stats are collected to make those manual changes. A bad implementation, which I, again, just recently saw, was just stop running RUNSTATS, then lose the list of table spaces that had the manual stats apply, This right around fear, uncertainty, and doubt, the FUD, so that RUNSTATS don’t get run again because we don’t want to go back to that problem we had in 1995. All of these really unrealistic, unreasonable approaches to RUNSTATS, but they’re common out there. Okay, so maybe I got ahead of myself. What is RUNSTATS? Probably most of you here know what this is, but It’s a utility. It scans your data, updates metadata in the Db2 catalog about your database objects. It includes things like number of rows, number of pages, record length, and so on.
[00:07:58] – Craig Mullins, Presenter
The Db2 optimizer uses the statistics captured by RUNSTATS to determine the access paths to the data for your SQL statements, whether they’re dynamic or static. Real-time stats also exist, and they’re different. They’re the ones that should be used by DBAs and probably your automated tools, and they’re used to automate tasks, but not used by the Db2 Optimizer. Let’s quickly take a look at some of the stats out there. Some provide stats on tables like cardinality, and that’s the count of the number of rows, and the number of pages, number of active pages, compression information, table stats. Some stats are on indexes, things like number of leaf pages, index levels, cluster ratio, cardinality for the first part of the for the full key. Still, other stats are going to provide data on your columns, such as the number of distinct values, high and low values. You can capture information on groups of correlated columns using RUNSTATS, too. If you’ve got one, two, or more columns that are correlated, RUNSTATS can capture information for the group, not just a single column. Then there are the stats for special situations. You’ve got the non-uniform distribution statistics, sometimes called frequent value stats.
[00:09:43] – Craig Mullins, Presenter
That’s one of the special conditions. They’re useful for recording how data is skewed. If you don’t collect distribution stats like this, then Db2 is going to assume that your data is uniformly distributed. But That’s rarely the actual case. For example, non-uniformly distributed data. One example would be US population by state. States like California and Texas and New York have much larger populations than Wyoming and North Dakota and Alaska. Another special situation is Histogram stats. Okay, A histogram is a way of summarizing data that’s measured on an interval scale. A histogram is particularly helpful when you want to highlight how data is distributed to determine if data is symmetrical or skewed and to indicate whether or not outliers exist. A histogram is appropriate only for numerical variables. As I said, it’s measured on an interval scale. An interval is a set of real numbers between two numbers that can include or exclude one or both of them. Histograms are generally useful with larger data sets. Instead of the frequency statistics, which are collected for only a subset of the data, sometimes Db2 can improve your access pass selection by estimating predicate selectivity from a Histogram, and they’re collected over all values in a table space.
[00:11:41] – Craig Mullins, Presenter
You should consider collecting Histogram stats to improve access paths for your troublesome queries that have range like and between predicates. In some instances, they can also help with equality, is null, endless, and other operation predicates. We reviewed there what types of stats you can select. What’s a good rule of thumb for what to collect? Well, if we start at the beginning, you should be running RUNSTATS regularly if your data is changing, and most of your data changes. If you’re not, maybe you’ve got table with all the states in the US in it. That doesn’t change very frequently. You run RUNSTATS on that once and leave it, you’re going to be fine. But most of your data is going to be changing. We want to be running RUNSTATS based on an interval of how frequently it changes. Always collect basic stats for tables and indexes. Then collect column stats for important predicates and order by clauses. Collect the non-uniform distribution statistics when your data is skewed. For example, our US state population, or maybe an easier example to remember is that Cigar smokers skew male. That is, more men than women smoke cigars.
[00:13:20] – Craig Mullins, Presenter
Then collect correlation stats when two or more columns are highly correlated. The example here, city, state, and zip. Then consider collecting histogram stats when data skews by range over a large amount of data to collect things like events such as a lunch rush period or Christmas shopping season for a retailer. Here we’ll look at some quick examples of collecting what I just recommended. The first example is showing RUNSTATS on all indexes of a table in the identified table space. Then we’re going to collect stats for named columns in sensitive where clauses in a particular table. Here we see an example where the job column is skewed. That is, there’s going to be more worker bees out there than there’s going to be managers or CXOs. We capture that information using frequent value statistics. The image here is to remind us of that example I mentioned earlier, Cigar Smokers skew male. Here, we capture information on correlated data, city, state, and Zip, and we use column, group, and fréquence in order to collect that. The example here, Chicago, Illinois, is a much more frequent occurrence than Chicago, Texas, if there even is a Chicago, Texas.
[00:15:06] – Craig Mullins, Presenter
On the other hand, I do know that there is a Cleveland, Texas, and its population is around 7,500. So Cleveland, Ohio, with a population of about 375,000, is going to be much more likely to occur frequently than Cleveland, Texas, because the Ohio city is a lot bigger. So in those instances, we want to capture correlated data information. Here’s an example collecting Histogram stats. You can tell RUNSTATS to collect Histogram stats by coding the Histogram keyword along with column group information. That way you can collect Histogram stats for a group of columns. You also have to tell Db2 the number of quantiles to collect by specifying the number quantiles parameter. You can also specify that using the index parameter. In that case, it collects Histogram stats for the columns of the index, in this case, date, timest. That’s in this x-trig column. We’re collecting information here for the lunch rush information. To summarize, I want to be sure that you have a plan for keeping your RUNSTATS up to date. But what’s next? Back to the Dr. Seuss. Outdated statistics of silent flow caused the RID pool to overflow. Queries stumbled, indexes failed as SQL’s prowess became impaled.
[00:16:49] – Craig Mullins, Presenter
That brings us to the next I’ve seen, which is RID failures. Access paths using RIDs can cause problems, especially especially when coupled with a lack of RUNSTATS. Also, a thing to keep in mind is that as of Db2.12, we’re all there, right? Probably a lot of us are on 13 already. But as of 12, RID usage in access paths has increased. If you’re interested in the details of why RID usage increases, you want to take a look at the blog post in the URL here at the bottom of the screen. I did I’ll send a PDF to Amanda that she can share with everybody here, so you don’t have to be taking notes on these screens. As I go through it, you can get the PDF from her after the presentation. What is a RID? Well, in Db2 for ZOS, a RID is a record identifier, RID, and it’s a unique identifier that’s assigned to each row in a table, and it consists of a page number and a slot number within that page. The RID acts as a physical address that allows Db2 to locate and access specific rows very efficiently. Rids are used in Db2’s internal processing for navigating and retrieving data from tables for certain access paths.
[00:18:27] – Craig Mullins, Presenter
The RID pool is used is used as a dedicated memory area for caching Rids during Db2 SQL processing. The RID pool is used for all RID processing, and it includes enforcing unique keys for multiple row access, I’m sorry, multiple row updates for list Prefetch, for multiple index access paths, and for hybrid joints. What happens is Db2 collects Rids that match the selection criteria and places them in a list in the RID pool. That list is sorted by page number, and that’s contained in the RID. Db2 then uses that sorted list to access the table by reading up to 32 pages per I/O and attempting to read ahead one block of 32 pages before it’s used. The RID pool gets allocated dynamically as it’s needed and the maximum size of it is set in the DSN ZPAR MAXRBLK. Additionally, the work file database can be used to store a read list when read pool storage can’t contain all the reads that are in the list. When read pool storage overflows occurs for a RID list, Db2 attempts to store the RID list in the work file storage first instead of falling back to a scan as it will have to do when it runs out of memory.
[00:20:04] – Craig Mullins, Presenter
The maximum number of RIDs measured in RID blocks that Db2 is allowed to store in the work file database is determined by MAXTEMPS RID. If you set it to no limit, that can help to prevent reverting to tablespace scans when some arbitrary limit for work file usage is reached here. Okay, I mentioned list Prefetch on the previous slide, but what is that? Well, list Prefetch reads a set of data pages determined by a list of reads gathered from an index. List Prefetch access paths are well suited for queries where the qualified rows as determined in index key sequence are not sequential or are skip sequential potential but sparse, or when the value of the data repeat factor run stat is large. List Prefetch is a way of enabling reads to be used to speed up certain types of access paths. It can be used in the circumstances that are outlined in this list, and this list comes directly from the IBM Doc, which, again, here’s a URL for you to readread if you want to see it later, but let’s quickly summarize what’s on the screen here. Okay, list Prefetch may be used with a single index that has a cluster ratio lower than 80%, or it may be used on indexes with a high cluster ratio.
[00:21:48] – Craig Mullins, Presenter
If the estimated amount of data to be accessed is too small to make sequential Prefetch efficient, but large enough to require more than one read. These Prefetch is always used to access data by multiple index access, and it’s always used to access data from the inner table during a hybrid join. Also, it can be used for updatable cursors when the index contains columns that might be updated. List Prefetch can be used when inlist predicates are used through an in-memory table as has matching predicates. So several cases where you’ll see list Prefetch. But although it can be helpful at times, sometimes list Prefetch can cause some problems. There are four types of RID issues, some more troublesome than the others. You’ve got the RDS limit exceeded. In this case, the number of RIDs that can fit into the guaranteed number of RID blocks is greater than 25% of the table. When this happens, your index access gets abandoned and Db2 reverts to a table space scan. When the DM limit is exceeded, this is The number of RID entries is greater than the physical limit of… It’s about 26 million RIDs, and again, index access abandoned, replaced with a table space scan.
[00:23:29] – Craig Mullins, Presenter
Proc limit exceeded. This is the RID pull storage exceeded. You just ran out of RID pull storage. Again, index access abandoned, replaced with tablespace scan. These three are all pretty bad problems that you want to try to avoid. The The Overflow, which we talked about where it overflows to the work file. This is where you run out of RID storage, but it goes to the work file. This is the least worrisome because it’s just using more memory, but you retain indexed access. You’re not going back to that scan. Let’s take a look at an example. This is from a site where I consulted and I used a monitor. In this case, it was Mainview or AMI Ops, I guess BMC is calling it now. We use that to identify rid issues, and there were an awful lot of rid failures occurring. This is showing here in the red circled area. This is since the last time we looked and since the last time we recycled. We’re seeing a lot of rid failures. What I did was drill down in the monitor to identify the packages and statements that were causing the RDS failures. You notice that I blurred some of the details here to hide the identity of the client and the packages and such that they were using.
[00:24:59] – Craig Mullins, Presenter
Here we can identify RDS limit failures with RID fail equal to M and RID used equal to no. We have a reversion to a scan, and that can be a big performance problem if it happens a lot or even once on a big table. This particular shop had a day’s worth of data available to the online monitor, so I had to I’ll scroll through this RID LIST to review, you have to pf11 over to the right to get the statement number. After strolling, we see the statement numbers here that are associated with each entry. Once we go here, we’re armed with collection, package, statement number. You can go to the Db2 catalog and find the offending statement. The next step was to check the plan table to see what the access path is and to verify the statement was indeed using list Prefetch in the access path for the identified statement number. Now, even though this site had a best practice to bind everything went into production with explain YES, there were still times that access paths were missing from the plan table.
[00:26:39] – Craig Mullins, Presenter
The solution was I tried to run explain package to get access paths into the plan table, but here I didn’t have the authority. Nobody there had ever done this or even heard of this ability. What explain package is it captures the access paths from the existing package instead of binding with explain, which generates new access paths. What we want is, no, I want the access path that’s out there that’s causing the RDS failure. Here’s a case of making sure you keep up with new features of Db2 as they’re released, like explain package. Anyway, I helped walk them through the issues to get the access pass captured for the offending package using explain package. I When we got the explain information in the plan table, then the next step there was to verify that the statement number was indeed requiring a read access path and list Prefetch, and then to work on a plan to remediate the issue in the SQL. In some cases, the statement’s an open, could be a fetch, or you may need to track it back to the clip for the SQL. There’s some work you need to do in the catalog, and it’s helpful to have a catalog visibility tool as well.
[00:28:12] – Craig Mullins, Presenter
In this case, they had the admin tool from IBM, so that’s what we used. Why were they seeing so many RID issues? Well, back to our original thing I’ve seen, one of the most common reasons for RIDPool failures is not running RUNSTATS. Yes, this particular client had that problem. Manual stats were added decades ago. RUNSTATS were not run for everything to keep the manual stats in place. But again, the list of tables that this was done for couldn’t be found. They said it was documented. I’ve been waiting since last November for I doubt it’s ever going to be found. Some RUNSTATS were being done, but nobody wanted to run anything additional or do mass RUNSTATS on everything for fear of messing up 30-year-old tuning efforts. If the client is afraid of running RUNSTATS, that’s off the table, at least until a lot of reviewing is done so that they can feel more comfortable. But we wanted to We wanted to eliminate those RDS failures because there were a lot of them and they were chewing up CPU with scans. The other options on this list were considered here. Well, at least some of them, and we’ll get to that.
[00:29:43] – Craig Mullins, Presenter
What did I do? Well, I monitored for failures daily and captured the packages and statements that had RDS failures. Then we reviewed the access paths, again, which were not always there. Then we analyzed the data and considered fixes from the list of options that was on that previous slide. Sometimes it was easy to identify a new index that would remediate the problem. Another tactic that was useful a lot more frequently than the index was to add optimize for one row. If you had Optimize for one row, a lot of times Optimizer will forego a list Prefetch option. But this wasn’t always possible. The reason for that is you can’t add optimize for one row to an update or a delete statement. You can add it to a singleton select. There were cases where we actually did change the singleton select into a cursor just so we could add optimize for one row, but that was not That’s something we wanted to do significantly throughout the issue. Other cases, a simple rebind helped for those areas where run sets were being kept up to date, and it had been a long time since the last rebind had been run.
[00:31:20] – Craig Mullins, Presenter
Interestingly, this client also had IDAA, and in some circumstances, this helped. A few packages were accessing data already on IDAA, so we just rebound with the query acceleration option set to eligible. In other cases, a table or two was missing from the accelerator, so we reviewed the situation, and if it was possible, loaded the table to IDAA and then rebound with query acceleration eligible. This helped in some cases. There was a battery of things that we looked at to eliminate the RDS failures when the ability to just run RUNSTATS and fix a lot of them wasn’t really available to us. As Dr. Seuss would say, let us heed this cautionary tale and keep our statistics fresh without fail. For in the world of data, chaos can reign and success relies on keeping your stats sane. Okay, next up is a situation when there was overreliance on the DBA team. I mean, DBAs are the center of the universe, right? They’re going to review everything and make sure everything is okay. Well, not so fast there. Can the DBA do everything all the time? Is that the right thing to make the DBA team do?
[00:32:54] – Craig Mullins, Presenter
With scrutinizing eyes in a meticulous hand, the DBA reviews all a task ever grand, All queries and updates, each column and row, no badness escapes. This we all know. This particular situation was at a site where I was helping to augment the DBA team, and the developer users periodically would create and send data fix SQL. They’d send it to the DBA team asking them to review it and give their blessing that the SQL was accurate it and wouldn’t cause any problems. This was being done instead of coding one-off programs to correct the data problems. This approach can work fine. It’s easier to write some SQL than it is to write a program for a few statements, but not for pages and pages of SQL. And yes, this was being done in production. I was getting these few pages of updates, inserts, and would look at them, and they’d run them over the weekend. Then I got a request with hundreds and hundreds of insert and update statements. I’m not a human compiled. Neither is any DBA. It’s not a reasonable request for a DBA to manually review hundreds or thousands of statements in a short period of time and say, Yeah, these Reviews are exactly fine.
[00:34:32] – Craig Mullins, Presenter
The simple truth is that a DBA or any reviewer is going to find 10 problems in a 12-line program, but nothing in a 7,500-line program. It’s just human nature, right? What did I do? I didn’t want to take responsibility for hundreds of SQL statements for accuracy. Then if any of them failed, they could say, The DBA reviewed It’s their fault. I sent it back to the development team. I told them, You’re responsible for the accuracy of your SQL code. I did take an image copy of the tables that were going to be modified before leaving for the weekend. That way, I knew I’d have something to fall back on if the SQL somehow caused problems after they ran it. I didn’t completely abdicate DBA responsibility. Additionally, since I was a consultant there, I recommended to the DBA team that this is really not a good plan in a moving forward capacity. Why that is, is that If the DBA reviews it and misses something, as I said, it’s DBA’s fault. Also, it’s better for the developers to write a program to implement changes because that can be easier to track and then to back out problems.
[00:36:00] – Craig Mullins, Presenter
Finally, if the data is constantly needing to be modified like this, then there’s some root cause problem, and you need to analyze and fix that problem instead of just constantly modifying production data with these inserts and update statements on a weekly or bi weekly basis. So according to Dr. Seuss in the realm of data where things are not right lies a daunting task at DBA’s great fight. Hundreds of statements awaiting review, but why the DBA? What did he do? Line by line, the DBA delves through the pages of statements where the data is shelled. But as the power All grows taller in the hours they wane and efficiency falters under the strain. Instead, let us honor the programmer’s art in crafting a program to play its true part. To change so much data correctly, it seems, the program solves all the DBA’s dreams. Moving on. Another situation I’ve seen at a number of sites is a ton of Lock Escalations. According to Dr. Seuss, in Db2 land, so big and so wide lived apps that locked. Oh, how they’d collide. When queries ran all eager and quick, locks would escalate, creating quite the stick. Okay, so what is lock escalation?
[00:37:29] – Craig Mullins, Presenter
Well, when a threshold is reached, and that’s either a zeparm setting or a tablespace parameter, all the page locks or row locks that are held by an application on a single table or tablespace get released. And instead, the lock is promoted to a tablespace lock or a partition lock or a set of partition locks. When lock escalation occurs, Db2 issues this DSNI031I message, and that identifies the table space where lock escalation occurred and some information to help identify the plan or package that was running when that escalation occurred. Finding these lock escalations can be tedious. There are tools out there that can help, but a lot of organizations don’t have such a tool. Then you can go out to the DSN master log and search for DSNI031I messages or just search for escalation. That’s a lot easier to spell. When you find the messages, here again, some blurt out, so you don’t know the source here, the messages are going to have the table space that is impacted here. It’s going to have the package and collection as well as the statement number, so you can identify the SQL that’s causing the escalation.
[00:38:59] – Craig Mullins, Presenter
Now, The recurring theme at this particular shop was that lock escalations were occurring on table spaces that had LOCKSIZE row. Did a bit of digging there, and it seemed that long ago there were problems. So some of the table spaces were modified from page locking to row locking. Okay, but at the same time, they really should have re-evaluated the number of locks for the space by setting LOCKMAX parameter from something other than system. If you use system, LOCKMAX is the same as the NUMLK TS, DSNZPARM. But there are more locks taken when you go from row to page locks, at least most of the time, because there are multiple rows per page. But really, there was no analysis done here, and all the table spaces in the system were set to lock max system, regardless of whether it was LOCKSIZE ROW or LOCKSIZE PAG. The first thing we did, which is really the first thing you should always do when you encounter something like this, is we throw things over to the application teams and make make sure that all the programs that were getting lock escalations had commit logic in place.
[00:40:21] – Craig Mullins, Presenter
And of course, some of them didn’t. Those programs had to be changed to add commit logic. When you commit at a regular basis, those page and row locks get removed and the need to escalate doesn’t happen. Then there were programs that did have commit logic, and we had to look at them and change some of those to commit more frequently. Committing and/or committing more frequently helped to alleviate many of these issues. Another thing was was that some of these programs had single statement deletes that impacted more than the NUMLK TS value. I had it about a piecewise delete, which is another new newer feature in Db2 that allows you to break up a delete using the fetch first statement, and that can allow you to break into a multiple row delete statement, issue a commit, and then pick up again. But they weren’t familiar with that, but we had to implement that in some of the programs as well. Another thing, as I had alluded to, is LOCKMAX settings. If NUMLKTS is, let’s say, 10,000, then setting lock max to 10,000 times the number of rows per page, or at least some higher value, would be a simple approach to at least consider when you go to row locking.
[00:42:04] – Craig Mullins, Presenter
But you also have to keep taking compression into account because that can increase the number of rows per page, too. Even And so such a simple approach doesn’t solve everything because you may run into num lock US issues because a user is now taking a lot more locks, and they did run into that several times, too. One way to help with this is taking a look at the new features in function level 507, which gives you host variables that you can set in your program. These should always only be done under DBA supervision, and DBA should understand what these are. These could be used to set the number of blocks per table space, number of blocks per user, to different values only for this one program. Unfortunately, this client was slow to migrate through the function levels. They were still at 505, so we couldn’t use the host variables for these programs and had to fine-tune the commitments a couple of times before we got it right. It can take time, but with a team effort, lock escalations can be tackled. Dr. Seuss It tells us, first it starts small, just a row or two, but as things heated up, it grew into view.
[00:43:35] – Craig Mullins, Presenter
Shared locks turns exclusive, escalating high, blocking other queries as they pass by. So programmers and admins dug right in to smooth things out, keeping problems to them in. So remember, dear friend, in Db2’s domain, lock escalation’s a dance, a delicate game. With finesse and precision, it keeps things in line in the wondrous world of Db2 so fine. Then, there’s the problem that’s not a Db2 problem at all. Dr. Seuss tells us, in the realm of the mainframe, a mystery did brew a problem perplexed with no clue what to do. Db2, they suspected the culprit for sure, but beneath surface, dwelled a different cure. Okay, so what was the situation here? In this case, I was hired to help with a significant performance problem. This was in a system that was probably that best described as a big old mess of stuff. The application had components running in batch. There were kicks transactions. It was using Db2 and VSAM and flat files. Many programs from an old app had been converted to Db2 from VSAM, but not all of them. Other portions were rewritten in. Net using Oracle. Other parts were still using the old app with flat files and VSAM.
[00:45:10] – Craig Mullins, Presenter
In a situation like this, it can take a lot of time and effort to dig through and make sense of. It can require a lot of different experts because really nobody’s going to have all that knowledge. I mean, really, who knows, VSAM and Oracle and Db2 and kicks and being an expert in all of those. That’s really tough. A few more details here. It was a long running project, and I came in in the middle of it. The client had hired a lot of experts for the other parts of the application. Ibm was in there, so were a lot of other different local consultants. I was brought in to focus only on the to SQL. They wanted me to look there for potential performance problems. I dug in and I reviewed access paths, and along the way, I tuned a lot of SQL statements. In some cases, I added indexes and others making code changes. I didn’t make the code changes. I had to give directions to the programming team, and they made the changes. Along the way, they’d have regular tests with the entire application running, usually on Friday around noon. Each time it ran, they would see small but measurable performance gains, but there’s still overall issues that they were needing to tackle.
[00:46:50] – Craig Mullins, Presenter
Each of the specific domain experts were tuning their individual areas, and I think that was what was causing the small small gains, but there was still something lurking behind the scenes. Again, I was part of a large battalion of folks, and at some point, one of the old guys they brought out of retirement had an idea. He said, Have you looked at the VSAM files? It might make sense to look at them if you haven’t looked at them lately. Remember, VSAM files were still part the overall application, but they had not been looked at in literally years, actually decades. It turns out the VSAM files were really a mess. So someone suggested that REPRO was used to reorganize them. Now, if you’re not a VSAM person, REPRO is a VSAM utility program. Here’s the diagram from the manual. It can perform a lot of different functions, but one of them is similar to a reorg. Now, when REPRO is not run on a VSAM data set, it can become disorganized and several consequences can happen. You can get data integrity issues. If it’s not used to copy or reorganize, the data in the VSAM file can become fragmented or disordered, and that can lead to incomplete or incorrect records being retrieved when you’re reading.
[00:48:39] – Craig Mullins, Presenter
You can get performance degradation. It can impact reading the data because as you retrieve the records from a disorganized data set, it requires additional I/O operations, resulting in slower access time. You can get index inconsistencies. They may no longer accurately reflect the actual location of the data. So you can get incorrect record retrieval or even data loss. You can increase space usage, can cause unused space within the data set. Inefficient storage utilization, again, back to inefficient reads because sometimes you’re reading pages with not a lot of data on them. And your risk of duplicate records. If the REPRO command is not to handle them. It can lead again to data inconsistencies. There are inconsistency issues, performance issues. It’s really essential to run REPRO on your VSAM data sets, and if you don’t do it, it can result in efficiencies. Inefficiencies, sorry. For those of us who are Db2 folks, the parallels to a Db2 reorg here are pretty obvious. They REPROed the VSAM files and the next system test ran a lot better. There were still things that could be tuned on the Db2 side, but they got the performance problem resolved. Unfortunately for me, at least, that contract ended early, but fortunately for them, they got the performance they wanted.
[00:50:16] – Craig Mullins, Presenter
As Dr. Seuss tells us, Oh, the lesson learned in this database tale to look beyond the obvious without fail. For sometimes the problem, though it may seem, is lurking elsewhere in in a different scheme. With all these tales of woe behind us, let’s take a look at a few high-level best practices you can embrace to help you avoid these types of problems and things I’ve seen. First of all, keep your maintenance up to date, your Db2 maintenance. I was called into another site that was planning on migrating off of the mainframe, off of Db2 another system, but they had a long ramp up-time. Unfortunately, they moved a lot of the Db2 team to the new system and were limping along supporting that existing system. After converting to Db2 version 12, they failed to expand the RBAs of all their table spaces, so they got a production outage there. You don’t ever want something like to happen. You want to keep up to date on your Db2 maintenance, but also make sure that you follow the instructions as you move along in Db2 versions and Db2 maintenance and make sure that you’re doing all the recommended procedures along the way.
[00:51:47] – Craig Mullins, Presenter
Next, ensure that appropriate RUNSTATS are being run. Understand all the options that we looked at for running RUNSTATS, and keep your RUNSTATS up to date, and do that either by scheduling jobs or by using an automation tool, which brings us to automating. And automating is always preferable. This might require using vendor tools, But automating can help you avoid outages and help you avoid performance problems. So automation tools can be a great, great investment. Make sure back backups are taken for all of your database objects. Be sure to have RTOs. That’s a recovery time objective. And then make sure that you’re taking backups frequently enough to be able to meet your RTOs. You can really only do that by testing your backup and recovery jobs. I mean, the worst time to discover that your backup plan isn’t sufficient is during a production recovery situation, right? So plan to RTOs, do your backups, test your backup and recovery. Also, make sure that all your programs have a commit strategy in place. When you can, make it parameter-driven so that you can easily change it in reaction to changing modification patterns. This is going back to our lock escalation story.
[00:53:19] – Craig Mullins, Presenter
If every program has a commit strategy and a parameter in place, then you’re not going in modifying code, you’re just modifying parameters to make your commitments more frequent. Documentation. Be sure to document everything. Make sure that there’s a standard location where all that documentation is stored. I think I mentioned that I’m waiting For some doc at one client that we requested last November, probably doesn’t exist anymore because there’s not that local place where all the documentation is kept. Finally, be proactive. Being proactive can eliminate future problems. The earlier in the development cycle that are required changes can be identified and implemented, the lower the overall cost of a change is going to be. The earlier you identify a performance issue, and the quicker you resolve it, the less CPU you’re going to chew up, the quicker that you’re finding the problems, the less impact they’re going to have on the overall system, the less impact they’re going to have on the overall performance problems you might be experiencing. So I hope this has been a useful session for you all. And I want to leave you with this quote from Dr. Seuss himself. If you didn’t realize the poem The terms I put in here weren’t written by Dr. Seuss, they were written by me.
[00:54:49] – Craig Mullins, Presenter
But this one was written by Dr. Seuss. The more that you read, the more things you will know, the more that you’ll learn, the more places you’ll go. So keep that in mind Thank you for your time. Thank you to Planet Mainframe and the virtual Db2 user group for the invitation to talk. Also, my contact information here. If you need to get in touch with me about anything in this presentation, or really anything Db2 or data related, there’s my website, there’s my email address, and there’s some information about the books I’ve written, which are all shown here. I guess I’ll turn it back over to Amanda in case there’s any questions or to lead us out. Amanda?
[00:55:40] – Amanda Hendley, Host
Thanks, Craig. If anyone has any questions, you can drop them in chat, or you’re welcome to come off mute. Now, earlier, someone asked about the presentation deck, so we’ll have the deck, the recording, and the transcript ready within a couple of weeks on the website. And if you’re not already subscribed to the newsletter, please do that so that you’ll get all of the updates. I’ll give you just another second. I’m going to share my screen if I can find it. Okay, great. If you do have any questions, just feel free to drop them in chat, but I’m going to cover my last couple of slides. So So our articles and news, and there are a couple of pieces. So I’ve got a Forrester report about IntelliMagic Vision for you to check out. An article I found interesting on Db2, I think that’s published on the Register website. The schedule for IDUG EMEA is available now, so I hope to see you there. I’ll be there for that event. And lastly, on our job board at planetmainframe, jobs. Planetmainframe.com, a Db2 database administrator is one of several jobs that I saw posted today, looking for Db2 skills if you are in the market for a new opportunity.
[00:57:18] – Amanda Hendley, Host
To get involved with our user groups, I think the most likely place to get really good conversations is LinkedIn, but we are on Facebook, and videos get published on YouTube as well. I do want to thank IntelliMagic again for partnering with us on this Db2 user group. They’ve been a sponsor since the beginning, January 2023. Again, that opportunity is available for partnership if you have a company that you think should join up. And save the date, 11/19/24. Same time, same place. We’ll see you in two months. Craig, thank you so much for today’s session. I can’t believe you wrote all that… I mean, poetry rhymes.
[00:58:13] – Craig Mullins, Presenter
Thank you. I’m an aspiring poet, have been for years, but Db2 pays better.
[00:58:22] – Amanda Hendley, Host
Well, great. So there is a question that just popped, two questions that just popped up. I see one about certification.
[00:58:45] – Craig Mullins, Presenter
Oh, certification. I’m a programmer. I’d like to know what certification helps to improve my skills on Db2 application program side. Yeah, there are certification tests that IBM offers. As Amanda mentioned, IDUG is coming up. One of the benefits there is that IDUG offers attendees’ free access to certification exams. That’s one thing you can do. You can also go to some of the education sites that one of the good ones is Interskill Learning, and they offer courses that can then give you badges that are similar certifications. I also did some training sessions that are available there for Db2 programers on performance for programming Db2 that can be accessible via interest, skill learning. So you might want to look into that as well.
[00:59:54] – Amanda Hendley, Host
Someone said, If I run RUNSTATS out of basics, will I miss it if I run online statistics with a reorg, for example?
[01:00:04] – Craig Mullins, Presenter
So you’re talking about inline stats there. And yes, inline stats is another way of running RUNSTATS. You just have to be careful that the stats you’re collecting inline are the ones that you want to be collecting and getting all of the parameters right there. So I did neglect to mention that collecting stats with a reorg is a possibility. Thank you. Sure. It looks like Roman has another question about the books. And yes, some of them are for Db2. I think there’s three of them there that are specifically for Db2. Some of them are for heterogeneous DBA, heterogeneous performance type issues. So there all across the board there. If Db2 is in the title, they’re specifically for Db2. The Db2 developers guide is specifically for ZOS. The performance one is not specifically for ZOS, it’s for both ZOS and LUW. And there’s the Tau of Db2, which is high level for any Db2 user.
[01:01:35] – Amanda Hendley, Host
There’s a question. I’m following the IBM manuals to read it there, version 12. Would you advise if I need to follow on books?
[01:01:48] – Craig Mullins, Presenter
Well, I always would advise you need to buy my books. But yeah, the manuals are definitely great for the latest and greatest in terms of version information. The books are more cross-version type of general high-level and sometimes in-depth information on things you can do to tune Db2, things you can tune to better administer Db2. So a combination of manuals and books is always a good approach, in my opinion.
[01:02:23] – Amanda Hendley, Host
Did you catch Soledad showing a book, Craig?
[01:02:27] – Craig Mullins, Presenter
I’m sorry, what?
[01:02:29] – Amanda Hendley, Host
Someone was showing here.
[01:02:33] – Craig Mullins, Presenter
Oh, I didn’t see that. No. I guess I don’t see what is being shared on the screen, unfortunately.
[01:02:47] – Amanda Hendley, Host
The Db2 developers guide?
[01:02:50] – Craig Mullins, Presenter
Oh, yeah. That’s the one that’s been out there for 30 plus years. Wrote that way back when the sixth edition is the latest and greatest version of that. And again, you can always get all these books on Amazon.
[01:03:10] – Amanda Hendley, Host
Well, I want to thank everyone for attending today and being active participants. And Craig, again, thank you so much for presenting.
[01:03:22] – Craig Mullins, Presenter
Thank you for asking me. Happy to have done it.
[01:03:24] – Amanda Hendley, Host
You all have a great rest of the day.