YAPP 发布后的十年

因为某些原因,国外的很多 Blog 站点在国内都是看不到的,包括 Blogspot.BlogSpot 上有很多不错的 Oracle 专家的 Blog .比如 Thomas Kyte. 此外还有 Anjo KolkOraperf. Oraperf 的三位大牛Anjo Kolk、Shari Yamaguchi、Jim Viscusi曾经在10年前提出 Oracle 数据库优化的 YAPP(Yet Another Performance Profiling Method)方法。在这个Blog第一篇就是回顾YAPP发布后的十年(YAPP Ten years later: What has changed? ).以下内容为引用:

In 1995 I started to work on the wait events paper based on Oracle 7.3, and that formed the basis for the YAPP white paper. That in turn formed the basis for the wave of response time tuning books and presentations. Now it is funny to see how the same thing is rehashed over and over again and nothing really new is being added. In fact I believe that all the attention on wait events and response time or resource tuning in the Oracle RDBMS, is taking away the way the focus of performance problems that actually originate outside the database. The word “over exposure” comes to mind. Does this mean that wait events are no longer important? Well yes and no. Believe it or not, but most databases suffer from the same performance problems. They differ in the symptoms that they show. For example, many databases suffer from I/O performance problems and Oracle has quite a number of wait events that are directly and indirectly associated with I/O. So instead of approaching each of these events individualy, they could be grouped together to just show the symptom. In fact, Oracle 10g has finally done this and has introduced wait event classes (not new, a company Precise (now part of Veritas) did that first in their product back in 1998). So Oracle is still expanding the number of wait events, but at the same it is grouping them together again.

Before the response time tuning (and even today) people actually based on best practices (BP). Each best practice has a ratio associated with it. For example the Buffer Cache Hit Ratio, basically is the Best Practice that tells people to cache frequently used data (which is a good thing in principle). The problem with tuning today is that many Best Practices exists and they all have some kind of ratio associated with them. So if a performance problem occurs DBAs starting working on this list of Best Practices to check if something applies to their problem. There are a couple of problems with that:

1) Not every body may have the same list of Best Practices or the same threshold for the ratios.
2) The list of Best Practices may not be sorted in the same way for different people

The result is that the problem finding process takes a long time and is not really repeatable by different people (different lists, different ratios). So starting the problem finding process with Best Practices is like shooting a gun and hooping that you will hit some thing.

The Response Time tuning process is basically telling you what Best Practice to use. For example if the Response Time consists mostly of I/O relatated waits we should start looking at the Best Practices for I/O. If CPU is the most common resource consumption, we should start looking at the Logical I/Os .

In this approach the different wait event groups play an important role, because each group is basically assoicated with one or more Best Practices. We still do care about what wait events are actually in the group, but for selecting the right Best Practice it doesn’t matter. For solving the problem, it may be important.

I believe that people should start thinking this way instead of worrying about the individual events. I am not saying that the individual events are not important, but keep an eye on the complete picture before diving into the indivual events.

So one of these days (may this year) I should write an update to the YAPP paper and hopefully it can be basisses for a wave in the response time tuning. Oh yeah, I don’t see my self as the inventor of all this. As far as I am concerned, Response time Tuning has always been there, it was just not really accepted in the Oracle world as the way of tuning your Oracle database. So may be I started that, but I just wrote the paper(s) and other people like Mogens Norgaard made it popular. It is kind of funny that I actually still use the same method(s) almost 10 years later (and it doesn’t matter if they are on instance, session or SQL statement level it is the same methodology, with different result but still solving the same problems; think about that one …..)

提到了所谓的 BP(best practices) 方法.想到曾经在一个站点上看到所谓的 HP 专家最欣赏”最佳实践”之类的说法不由得有点好笑.

还在 Oraperf 上看到了一则关于 Outer Edge of Disk 的帖子。
Oraperf’s Blog: Outer Edge of Disk

The question is why one should place data on the outside of the disk and if that helps performance. I just like to add my view on that discussion. The number of physical I/Os per Disk is limited by the (full/average) seek time. The seek time is the biggest component of the I/O time (with normal Oracle blocksize). The Full Seek Time (seeking from the outermost track to the innermost track or vice versa) can be greatly reduced by only using the outermost tracks on a disk. Why the outermost tracks? Well there is this statistic that on the 1/3 of the outermost tracks there is around 50 percent of the disk capacity stored. If I only have to seek 1/3 of the distance to reach 50 percent of the data all the time, the average seek time will be greatly reduced. If the Seek time is greatly reduced, the total I/O time is greatly reduced and we can do more random I/O operations per second. So to summarize:
1) Use around 50 percent of the disk capacity (outer 1/3 of the tracks)
2) That will increase the number of random I/Os greatly.

此文作者:, 位于 Database 分类 标签: on .
转载须以超链接形式标明文章原始出处和作者信息及版权声明.

Leave a Reply

Your email address will not be published. Required fields are marked *