Wait event 에 대한 간단한 Tuning 방법

2009. 2. 3. 13:17

이 문서에서는 어떻게 wait event를 측정하고 이들에 대한 의미와 간단하게 할 수 있는 tuning 방법을 요약하고자 한다.

Oracle session이 비록 CPU에 대한 자원을 기다리거나, 소비하고 있는 상황에서도 이들은 여러 가지 wait event중 한가지를 가지고 있다. 가령 session이 I/O 에 대한 응답을 기다린다거나, SGA에 free space를 기다린다거나 혹은 internal한 자원(latch)등 을 기다린다거나 하는 wait event를 가지고 있게 된다.

Wait event 정보 수집

Oracle wait event에 대한 상세한 정보를 제공하는 3가지 performance view들이 있다.

·V$SYSTEM_EVENT - Oracle database가 시작한 후 모든session의해 발생되어진 다양한
wait event에 대한 elapsed time,그리고 유형과 수를 요약해 놓은 view 이다.

·V$SESSION_WAIT - 현재 기다리고 있는 session에 대한 wait를 보여 준다. 이 view는 또
한 이 wait에 대한 상세정보 ( 위치 및 주소 ) 등을 포함한다.

·V$SESSION_EVENT - 하나의 process 이를 기반으로 elapse time, type and number등에
대한 요약 정보를 가지고 있다. 이 view는 개인적인 session에 대해 검증할 수 있
도록 해준다.

비록 V$SESSION_EVENT가 개인적인 process waits에 대해 심층적으로 접근할 수 있도록 해 준다. V$SESSION_WAIT는 각각의 wait에 대해 좀더 자세히 검증 할 수 있도록 해주고, V$SYSTEM_EVENT는 Oracle instance에 대한 overall한 관점의 tuning에 있어 매우 유용한 정보를 제공한다. V$SYSTEM_EVENT는 다음과 같은 속성을 포함한다.

Column Description
===================================================
Event EVENT 이름
Total_waits - Total number of times this event has been waited for since the instance was started.
Total_timeouts - Number of times the wait for the event was terminated by a timeout
Time_waited -Total time, in 1/100ths of a second, spend waiting for this event by all sessions since the instance started
Average_wait - Average time spent waiting on this event in 1/100ths of a second
Oracle에서 제공하는 utlbstat/utlestat를 사용하여 report.txt를 보게되면 V$SYSTEM_EVENT의 대한 내용을 볼 수 있다. 단 이 report는 이해하기가 그리 쉽지않은데 gury harriso의 script인 wait_stA.sql , wait_stB.sql.을 사용해 볼 것을 추천한다. 첫번째 요약정보는 database가 startup후의 정보이고,두번째정보는 특정 시간의 기간동안의 요약 정보이다.

각각의 event에 대한 간단한 설명

"db file" waits
Oracle event의 이름 중 "db file"로 시작되는 event 이름들은 (db file parallel write, db file scattered read, db file sequential read, db file single write) Oracle session이 oracle datafile에 대한 I/O에 대한 문제로 발생하는 event들이다.
Datafile에 쓰는 작업은 database writer에 의한 작업이다. 고로 "db file" write wait는 사용자 session에 의해서는 절대로 경험되어질 수 없다. 그러나 user session은 database file로부터 직접적으로 읽어 올 수는 있다. 그리고 이는 거의 datafile read events에 대한 wait를 가지게 된다.
만약 database가 완전히 SGA영역에 cache되어지지 않는다면, datafile I/O에 대한 waiting은 피할 수 없고 그리고 이 wait event는 database 성능에 대해 어떤 것도 아니다. 가장 건전한 database라면 "db file" wait는 모든 non-idle wait times에 대해 80%~90%로 설명되어질 수 있다.

log file sync/write waits
Oracle session이 db file I/O에 대해 피할 수 없는 것처럼 log file I/O에 대한 wait도 피할 수 없다. 이런 waits는 commit statement가 redo log에 write할 때마다 발생 되어지며, Commit을 발행한 session은 log file sync event를 기다리게 된다. LogWriter process가 I/O를 요구할 때 이는 log file parallel write event를 기다리게 된다.
위의 두 wait event는 피할 수 없고 non-idle wait times의 10%~ 20%로 설명되어지는 것이 좋다.
Log file parallel write의 평균 기다림의 시간은 매우 중요한 측정치이다. 이것은 얼마나 빠르게 log writer process가 redo log buffer를 flush시키는지를 지적할 수 있고, redo log device에 대한 효율성을 알아 볼 수 있는 매우 중요한 indicator이다. 1미만의 값 ( 100분의 1초)가 매우 좋으며 5이상의 값은 매우 unusual한것이며 이 위의 값은 redo log device에 대한 경합을 알아 볼 수 있게 해주는 것이다.

log file space/switch
위의 event는 redo log entry가 redo log buffer의 free space가 부족해서 혹은 redo log file에 switch가 일어나 redo log가 쓰여질 수 없어서 redo log entry를 만들 수 없을 때 발생하는 event이다. 이 event의 발생 빈도는 잘 tuning 되어진 database에 대해서는 무시 할 수 있다. Version 7.3 전에는 이들이 값이 결합된 형태인 "log file space/switch"로 표현되었으나 7.3부터는 다음과 같은 event로 정의 되어진다.

·log buffer space: redo log buffer에서 free space를 기다리는 wait event. 이는 LOG_BUFFER parameter의 크기를 증가하거나 logwriter의 성능을 최적화함으로써 줄일 수 있다.

·log file switch (checkpoint incomplete): next redo log가 사용되어질 수 없을 때 발생한다.이는 log file이 마지막으로 switched 되었을때 시작되어진 checkpoint가 아직 끝나지 않았을 때 발생하게 된다.

·log file switch (archiving needed): redo log가 아직 archived 되지 않아 사용할 수 없을 때 발생하는 wait event이다.

이들 waits들은 redo logs가 매우 느린 device에 있다거나, Log_buffer가 너무 낮은 값으로 setting되어졌거나 혹은 너무나 적은 혹은 너무나 작은 redo log를 가지고 있는데 기인함을 보여준다. Redo log를 multiple devices에 놓아 log writer그리고 archive log process사이의 경합을 줄이는 방안을 고려해 볼만하다.

Buffer busy waits
이 wait event는 session이 SGA안에서 필요로 하는 block이 다른 session에 의해 사용되어서 access할 수 없을 때 발생하게 된다. 가장 흔하게 발생하는 공통적인 두 가지 사항은 테이블에 대한 충분치 못한 free lists와 너무도 적은 rollback segments때문에 발생할 수 있다.

V$WAITSTAT.에 의해 위의 두 가지 경우에 대한 원인을 구별할 수 있다. 만약 V$WAITSTAT에 "data block" 혹은 "free list"에 대한 wait가 심하다면, 이것은 multiple freelists( 동시에 많은 inserts 가 일어나는 테이블에 대해 )가 필요로 하는 것일 수가 많다. 만약에 "undo header" or "undo block"가 심하다면 이는 추가적인 rollback segments가 필요로 하는 것일 수 가 많다.

Free buffer waits/write complete waits
위의 wait event는 session이 SGA에 있는 하나의 block에 insert나 modify하려고 시도하나 할 수 없을 때 발생되는 wait event이다.

"write complete waits" wait event는 어는 한 session이 수정하려는 block이 현재 database write process에 의해 disk에 쓰여지고 있을 때 발생되는 wait event이다.이 wait event는 특별히 checkpoint중에 발생하는 경우가 많다.

Free buffer waits는 어느 한 session이 disk상에 있는 datafile로부터 block을 buffer cache로 읽어오는데 있어 발생하는 wait event이다. 만약에 buffer cache에 clean ( unmodified )이 없을 경우 session은 database writer process가 dirty(modified) blocks을 disk에 쓸 동안 기다려야만 한다. 일반적으로 database writer는 계속하여 dirty blocks을 disk에 내려 쓴다. 이런 현상은 거의 발생하지 않는다.

만약 이들 wait events들이 total wait의 많은 부분을 차지한다면 다음과 같은 방법으로 database writer process의 성능을 향상 시킬 수 있다.

· Use asynchronous I/O or multiple database writers (DB_WRITERS parameter). Asynchronous I/O is enabled by default under NT. In UNIX you need to set ASYNC_IO=TRUE in your server parameter file and possibly enable asynchronous I/O at the operating system level. Use asynchonous or list I/O in preference to multiple database writers if it is available.
· Stripe your datafiles across multiple disk devices. The number of devices across which the datafiles are housed and the evenness of the spread determines the ultimate I/O limit of your system. Use operating system striping if possible (RAID-0) but if this is not available alternate datafiles across multiple disk devices.
· Avoid RAID-5. RAID-5 can be attractive because it spreads I/O across multiple devices and enables fault tolerance more economically than mirroring (RAID-1). However, RAID-5 can more than halve the write capability of your disks, since each local write will require at least two physical writes (additional I/Os are required to read and write the parity block). Many RAID-5 vendors provide large non-volatile disk caches in an attempt to avoid this write penalty. Free buffer or write complete waits may be a sign that these efforts are unsuccessful.
· Consider raw devices/partitions. The use of raw devices (unformatted disk devices without an overlying fileystem) is somewhat controversial and may not suit all applications. However, the database writer process will usually benefit from raw devices.

enqueue waits
Enqueue는 session이 lock을 잡고자 할 때 발생한다. 대부분의 case에서 session이 수정하고자 하는 row나 table에 대해 lock을 걸고자 할 때 발생 하게 된다. 어떤 환경 내에서는 관련된 lock이 oracle internal lock일 경우도 있다(eg. Space Transaction enqueue ). 일반적으로 excessive하게 enqueue waits를 야기하는 일반적인 경우는 다음과 같다.

· Database내에 특정 row에 대한 경합이 있을 경우.어떤 다량의 processes들이 동시에 데이터베이스 안에 같은 row에 대하여 lock혹은 update를 요구할 때..

· Table locks caused by unindexed foreign keys. If an un-indexed foreign key is updated then the parent table will be subjected to a table lock until the transaction is complete.

· "Old-style" temporary tablespaces. If the tablespace nominated as the temporary tablespace has not been created with the TEMPORARY clause (introduced in ORACLE 7.3) , then sessions may contend for the "space transaction" lock.

· The space reserved for transactions within a data block is too small. By default, only one transaction slot for tables or two for indexes is allocated when the table or index is created. The number of transaction slots is determined by the INITRANS clause in the CREATE TABLE or INDEX statement. If additional transaction slots are required they are created - providing there is free space in the block. However, if all transaction slots are in use and there is no free space in the block then a session wishing to lock a row in the block will encounter an enqueue wait, even if the row in question is not actually locked by another process. This phenomenon can occur if both PCTFREE and INITRANS were set too low.

latch free waits
Latch는 다량의 session이 SGS안에 있는 동일한 item에 대하여 동시에 update를 하지 못하게 하는 internal한 locking mechanism이다. 만약에 session이 다른session이 이미 가지고 있는 latch를 요구한다면 이때 이 session은 latch free wait가 발생할 것 이다.
아주 많은 latch free wait가 존재한다면 이는 SGA내에 bottleneck에 대한 indicator가 될 수 있다. Latch 경합에 대한 상세한 내용은 다음에 이야기로 하고 단순한 guide line을 이야기 하겠다.

· V$LATCH view를 이용하여 어떤 latch가 많은 sleep부분에 영향이 있는지를 알아본다. 각각의 sleep은 latch free wait로 이해되어 질 수 있다.

· cache buffer lru chain 그리고 cache buffer chain latches 의 경합은 만약 database가 높은 물리적인 I/O 혹은 논리적인 I/O로 유지 되어진다면 발생 할 수 있다. 이런 I/O 비율은 SQL을 tuning함으로써 혹은 buffer cache 의 크기를 증가시킴으로 줄일 수 있다. 또한 DB_BLOCK_LRU_LATCHES 혹은 DB_BLOCK_HASH_BUCKETS 의 값을 증가함으로 효과를 볼 수 있다.

· the library cache and library cache pin latches에 대한 경합은 heavy parsing 혹은 SQL 실행비율이 굉장히 높을 때 발생할 수 있다. the library cache latch에 대한 misses는 일반적으로 non-sharable SQL에 대한 초과적인(excessive) reparsing의 의미이다.SQL statement에서 literal한 방법보다 bind 변수를 사용하는 것이 좋다.

· the redo allocation latch에 대한 경합은 LOG_ENTRY_MAX_SIZE값을 줄임으로 경합을 줄일 수 있다. the redo copy latch에 대한 경합은 LOG_SIMULTANEOUS_COPIES의 값을 증가함으로 다루어 질 수 있다.

· 만약 latch contention에 직면 하고 spare CPU 용량이 있다면,SPIN_COUNT값을 증가해 보는 것도 고려해 볼만 하다. 만약 CPU 에 대한 자원이 다 사용되고 있을경우는 SPIN_COUNT의 값을 줄여 보는 것도 생각해 볼 수 있다.

SQL*Net waits
Oracle server는 SQL*NET작동이 마치기를 기다리는 상황을 기록하는데, 이런 wait event들은 SQL*NET으로 시작한다. 이들 event들은 client process가 바빠서 혹은 네트워크 부하 문제인지 검증하기 어렵다. 특별히 "SQL*Net message from client" event는 server process가 client process로부터 다른 조치를 기다리는 동안 idle한 상태임을 말한다. 그런 이유로 이 event는 무시하여도 좋다.만약 다른 SQL*NET 기다림은 많은 wait 통계치에 많은 부분을 차지한다면 network쪽을 검증해 볼 필요가 있다.

Events which are safe to ignore
· Null event
· SQL*NET message from client
· SQL*NET more data from client
· Parallel query dequeue wait
· client message
· smon timer
· rdbms ipc message
· pmon timer
· WMON goes to sleep
· virtual circuit status
· dispatcher timer
· pipe get

Where to from here?
가끔은 wait event에 대한 발생기간 혹은 빈도보다 더 많은 것을 알고자 할 때가 있다. 가령 buffer busy wait가 불충분한 freelists로 인해 발생되었다면, 어떤 table혹은 어떤 index와 관련이 되어 있는지 아는 것이 많은 도움을 줄 수 있다. 이를 할 수 있는 여러 가지 방법이 있다.
V$SESSION_WAITS는 어떤 session이 현재 기다리고 있는지를 그리고 p1,p2,p3가 이 wait에 대해 보다 자세한 정보를 가지고 있다.가령 "db fiel sequential read" 에서 p1은 읽고 있는 file number이고 p2는 block number이다. 고로 DBA_EXTENTS view를 이용하여 관련된 segment를 알아 낼 수 있다.

그러나 V$SESSIOM_WAITS는 queried되어진 순간의 정보이기 때문에 알아내고자 하는 event에 대한 정보를 얻기는 그리 쉽지 않다. 다음과 같이 trace를 떠서 사용 한다.
ALTER SESSION SET EVENTS '10046 TRACE NAME CONTEXT FOREVER, LEVEL n';
이 명령을 사용하고, wait event 정보는 sql trace file에 다음과 같이 기록되어 진다.
WAIT #2: nam='db file sequential read' ela= 0 p1=1 p2=1135 p3=1

Comparing wait times with CPU utilisation
Sessions이 어떤 event에 대한 기다림이 없을 때, 이들 session은 CPU자원에 대해 사용하거나,기다리고 있다. 이런 이유로 CPU사용한 시간과 다양한 event에 대한 기다림의 시간과 비교하는 것이 좋을 수 있다.
전체적인 oracle instance에 대한 CPU utilization은 V$SYSSTAT를 조회함으로 알 수 있다. 가령 "CPU used by this session" or "statistic#=12"를 이용하자.

'IT > Oracle' 카테고리의 다른 글

다른 Tablespace 로 Import 하기 (0)	2009.02.26
Oracle Backup (0)	2009.02.20
Oracle 성능분석방법론 (0)	2009.02.02
Data pump (0)	2008.12.24
한글 캐릭터셋 비교 (0)	2008.12.24

전산쟁이

Wait event 에 대한 간단한 Tuning 방법

'IT > Oracle' 카테고리의 다른 글

+ Recent posts

티스토리툴바