“五一”连续两天站点出现报错页面,发生的时间也比较有规律,查看了tomcat的日志,发现是数据库的链接错误,于是拨打oracle的DBA手机,关机,呵呵!没办法只能自己上马,查看alert日志,发现如下错误:
Process m000 died, see its trace file
Fri May 2 10:06:10 2008
ksvcreate: Process(m000) creation failed
Fri May 2 10:07:11 2008
Process m000 died, see its trace file
Fri May 2 10:07:11 2008
ksvcreate: Process(m000) creation failed
Fri May 2 10:08:12 2008
Process m000 died, see its trace file
Fri May 2 10:08:12 2008
ksvcreate: Process(m000) creation failed
Fri May 2 10:09:15 2008
Process m000 died, see its trace file
Fri May 2 10:09:15 2008
ksvcreate: Process(m000) creation failed
..............
似乎是process的问题,由于后台有4个tomcat节点,于是停掉了其中一台,站点回复正常,故障也不再出现,难道真的由于process的问题,导致了无法创建更多的process,这似乎不太可能,因为之前的状态一直是正常的。
于是Google了一下,有的说是process大小问题,有的说是10g的一个bug。
系统的process一直是150,没有变过,之前一直正常,没有修改过,至少据我了解的是这样,(不清楚DBA有没有修改过什么参数,这个还得去进一步了解。)
至于10g的bug似乎很多人都这样认同,metalink上似乎也是这么解释的:
Applies to:
Oracle Server - Enterprise Edition - Version: 10.2 to 10.2This problem can occur on any platform.
Symptoms Switching a Physical Standby Database multiple to READ ONLY Mode will report the following Errors in the ALERT.LOG:
ksvcreate: Process(m000) creation failed
Changes Switch Physical Standby from READ ONLY to apply and back to READ ONLY.
Cause The Cause of this Problem has been identified in Bug 5583049.
Solution There are two Workarounds available:
Restart the Instance..
or
Disable ADDM - Should be re-enabled if Standby takes up the Primary Role* Set SGA_TARGET=0 and set shared_pool_size, db_cache_size, etc if using Automatic SGA Memory Management (ASMM)
* Set STATISTICS_LEVEL=BASIC to disable statistics gathering
据 http://zhang41082.itpub.net/post/7167/456170的作者称重启了实例问题就解决了!
于是尝试了操作,发现重启前后有些差异:
重启前:
SQL> show parameter processes;
NAME TYPE VALUE
------------------------------------ ----------- ------------
aq_tm_processes integer 0
db_writer_processes integer 1
gcs_server_processes integer 2
job_queue_processes integer 10
log_archive_max_processes integer 2
processes integer 150
重启后:
SQL> show parameter processes;
NAME TYPE VALUE
------------------------------------ ----------- ------------
_trace_flush_processes string ALL
_trace_processes string ALL
aq_tm_processes integer 0
db_writer_processes integer 1
gcs_server_processes integer 2
job_queue_processes integer 10
log_archive_max_processes integer 2
processes integer 150
不知道是不是_trace_flush_processes string 和 _trace_processes string的问题(Process m000 died, see its trace file)
重启后不知道效果如何,只能观察一段时间!
Subscribe to:
Post Comments (Atom)

No comments:
Post a Comment