Oracle11g数据库异常-Redhat6主机系统重启问题

数据库 (50) 2024-02-23 12:12

Hi,大家好,我是编程小6,很荣幸遇见你,我把这些年在开发过程中遇到的问题或想法写出来,今天说一说Oracle11g数据库异常-Redhat6主机系统重启问题,希望能够帮助你!!!。

一、组网图

Oracle11g数据库异常-Redhat6主机系统重启问题_https://bianchenghao6.com/blog_数据库_第1张

二、问题描述

Redhat6主机Oracle 11g数据库多次异常重启问题。

三、过程分析

1.数据库主机异常重启前均有FC链路及multipath多路径状态异常。

... ...
Jan  1 00:20:52 zs2 kernel: rport-0:0-6: blocked FC remote port time out: removing rport
Jan  1 00:20:52 zs2 kernel: rport-2:0-6: blocked FC remote port time out: removing rport
Jan  1 00:20:52 zs2 kernel: EXT4-fs (sdak1): mounted filesystem with ordered data mode. Opts: 
Jan  1 00:20:52 zs2 kernel: EXT4-fs (dm-20): mounted filesystem with ordered data mode. Opts: 
... ... 
Jan  1 00:21:50 zs2 multipathd: asm!.asm_ctl_vbg6: add path (uevent)
Jan  1 00:21:50 zs2 multipathd: asm!.asm_ctl_vbg6: failed to get path uid
Jan  1 00:21:50 zs2 multipathd: uevent trigger error
Jan  1 00:21:50 zs2 kernel: [Oracle ACFS] FCB hash size 2000000
Jan  1 00:21:50 zs2 kernel: [Oracle ACFS] buffer cache size 36184MB (5514771 buckets)
Jan  1 00:21:50 zs2 kernel: [Oracle ACFS] DLM hash size 2000000
Jan  1 00:21:50 zs2 kernel: ACFSK-0037: Module load succeeded. Build information: (LOW DEBUG) USM_11.2.0.4.0 ...
Jan  1 00:21:50 zs2 multipathd: ofsctl: add path (uevent)
Jan  1 00:21:50 zs2 multipathd: ofsctl: failed to get path uid
Jan  1 00:21:50 zs2 multipathd: uevent trigger error
Jan  1 00:21:50 zs2 kernel: OKSK-00010: Persistent OKS log opened at /u01/app/11.2.0/grid/log/zs2/acfs/kernel/acfs.log.0.
Jan  1 00:37:02 zs2 kernel: INFO: task multipathd:10201 blocked for more than 120 seconds.
Jan  1 00:37:02 zs2 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jan  1 00:37:02 zs2 kernel: multipathd    D 0000000000000013     0 10201      1 0x00000000
Jan  1 00:37:02 zs2 kernel: ffff88804f79b968 0000000000000086 0000000000000000 ffffffff811666bc
Jan  1 00:37:02 zs2 kernel: ffff8800375103f0 ffff88204e22ae68 ffff88804f79b8e8 ffffffff8107c93b
Jan  1 00:37:02 zs2 kernel: ffff88804ab80638 ffff88804f79bfd8 000000000000fb88 ffff88804ab80638
Jan  1 00:37:02 zs2 kernel: Call Trace:
Jan  1 00:37:02 zs2 kernel: [<ffffffff811666bc>] ? transfer_objects+0x5c/0x80
Jan  1 00:37:02 zs2 kernel: [<ffffffff8107c93b>] ? round_jiffies_up+0x1b/0x20
... ... 
Jan  1 00:37:02 zs2 kernel: [<ffffffff811955f1>] sys_ioctl+0x81/0xa0
Jan  1 00:37:02 zs2 kernel: [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
Jan  1 00:38:49 zs2 multipathd: sdbh: couln't get asymmetric access state
Jan  1 00:38:49 zs2 multipathd: mpath05: load table [0 4294967296 multipath 1 queue_if_no_path 0 3 2 round-robin ...]
Jan  1 00:39:29 zs2 multipathd: sdao: couln't get asymmetric access state
Jan  1 00:41:40 zs2 multipathd: mpath06: load table [0 4294967296 multipath 1 queue_if_no_path 0 3 2 round-robin ...]
Jan  1 00:41:40 zs2 multipathd: sdau: couldn't get target port group
Jan  1 00:41:40 zs2 multipathd: mpath12: load table [0 4294967296 multipath 1 queue_if_no_path 0 3 2 round-robin ...]
Jan  1 00:41:40 zs2 multipathd: mpath05: load table [0 4294967296 multipath 1 queue_if_no_path 0 2 1 round-robin ...]
Jan  1 00:41:40 zs2 multipathd: sdbm: couldn't get target port group
Jan  1 00:41:41 zs2 multipathd: mpath12: load table [0 4294967296 multipath 1 queue_if_no_path 0 3 2 round-robin ...]
Jan  1 00:45:02 zs2 kernel: INFO: task multipathd:10201 blocked for more than 120 seconds.
Jan  1 00:45:02 zs2 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jan  1 00:45:02 zs2 kernel: multipathd    D 0000000000000004     0 10201      1 0x00000000
Jan  1 00:45:02 zs2 kernel: ffff88804f79b968 0000000000000086 0000000000000000 ffff88000001bfc0
Jan  1 00:45:02 zs2 kernel: ffff8800375a7018 ffff884051116038 ffff88804f79b8e8 ffffffff8107c93b
Jan  1 00:45:02 zs2 kernel: ffff88804ab80638 ffff88804f79bfd8 000000000000fb88 ffff88804ab80638
Jan  1 00:45:02 zs2 kernel: Call Trace:
Jan  1 00:45:02 zs2 kernel: [<ffffffff8107c93b>] ? round_jiffies_up+0x1b/0x20
... ...
Jan  1 00:54:15 zs2 kernel: imklog 5.8.10, log source = /proc/kmsg started.
Jan  1 00:54:15 zs2 rsyslogd: [origin software="rsyslogd" swVersion="5.8.10" x-pid="10768" x-info="http://www.rsyslog.com"] start
Jan  1 00:54:15 zs2 kernel: Initializing cgroup subsys cpuset
Jan  1 00:54:15 zs2 kernel: Initializing cgroup subsys cpu
Jan  1 00:54:15 zs2 kernel: Linux version 2.6.32-358.el6.x86_64 (mockbuild@x86-022.build.eng.bos.redhat.com) ... ...
Jan  1 00:54:15 zs2 kernel: Command line: ro root=/dev/mapper/VolGroup-LogVol02 rd_NO_LUKS LANG=en_US.UTF-8 ... ...
Jan  1 00:54:15 zs2 kernel: KERNEL supported cpus:
Jan  1 00:54:15 zs2 kernel:  Intel GenuineIntel
Jan  1 00:54:15 zs2 kernel:  AMD AuthenticAMD
Jan  1 00:54:15 zs2 kernel:  Centaur CentaurHauls
Jan  1 00:54:15 zs2 kernel: BIOS-provided physical RAM map:
Jan  1 00:54:15 zs2 kernel: BIOS-e820: 0000000000000000 - 000000000009c000 (usable)
... ...

2.FC-SW端显示zs2主机FC链路(14号端口)有大量enc_out报错异常。

Index Port Address Media Speed       State   Proto                                                                                                                                                                                                                            
==================================================                                   
  14  14   010e00   id    N8       Online      FC  F-Port  10:00:00:10:9b:1a:ad:d8 # zs2_port1
... ...   
  20  20   011400   id    N8       Online      FC  F-Port  50:0b:34:20:0f:5b:e0:02                                                                                                                                                                                            
  21  21   011500   id    N8       Online      FC  F-Port  50:0b:34:20:0f:5b:e6:02
  
SNS2124:admin> porterrshow                                                                                                                                                                                                                                                    
          frames      enc    crc    crc    too    too    bad    enc   disc   link   loss   loss   frjt   fbsy    c3timeout    pcs                                                                                                                                             
       tx     rx      in    err    g_eof  shrt   long   eof     out   c3    fail    sync   sig                   tx    rx     err                                                                                                                                             
... ...                                                                                                                                       
 14:    1.8g   4.2g   0      0      0      0      0      0      5.9m   0      0      0     28      0      0      0      0      0                                                                                                                                                                                                                                                                                          
... ...                                                                                                                                       
 20:    2.3g   1.5g   0      0      0      0      0      0      0      0      0      0      0      0      0      0      0      0
 21:    4.1g 712.3m   0      0      0      0      0      0      0      0      0      0      0      0      0      0      0      0   

3.RedHat Bugzilla官网确认multipathd报错异常为系统bug。

Oracle11g数据库异常-Redhat6主机系统重启问题_https://bianchenghao6.com/blog_数据库_第2张

Oracle11g数据库异常-Redhat6主机系统重启问题_https://bianchenghao6.com/blog_数据库_第3张

四、问题定位

zs2主机FC1端口光纤质量问题导致主机端检测到链路超时,触发主机系统multipathbug,导致multipathFC多路径识别异常超时及RedHat读写磁盘超时,最终导致数据库多次异常重启。

五、解决方法

建议根据RedHat官方指导意见进行升级multipath至75版本。

六、风险提示

Multipath版本升级建议由客户或客户协调Red Hat工程师执行。

今天的分享到此就结束了,感谢您的阅读,如果确实帮到您,您可以动动手指转发给其他人。

发表回复