marcelo.duarte
(usa Outra)
Enviado em 30/05/2012 - 21:29h
Boa noite pessoal,
Sou novo aqui, começei a pouco tempo a mexer com linux, então antes de mais nada desculpe se não é aqui a parte em que discute sobre heartbeat.
Estou com o seguinte problema: estou configurando um cluster de alta disponibilidade, com apache e heartbeat. Quando o servidor primário para de funcionar, o secundário automaticamente assume. Até ai normal, o problema é que quando inicio novamente o primário ele não volta os serviços para o primário, o serviço continua ativo no secundário, sendo que configurei a opção de auto_failback on. Tentei encontrar caso parecido na internet, e inclusive aqui antes de postar, porém não consegui encontrar especificadamente nada sobre o este problema.
Segue abaixo os arquivos de configuração e de log:
ha.cf:
# Intervalo em segundos entre os pings
keepalive 1
#Intervalo em segundos para declarar uma maquina inativa
deadtime 4
#Tempo para notificar no log
warntime 10
#Permitir inicialização em todos os nodes
initdead 60
#Interface que fara broadcast de verificação
bcast eth0
#Configuração de nodes
node cluster01.labredes.com.br
node cluster02.labredes.com.br
#Especifica o uso de um gerenciador de recursos externo
crm off
#Porta de comunicação
udpport 694
#Volta do node primario caso haja falha e depois volte a responder
auto_failback on
nice_failback off
#Caminho do arquivo de debug
debugfile /var/log/ha-debug.log
#Caminho do arquivo de log
logfile /var/log/ha-log.log
haresources:
cluster01.labredes.com.br IPaddr::192.168.0.3/24/eth0:1 apache2
arquivo de log:
harc[6174]: 2012/05/30_20:37:27 info: Running /etc/ha.d//rc.d/ip-request-resp ip-request-resp
ip-request-resp[6174]: 2012/05/30_20:37:28 received ip-request-resp IPaddr::192.168.0.3/24/eth0:1 OK yes
ResourceManager[6193]: 2012/05/30_20:37:28 info: Acquiring resource group: cluster01.labredes.com.br IPaddr::192.168.0.3/24/eth0:1 apache2
IPaddr[6219]: 2012/05/30_20:37:28 INFO: Resource is stopped
ResourceManager[6193]: 2012/05/30_20:37:28 info: Running /etc/ha.d/resource.d/IPaddr 192.168.0.3/24/eth0:1 start
IPaddr[6297]: 2012/05/30_20:37:28 INFO: Using calculated netmask for 192.168.0.3: 255.255.255.0
IPaddr[6297]: 2012/05/30_20:37:28 INFO: eval ifconfig eth0:0 192.168.0.3 netmask 255.255.255.0 broadcast 192.168.0.255
IPaddr[6273]: 2012/05/30_20:37:28 INFO: Success
ResourceManager[6193]: 2012/05/30_20:37:28 info: Running /etc/init.d/apache2 start
May 30 20:37:38 cluster01.labredes.com.br heartbeat: [5970]: info: Local Resource acquisition completed. (none)
May 30 20:37:38 cluster01.labredes.com.br heartbeat: [5970]: info: local resource transition completed.
May 30 20:48:01 cluster01.labredes.com.br heartbeat: [5970]: info: Heartbeat shutdown in progress. (5970)
May 30 20:48:01 cluster01.labredes.com.br heartbeat: [7122]: info: Giving up all HA resources.
ResourceManager[7136]: 2012/05/30_20:48:01 info: Releasing resource group: cluster01.labredes.com.br IPaddr::192.168.0.3/24/eth0:1 apache2
ResourceManager[7136]: 2012/05/30_20:48:01 info: Running /etc/init.d/apache2 stop
ResourceManager[7136]: 2012/05/30_20:48:02 info: Running /etc/ha.d/resource.d/IPaddr 192.168.0.3/24/eth0:1 stop
IPaddr[7200]: 2012/05/30_20:48:02 INFO: Success
May 30 20:48:02 cluster01.labredes.com.br heartbeat: [7122]: info: All HA resources relinquished.
May 30 20:48:04 cluster01.labredes.com.br heartbeat: [5970]: info: killing HBFIFO process 5973 with signal 15
May 30 20:48:04 cluster01.labredes.com.br heartbeat: [5970]: info: killing HBWRITE process 5974 with signal 15
May 30 20:48:04 cluster01.labredes.com.br heartbeat: [5970]: info: killing HBREAD process 5975 with signal 15
May 30 20:48:04 cluster01.labredes.com.br heartbeat: [5970]: info: Core process 5973 exited. 3 remaining
May 30 20:48:04 cluster01.labredes.com.br heartbeat: [5970]: info: Core process 5975 exited. 2 remaining
May 30 20:48:04 cluster01.labredes.com.br heartbeat: [5970]: info: Core process 5974 exited. 1 remaining
May 30 20:48:04 cluster01.labredes.com.br heartbeat: [5970]: info: cluster01.labredes.com.br Heartbeat shutdown complete.
May 30 20:48:19 cluster01.labredes.com.br heartbeat: [7321]: ERROR: Heartbeat not started: configuration error.
May 30 20:48:19 cluster01.labredes.com.br heartbeat: [7321]: ERROR: Configuration error, heartbeat not started.
May 30 20:51:40 cluster01.labredes.com.br heartbeat: [7421]: WARN: Core dumps could be lost if multiple dumps occur.
May 30 20:51:40 cluster01.labredes.com.br heartbeat: [7421]: WARN: Consider setting non-default value in /proc/sys/kernel/core_pattern (or equivalent) for maximum supportability
May 30 20:51:40 cluster01.labredes.com.br heartbeat: [7421]: WARN: Consider setting /proc/sys/kernel/core_uses_pid (or equivalent) to 1 for maximum supportability
May 30 20:51:40 cluster01.labredes.com.br heartbeat: [7421]: WARN: Logging daemon is disabled --enabling logging daemon is recommended
May 30 20:51:40 cluster01.labredes.com.br heartbeat: [7421]: info: **************************
May 30 20:51:40 cluster01.labredes.com.br heartbeat: [7421]: info: Configuration validated. Starting heartbeat 3.0.2
May 30 20:51:40 cluster01.labredes.com.br heartbeat: [7422]: info: heartbeat: version 3.0.2
May 30 20:51:40 cluster01.labredes.com.br heartbeat: [7422]: info: Heartbeat generation: 1337730318
May 30 20:51:40 cluster01.labredes.com.br heartbeat: [7422]: info: glib: UDP Broadcast heartbeat started on port 694 (694) interface eth0
May 30 20:51:40 cluster01.labredes.com.br heartbeat: [7422]: info: glib: UDP Broadcast heartbeat closed on port 694 interface eth0 - Status: 1
May 30 20:51:40 cluster01.labredes.com.br heartbeat: [7422]: info: G_main_add_TriggerHandler: Added signal manual handler
May 30 20:51:40 cluster01.labredes.com.br heartbeat: [7422]: info: G_main_add_TriggerHandler: Added signal manual handler
May 30 20:51:40 cluster01.labredes.com.br heartbeat: [7422]: info: G_main_add_SignalHandler: Added signal handler for signal 17
May 30 20:51:40 cluster01.labredes.com.br heartbeat: [7422]: info: Local status now set to: 'up'
May 30 20:51:40 cluster01.labredes.com.br heartbeat: [7422]: info: Link cluster01.labredes.com.br:eth0 up.
May 30 20:52:41 cluster01.labredes.com.br heartbeat: [7422]: WARN: node cluster02.labredes.com.br: is dead
May 30 20:52:41 cluster01.labredes.com.br heartbeat: [7422]: info: Comm_now_up(): updating status to active
May 30 20:52:41 cluster01.labredes.com.br heartbeat: [7422]: info: Local status now set to: 'active'
May 30 20:52:41 cluster01.labredes.com.br heartbeat: [7422]: WARN: No STONITH device configured.
May 30 20:52:41 cluster01.labredes.com.br heartbeat: [7422]: WARN: Shared disks are not protected.
May 30 20:52:41 cluster01.labredes.com.br heartbeat: [7422]: info: Resources being acquired from cluster02.labredes.com.br.
harc[7429]: 2012/05/30_20:52:41 info: Running /etc/ha.d//rc.d/status status
mach_down[7472]: 2012/05/30_20:52:41 info: /usr/share/heartbeat/mach_down: nice_failback: foreign resources acquired
mach_down[7472]: 2012/05/30_20:52:41 info: mach_down takeover complete for node cluster02.labredes.com.br.
May 30 20:52:41 cluster01.labredes.com.br heartbeat: [7422]: info: Initial resource acquisition complete (T_RESOURCES(us))
May 30 20:52:41 cluster01.labredes.com.br heartbeat: [7422]: info: mach_down takeover complete.
IPaddr[7480]: 2012/05/30_20:52:41 INFO: Resource is stopped
May 30 20:52:41 cluster01.labredes.com.br heartbeat: [7430]: info: Local Resource acquisition completed.
harc[7549]: 2012/05/30_20:52:41 info: Running /etc/ha.d//rc.d/ip-request-resp ip-request-resp
ip-request-resp[7549]: 2012/05/30_20:52:41 received ip-request-resp IPaddr::192.168.0.3/24/eth0:1 OK yes
ResourceManager[7568]: 2012/05/30_20:52:41 info: Acquiring resource group: cluster01.labredes.com.br IPaddr::192.168.0.3/24/eth0:1 apache2
IPaddr[7594]: 2012/05/30_20:52:41 INFO: Resource is stopped
ResourceManager[7568]: 2012/05/30_20:52:41 info: Running /etc/ha.d/resource.d/IPaddr 192.168.0.3/24/eth0:1 start
IPaddr[7672]: 2012/05/30_20:52:41 INFO: Using calculated netmask for 192.168.0.3: 255.255.255.0
IPaddr[7672]: 2012/05/30_20:52:41 INFO: eval ifconfig eth0:0 192.168.0.3 netmask 255.255.255.0 broadcast 192.168.0.255
IPaddr[7648]: 2012/05/30_20:52:41 INFO: Success
ResourceManager[7568]: 2012/05/30_20:52:41 info: Running /etc/init.d/apache2 start
May 30 20:52:51 cluster01.labredes.com.br heartbeat: [7422]: info: Local Resource acquisition completed. (none)
May 30 20:52:51 cluster01.labredes.com.br heartbeat: [7422]: info: local resource transition completed.
Fico no aguardo de ajuda de vocês e obrigado.