Monday, September 22, 2014

Numerous pbs_server errors in /var/log/messages

The problem:

On supercomputer's management node we receive numerous errors such as:
pbs_server: LOG_ERROR::is_request, bad attempt to connect from 10.10.0.254:1023 
(address not trusted - check entry in server_priv/nodes)
And after them nearly every minute follows this one:
last message repeated 16 times
where repeat's count vary from time to time.
Mentioned address 10.10.0.254 is one of management node's addresses. Port 1023 according to "netstat -pa | grep 1023" is related to pbs_mom.
It turns out that management node several times per minute tries to connect with itself and can't do it. Advice from error text doesn't help much, management node should not be in "nodes" file as far as I understand.
Could anybody suggest how to solve this problem?

 Answer:
Your management node is not defined as a node in pbs. Open up qmgr and run "create node [hostname without brackets]". The other options is to kill pbs_mom since you probably don't want to run compute jobs on your head node.

#service pbs_mom stop


No comments:

Post a Comment