系统环境
opscenter 5.2
centOS 6.6
cassandra 2.0.x
问题
opscenter上的dashboard监控cassandra集群一段时间(大约1天)后总会停止显示。
然而在cassandra节点上发现datastax-agent进程还是好好的在运行着。
之后查看datastax agent的LOG日志发现
WARN [Thread-10] .... operations dropped so far.
WARN [Thread-10] .... Cassandra operation queue is full, discarding cassandra operation
Error when proccessing cassandra callcom.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) tried for query failed (tried: /192.168.47.222:9042 (com.datastax.driver.core.TransportException: [/192.168.47.222:9042] Connection has been closed))
ERROR [Reconnection-0] 2015-08-05 16:06:39,841 Unknown error during reconnection to /192.168.47.222:9042, scheduling retry in 8000 milliseconds
初步认定是cassandra request过多导致
解决方案
在/var/lib/datastax-agent/conf/address.yaml
中添加参数
stomp_interface: opscenterIP
use_ssl: 0
async_pool_size: 200
thrift_max_cons: 200
async_queue_size: 20000
hosts: 集群ip,格式为["host1","host2"]
local_interface: localhost
cassandra_conf: /xxx/apache-cassandra-2.0.15/conf/cassandra.yaml
在$CASSANDRA_HOME/conf/clusters/cluster_name.conf
中修改
[stomp]
batch_size = 10000
push_interval = 10
一些参数
#address.yaml参数
thrift_max_conns - the max number of concurrent connections to make to the local node
asysnc_pool_size - the size of the threadpool pulling from a queue of inserts and inserting in to cassandra
async_queue_size - the size of the queue of inserts to send to cassandra, if the queue fills up additional operations will be dropped
#stomp参数
batch_size - The number of request updates OpsCenter will push out at once. The default value is 100. This is used to avoid overloading the browser.
push_interval - How often OpsCenter will push out updates to requests. The default value is 3 seconds. This is used to avoid overloading the browser