skywalking
分布式系统的应用程序性能监视工具,专为微服务、云本机架构和基于容器(Docker、K8s、Mesos)架构而设计
背景
随着微服务架构的流行,一些微服务架构下的问题也会越来越突出,比如一个请求会涉及多个服务,而服务本身可能也会依赖其他服务,整个请求路径就构成了一个网状的调用链,而在整个调用链中一旦某个节点发生异常,整个调用链的稳定性就会受到影响,所以会深深的感受到 “银弹” 这个词是不存在的,每种架构都有其优缺点 。
面对以上情况, 我们就需要一些可以帮助理解系统行为、用于分析性能问题的工具,以便发生故障的时候,能够快速定位和解决问题,这时候 APM(应用性能管理)工具就该闪亮登场了。
安装
JDK 1.8
wget https://mirrors.huaweicloud.com/java/jdk/8u151-b12/jdk-8u151-linux-x64.tar.gz
tar zxvf jdk-8u151-linux-x64.tar.gz -C /usr/local
mv /usr/local/jdk1.8.0_151/ /usr/local/jdk
cat <<'EOF'> /etc/profile.d/java_home.sh
export JAVA_HOME=/usr/local/jdk
export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
export PATH=$JAVA_HOME/bin:$PATH
EOF
source /etc/profile
ElasticSearch 要求 6.3+
wget https://mirrors.tuna.tsinghua.edu.cn/elasticstack/6.x/yum/6.7.0/elasticsearch-6.7.0.rpm
rpm -ivh elasticsearch-6.7.0.rpm
修改配置文件
[root@elk ~]# egrep -v "^#|^$" /etc/elasticsearch/elasticsearch.yml
cluster.name: my-application
node.name: node-1
path.data: /home/work/elk
path.logs: /home/work/elk/logs
network.host: 0.0.0.0
http.port: 9200
启动服务
systemctl daemon-reload
systemctl enable elasticsearch.service
systemctl start elasticsearch.service
systemctl status elasticsearch.service
curl localhost:9200
skywalking 6.1 服务端
wget http://mirrors.tuna.tsinghua.edu.cn/apache/skywalking/6.1.0/apache-skywalking-apm-6.1.0.tar.gz
tar zxvf apache-skywalking-apm-6.1.0.tar.gz -C /usr/local
目录结构
[root@elk apache-skywalking-apm-bin]# ls -l
total 80
drwxrwxr-x 7 1001 1002 118 Apr 30 17:00 agent
drwxr-xr-x 2 root root 241 May 9 14:59 bin
drwxr-xr-x 2 root root 175 May 17 11:30 config
-rw-rw-r-- 1 1001 1002 27549 Apr 30 16:48 LICENSE
drwxrwxr-x 3 1001 1002 4096 May 9 14:59 licenses
drwxr-xr-x 2 root root 98 May 9 15:09 logs
drwxr-xr-x 2 root root 78 May 9 15:55 mesh-buffer
-rw-rw-r-- 1 1001 1002 29638 Apr 30 16:48 NOTICE
drwxrwxr-x 2 1001 1002 8192 Apr 30 17:07 oap-libs
-rw-rw-r-- 1 1001 1002 1978 Apr 30 16:48 README.txt
drwxr-xr-x 3 root root 88 May 9 15:55 trace-buffer
drwxr-xr-x 2 root root 53 May 9 14:59 webapp
配置文件
[root@elk config]# egrep -v "^#|^$" config/application.yml
cluster:
standalone:
# Please check your ZooKeeper is 3.5+, However, it is also compatible with ZooKeeper 3.4.x. Replace the ZooKeeper 3.5+
# library the oap-libs folder with your ZooKeeper 3.4.x library.
core:
default:
# Mixed: Receive agent data, Level 1 aggregate, Level 2 aggregate
# Receiver: Receive agent data, Level 1 aggregate
# Aggregator: Level 2 aggregate
role: ${SW_CORE_ROLE:Mixed} # Mixed/Receiver/Aggregator
restHost: ${SW_CORE_REST_HOST:0.0.0.0}
restPort: ${SW_CORE_REST_PORT:12800}
restContextPath: ${SW_CORE_REST_CONTEXT_PATH:/}
gRPCHost: ${SW_CORE_GRPC_HOST:0.0.0.0}
gRPCPort: ${SW_CORE_GRPC_PORT:13800}
downsampling:
- Hour
- Day
- Month
# Set a timeout on metric data. After the timeout has expired, the metric data will automatically be deleted.
recordDataTTL: ${SW_CORE_RECORD_DATA_TTL:90} # Unit is minute
minuteMetricsDataTTL: ${SW_CORE_MINUTE_METRIC_DATA_TTL:90} # Unit is minute
hourMetricsDataTTL: ${SW_CORE_HOUR_METRIC_DATA_TTL:36} # Unit is hour
dayMetricsDataTTL: ${SW_CORE_DAY_METRIC_DATA_TTL:45} # Unit is day
monthMetricsDataTTL: ${SW_CORE_MONTH_METRIC_DATA_TTL:18} # Unit is month
storage:
elasticsearch:
nameSpace: ${SW_NAMESPACE:""}
#设置elk的ip
clusterNodes: ${SW_STORAGE_ES_CLUSTER_NODES:172.16.103.64:9200}
indexShardsNumber: ${SW_STORAGE_ES_INDEX_SHARDS_NUMBER:2}
indexReplicasNumber: ${SW_STORAGE_ES_INDEX_REPLICAS_NUMBER:0}
receiver-sharing-server:
default:
receiver-register:
default:
receiver-trace:
default:
bufferPath: ${SW_RECEIVER_BUFFER_PATH:../trace-buffer/} # Path to trace buffer files, suggest to use absolute path
bufferOffsetMaxFileSize: ${SW_RECEIVER_BUFFER_OFFSET_MAX_FILE_SIZE:100} # Unit is MB
bufferDataMaxFileSize: ${SW_RECEIVER_BUFFER_DATA_MAX_FILE_SIZE:500} # Unit is MB
bufferFileCleanWhenRestart: ${SW_RECEIVER_BUFFER_FILE_CLEAN_WHEN_RESTART:false}
sampleRate: ${SW_TRACE_SAMPLE_RATE:10000} # The sample rate precision is 1/10000. 10000 means 100% sample in default.
slowDBAccessThreshold: ${SW_SLOW_DB_THRESHOLD:default:200,mongodb:100} # The slow database access thresholds. Unit ms.
receiver-jvm:
default:
receiver-clr:
default:
service-mesh:
default:
bufferPath: ${SW_SERVICE_MESH_BUFFER_PATH:../mesh-buffer/} # Path to trace buffer files, suggest to use absolute path
bufferOffsetMaxFileSize: ${SW_SERVICE_MESH_OFFSET_MAX_FILE_SIZE:100} # Unit is MB
bufferDataMaxFileSize: ${SW_SERVICE_MESH_BUFFER_DATA_MAX_FILE_SIZE:500} # Unit is MB
bufferFileCleanWhenRestart: ${SW_SERVICE_MESH_BUFFER_FILE_CLEAN_WHEN_RESTART:false}
istio-telemetry:
default:
envoy-metric:
default:
query:
graphql:
path: ${SW_QUERY_GRAPHQL_PATH:/graphql}
alarm:
default:
telemetry:
none:
启动服务端
bin/startup.sh
浏览器访问ip:8080 端口 ,用户密码 admin/admin
skywalking 6.1 客户端
[root@elk apache-skywalking-apm-bin]# egrep -v "^#|^$" agent/config/agent.config
agent.service_name=${SW_AGENT_NAME:Micros}
collector.backend_service=${SW_AGENT_COLLECTOR_BACKEND_SERVICES:172.16.103.64:13800}
logging.level=${SW_LOGGING_LEVEL:DEBUG}
传送到要监控的机器上
- Java
https://github.com/apache/skywalking/blob/master/docs/en/setup/service-agent/java-agent/README.md
- nodejs