1. 调用链简介
在分布式架构、微服务以及 k8s 生态相关技术环境下,对应用的请求链路进行追踪(也叫做 APM,Application Performance Management)是非常有必要的,链路追踪简单来说就是将应用从流量到达前端开始,一直到最后端的数据库核心,中间经过的每一层请求链路的完整行为都记录下来,而且通过可视化的形式实现链路信息查询、依赖关系、性能分析、拓扑展示等等,利用链路追踪系统可以很好的帮我们定位问题,这是常规监控手段实现起来比较困难的。
常见商业版本:
- 听云
- 博睿宏远
常见开源版本:
- Skywalking:中国,个人开源,目前隶属于 Apache 基金会,作者近期刚刚入选 Apache 首位中国董事
- Pinpoint:韩国,个人开源
- Zipkin:美国,Twitter 公司开源
- Cat:中国,美团开源
2. 环境
- K8S v1.22.5 集群
主机 | IP |
---|---|
master | 192.168.10.100 |
node01 | 192.168.10.101 |
node02 | 192.168.10.102 |
- Elasticsearch v7.12.0
- Skywalking
skywalking-oap-server:后端服务
skywalking-ui:ui 前端
skywalking-es-init:初始化 es 集群数据使用
elasticsearch:存储 skywalking 的数据指标
本次编写时候:
skywalking 最高版本 9.1.0
elasticsearch 最高版本 8.3.2
3. K8S 集群部署 nfs 环境
3.1 创建命名空间
[root@master ~]# kubectl create ns efk
namespace/efk created
[root@master ~]# kubectl get ns
NAME STATUS AGE
default Active 22h
efk Active 2s
ingress-nginx Active 22h
istio-system Active 19h
kube-node-lease Active 22h
kube-public Active 22h
kube-system Active 22h
metallb-system Active 22h
3.2 创建 NFS
### 这里就将 nfs-server 安装在 master 节点
# 安装 nfs-utils、rpcbind 软件包(===所有节点===)
yum -y install nfs-utils rpcbind
# 创建目录
sudo mkdir -p /nfsdata
# 添加权限
sudo chmod 777 -R /nfsdata
# 编辑文件,添加以下内容
sudo vim /etc/exports
/nfsdata 192.168.10.0/24(rw,no_root_squash,sync)
# 重启服务
systemctl start rpcbind && systemctl enable rpcbind
systemctl start nfs && systemctl enable nfs(所有节点)
# 配置生效
exportfs -rv
# 查看共享目录
sudo showmount -e 192.168.10.100
# 返回值如下,表示创建成功
Export list for 192.168.10.100:
/nfsdata 192.168.108.*
3.3 创建 StorageClass
external-storage/nfs-client/deploy at master · kubernetes-retired/external-storage (github.com)
rbac.yaml:创建 serviceaccount
apiVersion: v1
kind: ServiceAccount
metadata:
name: nfs-client-provisioner
namespace: default
---
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: nfs-client-provisioner-runner
rules:
- apiGroups: [""]
resources: ["persistentvolumes"]
verbs: ["get", "list", "watch", "create", "delete"]
- apiGroups: [""]
resources: ["persistentvolumeclaims"]
verbs: ["get", "list", "watch", "update"]
- apiGroups: ["storage.k8s.io"]
resources: ["storageclasses"]
verbs: ["get", "list", "watch"]
- apiGroups: [""]
resources: ["events"]
verbs: ["create", "update", "patch"]
---
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: run-nfs-client-provisioner
subjects:
- kind: ServiceAccount
name: nfs-client-provisioner
namespace: default
roleRef:
kind: ClusterRole
name: nfs-client-provisioner-runner
apiGroup: rbac.authorization.k8s.io
---
kind: Role
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: leader-locking-nfs-client-provisioner
namespace: default
rules:
- apiGroups: [""]
resources: ["endpoints"]
verbs: ["get", "list", "watch", "create", "update", "patch"]
---
kind: RoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: leader-locking-nfs-client-provisioner
subjects:
- kind: ServiceAccount
name: nfs-client-provisioner
namespace: default
roleRef:
kind: Role
name: leader-locking-nfs-client-provisioner
apiGroup: rbac.authorization.k8s.io
prvisor-deployment.yaml:创建 nfs-client-provisioner,要和 rbac 一个 ns
apiVersion: apps/v1
kind: Deployment
metadata:
name: nfs-client-provisioner
labels:
app: nfs-client-provisioner
namespace: default
spec:
replicas: 1
selector:
matchLabels:
app: nfs-client-provisioner
strategy:
type: Recreate
selector:
matchLabels:
app: nfs-client-provisioner
template:
metadata:
labels:
app: nfs-client-provisioner
spec:
serviceAccountName: nfs-client-provisioner
containers:
- name: nfs-client-provisioner
image: quay.io/external_storage/nfs-client-provisioner:latest
volumeMounts:
- name: nfs-client-root
mountPath: /persistentvolumes
env:
- name: PROVISIONER_NAME
value: wuchang-nfs-storage
- name: NFS_SERVER
value: 192.168.10.100 #NFS Server IP地址
- name: NFS_PATH
value: /nfsdata #NFS挂载卷
volumes:
- name: nfs-client-root
nfs:
server: 192.168.10.100 #NFS Server IP地址
path: /nfsdata #NFS 挂载卷
storageclass.yaml:创建 storageclass
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: managed-nfs-storage
provisioner: wuchang-nfs-storage
# 允许 pvc 创建后扩容
allowVolumeExpansion: True
parameters:
# 资源删除策略,如果为 "true" 则表示删除 PVC 时,同时删除绑定的 PV
archiveOnDelete: "false"
按顺序执行
kubectl apply -f rbac.yaml
kubectl get serviceaccount
kubectl apply -f prvisor-deployment.yaml
kubectl get deploy
kubectl apply -f storageclass.yaml
kubectl get sc
4. K8S 安装 ES
可事先下载镜像:
docker pull elasticsearch:x.x.x docker pull apache/skywalking-oap-server:9.0.0 docker pull apache/skywalking-ui:9.0.0
es-pvc.yaml
:serviceaccount 和 pv-deploy 要放在 default 下,pvc 可以指定任意 ns。
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: es-pvc
namespace: default
spec:
storageClassName: "managed-nfs-storage" #指定动态 PV 名称
accessModes:
- ReadWriteMany
resources:
requests:
storage: 10Gi
es-svc.yaml
apiVersion: v1
kind: Service
metadata:
name: elasticsearch-single
namespace: default
spec:
ports:
- port: 9200
protocol: TCP
targetPort: 9200
selector:
k8s-app: elasticsearch-single
elasticsearch-single.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: elasticsearch-single
namespace: default
labels:
k8s-app: elasticsearch-single
spec:
replicas: 1
selector:
matchLabels:
k8s-app: elasticsearch-single
template:
metadata:
labels:
k8s-app: elasticsearch-single
spec:
containers:
- image: elasticsearch:7.12.0 # 目前最高 8.3.2
name: elasticsearch-single
resources:
limits:
cpu: 2
memory: 3Gi
requests:
cpu: 0.5
memory: 500Mi
env:
- name: "discovery.type"
value: "single-node"
- name: ES_JAVA_OPTS
value: "-Xms512m -Xmx2g"
ports:
- containerPort: 9200
name: db
protocol: TCP
volumeMounts:
- name: elasticsearch-data
mountPath: /usr/share/elasticsearch/data
volumes:
- name: elasticsearch-data
persistentVolumeClaim:
claimName: es-pvc
执行
kubectl apply -f es-pvc.yaml
kubectl get pv,pvc
kubectl apply -f es-svc.yaml
kubectl get svc
kubectl apply -f elasticsearch-single.yaml
kubectl get pod
故障描述:
PVC 显示创建不成功:
kubectl get pvc -n efk
显示 Pending,这是由于版本太高导致的。k8sv1.20 以上版本默认禁止使用 selfLink。(selfLink:通过 API 访问资源自身的 URL,例如一个 Pod 的 link 可能是 /api/v1/namespaces/ns36aa8455/pods/sc-cluster-test-1-6bc58d44d6-r8hld)。故障解决:
[root@k8sm storage]# vi /etc/kubernetes/manifests/kube-apiserver.yaml apiVersion: v1 ··· - --tls-private-key-file=/etc/kubernetes/pki/apiserver.key - --feature-gates=RemoveSelfLink=false # 添加这个配置 重启下kube-apiserver.yaml # 如果是二进制安装的 k8s,执行 systemctl restart kube-apiserver # 如果是 kubeadm 安装的 k8s [root@k8sm manifests]# ps aux|grep kube-apiserver [root@k8sm manifests]# kill -9 [Pid] # 有可能自动重启 [root@k8sm manifests]# kubectl apply -f /etc/kubernetes/manifests/kube-apiserver.yaml ... [root@master ~]# kubectl get pods -A | grep kube-apiserver-master # 查看时间 ...... [root@k8sm storage]# kubectl get pvc # 查看 pvc 显示 Bound NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE my-pvc Bound pvc-ae9f6d4b-fc4c-4e19-8854-7bfa259a3a04 1Gi RWX example-nfs 13m
5. 安装 skywalking
5.1 安装 Helm
curl https://raw.githubusercontent.com/helm/helm/master/scripts/get-helm-3 | bash
curl http://49.232.8.65/shell/helm/helm-v3.5.0_install.sh | bash
------------------------------------------------------------------------------------
[root@master ~]# curl https://raw.githubusercontent.com/helm/helm/master/scripts/get-helm-3 | bash
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 11156 100 11156 0 0 17974 0 --:--:-- --:--:-- --:--:-- 17964
Downloading https://get.helm.sh/helm-v3.9.0-linux-amd64.tar.gz
Verifying checksum... Done.
Preparing to install helm into /usr/local/bin
helm installed into /usr/local/bin/helm
5.2 初始化 skywalking 的 charts 配置
clone helm 仓库
git clone https://github.com/apache/skywalking-kubernetes
cd skywalking-kubernetes/chart && ls
添加 ES repo
:即使使用外部 ES 也要添加这个 repo,否则会导致依赖错误
helm repo add elastic https://helm.elastic.co
helm dep up skywalking
export SKYWALKING_RELEASE_NAME=skywalking
export SKYWALKING_RELEASE_NAMESPACE=skywalking
-------------------------------------------------------
[root@master ~/skywalking-kubernetes/chart]# ls
skywalking
[root@master ~/skywalking-kubernetes/chart]# helm dep up skywalking
Hang tight while we grab the latest from your chart repositories...
...Successfully got an update from the "elastic" chart repository
Update Complete. ⎈Happy Helming!⎈
Saving 1 charts
Downloading elasticsearch from repo https://helm.elastic.co/
Deleting outdated charts
[root@master ~/skywalking-kubernetes/chart]# export SKYWALKING_RELEASE_NAME=skywalking
[root@master ~/skywalking-kubernetes/chart]# export SKYWALKING_RELEASE_NAMESPACE=skywalking
创建 skywalking 的 namespace
[root@master ~]# kubectl create namespace skywalking
namespace/skywalking created
[root@master ~]# kubectl get ns
NAME STATUS AGE
default Active 2d1h
efk Active 124m
ingress-nginx Active 2d
istio-system Active 46h
kube-node-lease Active 2d1h
kube-public Active 2d1h
kube-system Active 2d1h
metallb-system Active 2d
skywalking Active 4s
5.3 配置 skywalking 的 vaules 配置参数
初始化完成后需要自行调整配置文件:
1️⃣ 配置 oap-server 使用外部 ES,values-my-es-o1.yaml(自己创建)
2️⃣ 使用 values 自带的 es 的配置示例 values-my-es.yaml
预先下载镜像文件:
docker pull skywalking.docker.scarf.sh/apache/skywalking-oap-server:9.1.0
docker pull skywalking.docker.scarf.sh/apache/skywalking-ui:9.1.0
修改 values.yaml
[root@master ~/skywalking-kubernetes/chart/skywalking]# vim values.yaml
----------------------------------------
......
image:
repository: skywalking.docker.scarf.sh/apache/skywalking-oap-server
改为:
repository: docker.mirrors.ustc.edu.cn/apache/skywalking-oap-server
......
如果使用外部 es
# skywalking 目前最高 9.1.0
[root@master ~/skywalking-kubernetes/chart/skywalking]# cat values-my-es-01.yaml
oap:
image:
tag: 8.4.0-es7
storageType: elasticsearch7
ui:
image:
tag: 8.4.0
service:
type: NodePort
externalPort: 80
internalPort: 8080
nodePort: 30008
elasticsearch:
enabled: false
config:
# {SERVICE_NAME}.{NAMESPACE_NAME}.svc.cluster.local
host: elasticsearch-single.default # elasticsearch-single 是服务名,default 是命名空间
port:
http: 9200
# user: "elastic" # [optional]
# password: "elastic" # [optional]
# es 没有使用账号密码
5.4 helm 安装 skywalking 8.4.0
cd /root/skywalking-kubernetes/chart/
--- 方法一:直接指定版本安装,不适用外部 es
helm install "${SKYWALKING_RELEASE_NAME}" skywalking -n "${SKYWALKING_RELEASE_NAMESPACE}" --set oap.image.tag=8.4.0-es7 --set oap.storageType=elasticsearch7 --set ui.image.tag=8.4.0 --set elasticsearch.imageTag=7.12.0
--- 方法二:使用外部 es 命令(设置环境变量有点多此一举)
helm install "${SKYWALKING_RELEASE_NAME}" skywalking -n "${SKYWALKING_RELEASE_NAMESPACE}" -f ./skywalking/values-my-es-01.yaml
-----------------------------------------------------------------------------------------
[root@master ~/skywalking-kubernetes/chart]# helm install skywalking skywalking -n skywalking -f ./skywalking/values-my-es-01.yaml
NAME: skywalking
LAST DEPLOYED: Sat Jul 9 15:05:23 2022
NAMESPACE: skywalking
STATUS: deployed
REVISION: 1
TEST SUITE: None
NOTES:
************************************************************************
* *
* SkyWalking Helm Chart by SkyWalking Team *
* *
************************************************************************
Thank you for installing skywalking.
Your release is named skywalking.
Learn more, please visit https://skywalking.apache.org/
Get the UI URL by running these commands:
export NODE_PORT=$(kubectl get --namespace skywalking -o jsonpath="{.spec.ports[0].nodePort}" services skywalking-ui)
export NODE_IP=$(kubectl get nodes --namespace skywalking -o jsonpath="{.items[0].status.addresses[0].address}")
echo http://$NODE_IP:$NODE_PORT
卸载 skywalking
helm uninstall skywalking -n skywalking
持续查看 pod 安装进度
[root@master ~/skywalking-kubernetes/chart]# kubectl get pod -n skywalking -w
[root@master ~/skywalking-kubernetes/chart]# kubectl get pod -n skywalking -w
NAME READY STATUS RESTARTS AGE
skywalking-es-init--1-fnnbn 0/1 PodInitializing 0 42s
skywalking-oap-7596f94959-97ccs 0/1 PodInitializing 0 42s
skywalking-oap-7596f94959-t5sxv 0/1 PodInitializing 0 42s
skywalking-ui-7957d9fb5f-drpwg 1/1 Running 0 42s
......
临时对外暴露 skywalking 端口,我用了 NodePort 的方法开放了端口,生产中也可以使用 ingress 的方式开放
export POD_NAME=$(kubectl get pods --namespace skywalking -l "app=skywalking,release=skywalking,component=ui" -o jsonpath="{.items[0].metadata.name}") kubectl port-forward $POD_NAME 8080:8080 --namespace skywalking
查看 skywalking 的访问 URL
:k8s master/node ip + nodeport
export NODE_PORT=$(kubectl get --namespace skywalking -o jsonpath="{.spec.ports[0].nodePort}" services skywalking-ui)
export NODE_IP=$(kubectl get nodes --namespace skywalking -o jsonpath="{.items[0].status.addresses[0].address}")
echo http://$NODE_IP:$NODE_PORT
-----------------------------------------------------------------------------------------
[root@master ~]#kubectl get --namespace skywalking -o jsonpath="{.spec.ports[0].nodePort}" services skywalking-ui
30008
[root@master ~]#kubectl get nodes --namespace skywalking -o jsonpath="{.items[0].status.addresses[0].address}"
192.168.10.65
运行状态检查
# docker.mirrors.ustc.edu.cn/apache/skywalking-oap-server 8.4.0-es7 # 这个镜像最好提前下载
# 显示 PodInitializing 就是因为镜像一直没下载下来
[root@master ~]#kubectl get pods,svc -o wide -n skywalking
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
pod/elasticsearch-single-6768b6454b-l5tds 1/1 Running 0 31m 10.244.2.17 node02 <none> <none>
pod/skywalking-es-init--1-kqkz2 0/1 Completed 0 28m 10.244.1.32 node01 <none> <none>
pod/skywalking-oap-666d7ffb45-bd2bl 0/1 PodInitializing 0 28m 10.244.2.18 node02 <none> <none>
pod/skywalking-oap-666d7ffb45-kpl7p 1/1 Running 1 (24m ago) 28m 10.244.1.31 node01 <none> <none>
pod/skywalking-ui-9c4c5f495-vwnrz 1/1 Running 0 28m 10.244.1.30 node01 <none> <none>
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR
service/elasticsearch-single ClusterIP 10.102.62.187 <none> 9200/TCP 31m k8s-app=elasticsearch-single
service/skywalking-oap ClusterIP 10.101.90.191 <none> 11800/TCP,12800/TCP 28m app=skywalking,component=oap,release=skywalking
service/skywalking-ui NodePort 10.111.254.170 <none> 80:30008/TCP 28m app=skywalking,component=ui,release=skywalking
有时候 init 和 oap pod 运行不成功,显示等待 es container 创建,这是 oap 没有检测到 es 存储,可能是对接出了问题,有点玄学,需要排查。
访问
helm v3 在 k8s 上面的部署 skywalking
k8s 部署 elasticsearch(包含数据挂载VOLUME)
【Docker】之部署 skywalking 实现全链路监控功能
Kubernetes 中部署 ES 集群及运维 --- 华为云 重点