当前位置: 首页 > 科技观察

使用KubeNurse进行集群网络监控

时间:2023-03-16 15:04:38 科技观察

前言在Kubernetes中,网络是通过第三方网络插件提供的。这些第三方插件的实现比较复杂,以至于在排查网络问题时经常碰壁。那么有没有办法监控集群中所有的网络连接呢?kubenurse就是这样一个项目,它监控集群中所有的网络连接,并提供监控指标供Prometheus收集。Kubenursekubenurse的部署非常简单。它以Daemonset的形式部署在集群节点上,Yaml文件在项目的example目录下。部署成功后,每5秒会向/alive发送一个检查请求,然后内部会运行各种方法,全方位检查集群网络。为了防止网络流量过大,检查结果会缓存3秒。检测机制如下:从上图可以看出,kubenurse会对ingress、dns、apiserver、kube-proxy进行网络检测。所有检查都会创建可用于检测的公共指标:SDN网络延迟和错误Kubelet之间的网络延迟和错误Pod和apiserver通信问题入口往返网络延迟和错误服务往返网络延迟和错误(kube-proxy)Kube-apiserver问题kube-dns(CoreDns)errorExternalDNSresolutionerror(ingressurlresolution)那么这些数据主要通过两个监控指标来体现:kubenurse_errors_total:errorcounterbyerrortypekubenurse_request_duration:requesttimedistributionbytype这些指标通过Type来标识type,对应几种不同的检测目标:api_server_direct:直接从节点检测APIServerapi_server_dns:从节点通过DNS检测APIServerme_ingress:通过Ingress检测服务Serviceme_service:使用Service服务检测服务path_$KUBELET_HOSTNAME:相互检测节点之间,然后这些指标根据theP50、P90、P99分位数,可根据不同情况确认集群网络状态。安装部署这里直接使用官方的部署文件进行部署。但是,需要进行一些更改。(1)首先将代码clone到本地gitclonehttps://github.com/postfinance/kubenurse.git(2)进入example目录,修改ingress.yaml配置,主要是添加域名,如下。---apiVersion:extensions/v1beta1kind:Ingressmetadata:annotations:kubernetes.io/ingress.class:nginxname:kubenursenamespace:kube-systemspec:rules:-host:kubenurse-test.cooolops.cnhttp:paths:-backend:serviceName:kubenurseservicePort:8080(2)更新daemonset.yaml配置,主要是更改ingress的入口域名,如下。---apiVersion:apps/v1kind:DaemonSetmetadata:labels:app:kubenursename:kubenursenamespace:kube-systemspec:selector:matchLabels:app:kubenursetemplate:metadata:labels:app:kubenurseannotations:prometheus.io/path:"/metrics"prometheus.io/port:"8080"prometheus.io/scheme:"http"prometheus.io/scrape:"true"spec:serviceAccountName:nursecontainers:-name:kubenurseenv:-name:KUBENURSE_INGRESS_URLvalue:kubenurse-test.cooolops.cn#需要更改的地方-name:KUBENURSE_SERVICE_URLvalue:http://kubenurse.kube-system.svc.cluster.local:8080-name:KUBENURSE_NAMESPACEvalue:kube-system-name:KUBENURSE_NEIGHBOUR_FILTERvalue:"app=kubenurse"image:"postfinance/kubenurse:v1.2.0“端口:-containerPort:8080protocol:TCPtolerations:-effect:NoSchedulekey:node-role.kubernetes.io/masteroperator:Equal-effect:NoSchedulekey:node-role.kubernetes.io/control-planeoperator:Equal(4)新创建一个ServiceMonitor,用于获取指标数据,如下:apiVersion:monitoring.coreos.com/v1kind:SserviceMonitormetadata:name:kubenursenamespace:monitoringlabels:k8s-app:kubenursespec:jobLabel:k8s-appendpoints:-port:"8080-8080"interval:30sscheme:httpselector:matchLabels:app:kubenursenamespaceSelector:matchNames:-kube-system(5)部署application,在example目录下执行如下命令kubectlapply-f。(6)等待所有应用程序开始运行,如下所示。#kubectlgetall-nkube-system-lapp=kubenurseNAMEREADYSTATUSRESTARTSAGEpod/kubenurse-fznsw1/1Running017hpod/kubenurse-n52rq1/1Running017hpod/kubenurse-nwtl41/1Running017hpod/kubenurse-xp92p1/1Running017hpod/kubenurse-z2ksz1/1Running017hNAMETYPECLUSTER-IPEXTERNAL-IPPORT(S)AGEservice/kubenurseClusterIP10.96.229.2448080/TCP17hNAMEDESIREDCURRENTREADYUP-TO-DATEAVAILABLENODESELECTORAGEdaemonset.apps/kubenurse5555517h(7)去prometheus查看数据是否正常获取。检查各项指标是否正常。(8)此时可以在grafana上画图展示监控数据,如下。参考文档[1]https://github.com/postfinance/kubenurse[2]https://github.com/postfinance/kubenurse/tree/master/examples