基本操作
- 局部配置knative
在isvc anntations增加 autoscaling.knative.dev,源码位置
示例如下:
apiVersion: serving.kserve.io/v1beta1 kind: InferenceService metadata: annotations: autoscaling.knative.dev/scaleToZeroPodRetentionPeriod: 3600s name: zwy-tensorflow-test namespace: default spec: predictor: canaryTrafficPercent: 100 maxReplicas: 1 minReplicas: 0 nodeSelector: device: cpu tensorflow: name: kserve-container resources: limits: cpu: "2" memory: 2Gi requests: cpu: 100m memory: 100Mi runtimeVersion: 1.14.0 storageUri: http://wenyangchou/model.zip
常见问题
- clusterDomain造成的安装失败
Creating service 'hello' in namespace 'default': 0.028s The Route is still working to reflect the latest desired specification. 0.067s Configuration "hello" is waiting for a Revision to become ready. 2.736s ... 2.736s Ingress has not yet been reconciled. 2.788s Waiting for load balancer to be ready
在knative-serving安装过程中,出现一直 wait fot load balancer。该原因由clusterDomain修改造成。详情参考: https://github.com/knative/serving/issues/12371
-
域名修改
kubectl edit cm config-domain -n knative-serving
-
Error from server (BadRequest): error when creating "service.yaml": admission webhook "validation.webhook.serving.knative.dev" denied the request: validation failed: must not set the field(s): spec.template.spec.nodeSelector, spec.template.spec.tolerations
kserve依赖于knative,默认是不打开节点标签选取的。参考: https://github.com/knative/serving/issues/11388
kubectl -n knative-serving edit cm config-features
- failed to resolve image to digest: Get https:/xxx.xxx.com/v2/: ....
运行isvc发现私有仓库默认转https了。
解决: kubectl -n knative-serving edit configmap config-deployment
修改:registries-skipping-tag-resolving: kind.local,ko.local,dev.local,harbor.wenyangchou.com
- 发布新版本时,会导致之前的版本又拉起来
这个是kserve bug https://github.com/kserve/kserve/pull/2097
解决:
kubectl edit sts kserve-controller-manager -n kserve
修改第一个container镜像为wenyangchou/kserve-controller:v0.7.1