问题背景:
我们的项目要用Gateway实现对微服务的分流,就是控制流量打到一个微服务的不同实例的比例,所以在geteway里写了很多调用Nacos的API的方法。
在部署新环境的时候,报了以下错误,我们的服务器使用的是k8s,镜像都是统一的。
2021-11-23 16:53:54.568 ERROR [***-gateway,,,] 1 --- [ main] com.alibaba.nacos.client.naming : [NA] failed to write cache for dom:DEFAULT_GROUP@@***-**** java.lang.IllegalStateException: failed to create cache dir: /root/nacos/naming/753378b3-d4ad-4f1a-859b-f9d57df33c9f at com.alibaba.nacos.client.naming.cache.DiskCache.makeSureCacheDirExists(DiskCache.java:154) ~[nacos-client-1.1.4.jar:na] at com.alibaba.nacos.client.naming.cache.DiskCache.write(DiskCache.java:45) ~[nacos-client-1.1.4.jar:na] at com.alibaba.nacos.client.naming.core.HostReactor.processServiceJSON(HostReactor.java:184) [nacos-client-1.1.4.jar:na]
问题排查过程:
错误内容很明显,就是要往服务器里写入缓存文件,失败了。通过错误提示,我们在 nacos-client-1.1.4.jar 里找到了报错的类
package com.alibaba.nacos.client.naming.cache; public class DiskCache { public static void write(ServiceInfo dom, String dir) { try { makeSureCacheDirExists(dir); File file = new File(dir, dom.getKeyEncoded()); if (!file.exists()) { // add another !file.exists() to avoid conflicted creating-new-file from multi-instances if (!file.createNewFile() && !file.exists()) { throw new IllegalStateException("failed to create cache file"); } } StringBuilder keyContentBuffer = new StringBuilder(""); String json = dom.getJsonFromServer(); if (StringUtils.isEmpty(json)) { json = JSON.toJSONString(dom); } keyContentBuffer.append(json); //Use the concurrent API to ensure the consistency. ConcurrentDiskUtil.writeFileContent(file, keyContentBuffer.toString(), Charset.defaultCharset().toString()); } catch (Throwable e) { NAMING_LOGGER.error("[NA] failed to write cache for dom:" + dom.getName(), e); } } *******
private static File makeSureCacheDirExists(String dir) { File cacheDir = new File(dir); if (!cacheDir.exists() && !cacheDir.mkdirs()) { throw new IllegalStateException("failed to create cache dir: " + dir); } return cacheDir; } }
write方法调用了makeSureCacheDirExists,在makeSureCacheDirExists方法里,判断缓存文件不存在,并且创建目录失败了,就会抛出异常。
我们通过调动关系,要找到谁调用了DiskCache的write方法,我找到了HostReactor,缓存地址cacheDir是通过构造方法传进来的。
package com.alibaba.nacos.client.naming.core;
public class HostReactor {
public HostReactor(EventDispatcher eventDispatcher, NamingProxy serverProxy, String cacheDir, boolean loadCacheAtStart, int pollingThreadCount) {
......
}
}
再往前找,发现是 NacosNamingService 实例化的时候,调用了 HostReactor
package com.alibaba.nacos.client.naming; @SuppressWarnings("PMD.ServiceOrDaoClassShouldEndWithImplRule") public class NacosNamingService implements NamingService { private HostReactor hostReactor; public NacosNamingService(String serverList) { Properties properties = new Properties(); properties.setProperty(PropertyKeyConst.SERVER_ADDR, serverList); init(properties); } public NacosNamingService(Properties properties) { init(properties); } private void init(Properties properties) { namespace = InitUtils.initNamespaceForNaming(properties); initServerAddr(properties); InitUtils.initWebRootContext(); initCacheDir(); initLogName(properties); eventDispatcher = new EventDispatcher(); serverProxy = new NamingProxy(namespace, endpoint, serverList); serverProxy.setProperties(properties); beatReactor = new BeatReactor(serverProxy, initClientBeatThreadCount(properties)); hostReactor = new HostReactor(eventDispatcher, serverProxy, cacheDir, isLoadCacheAtStart(properties), initPollingThreadCount(properties)); }
private void initCacheDir() { cacheDir = System.getProperty("com.alibaba.nacos.naming.cache.dir"); if (StringUtils.isEmpty(cacheDir)) { cacheDir = System.getProperty("user.home") + "/nacos/naming/" + namespace; } } ...... }
NacosNamingService 的构造方法都调用了init方法,而init方法调用了initCacheDir()方法,给cacheDir变量赋值,最后完成了HostReactor 类的初始化。
当看到 initCacheDir 方法的内容后,大家应该就都明白了,指定Nacos缓存路径有2种方式:
1. 在项目配置文件中指定,参数:com.alibaba.nacos.naming.cache.dir
2. 服务器的运行用户的根目录 + /nacos/naming/
解决方法:
1. 如果服务器上的只有root账号,可以尝试让运维同学把 /root/nacos/naming/ 目录的写入权限放开
2. 一般情况root的目录是禁止随便写入的,那可以更换服务器上的其他账号,启动应用程序,并开放 /user/nacos/naming/ 目录的写入权限
3. 在程序的yml文件中配置 com.alibaba.nacos.naming.cache.dir ,把缓存写到一个开放的文件目录。