• Heritrix 3.1.0 源码解析(九)


    Heritrix3.1.0系统里面Frontier组件管理链接队列,采用的是BDB数据库,利用BDB数据库来存储CrawlURI对象,首先我们来看Heritrix3.1.0是怎么实现BDB模块的

    我们知道,创建BDB数据库首先要构建数据库环境,Heritrix3.1.0的BDB模块里面EnhancedEnvironment类实现了对BDB数据库环境的封装(继承自je的Environment),如果你不熟悉BDB数据库,可以先google一下吧

    EnhancedEnvironment类的源码如下:

    /**
     * Version of BDB_JE Environment with additional convenience features, such as
     * a shared, cached StoredClassCatalog. (Additional convenience caching of 
     * Databases and StoredCollections may be added later.)
     * 
     * @author gojomo
     */
    public class EnhancedEnvironment extends Environment {
        StoredClassCatalog classCatalog; 
        Database classCatalogDB;
        
        /**
         * Constructor
         * 
         * @param envHome directory in which to open environment
         * @param envConfig config options
         * @throws DatabaseException
         */
        public EnhancedEnvironment(File envHome, EnvironmentConfig envConfig) throws DatabaseException {
            super(envHome, envConfig);
        }
    
        /**
         * Return a StoredClassCatalog backed by a Database in this environment,
         * either pre-existing or created (and cached) if necessary.
         * 
         * @return the cached class catalog
         */
        public StoredClassCatalog getClassCatalog() {
            if(classCatalog == null) {
                DatabaseConfig dbConfig = new DatabaseConfig();
                dbConfig.setAllowCreate(true);
                dbConfig.setReadOnly(this.getConfig().getReadOnly());
                try {
                    classCatalogDB = openDatabase(null, "classCatalog", dbConfig);
                    classCatalog = new StoredClassCatalog(classCatalogDB);
                } catch (DatabaseException e) {
                    // TODO Auto-generated catch block
                    throw new RuntimeException(e);
                }
            }
            return classCatalog;
        }
    
        @Override
        public synchronized void close() throws DatabaseException {
            if(classCatalogDB!=null) {
                classCatalogDB.close();
            }
            super.close();
        }
    
        /**
         * Create a temporary test environment in the given directory.
         * @param dir target directory
         * @return EnhancedEnvironment
         */
        public static EnhancedEnvironment getTestEnvironment(File dir) {
            EnvironmentConfig envConfig = new EnvironmentConfig();
            envConfig.setAllowCreate(true);
            envConfig.setTransactional(false);
            EnhancedEnvironment env;
            try {
                env = new EnhancedEnvironment(dir, envConfig);
            } catch (DatabaseException e) {
                throw new RuntimeException(e);
            } 
            return env;
        }
    }

    从该类源码可以看到,除了实现je的Environment功能外,还增加了StoredClassCatalog getClassCatalog()方法,是BDB存储自定义对象需要用到的,里面同时创建了classCatalogDB库用来构建StoredClassCatalog对象

    那么 我们要创建以及操作BDB数据库是哪里实现的呢,接下来就是要分析的BdbModule类了(BdbModule类实现了一系列的接口,这部分暂时不具体解释)

    BdbModule类的源码有点长,我这里就不贴出来了,只在分析时贴出相关代码

        private static class DatabasePlusConfig implements Serializable {
            private static final long serialVersionUID = 1L;
            public transient Database database;
            public BdbConfig config;
        }
        
        
        /**
         * Configuration object for databases.  Needed because 
         * {@link DatabaseConfig} is not serializable.  Also it prevents invalid
         * configurations.  (All databases opened through this module must be
         * deferred-write, because otherwise they can't sync(), and you can't
         * run a checkpoint without doing sync() first.)
         * 
         * @author pjack
         *
         */
        public static class BdbConfig implements Serializable {
            private static final long serialVersionUID = 1L;
    
            boolean allowCreate;
            boolean sortedDuplicates;
            boolean transactional;
            boolean deferredWrite = true; 
    
            public BdbConfig() {
            }
    
    
            public boolean isAllowCreate() {
                return allowCreate;
            }
    
    
            public void setAllowCreate(boolean allowCreate) {
                this.allowCreate = allowCreate;
            }
    
    
            public boolean getSortedDuplicates() {
                return sortedDuplicates;
            }
    
    
            public void setSortedDuplicates(boolean sortedDuplicates) {
                this.sortedDuplicates = sortedDuplicates;
            }
    
            public DatabaseConfig toDatabaseConfig() {
                DatabaseConfig result = new DatabaseConfig();
                result.setDeferredWrite(deferredWrite);
                result.setTransactional(transactional);
                result.setAllowCreate(allowCreate);
                result.setSortedDuplicates(sortedDuplicates);
                return result;
            }
    
    
            public boolean isTransactional() {
                return transactional;
            }
    
    
            public void setTransactional(boolean transactional) {
                this.transactional = transactional;
            }
    
    
            public void setDeferredWrite(boolean b) {
                this.deferredWrite = true; 
            }
        }

    上面部分是静态类DatabasePlusConfig和BdbConfig,前者是私有的,只能在BdbModule类创建,后者是公有的,可以在外部创建 

    显然,静态类DatabasePlusConfig除了Database database成员变量外,还有静态类BdbConfig的成员变量BdbConfig config

    静态类BdbConfig是对BDB数据库配置的封装,我们从它的属性可以看到,通过设置里面的属性后,从它的DatabaseConfig toDatabaseConfig()方法返回BDB数据库配置对象

      public DatabaseConfig toDatabaseConfig() {
                DatabaseConfig result = new DatabaseConfig();
                result.setDeferredWrite(deferredWrite);
                result.setTransactional(transactional);
                result.setAllowCreate(allowCreate);
                result.setSortedDuplicates(sortedDuplicates);
                return result;
            }

     BdbModule源码下面部分为BDB数据库环境属性设置,在后面的BDB数据库环境实例化方法里面用到了这些参数

    protected ConfigPath dir = new ConfigPath("bdbmodule subdirectory","state");
        public ConfigPath getDir() {
            return dir;
        }
        public void setDir(ConfigPath dir) {
            this.dir = dir;
        }
        
        int cachePercent = -1;
        public int getCachePercent() {
            return cachePercent;
        }
        public void setCachePercent(int cachePercent) {
            this.cachePercent = cachePercent;
        }
    
        boolean useSharedCache = true; 
        public boolean getUseSharedCache() {
            return useSharedCache;
        }
        public void setUseSharedCache(boolean useSharedCache) {
            this.useSharedCache = useSharedCache;
        }
        
        /**
         * Expected number of concurrent threads; used to tune nLockTables
         * according to JE FAQ
         * http://www.oracle.com/technology/products/berkeley-db/faq/je_faq.html#33
         */
        int expectedConcurrency = 64;
        public int getExpectedConcurrency() {
            return expectedConcurrency;
        }
        public void setExpectedConcurrency(int expectedConcurrency) {
            this.expectedConcurrency = expectedConcurrency;
        }
        
        /**
         * Whether to use hard-links to log files to collect/retain
         * the BDB log files needed for a checkpoint. Default is true. 
         * May not work on Windows (especially on pre-NTFS filesystems). 
         * If false, the BDB 'je.cleaner.expunge' value will be set to 
         * 'false', as well, meaning BDB will *not* delete obsolete JDB
         * files, but only rename the '.DEL'. They will have to be 
         * manually deleted to free disk space, but .DEL files referenced
         * in any checkpoint's 'jdbfiles.manifest' should be retained to
         * keep the checkpoint valid. 
         */
        boolean useHardLinkCheckpoints = true;
        public boolean getUseHardLinkCheckpoints() {
            return useHardLinkCheckpoints;
        }
        public void setUseHardLinkCheckpoints(boolean useHardLinkCheckpoints) {
            this.useHardLinkCheckpoints = useHardLinkCheckpoints;
        }
        
        private transient EnhancedEnvironment bdbEnvironment;
            
        private transient StoredClassCatalog classCatalog;

    下面需要注意的是两个成员变量比较重要

    @SuppressWarnings("rawtypes")
        private Map<String,ObjectIdentityCache> oiCaches = 
            new ConcurrentHashMap<String,ObjectIdentityCache>();
    
        private Map<String,DatabasePlusConfig> databases =
            new ConcurrentHashMap<String,DatabasePlusConfig>();

    两者都是map类型的变量成员,可以理解为map容器,前者保存的是缓存管理的对象(BdbFrontier模块里面用来管理工作队列缓存),后者是DatabasePlusConfig对象,对外提供BDB数据库实例

    我们看它的初始化方法start(该方法是spring框架里面的Lifecycle接口方法,BdbModule实现了该接口)

    public synchronized void start() {
            if (isRunning()) {
                return;
            }
            
            isRunning = true;
            
            try {
                boolean isRecovery = false; 
                if(recoveryCheckpoint!=null) {
                    isRecovery = true; 
                    doRecover(); 
                }
       
                setup(getDir().getFile(), !isRecovery);
            } catch (DatabaseException e) {
                throw new IllegalStateException(e);
            } catch (IOException e) {
                throw new IllegalStateException(e);
            }
        }

    doRecover()方法用于从断点恢复,setup(getDir().getFile(), !isRecovery);用于实初始化数据库环境的封装对象EnhancedEnvironment和StoredClassCatalog对象

    protected void setup(File f, boolean create) 
        throws DatabaseException, IOException {
            EnvironmentConfig config = new EnvironmentConfig();
            config.setAllowCreate(create);
            config.setLockTimeout(75, TimeUnit.MINUTES); // set to max
            if(getCachePercent()>0) {
                config.setCachePercent(getCachePercent());
            }
            config.setSharedCache(getUseSharedCache());
            
            // we take the advice literally from...
            // http://www.oracle.com/technology/products/berkeley-db/faq/je_faq.html#33
            long nLockTables = getExpectedConcurrency()-1;
            while(!BigInteger.valueOf(nLockTables).isProbablePrime(Integer.MAX_VALUE)) {
                nLockTables--;
            }
            config.setConfigParam("je.lock.nLockTables", Long.toString(nLockTables));
            
            // triple this value to 6K because stats show many faults
            config.setConfigParam("je.log.faultReadSize", "6144"); 
    
            if(!getUseHardLinkCheckpoints()) {
                // to support checkpoints by textual manifest only, 
                // prevent BDB's cleaner from deleting log files
                config.setConfigParam("je.cleaner.expunge", "false");
            } // else leave whatever other setting was already in place
    
            org.archive.util.FileUtils.ensureWriteableDirectory(f);
            this.bdbEnvironment = new EnhancedEnvironment(f, config);
            this.classCatalog = this.bdbEnvironment.getClassCatalog();
            if(!create) {
                // freeze last log file -- so that originating checkpoint isn't fouled
                DbBackup dbBackup = new DbBackup(bdbEnvironment);
                dbBackup.startBackup();
                dbBackup.endBackup();
            }
        }

    打开数据库的方法是openDatabase(String name, BdbConfig config, boolean usePriorData) 

    /**
         * Open a Database inside this BdbModule's environment, and 
         * remember it for automatic close-at-module-stop. 
         * 
         * @param name
         * @param config
         * @param usePriorData
         * @return
         * @throws DatabaseException
         */
        public Database openDatabase(String name, BdbConfig config, boolean usePriorData) 
        throws DatabaseException {
            if (bdbEnvironment == null) {
                // proper initialization hasn't occurred
                throw new IllegalStateException("BdbModule not started");
            }
            if (databases.containsKey(name)) {
                DatabasePlusConfig dpc = databases.get(name);
                if(dpc.config == config) {
                    // object-identical configs: OK to share DB
                    return dpc.database;
                }
                // unshared config object: might be name collision; error
                throw new IllegalStateException("Database already exists: " +name);
            }
            
            DatabasePlusConfig dpc = new DatabasePlusConfig();
            if (!usePriorData) {
                try {
                    bdbEnvironment.truncateDatabase(null, name, false);
                } catch (DatabaseNotFoundException e) {
                    // Ignored
                }
            }
            dpc.database = bdbEnvironment.openDatabase(null, name, config.toDatabaseConfig());
            dpc.config = config;
            databases.put(name, dpc);
            return dpc.database;
        }

     在调用该方法时先判断Map<String,DatabasePlusConfig> databases成员变量里面有没有保存,然后再创建

    下面的方法是返回StoredQueue队列,StoredQueue队列里面保存的类型为参数里面的Class<K> clazz,数据库配置是StoredQueue.databaseConfig()(StoredQueue本身的)

     public <K extends Serializable> StoredQueue<K> getStoredQueue(String dbname, Class<K> clazz, boolean usePriorData) {
            try {
                Database queueDb;
                queueDb = openDatabase(dbname,
                        StoredQueue.databaseConfig(), usePriorData);
                return new StoredQueue<K>(queueDb, clazz, getClassCatalog());
            } catch (DatabaseException e) {
                throw new RuntimeException(e);
            }
            
        }

    在实例化StoredQueue队列时,传入的StoredClassCatalog对象用于创建EntryBinding<E>类型的对象(比如Heritrix里面有KryoBinding<K>类型的)(用于可序列化化类到BDB数据类型的转换,K为可序列化类型对象 <K extends Serializable>)

    这里有必要看来一段插曲,进去看看StoredQueue类的源码,StoredQueue继承自AbstractQueue<E>,实现了用BDB数据库存储队列成员的队列操作

    /**
     * Queue backed by a JE Collections StoredSortedMap. 
     * 
     * @author gojomo
     *
     * @param <E>
     */
    public class StoredQueue<E extends Serializable> extends AbstractQueue<E>  {
        @SuppressWarnings("unused")
        private static final Logger logger =
            Logger.getLogger(StoredQueue.class.getName());
    
        transient StoredSortedMap<Long,E> queueMap; // Long -> E
        transient Database queueDb; // Database
        AtomicLong tailIndex; // next spot for insert
        transient volatile E peekItem = null;
        
        /**
         * Create a StoredQueue backed by the given Database. 
         * 
         * The Class of values to be queued may be provided; there is only a 
         * benefit when a primitive type is specified. A StoredClassCatalog
         * must be provided if a primitive type is not supplied. 
         * 
         * @param db
         * @param clsOrNull 
         * @param classCatalog
         */
        public StoredQueue(Database db, Class<E> clsOrNull, StoredClassCatalog classCatalog) {
            hookupDatabase(db, clsOrNull, classCatalog);
            tailIndex = new AtomicLong(queueMap.isEmpty() ? 0L : queueMap.lastKey()+1);
        }
    
        /**
         * @param db
         * @param clsOrNull
         * @param classCatalog
         */
        public void hookupDatabase(Database db, Class<E> clsOrNull, StoredClassCatalog classCatalog) {
            EntryBinding<E> valueBinding = TupleBinding.getPrimitiveBinding(clsOrNull);
            if(valueBinding == null) {
                valueBinding = new SerialBinding<E>(classCatalog, clsOrNull);
            }
            queueDb = db;
            queueMap = new StoredSortedMap<Long,E>(
                    db,
                    TupleBinding.getPrimitiveBinding(Long.class),
                    valueBinding,
                    true);
        }
    
        @Override
        public Iterator<E> iterator() {
            return queueMap.values().iterator();
        }
    
        @Override
        public int size() {
            try {
                return Math.max(0, 
                        (int)(tailIndex.get() 
                              - queueMap.firstKey())); 
            } catch (IllegalStateException ise) {
                return 0; 
            } catch (NoSuchElementException nse) {
                return 0;
            } catch (NullPointerException npe) {
                return 0;
            }
        }
        
        @Override
        public boolean isEmpty() {
            if(peekItem!=null) {
                return false;
            }
            try {
                return queueMap.isEmpty();
            } catch (IllegalStateException de) {
                return true;
            }
        }
    
        public boolean offer(E o) {
            long targetIndex = tailIndex.getAndIncrement();
            queueMap.put(targetIndex, o);
            return true;
        }
    
        public synchronized E peek() {
            if(peekItem == null) {
                if(queueMap.isEmpty()) {
                    return null; 
                }
                peekItem = queueMap.remove(queueMap.firstKey());
            }
            return peekItem; 
        }
    
        public synchronized E poll() {
            E head = peek();
            peekItem = null;
            return head; 
        }
    
        /**
         * A suitable DatabaseConfig for the Database backing a StoredQueue. 
         * (However, it is not necessary to use these config options.)
         * 
         * @return DatabaseConfig suitable for queue
         */
        public static BdbModule.BdbConfig databaseConfig() {
            BdbModule.BdbConfig dbConfig = new BdbModule.BdbConfig();
            dbConfig.setTransactional(false);
            dbConfig.setAllowCreate(true);
            return dbConfig;
        }
        
        public void close() {
            try {
                queueDb.sync();
                queueDb.close();
            } catch (DatabaseException e) {
                throw new RuntimeException(e);
            }
        }
    }

    je封装了StoredSortedMap<Long,E>类型的类用于操作管理BDB数据库里面的数据,至此,我们可以将StoredQueue对象理解为数据存储在BDB数据库(里面经过StoredSortedMap的封装)的队列(queue)

    后面的部分为缓存管理(管理实现了IdentityCacheable接口的对象的缓存,如BdbWorkQueue类间接实现了该接口,从而实现了工作队列对象的缓存的管理;其实ObjectIdentityBdbManualCache对象本身的缓存也是通过BDB数据库存储的)

     /**
         * Get an ObjectIdentityBdbCache, backed by a BDB Database of the 
         * given name, with the given value class type. If 'recycle' is true,
         * reuse values already in the database; otherwise start with an 
         * empty cache. 
         *  
         * @param <V>
         * @param dbName
         * @param recycle
         * @param valueClass
         * @return
         * @throws DatabaseException
         */
        public <V extends IdentityCacheable> ObjectIdentityBdbManualCache<V> getOIBCCache(String dbName, boolean recycle,
                Class<? extends V> valueClass) 
        throws DatabaseException {
            if (!recycle) {
                try {
                    bdbEnvironment.truncateDatabase(null, dbName, false);
                } catch (DatabaseNotFoundException e) {
                    // ignored
                }
            }
            ObjectIdentityBdbManualCache<V> oic = new ObjectIdentityBdbManualCache<V>();
            oic.initialize(bdbEnvironment, dbName, valueClass, classCatalog);
            oiCaches.put(dbName, oic);
            return oic;
        }
      
        public <V extends IdentityCacheable> ObjectIdentityCache<V> getObjectCache(String dbName, boolean recycle,
                Class<V> valueClass) 
        throws DatabaseException {
            return getObjectCache(dbName, recycle, valueClass, valueClass);
        }
        
        /**
         * Get an ObjectIdentityCache, backed by a BDB Database of the given 
         * name, with objects of the given valueClass type. If 'recycle' is
         * true, reuse values already in the database; otherwise start with 
         * an empty cache. 
         * 
         * @param <V>
         * @param dbName
         * @param recycle
         * @param valueClass
         * @return
         * @throws DatabaseException
         */
        public <V extends IdentityCacheable> ObjectIdentityCache<V> getObjectCache(String dbName, boolean recycle,
                Class<V> declaredClass, Class<? extends V> valueClass) 
        throws DatabaseException {
            @SuppressWarnings("unchecked")
            ObjectIdentityCache<V> oic = oiCaches.get(dbName);
            if(oic!=null) {
                return oic; 
            }
            oic =  getOIBCCache(dbName, recycle, valueClass);
            return oic; 
        }

    再后面部分为设置断点及从断点恢复

    public void doCheckpoint(Checkpoint checkpointInProgress) throws IOException {
            // First sync objectCaches
            for (@SuppressWarnings("rawtypes") ObjectIdentityCache oic : oiCaches.values()) {
                oic.sync();
            }
    
            try {
                // sync all databases
                for (DatabasePlusConfig dbc: databases.values()) {
                    dbc.database.sync();
                }
            
                // Do a force checkpoint.  Thats what a sync does (i.e. doSync).
                CheckpointConfig chkptConfig = new CheckpointConfig();
                chkptConfig.setForce(true);
                
                // Mark Hayes of sleepycat says:
                // "The default for this property is false, which gives the current
                // behavior (allow deltas).  If this property is true, deltas are
                // prohibited -- full versions of internal nodes are always logged
                // during the checkpoint. When a full version of an internal node
                // is logged during a checkpoint, recovery does not need to process
                // it at all.  It is only fetched if needed by the application,
                // during normal DB operations after recovery. When a delta of an
                // internal node is logged during a checkpoint, recovery must
                // process it by fetching the full version of the node from earlier
                // in the log, and then applying the delta to it.  This can be
                // pretty slow, since it is potentially a large amount of
                // random I/O."
                // chkptConfig.setMinimizeRecoveryTime(true);
                bdbEnvironment.checkpoint(chkptConfig);
                LOGGER.fine("Finished bdb checkpoint.");
            
                DbBackup dbBackup = new DbBackup(bdbEnvironment);
                try {
                    dbBackup.startBackup();
                    
                    File envCpDir = new File(dir.getFile(),checkpointInProgress.getName());
                    org.archive.util.FileUtils.ensureWriteableDirectory(envCpDir);
                    File logfilesList = new File(envCpDir,"jdbfiles.manifest");
                    String[] filedata = dbBackup.getLogFilesInBackupSet();
                    for (int i=0; i<filedata.length;i++) {
                        File f = new File(dir.getFile(),filedata[i]);
                        filedata[i] += ","+f.length();
                        if(getUseHardLinkCheckpoints()) {
                            File hardLink = new File(envCpDir,filedata[i]);
                            if (!FilesystemLinkMaker.makeHardLink(f.getAbsolutePath(), hardLink.getAbsolutePath())) {
                                LOGGER.log(Level.SEVERE, "unable to create required checkpoint link "+hardLink); 
                            }
                        }
                    }
                    FileUtils.writeLines(logfilesList,Arrays.asList(filedata));
                    LOGGER.fine("Finished processing bdb log files.");
                } finally {
                    dbBackup.endBackup();
                }
            } catch (DatabaseException e) {
                throw new IOException(e);
            }
        }
        
        @SuppressWarnings("unchecked")
        protected void doRecover() throws IOException {
            File cpDir = new File(dir.getFile(),recoveryCheckpoint.getName());
            File logfilesList = new File(cpDir,"jdbfiles.manifest");
            List<String> filesAndLengths = FileUtils.readLines(logfilesList);
            HashMap<String,Long> retainLogfiles = new HashMap<String,Long>();
            for(String line : filesAndLengths) {
                String[] fileAndLength = line.split(",");
                long expectedLength = Long.valueOf(fileAndLength[1]);
                retainLogfiles.put(fileAndLength[0],expectedLength);
                
                // check for files in checkpoint directory; relink to environment as necessary
                File cpFile = new File(cpDir, line);
                File destFile = new File(dir.getFile(), fileAndLength[0]);
                if(cpFile.exists()) {
                    if(cpFile.length()!=expectedLength) {
                        LOGGER.warning(cpFile.getName()+" expected "+expectedLength+" actual "+cpFile.length());
                        // TODO: is truncation necessary? 
                    }
                    if(destFile.exists()) {
                        if(!destFile.delete()) {
                            LOGGER.log(Level.SEVERE, "unable to delete obstructing file "+destFile);  
                        }
                    }
                    int status = CLibrary.INSTANCE.link(cpFile.getAbsolutePath(), destFile.getAbsolutePath());
                    if (status!=0) {
                        LOGGER.log(Level.SEVERE, "unable to create required restore link "+destFile); 
                    }
                }
                
            }
            
            IOFileFilter filter = FileFilterUtils.orFileFilter(
                    FileFilterUtils.suffixFileFilter(".jdb"), 
                    FileFilterUtils.suffixFileFilter(".del"));
            filter = FileFilterUtils.makeFileOnly(filter);
            
            // reverify environment directory is as it was at checkpoint time, 
            // deleting any extra files
            for(File f : dir.getFile().listFiles((FileFilter)filter)) {
                if(retainLogfiles.containsKey(f.getName())) {
                    // named file still exists under original name
                    long expectedLength = retainLogfiles.get(f.getName());
                    if(f.length()!=expectedLength) {
                        LOGGER.warning(f.getName()+" expected "+expectedLength+" actual "+f.length());
                        // TODO: truncate? this unexpected length mismatch
                        // probably only happens if there was already a recovery
                        // where the affected file was the last of the set, in 
                        // which case BDB appends a small amount of (harmless?) data
                        // to the previously-undersized file
                    }
                    retainLogfiles.remove(f.getName()); 
                    continue;
                }
                // file as now-named not in restore set; check if un-".DEL" renaming needed
                String undelName = f.getName().replace(".del", ".jdb");
                if(retainLogfiles.containsKey(undelName)) {
                    // file if renamed matches desired file name
                    long expectedLength = retainLogfiles.get(undelName);
                    if(f.length()!=expectedLength) {
                        LOGGER.warning(f.getName()+" expected "+expectedLength+" actual "+f.length());
                        // TODO: truncate to expected size?
                    }
                    if(!f.renameTo(new File(f.getParentFile(),undelName))) {
                        throw new IOException("Unable to rename " + f + " to " +
                                undelName);
                    }
                    retainLogfiles.remove(undelName); 
                }
                // file not needed; delete/move-aside
                if(!f.delete()) {
                    LOGGER.warning("unable to delete "+f);
                    org.archive.util.FileUtils.moveAsideIfExists(f);
                }
                // TODO: log/warn of ruined later checkpoints? 
            }
            if(retainLogfiles.size()>0) {
                // some needed files weren't present
                LOGGER.severe("Checkpoint corrupt, needed log files missing: "+retainLogfiles);
            }
            
        }

    最后还有getStoredMap(String dbName, Class<K> keyClass, Class<V> valueClass, boolean allowDuplicates, boolean usePriorData)方法。用于创建临时的DisposableStoredSortedMap<K,V>对象(继承自je的StoredSortedMap,可以理解为存储在BDB数据库的(经过StoredSortedMap封装)临时的map容器, Class<K> keyClass, Class<V> valueClass参数为key和value的类型)

    /**
         * Creates a database-backed TempStoredSortedMap for transient 
         * reporting requirements. Calling the returned map's destroy()
         * method when done discards the associated Database. 
         * 
         * @param <K>
         * @param <V>
         * @param dbName Database name to use; if null a name will be synthesized
         * @param keyClass Class of keys; should be a Java primitive type
         * @param valueClass Class of values; may be any serializable type
         * @param allowDuplicates whether duplicate keys allowed
         * @return
         */
        public <K,V> DisposableStoredSortedMap<K, V> getStoredMap(String dbName, Class<K> keyClass, Class<V> valueClass, boolean allowDuplicates, boolean usePriorData) {
            BdbConfig config = new BdbConfig(); 
            config.setSortedDuplicates(allowDuplicates);
            config.setAllowCreate(!usePriorData); 
            Database mapDb;
            if(dbName==null) {
                dbName = "tempMap-"+System.identityHashCode(this)+"-"+sn;
                sn++;
            }
            final String openName = dbName; 
            try {
                mapDb = openDatabase(openName,config,usePriorData);
            } catch (DatabaseException e) {
                throw new RuntimeException(e); 
            } 
            EntryBinding<V> valueBinding = TupleBinding.getPrimitiveBinding(valueClass);
            if(valueBinding == null) {
                valueBinding = new SerialBinding<V>(classCatalog, valueClass);
            }
            DisposableStoredSortedMap<K,V> storedMap = new DisposableStoredSortedMap<K, V>(
                    mapDb,
                    TupleBinding.getPrimitiveBinding(keyClass),
                    valueBinding,
                    true) {
                        @Override
                        public void dispose() {
                            super.dispose();
                            DatabasePlusConfig dpc = BdbModule.this.databases.remove(openName);
                            if (dpc == null) {
                                BdbModule.LOGGER.log(Level.WARNING,"No such database: " + openName);
                            }
                        }
            };
            return storedMap; 
        }
        

    经过本文分析,我们还有很多疑问,待后文再继续吧

    ---------------------------------------------------------------------------

    本系列Heritrix 3.1.0 源码解析系本人原创

    转载请注明出处 博客园 刺猬的温驯

    本文链接 http://www.cnblogs.com/chenying99/archive/2013/04/14/3019757.html 

  • 相关阅读:
    七牛上传图片
    Mysql数据库分布式事务XA详解
    PostgreSQL查询表名称及表结构
    利用DataSet分页方法 小宝马的爸爸
    Flex4中的皮肤(4):使用SkinPart约束Skin 小宝马的爸爸
    Flex4中使用WCF 小宝马的爸爸
    Flex4中的皮肤(3):使用组件数据 小宝马的爸爸
    (转)Flex4中的皮肤(1):自定义SkinnableComponent 小宝马的爸爸
    一起学ASP.NET中如何使用存储过程 小宝马的爸爸
    从宫二的李为看处世哲学 小宝马的爸爸
  • 原文地址:https://www.cnblogs.com/chenying99/p/3019757.html
Copyright © 2020-2023  润新知