导出博客园的内容并生成markdown文件

博客园支持备份功能：

操作时间是：

工作日18:00之后、8点之前或周六、周日进行备份。

点击备份，可以选择时间段，导出以后，是xml格式，样例格式如下：

<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/"
     xmlns:trackback="http://madskills.com/public/xml/rss/module/trackback/"
     xmlns:wfw="http://wellformedweb.org/CommentAPI/" xmlns:slash="http://purl.org/rss/1.0/modules/slash/">
    <channel>
        <title>博客园-xxx</title>
        <link>https://www.cnblogs.com/xxx</link>
        <description>xxxx</description>
        <language>zh-cn</language>
        <lastBuildDate>Mon, 28 Jun 2021 12:48:06 GMT</lastBuildDate>
        <pubDate>Mon, 28 Jun 2021 12:48:06 GMT</pubDate>
        <ttl>60</ttl>
        <item>
            <title>文章3</title>
            <link>http://www.baidu.com</link>
            <dc:creator>作者名称</dc:creator>
            <author>作者名称</author>
            <pubDate>Mon, 21 Jun 2021 14:01:00 GMT</pubDate>
            <guid>http://www.baidu.com</guid>
            <description><![CDATA[这里是正文信息]]></description>
        </item>
        <item>
            <title>文章1</title>
            <link>http://www.baidu.com</link>
            <dc:creator>作者名称</dc:creator>
            <author>作者名称</author>
            <pubDate>Mon, 21 Jun 2021 14:01:00 GMT</pubDate>
            <guid>http://www.baidu.com</guid>
            <description><![CDATA[这里是正文信息]]></description>
        </item>
        <item>
            <title>文章2</title>
            <link>http://www.baidu.com</link>
            <dc:creator>作者名称</dc:creator>
            <author>作者名称</author>
            <pubDate>Mon, 21 Jun 2021 14:01:00 GMT</pubDate>
            <guid>http://www.baidu.com</guid>
            <description><![CDATA[这里是正文信息]]></description>
        </item>
    </channel>
</rss>

可以通过解析xml，拿到每篇文章的内容标题，并且生成markdown文档,代码如下：

import cn.hutool.core.date.DateUtil;
import cn.hutool.core.io.FileUtil;
import org.dom4j.Document;
import org.dom4j.DocumentException;
import org.dom4j.Element;
import org.dom4j.io.SAXReader;

import java.io.File;
import java.io.IOException;
import java.net.URL;
import java.text.ParseException;
import java.util.Date;
import java.util.Iterator;

/**
 * 博客园xml转md文档
 */
public class App {
    // 指定你要导出的目录
    static final String path = "C:\Users\zhuiz\OneDrive\blogs";

    public static void main(String[] args) throws DocumentException, IOException, ParseException {
        SAXReader reader = new SAXReader();
        URL url = App.class.getClassLoader().getResource("sample.xml");
        Document document = reader.read(url);
        Element root = document.getRootElement();
        Element channel = null;
        for (Iterator<Element> it = root.elementIterator(); it.hasNext(); ) {
            channel = it.next();
        }
        for (Iterator<Element> it = channel.elementIterator("item"); it.hasNext(); ) {
            Element item = it.next();
            String title = item.element("title").getTextTrim();
            String link = item.element("link").getTextTrim();
            String prefix = "---
" +
                    "title: '" + title + "'
" +
                    "date: " + format(item.element("pubDate").getTextTrim()) + "



" +
                    "---



" +
                    "<meta name = "referrer" content = "no-referrer" />



";
            inputFile(title(title), link + "




" + prefix + item.element("description").getText());
        }
    }

    // 替换windows中非法文件名
    public static String title(String invalidTitle) {
        return invalidTitle.replaceAll("[/\\:*?<>|]", "") + ".md";
    }

    public static void inputFile(String title, String content) throws IOException {
        File file = new File(path, title);
        file.createNewFile();
        FileUtil.appendString(content, file, "UTF-8");

    }

    static String format(String stringDate) {
        return DateUtil.format(new Date(stringDate), "yyyy-MM-dd HH:mm:ss");
    }
}

使用说明：

path指定你要导出的markdown文件放到哪里
用户导出的xml文件重命名成sample.xml放在resources文件夹中
运行App.java

效果

源码地址：

xml-handler

待完善：

处理重名博客。
代码可以重构一下。

相关阅读:
安装nginx
查看Linux内核
 Python学习总结11：获取当前运行类名和函数名
 Python学习总结10：获取shell输出结果
 Python学习总结9：目录操作方法汇总
 Python学习总结8：文件模式及操作方法汇总
 Python学习总结7：随机字符串和随机数
 Python学习总结6：字符串格式化操作及方法总结
 Python学习总结5：数据类型及转换
 Python学习总结4：字符串常量与操作汇总
原文地址：https://www.cnblogs.com/greyzeng/p/14949545.html