python开发时的注意点

1. 要注意动态语言和java不同的点，比如下面这段代码

2.用sqlalchemy插入表时，遇到一个问题，一张表可以插进去，另外一张表却不行

    # 插入或更新crawl_items
    def insert_or_update(self,cls,table_item,**kwargs):
        self.logger.info("kwargs = " + str(kwargs))

        exist = self.session.query(cls).filter_by(**kwargs).first()
        self.logger.info("exist = " + str(exist))
        if not exist:
            self.logger.info("inserting new ")
            self.logger.info("table_item = " + str(table_item))
            self.session.add(table_item)

        else:
            for key in table_item.__dict__:
                if key == '_sa_instance_state':
                    continue
                if hasattr(exist,key):
                    setattr(exist,key,getattr(table_item,key))
            self.logger.info("updating exist")

        try:
            self.session.commit()
        except:
            self.session.rollback()

原因未知，可能是本身框架的bug，解决办法在add之后是加入

self.session.flush()

3. python中用print或者logging打印不出，但却不是whitespace的情况

            logger.info("len = " + str(len(title.strip())))
            logger.info("title = " + title.strip())

在做一个爬虫时，爬下来一个字符串，用 if str.strip() = true，len(str.strip())=4, 但打印出来是‘title = ’，看不出有任何字符, 但我把鼠标移到 ‘title = ’后面，然后按左箭头按钮后退，发现确实可以按四次，只不过鼠标位置没动。但我把字符串转成bytes

b2 = bytes(title.strip(), encoding='utf8')

logger.info("b2 = " + str(b2))

结果显示

b2 = b'xe2x80x8bxe2x80x8bxe2x80x8bxe2x80x8b'

确实是。所以光strip()还不行，还得把这些不可打印字符去掉。

查了一下，unicode中有些字符是zero width character.如 Zero-width space (U+200B)，Zero-width non-joiner (U+200C)，Zero-width joiner (U+200D)，Word joiner (U+2060)，Zero-width no-break space (U+FEFF)

然后我把title专程list，然后打印出这个list

filter_result = [f for f in title]
print("filter_result = " + str(filter_result))

结果为

filter_result = ['
', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ']
filter_result = ['【', '5', '0', '款', '排', '骨', '做', '法', '】', '撑', '爆', '整', '个', '夏', '天', '的', '无', '肉', '不', '欢', '，', '可', '以', '收', '藏', '了', '，', '一', '道', '一', '道', '的', '学', '着', '做', '！']
filter_result = [' ', 'u200b', 'u200b', 'u200b', 'u200b', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ']

看到没，title中有的字符是 unicode Zero-width space (U+200B),这种字符没有长度。

目前就简单把u200b给替换掉，暂时就这么解决吧

title = title.replace(u'u200b', '')

喜欢艺术的码农

相关阅读:
Python交互设计_接口设计
 hibernate注解——@Temporal
java日期格式处理
 Unknown tag
个人总结
 学习进度条——第十七周
 学习进度条——第十六周
 学习进度条——第十五周
 第二阶段冲刺——个人总结10
学习进度条——十四周
原文地址：https://www.cnblogs.com/zjhgx/p/13123553.html