sklearn 词袋 CountVectorizer

from sklearn.feature_extraction.text import CountVectorizer

texts=["dog cat fish","dog cat cat","fish bird", 'bird']
cv = CountVectorizer()
cv_fit=cv.fit_transform(texts)

print(cv.get_feature_names())
print(cv_fit.toarray())
#['bird', 'cat', 'dog', 'fish']
#[[0 1 1 1]
# [0 2 1 0]
# [1 0 0 1]
# [1 0 0 0]]

print(cv_fit.toarray().sum(axis=0))
#[2 3 2 2]

相关阅读:
ACM算法
 过度拟合的问题
 多类分类：一对多
 先进的优化
 简化成本函数和梯度下降
 对数回归的成本函数
 决策边界
 假设表示
 分类
 hdu1574 I Hate It （线段树，查询区间最大值）
原文地址：https://www.cnblogs.com/bonelee/p/7808700.html

最新文章
HDU 5692 区间最大值+DFS序
 hdu 5708 博弈找规律
 Codeforces Round #355D (Div. 2) 暴力+BFS
Codeforces #355C (Div. 2) 组合数
 Codeforces #354D (Div. 2) 暴力BFS
Codeforces#354C (Div. 2) 二分答案
 C# 简单的写入EXCEL操作
 Codeforces #353D (Div. 2) STL+数据结构性质
 Codeforces #353 (Div. 2)C 贪心脑洞题
 (转)iOS Wow体验

热门文章
(转)iOS Wow体验
 (转)iOS Wow体验
 (转)iOS Wow体验
 (转)iOS Wow体验
 (转)iOS Wow体验
 （转）iPhone 判断UITableView 滚动到底部
 （转载）iOS Framework: Introducing MKNetworkKit
(转)直接保存对象的数据库——db4o
(转)A drop-in universal solution for moving text fields out of the way of the keyboard
惩罚项