• python 的csr_python


    [转载链接]:python 的csr_python - 以便携式数据形式保存/加载scipy稀疏csr_matrix_weixin_39974223的博客-CSDN博客

    以下是使用Jupyter笔记本的三个最受欢迎的答案的性能比较。 输入是一个1M x 100K随机稀疏矩阵,密度为0.001,包含100M非零值:

    from scipy.sparse import random

    matrix = random(1000000, 100000, density=0.001, format='csr')

    matrix

    <1000000x100000 sparse matrix of type ''

    with 100000000 stored elements in Compressed Sparse Row format>

    cPickle/np.savez

    from scipy.sparse import io

    %time io.mmwrite('test_io.mtx', matrix)

    CPU times: user 4min 37s, sys: 2.37 s, total: 4min 39s

    Wall time: 4min 39s

    %time matrix = io.mmread('test_io.mtx')

    CPU times: user 2min 41s, sys: 1.63 s, total: 2min 43s

    Wall time: 2min 43s

    matrix

    <1000000x100000 sparse matrix of type ''

    with 100000000 stored elements in COOrdinate format>

    Filesize: 3.0G.

    (请注意,格式已从csr更改为coo)。

    cPickle/np.savez

    import numpy as np

    from scipy.sparse import csr_matrix

    def save_sparse_csr(filename, array):

    # note that .npz extension is added automatically

    np.savez(filename, data=array.data, indices=array.indices,

    indptr=array.indptr, shape=array.shape)

    def load_sparse_csr(filename):

    # here we need to add .npz extension manually

    loader = np.load(filename + '.npz')

    return csr_matrix((loader['data'], loader['indices'], loader['indptr']),

    shape=loader['shape'])

    %time save_sparse_csr('test_savez', matrix)

    CPU times: user 1.26 s, sys: 1.48 s, total: 2.74 s

    Wall time: 2.74 s

    %time matrix = load_sparse_csr('test_savez')

    CPU times: user 1.18 s, sys: 548 ms, total: 1.73 s

    Wall time: 1.73 s

    matrix

    <1000000x100000 sparse matrix of type ''

    with 100000000 stored elements in Compressed Sparse Row format>

    Filesize: 1.1G.

    cPickle

    import cPickle as pickle

    def save_pickle(matrix, filename):

    with open(filename, 'wb') as outfile:

    pickle.dump(matrix, outfile, pickle.HIGHEST_PROTOCOL)

    def load_pickle(filename):

    with open(filename, 'rb') as infile:

    matrix = pickle.load(infile)

    return matrix

    %time save_pickle(matrix, 'test_pickle.mtx')

    CPU times: user 260 ms, sys: 888 ms, total: 1.15 s

    Wall time: 1.15 s

    %time matrix = load_pickle('test_pickle.mtx')

    CPU times: user 376 ms, sys: 988 ms, total: 1.36 s

    Wall time: 1.37 s

    matrix

    <1000000x100000 sparse matrix of type ''

    with 100000000 stored elements in Compressed Sparse Row format>

    Filesize: 1.1G.

    注意:cPickle不适用于非常大的对象(请参阅此答案)。根据我的经验,它不适用于具有270M非零值的2.7M x 50k矩阵。cPickle解决方案效果很好。

    结论

    (基于这个简单的CSR矩阵测试)cPickle是最快的方法,但它不适用于非常大的矩阵,np.savez只是稍慢,而io.mmwrite慢得多,产生更大的文件并恢复到错误的格式。 所以np.savez是赢家。
    ————————————————
    版权声明:本文为CSDN博主「weixin_39974223」的原创文章,遵循CC 4.0 BY-SA版权协议,转载请附上原文出处链接及本声明。
    原文链接:https://blog.csdn.net/weixin_39974223/article/details/111766769

  • 相关阅读:
    Python编程题32最小栈
    Python编程题31用列表实现队列
    Python编程题34用队列实现栈
    Python编程题40验证字母表的顺序
    Python编程题36三个数的最大乘积
    Python编程题39所有奇数长度子列表的和
    RTX 3090的深度学习环境配置指南:Pytorch、TensorFlow、Keras。配置显卡
    python numpy实现SVD 矩阵分解
    linux安装tomcat部署静态网页
    python使用deepwalk模型算节点相似度
  • 原文地址:https://www.cnblogs.com/huixinquan/p/15221996.html
Copyright © 2020-2023  润新知