• pandas_series04


    1. 如何计算两个series之间的欧氏距离
          p = pd.Series([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
          q = pd.Series([10, 9, 8, 7, 6, 5, 4, 3, 2, 1])
          
          \# 方法1
          sum((p - q)**2)**.5
          
          \# 方法2
          np.linalg.norm(p-q)
         
          #>    18.16590212458495
    2. 如何在数值series中找局部最大值
      局部最大值对应二阶导局部最小值
          ser = pd.Series([2, 10, 3, 4, 9, 10, 2, 7, 3])
          
          \# 二阶导
          dd = np.diff(np.sign(np.diff(ser)))
          \# 二阶导的最小值对应的值为最大值,返回最大值的索引
          peak_locs = np.where(dd == -2)[0] + 1
          peak_locs
          
          #>    array([1, 5, 7], dtype=int64)
    3. 如何用最少出现的字符替换空格符
      my_str = 'dbc deb abed gade'
      
      # 方法
      ser = pd.Series(list('dbc deb abed gade'))
      # 统计元素的频数
      freq = ser.value_counts()
      print(freq)
      # 求最小频数的字符
      least_freq = freq.dropna().index[-1]
      # 替换
      "".join(ser.replace(' ', least_freq))
      
      #>    d    4
               3
          b    3
          e    3
          a    2
          c    1
          g    1
          dtype: int64
      
      #>    'dbcgdebgabedggade'

      27如何计算数值series的自相关系数

      ser = pd.Series(np.arange(20) + np.random.normal(1, 10, 20))
      
      # 求series的自相关系数,i为偏移量
      autocorrelations = [ser.autocorr(i).round(2) for i in range(11)]
      print(autocorrelations[1:])
      # 选择最大的偏移量
      print('Lag having highest correlation: ', np.argmax(np.abs(autocorrelations[1:]))+1)
      
      #>    [0.33, 0.41, 0.48, 0.01, 0.21, 0.16, -0.11, 0.05, 0.34, -0.24]
      #>    Lag having highest correlation:  3
    4. 如何对series进行算术运算操作
      # 如何对series之间进行算法运算
      import pandas as pd
      series1 = pd.Series([3,4,4,4],['index1','index2','index3','index4'])
      series2 = pd.Series([2,2,2,2],['index1','index2','index33','index44'])
      # 加法
      series_add = series1 + series2
      print(series_add)
      # 减法
      series_minus = series1 - series2
      # series_minus
      # 乘法
      series_multi = series1 * series2
      # series_multi
      # 除法
      series_div = series1/series2
      series_div
      series是基于索引进行算数运算操作的,pandas会根据索引对数据进行运算,若series之间有不同的索引,对应的值就为Nan。结果如下:
      #加法:
      index1     5.0
      index2     6.0
      index3     NaN
      index33    NaN
      index4     NaN
      index44    NaN
      dtype: float64
      #除法:
      index1     1.5
      index2     2.0
      index3     NaN
      index33    NaN
      index4     NaN
      index44    NaN
      dtype: float64
  • 相关阅读:
    Python 3.6安装yaml时报"AttributeError: module 'pip' has no attribute 'main'"和“Non-zero exit code”错误
    Python 3.6版本中实现 HTMLTestRunner输出时”fp=file(filename,'wb')“报错
    LoadRunner录制脚本时没有响应——无法启动浏览器问题总结
    python中print不换行
    python中for循环的三种遍历方式
    python enumerate用法
    Python中添加中文注释报错SyntaxError: Non-UTF-8 code starting with 'xc1'
    pycharm 2017最新激活码
    设计模式之禅2之六大原则
    hibernate错误整理
  • 原文地址:https://www.cnblogs.com/huaobin/p/15687038.html
Copyright © 2020-2023  润新知