• pyspark AttributeError: 'NoneType' object has no attribute 'setCallSite'


    pyspark:

    AttributeError: 'NoneType' object has no attribute 'setCallSite'

    我草,是pyspark的bug。解决方法:

    print("Approximately joining on distance smaller than 0.6:")
        distance_min = model.approxSimilarityJoin(imsi_proc_df, imsi_proc_df, 1e6, distCol="JaccardDistance") 
            .select(col("datasetA.id").alias("idA"),
                    col("datasetB.id").alias("idB"),
                    col("JaccardDistance")) #.filter("idA=idB")
        print(distance_min.show())
        print("*"*88)
        print(imsi_proc_df.show())
    
        key = Vectors.sparse(53, [1, 3], [1.0, 1.0])
        print(model.approxNearestNeighbors(imsi_proc_df, key, 2).show())
        print("start calculate find botnet!")
        print("*"*99)
        print("time start:", time.time())
        print(type(distance_min), dir(distance_min))
        print(dir(distance_min.toLocalIterator))

    ############################################## add this line to solve distance_min.sql_ctx.sparkSession._jsparkSession
    = spark_app._jsparkSession distance_min._sc = spark_app._sc
    ############################################# similarity_val_rdd
    = distance_min.toLocalIterator #.collect() print("time end:", time.time()) print(similarity_val_rdd) print("*"*99) try: G = ConnectedGraph() ddos_ue_list = [] for item in similarity_val_rdd(): imsi, imsi2, jacard_similarity_val = item["idA"], item["idB"], item["JaccardDistance"] print("???", imsi, imsi2, jacard_similarity_val)

    Description

    reproducing the bug from the example in the documentation:

    import pyspark
    from pyspark.ml.linalg import Vectors
    from pyspark.ml.stat import Correlation
    spark = pyspark.sql.SparkSession.builder.getOrCreate()
    dataset = [[Vectors.dense([1, 0, 0, -2])],
     [Vectors.dense([4, 5, 0, 3])],
     [Vectors.dense([6, 7, 0, 8])],
     [Vectors.dense([9, 0, 0, 1])]]
    dataset = spark.createDataFrame(dataset, ['features'])
    df = Correlation.corr(dataset, 'features', 'pearson')
    df.collect()
     
    

    This produces the following stack trace:

    ---------------------------------------------------------------------------
    AttributeError                            Traceback (most recent call last)
    <ipython-input-92-e7889fa5d198> in <module>()
         11 dataset = spark.createDataFrame(dataset, ['features'])
         12 df = Correlation.corr(dataset, 'features', 'pearson')
    ---> 13 df.collect()
    
    /opt/spark/python/pyspark/sql/dataframe.py in collect(self)
        530         [Row(age=2, name=u'Alice'), Row(age=5, name=u'Bob')]
        531         """
    --> 532         with SCCallSiteSync(self._sc) as css:
        533             sock_info = self._jdf.collectToPython()
        534         return list(_load_from_socket(sock_info, BatchedSerializer(PickleSerializer())))
    
    /opt/spark/python/pyspark/traceback_utils.py in __enter__(self)
         70     def __enter__(self):
         71         if SCCallSiteSync._spark_stack_depth == 0:
    ---> 72             self._context._jsc.setCallSite(self._call_site)
         73         SCCallSiteSync._spark_stack_depth += 1
         74 
    
    AttributeError: 'NoneType' object has no attribute 'setCallSite'

    Analysis:

    Somehow the dataframe properties `df.sql_ctx.sparkSession._jsparkSession`, and `spark._jsparkSession` do not match with the ones available in the spark session.

    The following code fixes the problem (I hope this helps you narrowing down the root cause)

    df.sql_ctx.sparkSession._jsparkSession = spark._jsparkSession
    df._sc = spark._sc
    
    df.collect()
    
    >>> [Row(pearson(features)=DenseMatrix(4, 4, [1.0, 0.0556, nan, 0.4005, 0.0556, 1.0, nan, 0.9136, nan, nan, 1.0, nan, 0.4005, 0.9136, nan, 1.0], False))]
  • 相关阅读:
    Linux input子系统学习总结(一)---- 三个重要的结构体
    DRM/KMS 基本组件介绍
    Framebuffer 驱动学习总结(二)---- Framebuffer模块初始化
    Framebuffer 驱动学习总结(一) ---- 总体架构及关键结构体
    Linux USB驱动学习总结(三)---- USB鼠标的加载、初始化和通信过程
    Linux USB驱动学习总结(一)---- USB基本概念及驱动架构
    使用Python调用动态库
    使用 SignalR与SSE(Sever sent event)向客户端推送提示信息
    在IDEA下使用Spring Boot的热加载(Hotswap)
    使用Spring boot + jQuery上传文件(kotlin)
  • 原文地址:https://www.cnblogs.com/bonelee/p/10976253.html
Copyright © 2020-2023  润新知