• pyspark AttributeError: 'NoneType' object has no attribute 'setCallSite'


    pyspark:

    AttributeError: 'NoneType' object has no attribute 'setCallSite'

    我草,是pyspark的bug。解决方法:

    print("Approximately joining on distance smaller than 0.6:")
        distance_min = model.approxSimilarityJoin(imsi_proc_df, imsi_proc_df, 1e6, distCol="JaccardDistance") 
            .select(col("datasetA.id").alias("idA"),
                    col("datasetB.id").alias("idB"),
                    col("JaccardDistance")) #.filter("idA=idB")
        print(distance_min.show())
        print("*"*88)
        print(imsi_proc_df.show())
    
        key = Vectors.sparse(53, [1, 3], [1.0, 1.0])
        print(model.approxNearestNeighbors(imsi_proc_df, key, 2).show())
        print("start calculate find botnet!")
        print("*"*99)
        print("time start:", time.time())
        print(type(distance_min), dir(distance_min))
        print(dir(distance_min.toLocalIterator))

    ############################################## add this line to solve distance_min.sql_ctx.sparkSession._jsparkSession
    = spark_app._jsparkSession distance_min._sc = spark_app._sc
    ############################################# similarity_val_rdd
    = distance_min.toLocalIterator #.collect() print("time end:", time.time()) print(similarity_val_rdd) print("*"*99) try: G = ConnectedGraph() ddos_ue_list = [] for item in similarity_val_rdd(): imsi, imsi2, jacard_similarity_val = item["idA"], item["idB"], item["JaccardDistance"] print("???", imsi, imsi2, jacard_similarity_val)

    Description

    reproducing the bug from the example in the documentation:

    import pyspark
    from pyspark.ml.linalg import Vectors
    from pyspark.ml.stat import Correlation
    spark = pyspark.sql.SparkSession.builder.getOrCreate()
    dataset = [[Vectors.dense([1, 0, 0, -2])],
     [Vectors.dense([4, 5, 0, 3])],
     [Vectors.dense([6, 7, 0, 8])],
     [Vectors.dense([9, 0, 0, 1])]]
    dataset = spark.createDataFrame(dataset, ['features'])
    df = Correlation.corr(dataset, 'features', 'pearson')
    df.collect()
     
    

    This produces the following stack trace:

    ---------------------------------------------------------------------------
    AttributeError                            Traceback (most recent call last)
    <ipython-input-92-e7889fa5d198> in <module>()
         11 dataset = spark.createDataFrame(dataset, ['features'])
         12 df = Correlation.corr(dataset, 'features', 'pearson')
    ---> 13 df.collect()
    
    /opt/spark/python/pyspark/sql/dataframe.py in collect(self)
        530         [Row(age=2, name=u'Alice'), Row(age=5, name=u'Bob')]
        531         """
    --> 532         with SCCallSiteSync(self._sc) as css:
        533             sock_info = self._jdf.collectToPython()
        534         return list(_load_from_socket(sock_info, BatchedSerializer(PickleSerializer())))
    
    /opt/spark/python/pyspark/traceback_utils.py in __enter__(self)
         70     def __enter__(self):
         71         if SCCallSiteSync._spark_stack_depth == 0:
    ---> 72             self._context._jsc.setCallSite(self._call_site)
         73         SCCallSiteSync._spark_stack_depth += 1
         74 
    
    AttributeError: 'NoneType' object has no attribute 'setCallSite'

    Analysis:

    Somehow the dataframe properties `df.sql_ctx.sparkSession._jsparkSession`, and `spark._jsparkSession` do not match with the ones available in the spark session.

    The following code fixes the problem (I hope this helps you narrowing down the root cause)

    df.sql_ctx.sparkSession._jsparkSession = spark._jsparkSession
    df._sc = spark._sc
    
    df.collect()
    
    >>> [Row(pearson(features)=DenseMatrix(4, 4, [1.0, 0.0556, nan, 0.4005, 0.0556, 1.0, nan, 0.9136, nan, nan, 1.0, nan, 0.4005, 0.9136, nan, 1.0], False))]
  • 相关阅读:
    利用知名站点欺骗挂马
    海量数据库解决方案
    利用第三方浏览器漏洞钓鱼
    WCF的用户名+密码认证方式
    启用 Master Data Services 的 Web Service
    ExtJS 4.1有什么值得期待?
    [笔记]软件框架设计的艺术
    Master Data Server API 更新 Member 内置字段(Code、Name)
    Master Data Service调用API创建Model
    Silverlight传值
  • 原文地址:https://www.cnblogs.com/bonelee/p/10976253.html
Copyright © 2020-2023  润新知