Spark ML 之 ALS内存溢出的解决办法

原帖：https://blog.csdn.net/Damonhaus/article/details/76572971

问题：协同过滤 ALS算法。在测试过程中遇到了内存溢出的错误

解决办法1：降低迭代次数，20次 -> 10次

val model = new ALS().setRank(10).setIterations(20).setLambda(0.01).setImplicitPrefs(false) .run(alldata)

以上改成 .setIterations(10)

解决办法2：checkpoint机制

  /**
     *  删除checkpoint留下的过程数据
     */
    val path = new Path(HDFSConnection.paramMap("hadoop_url")+"/checkpoint"); //声明要操作（删除）的hdfs 文件路径
    val hadoopConf = spark.sparkContext.hadoopConfiguration
    val hdfs = org.apache.hadoop.fs.FileSystem.get(new URI(HDFSConnection.paramMap("hadoop_url")+"/checkpoint"),hadoopConf)
    if(hdfs.exists(path)) {
      //需要递归删除设置true，不需要则设置false
      hdfs.delete(path, true) //这里因为是过程数据，可以递归删除
    }

  /**
   * 设置 CheckpointDir
   */
    spark.sparkContext.setCheckpointDir(HDFSConnection.paramMap("hadoop_url")+"/checkpoint")

 /**
   * Set period (in iterations) between checkpoints (default = 10). Checkpointing helps with
   * recovery (when nodes fail) and StackOverflow exceptions caused by long lineage. It also helps
   * with eliminating temporary shuffle files on disk, which can be important when there are many
   * ALS iterations. If the checkpoint directory is not set in [[org.apache.spark.SparkContext]],
   * this setting is ignored.
   */

val model = new ALS().setCheckpointInterval(2).setRank(10).setIterations(20).setLambda(0.01).setImplicitPrefs(false)
      .run(alldata)

相关阅读:
u盘的超级用法
文件夹访问被拒绝
web移动前端的click点透问题
call()apply()ind()备忘录
Safari中的new Date()格式化坑
dataURI V.S. CSS Sprites 移动端
css3属性之 box-sizing
多人协作代码--公共库的引用与业务约定
web前端本地测试方法
依赖包拼合方法

原文地址：https://www.cnblogs.com/sabertobih/p/13863214.html