• shell操作文件的几条命令:删除最后一列、删除第一行、diff等


    删除文件第一行: sed '1d' filename

    删除文件最后一列: awk '{print $NF}' filename

    awk删除重复行的命令:awk '{if (!seen[$0]++) {print $0;}}' filename

    比较文件的两种方法:

    1)comm -3 --nocheck-order file1 file2

    2) grep -v -f file1 file2 :输出file2中有file1中没有的行

    当然还有diff file1 file2

    贴一段昨天写的shell脚本~

    #!/bin/bash
    date_time=`date +'%H_%M_%S'`
    yesterday=`date -d"-1 day" +'%Y_%m_%d'`
    today=`date +'%Y_%m_%d'`
    date_day_time=`date +'%Y_%m_%d_%H_%M_%S'`
    
    mkdir /home/spamdetect/changxiaojia/workspace/finance/same_similar_news_mining/same_similiar_log/$today
    
    # begin to get input files which haven't been deal with
    today_input=/home/crawler/petabyte/crawllog/news_data/$today
    yesterday_input=/home/crawler/petabyte/crawllog/news_data/$yesterday
    
    /opt/hadoop/program/bin/hadoop fs -ls $yesterday_input/ > /home/spamdetect/changxiaojia/workspace/finance/same_similar_news_mining/mid_files/all_get
    /opt/hadoop/program/bin/hadoop fs -ls $today_input/ >> /home/spamdetect/changxiaojia/workspace/finance/same_similar_news_mining/mid_files/all_get
    
    sed '1d' /home/spamdetect/changxiaojia/workspace/finance/same_similar_news_mining/mid_files/all_get > /home/spamdetect/changxiaojia/workspace/finance/same_similar_news_mining/mid_files/all_get_without_first_line
    
    awk '{print $NF}' /home/spamdetect/changxiaojia/workspace/finance/same_similar_news_mining/mid_files/all_get_without_first_line > /home/spamdetect/changxiaojia/workspace/finance/same_similar_news_mining/mid_files/all_input
    
    #comm -3 --nocheck-order /home/spamdetect/changxiaojia/workspace/finance/same_similar_news_mining/mid_files/all_input /home/spamdetect/changxiaojia/workspace/finance/same_similar_news_mining/mid_files/input_done > /home/spamdetect/changxiaojia/workspace/finance/same_similar_news_mining/mid_files/today_diff
    
    grep -v -f /home/spamdetect/changxiaojia/workspace/finance/same_similar_news_mining/mid_files/input_done /home/spamdetect/changxiaojia/workspace/finance/same_similar_news_mining/mid_files/all_input > /home/spamdetect/changxiaojia/workspace/finance/same_similar_news_mining/mid_files/today_diff
    
    awk '{print $NF}' /home/spamdetect/changxiaojia/workspace/finance/same_similar_news_mining/mid_files/today_diff > /home/spamdetect/changxiaojia/workspace/finance/same_similar_news_mining/mid_files/today_new_input
    
    mv /home/spamdetect/changxiaojia/workspace/finance/same_similar_news_mining/mid_files/all_input /home/spamdetect/changxiaojia/workspace/finance/same_similar_news_mining/mid_files/input_done
    
    
    # begin to compute same_similary_news
    inputfile1=""
    while read line
    do
      inputfile1=$inputfile1,${line}
    done < /home/spamdetect/changxiaojia/workspace/finance/same_similar_news_mining/mid_files/input_done
    echo $inputfile1
  • 相关阅读:
    01 React快速入门(一)——使用循环时对于‘key’报错处理
    01 div实现浮动效果
    17 制作热力图
    16 ArcGIS Server 10.6.1发布影像服务
    虚拟机上有关于Apache服务基于主机名@4域名访问网页
    虚拟机上有关于Apache服务基于IP地址@3IP访问网站
    Apache有关个人用户主页以及强制访问安全系统功能介绍@2
    Apache服务的安装以及服务文件参数内容的配置 @1
    WVware虚拟机linux环境下使用ssh服务以安全密钥的形式远程控制服务(本地客户端登录远程服务端)
    WVware虚拟机linux环境下RAID5 五块磁盘操作管理实例
  • 原文地址:https://www.cnblogs.com/changxiaoxiao/p/3161279.html
Copyright © 2020-2023  润新知