• Salesforce 数据清洗


    新系统上线后,需要导入历史数据,但是旧数据格式,数据缺失,数据错误,奇异值,属性归类与新系统有很大的gap。因此我们需要建立一套数据动态清洗规则给Salesforce系统,通过这些规则自动清洗导入数据,清洗规则可以让function自己配置。而不需要IT负责

    下面将详细举一个例子如何在salesforce中做数据处理。数据清洗需要分成5个步骤

    1,建立2个关联数据的Object的和 一个数据清洗后台设置的Object的
    2,数据导入页面csv
    3,定义每个字段的范围、属性,如果是错误的则自动重新分配,或者修改成临近值
    4,数据清洗合并。
    5,导出错误数据到Excel
     
    第一步,新建立两个关联的Recruit 和 Recruit Department, 并且建立一个清洗规则的Object,当导入数据后我们可以读取设置的清洗规则,并对导入的数据进行清洗
    第二步,对于清洗规则,我们只能有一条规则被激活,因此我们在插入新规则和更改旧规则的时候,我们需要添加一个tirgger针对Data_Washing_Setting,保证规则的唯一性。
     
     1 trigger IsActiveChecking on Data_Washing_Setting__c (before insert,before update) {
     2 
     3     List<Data_Washing_Setting__c> ListOldData =[select Id from Data_Washing_Setting__c 
     4                                                where Active_this_Rule__c = true];
     5     List<Data_Washing_Setting__c> ListNewData =trigger.new;
     6     
     7     //system.debug('ListNewData:'+ListNewData.size());
     8     integer itemNum = 0;
     9     if(trigger.isInsert)
    10     {
    11         if(trigger.isBefore)
    12         {
    13             for(Data_Washing_Setting__c dws : trigger.new)
    14             {
    15                 if(dws.Active_this_Rule__c)
    16                 {
    17                     itemNum++;
    18                 }
    19             }
    20            itemNum +=ListOldData.size();
    21            
    22            if(itemNum>1)
    23            {
    24                for(Data_Washing_Setting__c dws : trigger.new){   
    25                     dws.adderror('only one record can be actived! pls check your history data and try again.');
    26                 }
    27            }
    28         }
    29     }
    30     else if(trigger.isUpdate)
    31     {   
    32         if(trigger.isBefore)
    33         {
    34             // 去掉更新的数据
    35             for(Data_Washing_Setting__c dws : trigger.new)
    36             {
    37                 for(integer i=0;i<ListOldData.size();i++){
    38                     if(dws.Id== ListOldData[i].Id)
    39                     {
    40                         ListOldData.remove(i);
    41                     }
    42                 }
    43                 if(dws.Active_this_Rule__c)
    44                 {
    45                     itemNum++;
    46                 }
    47             }
    48            itemNum +=ListOldData.size();
    49            if(itemNum>1)
    50            {
    51                for(Data_Washing_Setting__c dws : trigger.new){   
    52                     dws.adderror('only one record can be actived! pls check your history data and try again.');
    53                 }
    54            }
    55           
    56         }
    57     }
    58 }

    第三步,我们需要建立导入页面,并添加相应的验证按钮

    VF的代码

     1 <apex:page controller="BatchInsertByCsvController">
     2     <apex:form >
     3     <apex:sectionHeader title="Upload Recruit Data"/>
     4    <apex:pageMessages />
     5    <apex:pageblock >
     6         <center>
     7             <apex:inputFile value="{!contentFile}" fileName="{!fileName}" />
     8             <apex:commandButton action="{!LoadData}" value="Batch Insert"/>
     9             <apex:commandButton action="{!LoadBlankList}" value="Filter Blank Data"/>
    10             <apex:commandButton action="{!ExportBlankToCSV}" value="Export CSV"/>
    11             
    12         </center>
    13     </apex:pageblock>
    14      <apex:pageBlock title="Import Data">
    15          <apex:pageblocktable value="{!RecruitList}" var="ReList">
    16               <apex:column value="{!ReList.Name}" />
    17               <apex:column value="{!ReList.Position_Name__c}" />
    18               <apex:column value="{!ReList.Recruit_Department__c}" />
    19               <apex:column value="{!ReList.Recruit_Type__c}" />
    20               <apex:column value="{!ReList.Recruit_Number__c}" />
    21         </apex:pageblocktable>
    22      </apex:pageBlock>
    23      <apex:pageBlock title="Blank Data">
    24          <apex:pageblocktable value="{!BlankList}" var="BList">
    25               <apex:column value="{!BList.Name}" />
    26               <apex:column value="{!BList.Position_Name__c}" />
    27               <apex:column value="{!BList.Recruit_Department__c}" />
    28               <apex:column value="{!BList.Recruit_Type__c}" />
    29               <apex:column value="{!BList.Recruit_Number__c}" />
    30         </apex:pageblocktable>
    31      </apex:pageBlock>
    32     </apex:form>
    33 </apex:page>

    后台APEX 导入代码

      1 public class BatchInsertByCsvController {
      2     
      3     public string fileName{get;set;}
      4     //Blob:二进制对象类型。通过inputFile选中后的文件在后台获取的时候是一个Blob类型,
      5     public Blob contentFile{get;set;}
      6     public String[] filelines = new String[]{};
      7     public List<Recruit__c> RecruitList{get;set;}
      8     public List<Recruit__c> BlankList{get;set;}
      9     public List<Recruit__c> invaildList{get;set;}
     10     //初始化
     11     public PageReference LoadData()
     12     {
     13         try{
     14             filename = bitToString(contentFile,'ISO-8859-1');
     15             filelines = fileName.split('
    ');
     16            // ApexPages.Message msgs = new ApexPages.Message(ApexPages.Severity.INFO, 'import account:'+filelines.size());
     17            // ApexPages.addMessage(msgs);
     18             RecruitList = new List<Recruit__c>();
     19             string[] inputvalues;
     20             string SwpNumber;
     21             
     22             for(Integer i=1;i<filelines.size();i++)
     23             {
     24                 inputvalues = new string[]{};
     25                 inputvalues = filelines[i].split(',');
     26                 Recruit__c recruits = new Recruit__c();
     27                 recruits.Name = inputvalues[0];
     28                 recruits.Position_Name__c = inputvalues[1];
     29                 recruits.Recruit_Department__c = [SELECT Id 
     30                                 FROM Recruit_Department__c 
     31                                 WHERE Name =:inputvalues[2] LIMIT 1].Id;
     32                 recruits.Recruit_Type__c = inputvalues[3];
     33                 SwpNumber = inputvalues[4];
     34                 recruits.Recruit_Number__c = Decimal.valueOf(SwpNumber.trim());
     35                 RecruitList.add(recruits);
     36             }
     37         }
     38         catch(exception e){
     39             ApexPages.Message errormsg = new ApexPages.Message(ApexPages.Severity.ERROR,'An error has occured reading the CSV file: '+e.getMessage());
     40             ApexPages.addMessage(errormsg);
     41         }
     42         try{
     43            // insert RecruitList;
     44           //   ApexPages.Message successMsg = new ApexPages.Message(ApexPages.severity.INFO,'import success');
     45             // ApexPages.addMessage(successMsg);
     46         }
     47         catch(Exception e)
     48         {
     49             //ApexPages.Message errormsg = new ApexPages.Message(ApexPages.severity.ERROR,'An error has occured inserting the records'+e.getMessage());
     50             //ApexPages.addMessage(errormsg);
     51         }
     52         return null;
     53     }
     54     //blob是二进制存储的,String是16进制存储的,所以使用此种方式加上编码解码等操作肯定会更加适应,包括中文
     55     private String bitToString(Blob input, String inCharset){
     56          //转换成16进制
     57         String hex = EncodingUtil.convertToHex(input);
     58          //一个String类型两个字节 32位(bit),则一个String长度应该为两个16进制的长度,所以此处向右平移一个单位,即除以2
     59          //向右平移一个单位在正数情况下等同于除以2,负数情况下不等
     60          //eg 9  00001001  >>1 00000100   结果为4
     61          final Integer bytesCount = hex.length() >> 1;
     62          //声明String数组,长度为16进制转换成字符串的长度
     63          String[] bytes = new String[bytesCount];
     64          for(Integer i = 0; i < bytesCount; ++i) {
     65              //将相邻两位的16进制字符串放在一个String中
     66              bytes[i] =  hex.mid(i << 1, 2);
     67          }
     68          //解码成指定charset的字符串
     69          return EncodingUtil.urlDecode('%' + String.join(bytes, '%'), inCharset);
     70      }
     71     //筛选空值
     72     public PageReference LoadBlankList()
     73     {
     74         try
     75         {
     76             BlankList=new list<Recruit__c>();
     77             DataWashingSetting dws=new DataWashingSetting();
     78             string[] flines = dws.AddQuestionsData(filelines);
     79             string[] inputvalues;
     80             string SwpNumber;
     81             
     82             for(Integer i=0;i<flines.size();i++)
     83             {
     84                     inputvalues = new string[]{};
     85                     inputvalues = flines[i].split(',');
     86                     Recruit__c recruits = new Recruit__c();
     87                     recruits.Name = inputvalues[0];
     88                     recruits.Position_Name__c = inputvalues[1];
     89                     recruits.Recruit_Department__c = [SELECT Id 
     90                                     FROM Recruit_Department__c 
     91                                     WHERE Name =:inputvalues[2] LIMIT 1].Id;
     92                     recruits.Recruit_Type__c = inputvalues[3];
     93                     SwpNumber = inputvalues[4];
     94                     recruits.Recruit_Number__c = Decimal.valueOf(SwpNumber.trim());
     95                     BlankList.add(recruits);
     96             }
     97             ApexPages.Message msgs = new ApexPages.Message(ApexPages.Severity.INFO, 'blank num:'+BlankList.size());
     98             ApexPages.addMessage(msgs);
     99         }
    100         catch(Exception e)
    101         {
    102             ApexPages.Message errormsg = new ApexPages.Message(ApexPages.Severity.ERROR,'An error has occured reading the CSV file: '+e.getMessage());
    103             ApexPages.addMessage(errormsg);
    104         }
    105         return null;
    106     }
    107     public PageReference ExportBlankToCSV()
    108     {
    109          return new PageReference('/apex/ExportCSV');
    110     }
    111 }

    后台调用的验证清洗代码,可以根据需要任意添加

     1 public class DataWashingSetting {
     2 
     3     //消除重复数据
     4     public List<Recruit__c> DelDuplicateData(List<Recruit__c> OriginalList)
     5     {        
     6         set<Recruit__c> myset= new set<Recruit__c>();
     7         List<Recruit__c> result = new List<Recruit__c>();
     8         
     9         myset.addAll(OriginalList);
    10         result.addAll(myset);
    11         
    12         return result;
    13     }
    14     //筛选为空数据
    15     public string[] AddQuestionsData(string[] filelines)
    16     {
    17         string[] result =new string[]{}; 
    18         string[] inputvalues;
    19         for(Integer i=1;i<filelines.size();i++)
    20         {
    21             inputvalues = new string[]{};
    22             inputvalues = filelines[i].split(',');
    23             if(inputvalues[0] == ''||inputvalues[1] == '' ||inputvalues[2] == '' 
    24                 ||inputvalues[3] == '' ||inputvalues[4] == '')
    25             {
    26                 result.add(filelines[i]);   
    27             }
    28          }
    29         return result;
    30     }
    31     //检测各个字段的合理性
    32     public string[] CheckFiled(string[] filelines)
    33     {
    34         //读取规则
    35         Data_Washing_Setting__c dws = [select Position_Name_Rule__c,
    36                                        Recruit_End_Number__c,Recruit_Department_Rule__c,Recruit_Start_Number__c from Data_Washing_Setting__c where Active_this_Rule__c = true];
    37         string PositionNameRule = dws.Position_Name_Rule__c; //部门规则是否允许重复
    38         decimal startNumber= dws.Recruit_Start_Number__c; //招聘人数底线
    39         decimal endNumber= dws.Recruit_End_Number__c; //招聘人数上线
    40         string department = dws.Recruit_Department_Rule__c;//部门限制
    41         
    42         string[] result =new string[]{}; 
    43         string[] inputvalues;
    44         for(Integer i=1;i<filelines.size();i++)
    45         {
    46             inputvalues = new string[]{};
    47             inputvalues = filelines[i].split(',');
    48             //填写验证代码
    49          }
    50         return result; //返回不合格代码
    51     }
    52 }

    出现问题数据直接导出问题数据到Excel,手动处理后再导入。

     1 <apex:page controller="BatchInsertByCsvController" cache="true" contentType="application/x-excel# BlankList.xls" showHeader="false">
     2  <head>
     3       <meta http-equiv="Content-Type" content="text/html;charset=UTF-8" />
     4  </head>
     5      <apex:pageBlock >
     6      <apex:pageblocktable value="{!BlankList}" var="BList">
     7               <apex:column value="{!BList.Name}" />
     8               <apex:column value="{!BList.Position_Name__c}" />
     9               <apex:column value="{!BList.Recruit_Department__c}" />
    10               <apex:column value="{!BList.Recruit_Type__c}" />
    11               <apex:column value="{!BList.Recruit_Number__c}" />
    12         </apex:pageblocktable>
    13     </apex:pageBlock>
    14 </apex:page>

    下面就是最终效果:

    1,导入数据,自动筛选有缺失值的数据,并支持Excel导出

    2,后台清洗的规则设置。

     
  • 相关阅读:
    Map
    Enumeration输出
    iterator的基本用法
    Annotation整合工厂设计模式
    自定义Annotation
    Annotation
    动态代理设计模式
    静态代理设计模式
    自定义ClassLoader
    获取类的类对象的几种方式
  • 原文地址:https://www.cnblogs.com/guoyinghai/p/6081281.html
Copyright © 2020-2023  润新知