• 可映射的CSV读取引擎


    由于许多项目都会使用csv文件来存储数据,因此我在这里介绍一套我自己设计出来的解决方案。有不合理的地方还望指出。

    一般的csv文件读取都会比较繁琐:按照分隔符(默认逗号)分好行,列,再根据对应的顺序,一行一行,一条一条地读取数据。这本书没什么问题,然而一旦更改csv里的列顺序,或者增删某行就会产生牵一发动全身的结果。而且字段多的时候,写起来是非常反人类的。。

    我们项目起初就是用的这种原始解决方案,也的确碰到了上面提及的尴尬局面。后来我想到,如果我能对csv表结构做好映射,像json,像xml那样,不就能大大提高效率?

    于是我引出了如下的设计方案:

    1. 准备

    首先定义两条特性,一条是表整体结构相关的,一条是用来做字段映射

     1 /// <summary>
     2     /// CSV column mapping
     3     /// </summary>
     4     [AttributeUsage(AttributeTargets.Property | AttributeTargets.Field)]
     5     public class CSVColumnAttribute : Attribute
     6     {
     7         /// <summary>
     8         /// Name of this property/field in csv file(default is property name)
     9         /// </summary>
    10         public string Key { get; set; }
    11 
    12         /// <summary>
    13         /// Column of this property/field in csv file(if column is assigned, key will be ignored)
    14         /// </summary>
    15         public int Column { get; set; }
    16 
    17         /// <summary>
    18         /// Default value(if reading NULL or failed;deault: -1 for number value, null for class, false for bool)
    19         /// </summary>
    20         public object DefaultValue { get; set; }
    21 
    22         /// <summary>
    23         /// Separator for parsing if it's an array(',' by default)
    24         /// </summary>
    25         public char ArraySeparator { get; set; }
    26 
    27 
    28         public CSVColumnAttribute()
    29         {
    30             Column = -1;
    31             ArraySeparator = '#';
    32         }
    33 
    34         public CSVColumnAttribute(string key)
    35         {
    36             Key = key;
    37             Column = -1;
    38             ArraySeparator = '#';
    39         }
    40 
    41         public CSVColumnAttribute(int column)
    42         {
    43             Column = column;
    44             ArraySeparator = '#';
    45         }
    46     }
     1 /// <summary>
     2     /// CSV Mapping class or struct(Try avoid struct as possible. Struct is boxed then unboxed in reflection)
     3     /// </summary>
     4     [AttributeUsage(AttributeTargets.Class | AttributeTargets.Struct)]
     5     public class CSVMapperAttribute : Attribute
     6     {
     7         /// <summary>
     8         /// Path of the CSV file(without file extension). Base directory is Assets/Resources/
     9         /// </summary>
    10         public string Path { get; set; }
    11 
    12         /// <summary>
    13         /// Mapping key row(0 by default)
    14         /// </summary>
    15         public int KeyRow { get; set; }
    16 
    17         /// <summary>
    18         /// Description row(1 by default. Will be skipped in decoding. If no desc in file, assign -1)
    19         /// </summary>
    20         public int DescRow { get; set; }
    21 
    22         /// <summary>
    23         /// Separator for csv parsing(',' by default)
    24         /// </summary>
    25         public char Separator { get; set; }
    26 
    27         /// <summary>
    28         /// Starting index of data rows
    29         /// </summary>
    30         public int StartRow { get; set; }
    31 
    32         public CSVMapperAttribute()
    33         {
    34             KeyRow = 0;
    35             DescRow = 1;
    36             Separator = ',';
    37         }
    38 
    39         public CSVMapperAttribute(string name)
    40         {
    41             Path = name;
    42             KeyRow = 0;
    43             DescRow = 1;
    44             Separator = ',';
    45         }
    46     }

    表相关特性里的属性包括:CSV所在路径(可选),键值所在行(针对非英文表格),描述所在行(可选),分隔符(默认为逗号','),起始行(可选,解析时会跳过这之前的行)

    字段映射相关特性的属性包括:键值,对应列号(键值和列号2选1即可。都不设置则默认键值为属性名),默认值(可选,字段解析失败会返回此默认值),数组分隔符(可选,默认为'#',用来分隔数组)

    CSVMapperAttribute可以添加到类或结构体上,CSVColumnAttribute可以添加到属性或字段上。

    2. 读取和解析

      1 public class CSVEngine
      2     {
      3         private List<List<string>> _records;
      4 
      5         /// <summary>
      6         /// Get column count
      7         /// </summary>
      8         public int ColumnCount { get; private set; }
      9 
     10         /// <summary>
     11         /// Get row count
     12         /// </summary>
     13         public int RowCount { get; private set; }
     14 
     15         /// <summary>
     16         /// Get separator
     17         /// </summary>
     18         public char Separator { get; private set; }
     19 
     20         private int _keyRow = -1;
     21         private int _descRow = -1;
     22         private int _startRow = -1;
     23 
     24         /// <summary>
     25         /// Decode CSV file to target mapped type.
     26         /// </summary>
     27         /// <typeparam name="T"></typeparam>
     28         /// <param name="path"></param>
     29         /// <returns></returns>
     30         public IEnumerable<T> Decode<T>() where T : new()
     31         {
     32             if (_records == null || _keyRow < 0 || _descRow < 0 || _startRow < 0)
     33             {
     34                 Debug.LogError(string.Format("Decoding Failed: {0}", typeof (T)));
     35                 yield break;
     36             }
     37 
     38             //Decode each row
     39             for (int i = _startRow; i < _records.Count; i++)
     40             {
     41                 if (i == _keyRow || i == _descRow)
     42                     continue;
     43                 yield return DecodeRow<T>(_records[i], _records[_keyRow]);
     44             }
     45         }
     46 
     47         /// <summary>
     48         /// Decode single row
     49         /// </summary>
     50         /// <typeparam name="T"></typeparam>
     51         /// <param name="fields"></param>
     52         /// <param name="keys"></param>
     53         /// <returns></returns>
     54         private T DecodeRow<T>(List<string> fields, List<string> keys) where T : new()
     55         {
     56             T result = new T();
     57             IEnumerable<MemberInfo> members =
     58                 typeof (T).GetMembers()
     59                     .Where(m => m.MemberType == MemberTypes.Property || m.MemberType == MemberTypes.Field)
     60                     .Where(m => Attribute.IsDefined(m, typeof (CSVColumnAttribute), false));
     61 
     62             if (typeof (T).IsValueType)
     63             {
     64                 object boxed = result;
     65                 foreach (MemberInfo member in members)
     66                 {
     67                     CSVColumnAttribute attribute =
     68                         member.GetCustomAttributes(typeof (CSVColumnAttribute), false).First() as CSVColumnAttribute;
     69                     string field = GetRawValue(attribute, fields, keys, member.Name);
     70                     if (ReferenceEquals(field, member.Name))
     71                         return result;
     72                     SetValue(member, boxed, field, attribute.DefaultValue, attribute.ArraySeparator);
     73                 }
     74                 return (T) boxed;
     75             }
     76 
     77             foreach (MemberInfo member in members)
     78             {
     79                 CSVColumnAttribute attribute =
     80                     member.GetCustomAttributes(typeof (CSVColumnAttribute), false).First() as CSVColumnAttribute;
     81                 string field = GetRawValue(attribute, fields, keys, member.Name);
     82                 if (ReferenceEquals(field, member.Name))
     83                     return result;
     84                 SetValue(member, result, field, attribute.DefaultValue, attribute.ArraySeparator);
     85             }
     86             return result;
     87         }
     88 
     89         /// <summary>
     90         /// Get raw value by CSVColumnAttribute or name
     91         /// </summary>
     92         /// <param name="attribute"></param>
     93         /// <param name="fields"></param>
     94         /// <param name="keys"></param>
     95         /// <param name="name"></param>
     96         /// <returns></returns>
     97         private string GetRawValue(CSVColumnAttribute attribute, List<string> fields, List<string> keys, string name)
     98         {
     99             if (attribute.Column >= 0 && fields.Count > attribute.Column)
    100             {
    101                 return fields[attribute.Column];
    102             }
    103             if (!string.IsNullOrEmpty(attribute.Key) && keys.Contains(attribute.Key))
    104             {
    105                 return fields[keys.IndexOf(attribute.Key)];
    106             }
    107             if (keys.Contains(name))
    108             {
    109                 return fields[keys.IndexOf(name)];
    110             }
    111             Debug.LogError(string.Format("Mapping Error! Column: {0}, Key: {1}, Name:{2}", attribute.Column,
    112                 attribute.Key ?? "NULL", name));
    113             return name;
    114         }
    115 
    116         /// <summary>
    117         /// Parse and set raw value
    118         /// </summary>
    119         /// <param name="member"></param>
    120         /// <param name="obj"></param>
    121         /// <param name="value"></param>
    122         /// <param name="defaultValue"></param>
    123         /// <param name="arraySeparator"></param>
    124         private void SetValue(MemberInfo member, object obj, string value, object defaultValue, char arraySeparator)
    125         {
    126             if (member.MemberType == MemberTypes.Property)
    127             {
    128                 (member as PropertyInfo).SetValue(obj,
    129                     ParseRawValue(value, (member as PropertyInfo).PropertyType, defaultValue, arraySeparator),
    130                     null);
    131             }
    132             else
    133             {
    134                 (member as FieldInfo).SetValue(obj,
    135                     ParseRawValue(value, (member as FieldInfo).FieldType, defaultValue, arraySeparator));
    136             }
    137         }
    138 
    139         /// <summary>
    140         /// Parse string value to specified type
    141         /// </summary>
    142         /// <param name="field"></param>
    143         /// <param name="type">If type is collection, use array only(e.g. int[])</param>
    144         /// <param name="defaultValue">If type is collection, use element default(e.g. 0 for int[])</param>
    145         /// <param name="arraySeparator"></param>
    146         /// <returns></returns>
    147         private object ParseRawValue(string field, Type type, object defaultValue, char arraySeparator)
    148         {
    149             try
    150             {
    151                 if (type.IsArray)
    152                 {
    153                     IEnumerable<object> result =
    154                         field.Split(arraySeparator)
    155                             .Select(f => ParseRawValue(f, type.GetElementType(), defaultValue, arraySeparator));
    156                     if (type.GetElementType() == typeof (string))
    157                     {
    158                         return result.Cast<string>().ToArray();
    159                     }
    160                     if (type.GetElementType() == typeof (int))
    161                     {
    162                         return result.Cast<int>().ToArray();
    163                     }
    164                     if (type.GetElementType() == typeof (float))
    165                     {
    166                         return result.Cast<float>().ToArray();
    167                     }
    168                     if (type.GetElementType() == typeof (double))
    169                     {
    170                         return result.Cast<double>().ToArray();
    171                     }
    172                     if (type.GetElementType() == typeof (bool))
    173                     {
    174                         return result.Cast<bool>().ToArray();
    175                     }
    176                     return null;
    177                 }
    178                 if (type == typeof (string))
    179                 {
    180                     return field;
    181                 }
    182                 if (type == typeof (int))
    183                 {
    184                     return Convert.ToInt32(field);
    185                 }
    186                 if (type == typeof (float))
    187                 {
    188                     return Convert.ToSingle(field);
    189                 }
    190                 if (type == typeof (double))
    191                 {
    192                     return Convert.ToDouble(field);
    193                 }
    194                 if (type == typeof (bool))
    195                 {
    196                     if (field == null)
    197                     {
    198                         return false;
    199                     }
    200                     field = field.Trim();
    201                     return field.Equals("true", StringComparison.CurrentCultureIgnoreCase) || field.Equals("1");
    202                 }
    203             }
    204             catch (FormatException ex)
    205             {
    206                 Debug.LogWarning(string.Format("{0}: {1} -> {2}", ex.Message, field, type));
    207 
    208                 //In case default value is null but the property/field is not a reference type
    209                 if (defaultValue == null)
    210                 {
    211                     if (type == typeof (int) || type == typeof (float) || type == typeof (double))
    212                     {
    213                         defaultValue = -1;
    214                     }
    215                     else if (type == typeof (bool))
    216                     {
    217                         defaultValue = false;
    218                     }
    219                 }
    220             }
    221 
    222             return defaultValue;
    223         }
    224 
    225         /// <summary>
    226         /// Load CSV into record list. If you need to decode records, use Decode(path) instead.
    227         /// </summary>
    228         /// <param name="path"></param>
    229         /// <param name="separator"></param>
    230         public bool Load(string path, char separator = ',')
    231         {
    232             //Dispose records
    233             ClearRecord();
    234 
    235             if (string.IsNullOrEmpty(path))
    236             {
    237                 Debug.LogError(string.Format("CSV path not found: {0}", path));
    238                 return false;
    239             }
    240 
    241             //Read text
    242             TextAsset asset = Resources.Load<TextAsset>(path);
    243 
    244             if (asset == null)
    245             {
    246                 Debug.LogError(string.Format("CSV file not found: {0}", path));
    247                 return false;
    248             }
    249 
    250             string content = asset.text;
    251             if (string.IsNullOrEmpty(content))
    252             {
    253                 Debug.LogError(string.Format("CSV file content empty: {0}", path));
    254                 return false;
    255             }
    256 
    257             Separator = separator;
    258             _records = new List<List<string>>();
    259             foreach (string row in content.Split('
    ').Where(line => !string.IsNullOrEmpty(line.Trim())))
    260             {
    261                 List<string> columns = row.Split(separator).Select(s => s.Trim()).ToList();
    262                 //Check each row's column count. They must match
    263                 if (ColumnCount != 0 && columns.Count != ColumnCount)
    264                 {
    265                     Debug.LogError(
    266                         string.Format("CSV parsing error in {0} at line {1} : columns counts do not match! Separator: '{2}'", path,
    267                             content.IndexOf(row), separator));
    268                     return false;
    269                 }
    270                 ColumnCount = columns.Count;
    271                 _records.Add(columns);
    272             }
    273             RowCount = _records.Count;
    274 
    275             if (_records == null || !_records.Any())
    276             {
    277                 Debug.LogWarning(string.Format("CSV file parsing failed(empty records): {0}", path));
    278                 return false;
    279             }
    280 
    281             return true;
    282         }
    283 
    284         public bool Load<T>()
    285         {
    286             ClearRecord();
    287 
    288             //Check mapping
    289             if (!Attribute.IsDefined(typeof (T), typeof (CSVMapperAttribute), false))
    290             {
    291                 Debug.LogError(string.Format("CSV mapping not found in type: {0}", typeof (T)));
    292                 return false;
    293             }
    294 
    295             CSVMapperAttribute mapper =
    296                 Attribute.GetCustomAttribute(typeof (T), typeof (CSVMapperAttribute), false) as CSVMapperAttribute;
    297             _keyRow = mapper.KeyRow;
    298             _descRow = mapper.DescRow;
    299             _startRow = mapper.StartRow;
    300 
    301             bool result = Load(mapper.Path, mapper.Separator);
    302             if (result)
    303             {
    304                 if (_records[_keyRow].Any(string.IsNullOrEmpty))
    305                 {
    306                     Debug.LogError(
    307                         string.Format("Encoding Error! No key column found. Make sure target file is in UTF-8 format. Path: {0}",
    308                             mapper.Path));
    309                     return false;
    310                 }
    311             }
    312             return result;
    313         }
    314 
    315         /// <summary>
    316         /// Get string value at specified row and column. If record empty or position not found, NULL will be returned. Row/Column starts at 0
    317         /// </summary>
    318         /// <param name="row"></param>
    319         /// <param name="column"></param>
    320         /// <returns></returns>
    321         public string this[int row, int column]
    322         {
    323             get
    324             {
    325                 if (_records == null || _records.Count <= row || _records[row].Count <= column)
    326                 {
    327                     return null;
    328                 }
    329                 return _records[row][column];
    330             }
    331         }
    332 
    333         /// <summary>
    334         /// Get a converted value at specified row and column. If record empty or position not found or convertion failed, defaultValue will be returned. Row/Column starts at 0
    335         /// </summary>
    336         /// <typeparam name="T">If T is collection, use array only(e.g. int[])</typeparam>
    337         /// <param name="row"></param>
    338         /// <param name="column"></param>
    339         /// <param name="defaultValue">If T is collection, use element default(e.g. 0 for int[])</param>
    340         /// <param name="arraySeparator"></param>
    341         /// <returns></returns>
    342         public T Read<T>(int row, int column, object defaultValue, char arraySeparator = '#')
    343         {
    344             string field = this[row, column];
    345             if (field == null)
    346             {
    347                 Debug.LogWarning("Field is null. Make sure csv is loaded and field has content.");
    348                 return typeof (T).IsArray ? default(T) : (T) defaultValue;
    349             }
    350 
    351             return (T) ParseRawValue(field, typeof (T), defaultValue, arraySeparator);
    352         }
    353 
    354 
    355         /// <summary>
    356         /// Remove all records.
    357         /// </summary>
    358         public void ClearRecord()
    359         {
    360             _records = null;
    361         }
    362     }
    CSVEngine

    看起来比较复杂?我们用例子来讲解:

    添加一个表结构类

    1 [CSVMapper("Configs/Resource")]
    2     public class ResourceData : Data
    3     {
    4         [CSVColumn(0)] public int ID;
    5         [CSVColumn(1)] public string Path;
    6         [CSVColumn(2)] public float Ratio;
    7         [CSVColumn(3)] public string Desc;
    8     }

    添加一个根据结构类读表的方法

     1 /// <summary>
     2     /// Get table
     3     /// </summary>
     4     /// <typeparam name="T"></typeparam>
     5     /// <returns></returns>
     6     private IEnumerable<T> GetTable<T>() where T : Data, new()
     7     {
     8         CSVReaderX reader = new CSVReaderX();
     9         if (reader.Load<T>())
    10         {
    11             Debug.Log(string.Format("{0} Loaded", typeof (T)));
    12             return reader.Decode<T>();
    13         }
    14 
    15         return null;
    16     }

    注意,这里让ResourceData继承Data,并且在GetTable里做了泛型约束是为了规范使用,并无其他意义

    Data结构如下

    1 /// <summary>
    2     /// All table class must inherit this for constraint
    3     /// </summary>
    4     public abstract class Data
    5     {
    6     }

    Resource.csv的内容如下:

    资源ID,资源路径,缩放比例,说明
    int,string,float,string
    10001,Model/a,1,
    10002,Model/b,1,
    10003,Model/c,1,
    10004,Model/d,1,
    10005,Model/e,1,
    10006,Model/f,1,
    10007,Model/g,1,

    还可以直接用键值索引:

    [CSVMapper("Configs/Resource")]
        public class ResourceData : Data
        {
            [CSVColumn(“资源ID”)] public int ID;
            [CSVColumn(“资源路径”)] public string Path;
            [CSVColumn(“缩放比例”)] public float Ratio;
            [CSVColumn(“说明”)] public string Desc;
        }

    第二行(int,string,float,string)其实没什么意义,因此他被当作Desc行(描述行)。

    使用延迟实例化加载表格并存储为字典,即可进行键值索引

    public Dictionary<int, ResourceData> ResourceDict
        {
            get
            {
                return _resourceDict ?? (_resourceDict = GetTable<ResourceData>().ToDictionary(k => k.ID));
            }
        }
    var data = ResourceDict[0];

    以上是映射好表结构后自动加载的结果。

    我还额外提供了手动解析的接口:

    手动Load

    public bool Load(string path, char separator = ',');

    手动Read

    public T Read<T>(int row, int column, object defaultValue, char arraySeparator = '#');

    或者通过索引器获得string类型的值再自己解析

    1         CSVReaderX reader = new CSVReaderX();
    2 
    3         reader.Load("Path");
    4         int val = reader.Read<int>(0, 0, 0);
    5         int[] vals = reader.Read<int[]>(0, 0, null);
    6         string raw = reader[0, 0];

    注意,行和列都是从0开始算。

    路径因为我这里是Unity3D的项目,所以映射的路径是Resources下不带后缀的路径,且Load方法里用的是Resources.Load方式来读取资源。其他平台的项目做相应修改即可~

    集合字段只能用逗号之外的分隔符(默认'#'),且只能为数组类型

    1     [CSVMapper("Configs/Skill")]
    2     public class SkillData : Data
    3     {
    4         [CSVColumn(0)] public int ID;
    5         [CSVColumn(1)] public int Name;
    6         [CSVColumn(2)] public int[] SkillIDs;
    7     }

    有问题欢迎探讨。

    源码参见我的github:

    https://github.com/theoxuan/MTGeek/blob/master/Assets/Scripts/CSVReaderX.cs

  • 相关阅读:
    ACwing(基础)--- 树状数组
    ACwing(基础)--- 快速幂
    Oracle for loop 循环
    Oracle 为表增加时间戳字段
    Oracle Materialized View 物化视图
    Splunk DBConnect使用
    Splunk 过滤接入数据
    Python 协程库 asyncio 的简单理解和使用
    Python 正则使用 备查
    Splunk 数据接入 创建索引接收数据
  • 原文地址:https://www.cnblogs.com/seancheung/p/4184582.html
Copyright © 2020-2023  润新知