• Classify structured data using Keras Preprocessing Layers


    Classify structured data using Keras Preprocessing Layers

    对于既有数值特征,又有类别特征的输入情况,使用 keras的预处理层进行转换。

    https://colab.research.google.com/github/tensorflow/docs/blob/master/site/en/tutorials/structured_data/preprocessing_layers.ipynb?hl=ar-bh#scrollTo=sMYQvJuBi7MS

    This tutorial demonstrates how to classify structured data (e.g. tabular data in a CSV). You will use Keras to define the model, and preprocessing layers as a bridge to map from columns in a CSV to features used to train the model. This tutorial contains complete code to:

    • Load a CSV file using Pandas.
    • Build an input pipeline to batch and shuffle the rows using tf.data.
    • Map from columns in the CSV to features used to train the model using Keras Preprocessing layers.
    • Build, train, and evaluate a model using Keras.
     

    Note: This tutorial is similar to Classify structured data with feature columns. This version uses new experimental Keras Preprocessing Layers instead of tf.feature_column. Keras Preprocessing Layers are more intuitive, and can be easily included inside your model to simplify deployment.

    处理效果

    数值型 , 定义输入, 进行正规化变换。

    类别性, 转换为整数, 对整数进行类别编码, one-hot-code

    CategoryEncoding

    https://www.tensorflow.org/api_docs/python/tf/keras/layers/experimental/preprocessing/CategoryEncoding

    This layer provides options for condensing data into a categorical encoding. It accepts integer values as inputs and outputs a dense representation (one sample = 1-index tensor of float values representing data about the sample's tokens) of those inputs.

    类别编码对应的CODE

    index -- 字符查询表, 根据字符,查到index

    encoder-- 根据index查询到的index,生成one-hot码

    def get_category_encoding_layer(name, dataset, dtype, max_tokens=None):
      # Create a StringLookup layer which will turn strings into integer indices
      if dtype == 'string':
        index = preprocessing.StringLookup(max_tokens=max_tokens)
      else:
        index = preprocessing.IntegerLookup(max_values=max_tokens)
    
      # Prepare a Dataset that only yields our feature
      feature_ds = dataset.map(lambda x, y: x[name])
    
      # Learn the set of possible values and assign them a fixed integer index.
      index.adapt(feature_ds)
    
      # Create a Discretization for our integer indices.
      encoder = preprocessing.CategoryEncoding(max_tokens=index.vocab_size())
    
      # Apply one-hot encoding to our indices. The lambda function captures the
      # layer so we can use them, or include them in the functional model later.
      return lambda feature: encoder(index(feature))

    测试输出

    type_col = train_features['Type']
    layer = get_category_encoding_layer('Type', train_ds, 'string')
    layer(type_col)
    <tf.Tensor: shape=(5, 4), dtype=float32, numpy=
    array([[0., 0., 1., 0.],
           [0., 0., 1., 0.],
           [0., 0., 1., 0.],
           [0., 0., 1., 0.],
           [0., 0., 0., 1.]], dtype=float32)>

    数值型处理

    生成一个正则转换器。

    def get_normalization_layer(name, dataset):
      # Create a Normalization layer for our feature.
      normalizer = preprocessing.Normalization()
    
      # Prepare a Dataset that only yields our feature.
      feature_ds = dataset.map(lambda x, y: x[name])
    
      # Learn the statistics of the data.
      normalizer.adapt(feature_ds)
    
      return normalizer

    例子,将对应特征,做正规化处理。

    photo_count_col = train_features['PhotoAmt']
    layer = get_normalization_layer('PhotoAmt', train_ds)
    layer(photo_count_col)
    <tf.Tensor: shape=(5, 1), dtype=float32, numpy=
    array([[ 1.3705449 ],
           [ 0.74395925],
           [-0.19591942],
           [-0.8225052 ],
           [-0.8225052 ]], dtype=float32)>

    预处理管线定义

    all_input 存储 所有的输入层单元

    encoded_features 存储 所有的输入层单元, 经过预处理管线后的 逻辑单元, 例如 正规化对象, 和  ONE-HOT 编码对象。

    all_inputs = []
    encoded_features = []
    
    # Numeric features.
    for header in ['PhotoAmt', 'Fee']:
      numeric_col = tf.keras.Input(shape=(1,), name=header)
      normalization_layer = get_normalization_layer(header, train_ds)
      encoded_numeric_col = normalization_layer(numeric_col)
      all_inputs.append(numeric_col)
      encoded_features.append(encoded_numeric_col)
    
    
    
    
    # Categorical features encoded as string.
    categorical_cols = ['Type', 'Color1', 'Color2', 'Gender', 'MaturitySize',
                        'FurLength', 'Vaccinated', 'Sterilized', 'Health', 'Breed1']
    for header in categorical_cols:
      categorical_col = tf.keras.Input(shape=(1,), name=header, dtype='string')
      encoding_layer = get_category_encoding_layer(header, train_ds, dtype='string',
                                                   max_tokens=5)
      encoded_categorical_col = encoding_layer(categorical_col)
      all_inputs.append(categorical_col)
      encoded_features.append(encoded_categorical_col)

    使用 concatenate 接口,将所有的 编码特征, 连接起来。

    构成统一的特征向量输出。

    然后,将统一的特征向量输出, 连接到 第一层稠密层单元, 然后连接到第二层稠密层单元。

    第二层稠密层单元,就是作为模型输出。

    tf.keras.Model 模型的一个参数,是对所有输入输出的 规约;

    第二个参数,是经过一些列预处理和模型本身的管线输出, 模型输出的规约。

    all_features = tf.keras.layers.concatenate(encoded_features)

    x
    = tf.keras.layers.Dense(32, activation="relu")(all_features) x = tf.keras.layers.Dropout(0.5)(x) output = tf.keras.layers.Dense(1)(x)
    model
    = tf.keras.Model(all_inputs, output)
    model.compile(optimizer
    ='adam', loss=tf.keras.losses.BinaryCrossentropy(from_logits=True), metrics=["accuracy"])

    tf.keras.layers.concatenate

    https://www.tensorflow.org/api_docs/python/tf/keras/layers/concatenate

    tensor

    https://www.tensorflow.org/guide/tensor

    print("Type of every element:", rank_4_tensor.dtype)
    print("Number of axes:", rank_4_tensor.ndim)
    print("Shape of tensor:", rank_4_tensor.shape)
    print("Elements along axis 0 of tensor:", rank_4_tensor.shape[0])
    print("Elements along the last axis of tensor:", rank_4_tensor.shape[-1])
    print("Total number of elements (3*2*4*5): ", tf.size(rank_4_tensor).numpy())
    Type of every element: <dtype: 'float32'>
    Number of axes: 4
    Shape of tensor: (3, 2, 4, 5)
    Elements along axis 0 of tensor: 3
    Elements along the last axis of tensor: 5
    Total number of elements (3*2*4*5):  120
    
    A rank-4 tensor, shape: [3, 2, 4, 5]
    A tensor shape is like a vector. A 4-axis tensor
    出处:http://www.cnblogs.com/lightsong/ 本文版权归作者和博客园共有,欢迎转载,但未经作者同意必须保留此段声明,且在文章页面明显位置给出原文连接。
  • 相关阅读:
    $("").append无反应
    go 客户端、服务端
    go mysql insert变量到数据库
    .gvfs: Permission denied
    go笔记
    java socket通信笔记
    (转)linux中top命令下显示出的PRNIRESSHRS\%MEM TIME+都代表什么
    adb Android Debug Bridge 安卓调试桥
    一阶段冲刺(四)
    一阶段冲刺(三)
  • 原文地址:https://www.cnblogs.com/lightsong/p/14750273.html
Copyright © 2020-2023  润新知