• CNN autoencoder 先降维再使用kmeans进行图像聚类 是不是也可以降维以后进行iforest处理?


    import keras
    from keras.datasets import mnist
    from keras.models import Sequential
    from keras.layers import Dense, Activation, Flatten
    from keras.layers import Conv2D, MaxPooling2D, UpSampling2D
    import matplotlib.pyplot as plt
    from keras import backend as K
    import numpy as np
    
    # (x_train, y_train), (x_test, y_test) = mnist.load_data()
    
    f = np.load("mnist.npz")
    x_train, y_train = f['x_train'], f['y_train']
    x_test, y_test = f['x_test'], f['y_test']
    f.close()
    
    x_train = x_train.reshape(x_train.shape[0], 28, 28, 1) #transform 2D 28x28 matrix to 3D (28x28x1) matrix
    x_test = x_test.reshape(x_test.shape[0], 28, 28, 1)
    
    x_train = x_train.astype('float32')
    x_test = x_test.astype('float32')
    
    x_train /= 255 #inputs have to be between [0, 1]
    x_test /= 255
    
    
    model = Sequential()
    
    #1st convolution layer
    model.add(Conv2D(16, (3, 3) #16 is number of filters and (3, 3) is the size of the filter.
        , padding='same', input_shape=(28,28,1)))
    model.add(Activation('relu'))
    model.add(MaxPooling2D(pool_size=(2,2), padding='same'))
    
    #2nd convolution layer
    model.add(Conv2D(2,(3, 3), padding='same')) # apply 2 filters sized of (3x3)
    model.add(Activation('relu'))
    model.add(MaxPooling2D(pool_size=(2,2), padding='same'))
    
    #-------------------------
    
    #3rd convolution layer
    model.add(Conv2D(2,(3, 3), padding='same')) # apply 2 filters sized of (3x3)
    model.add(Activation('relu'))
    model.add(UpSampling2D((2, 2)))
    
    #4rd convolution layer
    model.add(Conv2D(16,(3, 3), padding='same'))
    model.add(Activation('relu'))
    model.add(UpSampling2D((2, 2)))
    
    #-------------------------
    
    model.add(Conv2D(1,(3, 3), padding='same'))
    model.add(Activation('sigmoid'))
    
    print(model.summary())
    
    model.compile(optimizer='adadelta', loss='binary_crossentropy')
    
    model.fit(x_train, x_train
        , epochs=3
        , validation_data=(x_test, x_test)
    )
    
    restored_imgs = model.predict(x_test)
    for i in range(5):
        plt.imshow(x_test[i].reshape(28, 28))
        plt.gray()
        plt.show()
    
        plt.imshow(restored_imgs[i].reshape(28, 28))
        plt.gray()
        plt.show()
    
        print("----------------------------")
    
    layers = len(model.layers)
    
    for i in range(layers):
        print(i, ". ", model.layers[i].output.get_shape())
    
    """
    0 .  (?, 28, 28, 16)
    1 .  (?, 28, 28, 16)
    2 .  (?, 14, 14, 16)
    3 .  (?, 14, 14, 2)
    4 .  (?, 14, 14, 2)
    5 .  (?, 7, 7, 2)
    6 .  (?, 7, 7, 2)
    7 .  (?, 7, 7, 2)
    8 .  (?, 14, 14, 2)
    9 .  (?, 14, 14, 16)
    10 .  (?, 14, 14, 16)
    11 .  (?, 28, 28, 16)
    12 .  (?, 28, 28, 1)
    13 .  (?, 28, 28, 1)
    """
    
    #layer[7] is activation_3 (Activation), it is compressed representation
    get_3rd_layer_output = K.function([model.layers[0].input], [model.layers[7].output])
    compressed = get_3rd_layer_output([x_test])[0]
    #layer[7] is size of (None, 7, 7, 2). this means 2 different 7x7 sized matrixes. We will flatten these matrixes.
    compressed = compressed.reshape(10000,7*7*2)
    
    # clustering
    from tensorflow.contrib.factorization.python.ops import clustering_ops
    import tensorflow as tf
    unsupervised_model = tf.contrib.learn.KMeansClustering(
        10 #num of clusters
        , distance_metric = clustering_ops.SQUARED_EUCLIDEAN_DISTANCE
        , initial_clusters=tf.contrib.learn.KMeansClustering.RANDOM_INIT
    )
    
    def train_input_fn():
        data = tf.constant(compressed, tf.float32)
        return (data, None)
    
    print(compressed[:3])
    unsupervised_model.fit(input_fn=train_input_fn, steps=1000)
    clusters = unsupervised_model.predict(input_fn=train_input_fn)
    
    index = 0
    for i in clusters:
        current_cluster = i['cluster_idx']
        features = x_test[index]
    
        if index < 200 and current_cluster == 5:
            plt.imshow(x_test[index].reshape(28, 28))
            plt.gray()
            plt.show()
        index = index + 1
    """
    """
    

    我摘录的代码。

    原文:https://sefiks.com/2018/03/21/autoencoder-neural-networks-for-unsupervised-learning/

    Previously, we’ve applied conventional autoencoder to handwritten digit database (MNIST). That approach was pretty. We can apply same model to non-image problems such as fraud or anomaly detection. If the problem were pixel based one, you might remember that convolutional neural networks are more successful than conventional ones. However, we tested it for labeled supervised learning problems. The question is that can I adapt convolutional neural networks to unlabeled images for clustering? Absolutely yes! these customized form of CNN are convolutional autoencoder.

    Remember autoencoder post. Network design is symettric about centroid and number of nodes reduce from left to centroid, they increase from centroid to right. Centroid layer would be compressed representation. We will apply same procedure for CNN, too. We will additionally consume convolution, activation and pooling layer for convolutional autoencoder.


    Neural Networks Fundamentals in Python

    convolutional-autoencoderConvolutional autoencoder

    We can call left to centroid side as convolution whereas centroid to right side as deconvolution. Deconvolution side is also known as unsampling or transpose convolution. We’ve mentioned how pooling operation works. It is a basic reduction operation. How can we apply its reverse operation? That might be a little confusing. I’ve found a excellent animation for unsampling. Input matrix size of 2×2 (blue one) will be deconvolved to a matrix size of 4×4 (cyan one). To do this duty, we can add imaginary elements (e.g. 0 values) to the base matrix and it is transformed to 6×6 sized matrix.

    unsamplingUnsampling

    We will work on handwritten digit database again. We’ll design the structure of convolutional autoencoder as illustrated above.

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    model = Sequential()
     
    #1st convolution layer
    model.add(Conv2D(16, (3, 3) #16 is number of filters and (3, 3) is the size of the filter.
    , padding='same', input_shape=(28,28,1)))
    model.add(Activation('relu'))
    model.add(MaxPooling2D(pool_size=(2,2), padding='same'))
     
    #2nd convolution layer
    model.add(Conv2D(2,(3, 3), padding='same')) # apply 2 filters sized of (3x3)
    model.add(Activation('relu'))
    model.add(MaxPooling2D(pool_size=(2,2), padding='same'))
     
    #here compressed version
     
    #3rd convolution layer
    model.add(Conv2D(2,(3, 3), padding='same')) # apply 2 filters sized of (3x3)
    model.add(Activation('relu'))
    model.add(UpSampling2D((2, 2)))
     
    #4rd convolution layer
    model.add(Conv2D(16,(3, 3), padding='same'))
    model.add(Activation('relu'))
    model.add(UpSampling2D((2, 2)))
     
    model.add(Conv2D(1,(3, 3), padding='same'))
    model.add(Activation('sigmoid'))

    You can summarize the constructed network structure.

    1
    model.summary()

    This command dumps the following output. Base input is size of 28×28 at the beginnig, 2 first two layers are responsible for reduction, following 2 layers are in charged of restoration. Final layer restores same size of input as seen.

    _____________
    Layer (type) Output Shape Param #
    ========
    conv2d_1 (Conv2D) (None, 28, 28, 16) 160
    _____________
    activation_1 (Activation) (None, 28, 28, 16) 0
    _____________
    max_pooling2d_1 (MaxPooling2 (None, 14, 14, 16) 0
    _____________
    conv2d_2 (Conv2D) (None, 14, 14, 2) 290
    _____________
    activation_2 (Activation) (None, 14, 14, 2) 0
    _____________
    max_pooling2d_2 (MaxPooling2 (None, 7, 7, 2) 0
    _____________
    conv2d_3 (Conv2D) (None, 7, 7, 2) 38
    _____________
    activation_3 (Activation) (None, 7, 7, 2) 0
    _____________
    up_sampling2d_1 (UpSampling2 (None, 14, 14, 2) 0
    _____________
    conv2d_4 (Conv2D) (None, 14, 14, 16) 304
    _____________
    activation_4 (Activation) (None, 14, 14, 16) 0
    _____________
    up_sampling2d_2 (UpSampling2 (None, 28, 28, 16) 0
    _____________
    conv2d_5 (Conv2D) (None, 28, 28, 1) 145
    _____________
    activation_5 (Activation) (None, 28, 28, 1) 0
    ========

    Here, we can start training.

    1
    2
    model.compile(optimizer='adadelta', loss='binary_crossentropy')
    model.fit(x_train, x_train, epochs=3, validation_data=(x_test, x_test))

    Loss values for both training set and test set are satisfactory.

    loss: 0.0968 – val_loss: 0.0926

    Let’s visualize some restorations.

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    restored_imgs = model.predict(x_test)
     
    for i in range(5):
    plt.imshow(x_test[i].reshape(28, 28))
    plt.gray()
    plt.show()
     
    plt.imshow(restored_imgs[i].reshape(28, 28))
    plt.gray()
    plt.show()

    Testing

    Restorations seems really satisfactory. Images on the left side are original images whereas images on the right side are restored from compressed representation.

    convolutional-autoencoder-restorationSome restorations of convolutional autoencoder

    Notice that 5th layer named max_pooling2d_2 states the compressed representation and it is size of (None, 7, 7, 2). This work reveals that we can restore 28×28 pixel image from 7x7x2 sized matrix with a little loss. In other words, compressed representation takes a 8 times less space to original image.

    Compressed Representations

    You might wonder how to extract compressed representations.

    1
    2
    3
    4
    5
    6
    compressed_layer = 5
    get_3rd_layer_output = K.function([model.layers[0].input], [model.layers[compressed_layer].output])
    compressed = get_3rd_layer_output([x_test])[0]
     
    #flatten compressed representation to 1 dimensional array
    compressed = compressed.reshape(10000,7*7*2)

    Now, we can apply clustering to compressed representation. I would like to apply k-means clustering.

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    from tensorflow.contrib.factorization.python.ops import clustering_ops
    import tensorflow as tf
     
    def train_input_fn():
    data = tf.constant(compressed, tf.float32)
    return (data, None)
     
    unsupervised_model = tf.contrib.learn.KMeansClustering(
    10 #num of clusters
    , distance_metric = clustering_ops.SQUARED_EUCLIDEAN_DISTANCE
    , initial_clusters=tf.contrib.learn.KMeansClustering.RANDOM_INIT
    )
     
    unsupervised_model.fit(input_fn=train_input_fn, steps=1000)

    Training is over. Now, we can check clusters for all test set.

    1
    2
    3
    4
    5
    6
    7
    clusters = unsupervised_model.predict(input_fn=train_input_fn)
     
    index = 0
    for i in clusters:
    current_cluster = i['cluster_idx']
    features = x_test[index]
    index = index + 1

    For example, 6th cluster consists of 46 items. Distribution for this cluster is like that: 22 items are 4, 14 items are 9, 7 items are 7, and 1 item is 5. It seems mostly 4 and 9 digits are put in this cluster.

    So, we’ve integrated both convolutional neural networks and autoencoder ideas for information reduction from image based data. That would be pre-processing step for clustering. In this way, we can apply k-means clustering with 98 features instead of 784 features. This could fasten labeling process for unlabeled data. Of course, with autoencoding comes great speed. Source code of this post is already pushed into GitHub.

  • 相关阅读:
    Neo4j 第五篇:批量更新数据
    Neo4j 第四篇:使用.NET驱动访问Neo4j
    Neo4j 第三篇:Cypher查询入门
    Neo4j 第二篇:图形数据库
    Neo4j 第一篇:在Windows环境中安装Neo4j
    ElasticSearch入门 第九篇:实现正则表达式查询的思路
    ElasticSearch入门 第八篇:存储
    ElasticSearch入门 第七篇:分词
    ElasticSearch入门 第六篇:复合数据类型——数组,对象和嵌套
    Package 设计3:数据源的提取和使用暂存
  • 原文地址:https://www.cnblogs.com/bonelee/p/9940154.html
Copyright © 2020-2023  润新知