Mnist / Neural Net / Relu by jupyter

Mnist 파일을 Neural Net을 이용하여 학습시킵니다.

Edwith의 부스트코스 <텐서플로우로 시작하는 딥러닝 기초>를 바탕으로 작성했습니다.

조언과 가르침은 언제나 환영입니다.

1. import

import numpy as np
import tensorflow as tf
from tensorflow.keras.utils import to_categorical
from tensorflow.keras.datasets import mnist

tensorflow 2.0 이후로는 eager mode를 권고한다.

tf.enable_eager_excution()

2. Load data & pre-processing

def load_mnist() :
    (train_data, train_labels),(test_data, test_labels) = mnist.load_data()
    train_data = np.expand_dims(train_data, axis = -1) # [N, 28, 28] -> [N, 28, 28, 1]
    test_data = np.expand_dims(test_data, axis = -1) # [N, 28, 28] -> [N, 28, 28, 1]
    
    train_data, test_data = normalize(train_data, test_data)
    
    train_labels = to_categorical(train_labels, 10) # [N,] -> [N,10]
    test_labels = to_categorical(test_labels, 10) # [N,] -> [N,10]
    
    return train_data, train_labels, test_data, test_labels

mnist.load_data() -> output : train_images, train_labels , test_images, test_labels

Data 형태

train : 60,000개의 이미지 / test : 10,000 개의 이미지.

tensorflow 의 input shape 은 [batch_size, height, width, channel] 이기 때문에, 단순하게 load_data()를 하게 되면 문제가 생긴다.

mnist : grey scale image 이며 channel 이 생략되어 [ N , 28 , 28 ] 형태로 나타난다.

그래서, np.expand_dims(data, axis=-1) 을 통해 data 의 shape 을 바꿔줘야 한다. [ N , 28 , 28 , 1 ]

(axis=-1 or axis=3 을 통해 끝자리에 추가한다.)

전처리

이후 image data를 정규화 시켜준다.

grey 색이 0~255의 정도의 차이를 가지고 있다. 그래서 255.0로 나누어 주어 0~1의 값으로 바꾸어준다.

def normalize(train_data, test_data):
    train_data = train_data.astype(np.float32) / 255.0
    test_data = test_data.astype(np.float32) / 255.0
    
    return train_data, test_data

to_categorical(labels, 10) 는 간단히 말해 one hot incoding이다.

숫자가 0~9까지 10개를 one hot incoding 한다.

예를 들어, 숫자 3은 0001000000 이 될 것 이다.

3. Create Network

Model Function

네트워크를 만들때, 우리는 어떤 함수를 사용할 것인가를 생각해봐야한다.

def flatten() :
    return tf.keras.layers.Flatten()

def dense(label_dim, weight_init) :
    return tf.keras.layers.Dense(units=label_dim, use_bias=True, kernel_initializer=weight_init)

def relu() :
    return tf.keras.layers.Activation(tf.keras.activations.relu)

keras.layers.Dense : fully-connected layer을 사용할것이다.

units=label_dim : output으로 나가는 channel을 몇 개로 설정할 것인가?

use_bias=True : bias를 사용할 것이다.

kernel_initializer=weight_init

relu를 activation으로 사용.

Create model (Class version)

class create_model_class(tf.keras.Model):
    def __init__(self, label_dim):
        super(create_model_class, self).__init__()
        weight_init = tf.keras.initializers.RandomNormal()

        self.model = tf.keras.Sequential()
        self.model.add(flatten())

        for i in range(2):
            self.model.add(dense(256, weight_init))
            self.model.add(relu())

        self.model.add(dense(label_dim, weight_init))

    def call(self, x, training=None, mask=None):
        x = self.model(x)

        return x

주의사항, class type으로 model create할때, class model(tf.keras.Model):로 상속을 받아야 한다.

weight_init = tf.keras.initializers.RandomNormal()을 통해,

- 평균이 0, 분산이 1인 gaussian distribution(normal distribution) 으로 정의.

- Keras에서는 kernel_initializer 인수로 가중치 초기화 방법을 바꿀 수 있다. 가능한 인수는 다음과 같은 것들이 있다.

random_uniform
random_normal
glorot_uniform
glorot_normal
lecun_uniform
lecun_normal

self.model = tf.keras.Sequential()

- Network은 층층이 쌓아가는 과정인데, 결국 list에 계속 더해주는 과정이라고 생각할 수 있다.

Sequential 은 일종의 list 자료구조의 type으로 볼 수 있다.

self.model.add(flatten()) # [ N , 28 , 28 , 1 ] -> [ N , 784 ]

- 다음에 fully-connected 를 이용하기 위함.

for i in range(2): # [ N , 784 ] -> [ N , 256 ] -> [ N , 256 ]

self.model.add(dense(256, weight_init))

self.model.add(relu())

- channel 을 256 으로 바꿔 relu에 사용.

self.model.add(dense(label_dim, weight_init)) # [ N , 256] -> [ N , 10 ]

- 마지막 결과를 10개로 낸다.

def call(self, x, training=None, mask=None):
        x = self.model(x)

        return x

- 위의 이러한 함수를 call 했을때 나와야할 output 에 대하여 설명.

Create Model (function version)

def create_model_function(label_dim) :
    weight_init = tf.keras.initializers.RandomNormal()

    model = tf.keras.Sequential()
    model.add(flatten())

    for i in range(2) :
        model.add(dense(256, weight_init))
        model.add(relu())

    model.add(dense(label_dim, weight_init))

    return model

Class version과 function version의 기능 차이는 없다.

4. Define loss

loss function

def loss_fn(model, images, labels):
    logits = model(images, training=True)
    loss = tf.reduce_mean(tf.keras.losses.categorical_crossentropy(y_pred=logits, 
    y_true=labels, from_logits=True))
    return loss

Accuracy function

def accuracy_fn(model, images, labels):
    logits = model(images, training=False)
    prediction = tf.equal(tf.argmax(logits, -1), tf.argmax(labels, -1))
    accuracy = tf.reduce_mean(tf.cast(prediction, tf.float32))
    return accuracy

logits, labels 의 결과 값은 [batch size, label_dim]의 모양을 가지는데, 지금의 경우 dim은 10이다.

tf.argmax(X,-1)은 마지막 열에서 가장큰 수를 가지는 위치가 어딘지를 return한다.

tf.equal을 이용해 두 값이 값다면 True, 다르다면 False를 반환할것이다.

tf.cast를 통해 숫자값으로 바꿔준다. (True -> 1, False ->0)

accuracy를 계산하기 위함이다.

Gradient

def grad(model, images, labels):
    with tf.GradientTape() as tape:
        loss = loss_fn(model, images, labels)
    return tape.gradient(loss, model.variables)

5. Experiments

parameters

""" dataset """
train_x, train_y, test_x, test_y = load_mnist()

""" parameters """
learning_rate = 0.001
batch_size = 128

training_epochs = 1
training_iterations = len(train_x) // batch_size

label_dim = 10

""" Graph Input using Dataset API """
train_dataset = tf.data.Dataset.from_tensor_slices((train_x, train_y)).\
    shuffle(buffer_size=100000).\
    prefetch(buffer_size=batch_size).\
    batch(batch_size, drop_remainder=True)

test_dataset = tf.data.Dataset.from_tensor_slices((test_x, test_y)).\
    shuffle(buffer_size=100000).\
    prefetch(buffer_size=len(test_x)).\
    batch(len(test_x))

buffer_size 는 들어가는 데이터 사이즈 보다 크다면 random하게 shuffle 가능.

prefetch : network 가 어떤 batch size만큼 학습을 하고 있을 때, 미리 메모리 위에 batch size 만큼 올려두는 기능.

Model

""" Model """
network = create_model_function(label_dim)

""" Training """
optimizer = tf.keras.optimizers.Adam(learning_rate=learning_rate)

Eager mode


start_epoch = 0
start_iteration = 0


    # train phase
for epoch in range(start_epoch, training_epochs):
    for idx, (train_input, train_label) in enumerate(train_dataset):                
        grads = grad(network, train_input, train_label)
        optimizer.apply_gradients(grads_and_vars=zip(grads, network.variables))
        
        train_loss = loss_fn(network, train_input, train_label)
        train_accuracy = accuracy_fn(network, train_input, train_label)
        
        for test_input, test_label in test_dataset:                
            test_accuracy = accuracy_fn(network, test_input, test_label)
            print("Epoch: {:2d}  |  iter: {:5d}  |  train_loss: {:.8f}  
            |  train_accuracy: {:.4f}  |  test_Accuracy: {:.4f}".format(
            epoch, idx, train_loss, train_accuracy, test_accuracy))

Epoch: 0 | iter: 0 | train_loss: 2.17855787 | train_accuracy: 0.3438 | test_Accuracy: 0.2358

Epoch: 0 | iter: 1 | train_loss: 2.14122343 | train_accuracy: 0.4141 | test_Accuracy: 0.4052

Epoch: 0 | iter: 2 | train_loss: 2.06397033 | train_accuracy: 0.5312 | test_Accuracy: 0.5200

Epoch: 0 | iter: 3 | train_loss: 1.96134591 | train_accuracy: 0.6641 | test_Accuracy: 0.5788

Epoch: 0 | iter: 465 | train_loss: 0.17224988 | train_accuracy: 0.9766 | test_Accuracy: 0.9598

Epoch: 0 | iter: 466 | train_loss: 0.14738023 | train_accuracy: 0.9609 | test_Accuracy: 0.9590

Epoch: 0 | iter: 467 | train_loss: 0.12661503 | train_accuracy: 0.9609 | test_Accuracy: 0.9598

train loss 값이 점차 줄어들며, test Accuracy 또한 구할 수 있다.

<출처>

https://github.com/deeplearningzerotoall/TensorFlow/blob/master/tf_2.x/lab-10-1-2-mnist_nn_relu.ipynb

deeplearningzerotoall/TensorFlow

Deep Learning Zero to All - Tensorflow. Contribute to deeplearningzerotoall/TensorFlow development by creating an account on GitHub.

github.com

'Artificial Intelligence > Basic' 카테고리의 다른 글

A Beginner's Guide to Variational Methods: Mean-Field Approximation (0)	2022.02.13
용어 정리 (인공지능, 머신러닝, 컴퓨터비전 분야) (0)	2022.01.17
머신 러닝(Machine learning)이란 무엇인가? (supervised? or unsupervised?) (0)	2020.03.21
Chapter 1 - 2 다층 신경망 (MLP; Multi-Layer Perceptrons) (0)	2020.03.08
Chapter 1 - 1 퍼셉트론(Perceptron), single layer (0)	2020.03.06

SuperMemi's Study

Mnist / Neural Net / Relu by jupyter

1. import

2. Load data & pre-processing

3. Create Network

Create Model (function version)

4. Define loss

5. Experiments

'Artificial Intelligence > Basic' 카테고리의 다른 글

티스토리툴바

Mnist / Neural Net / Relu by jupyter

1. import

2. Load data & pre-processing

3. Create Network

Create Model (function version)

4. Define loss

5. Experiments

'Artificial Intelligence > Basic' 카테고리의 다른 글

관련글

티스토리툴바