Due to the popularity of deep learning in recent years, neural networks have become popular. It has been used to solve a wide variety of problems. This article will introduce the neural network in detail with the binary classification neural network.
The complete code for this chapter can be found in .
Table of Contents
Neural Networks
Neural networks are connected by a large number of neurons. A neural network consists of three layers, namely an input layer that receives data, an output layer that outputs results, and a hidden layer composed of a large number of neurons in the middle, as shown below.
Input layer and output layer each have only one layer, while hidden layer can have several layers. In addition, when we say that the figure below is a three-layer neural network, it refers to the number of layers in the hidden layer plus an output layer, and the input layer is not included.
Each neuron will have an input vector , the weights of the vector , a scalar bias b
, and a non-linear function g
. Therefore, the function of a neuron is to find the inner product of and and add b
to get z
, and then add z
into g to get an output value a
. And this output value a
will become one of the input values xi
of the neurons in the next layer. This non-linear function g
is called the activation function, and a
is called the activation value.
Therefore, several to a large number of neurons form a layer. This layer takes the output value of the previous layer as the input value, and the output value after the operation of the neurons in this layer will be used as the input value of the next layer. In this way, the layers are connected layer by layer to form a hidden layer containing a large number of neurons.
Activation Functions
If a neuron does not contain non-linear function, that is, it only performs linear operations. Even if a large number of neurons are connected, it is only a multiple linear regression. Linear regression models cannot solve complex problems in the real world, but complex problems can be approximated using non-linear functions. The non-linear functions in neuron are called activation functions.
Below we will introduce four commonly used activation functions. We will also introduce how to find its derivatives, because later in backpropagation, we will need to find their derivatives.
Sigmoid Function and its Derivatives
Sigmoid function converts the input value into a value between 0 and 1, as shown below. It is often used as the output layer in neural networks for binary classification. We can think of this output value as a probability. For example, we want a neural network to determine whether there is a cat in a picture. 1 means it’s a cat, and 0 means it’s not a cat. When the output layer output value is greater than 0.5, the prediction is 1; when it is less than or equal to 0.5, the prediction is 0.
The implementation of the sigmoid function is as follows.
def sigmoid(Z): """ Implements the sigmoid function. Parameters ---------- Z: (ndarray of any shape) or (scalar) - input to the sigmoid function Returns ------- A: (ndarray of same shape as Z) or (scalar) - output from the sigmoid function """ A = 1 / (1 + np.exp(-Z)) return A
The derivative of the sigmoid function is as follows.
The derivative of the sigmoid function is implemented as follows.
def sigmoid_derivative(Z): """ Implements the derivative of the sigmoid function. Parameters ---------- Z: (ndarray of any shape) or (scalar) - input to the sigmoid function Returns -------- dZ: (ndarray of the same shape as Z) or (scalar) - derivative of the sigmoid function with respect to Z """ g = 1 / (1 + np.exp(-Z)) dZ = g * (1 - g) return dZ
Tanh Function and its Derivatives
Tanh function is very similar to the sigmoid function, but the output value of the tanh function is between -1 and 1, as shown below.
The implementation of the tanh function is as follows.
def tanh(Z): """ Implements the tanh function. Parameters ---------- Z: (ndarray of any shape) or (scalar) - input to the tanh function Returns ------- A: (ndarray of same shape as Z) or (scalar) - output from the tanh function """ A = (np.exp(Z) - np.exp(-Z)) / (np.exp(Z) + np.exp(-Z)) return A
The derivative of the tanh function is solved as follows:
The derivative of the tanh function is implemented as follows.
def tanh_derivative(Z): """ Implements the derivative of the tanh function. Parameters ---------- Z: (ndarray of any shape) or (scalar) - input to the tanh function Returns ------- dZ: (ndarray of the same shape as Z) or (scalar) - derivative of the tanh function with respect to Z """ g = (np.exp(Z) - np.exp(-Z)) / (np.exp(Z) + np.exp(-Z)) dZ = 1 - g ** 2 return dZ
ReLU Function and its Derivatives
ReLU (rectified linear unit) function is widely used in neural networks. When z
is less than or equal to 0, output 0; when z
is greater than 0, output z
. It can be seen that the execution efficiency of ReLU is very fast.
The function implementation of ReLU is as follows.
def relu(Z): """ Implements the ReLU function. Parameters ---------- Z: (ndarray of any shape) or (scalar) - input to the ReLU function Returns ------- A: (ndarray of same shape as Z) or (scalar) - output from the ReLU function """ A = np.maximum(0, Z) return A
The derivative of ReLU is as follows. When z
is less than 0, the derivative is 0; when z
is greater than 0, the derivative is 1; when z
is equal to 0, the derivative is undefined. In practice, by convention, the derivative is set to 1 when z equals 0.
The derivative of the ReLU function is implemented as follows.
def relu_derivative(Z): """ Implements the derivative of the ReLU function. Parameters ---------- Z: (ndarray of any shape) or (scalar) - input to the ReLU function Returns ------- dZ: (ndarray of the same shape as Z) or (scalar) - derivative of the ReLU function with respect to Z """ dZ = np.array(Z, copy=True) dZ[Z < 0] = 0 return dZ
Leaky ReLU function and its derivatives
Leaky ReLU function is a variant of the ReLU function. When z
is less than 0, output , where is a value between 0 and 1.
The implementation of the leaky ReLU function is as follows.
def leaky_relu(Z, negative_slope=0.01): """ Implements the leaky ReLU function. Parameters ---------- Z: (ndarray of any shape) or (scalar) - input to the leaky ReLU function negative_slope: (float) - the slope for negative values Returns ------- A: (ndarray of same shape as Z) or (scalar) - output from the leaky ReLU function """ A = np.maximum(0, Z) + negative_slope * np.minimum(0, Z) return A
The derivative of leaky ReLU is as follows. When z
is less than 0, the derivative is ; when z
is greater than 0, the derivative is 1; when z
is equal to 0, the derivative is undefined. In practice, by convention, the derivative is set to 1 when z
equals 0.
The derivative of the Leaky ReLU function is implemented as follows.
def leaky_relu_derivative(Z, negative_slope=0.01): """ Implements the derivative of the leaky ReLU function. Parameters ---------- Z: (ndarray of any shape) or (scalar) - input to the leaky ReLU function negative_slope: (float) - the slope for negative values Returns ------- dZ: (ndarray of the same shape as Z) or (scalar) - derivative of the leaky ReLU function with respect to Z """ dZ = np.array(Z, copy=True) dZ[Z < 0] = negative_slope return dZ
Binary Classification
The figure below is a neural network for binary classification. Because it is a binary classification, the activation function in its output layer is a sigmoid function . We can see that there are quite a few variables in the graph. Each layer has its own activation function, and each neuron has its own parameters w
and b
. Therefore, vectorizing these variables and using matrices to represent them can greatly simplify the formula, as shown in the yellow part in the figure.
Therefore, the vectorized formula for each layer and the dimensions of the array are as follows.
Layer | Shape of W | Shape of X | Shape of b | Shape of Z | Shape of A |
---|---|---|---|---|---|
1 | |||||
2 | |||||
L |
Gradient Descent
The gradient descent of Neural network is as follows. Since each layer has corresponding parameters W
and b
, we must calculate the partial derivative of J
with respect to each layer W
and b
.
The process of gradient descent is as shown below.
- Initialize all parameters
W
and b. - Calculate and store
W
,b
,A
, andZ
in the process. Because we will need these values when calculating the partial derivatives in the next step. This part is forward propagation. - Compute the partial derivative of J for each
W
andb
. This part is backward propagation. - Update all parameters
W
andb
. - Repeat steps 2 to 4 for a total of
num_iterations
times.
Cost Function
In the neural network of binary classification, the activation function of the output layer is the sigmoid function. Therefore, we can use the cost function of sigmoid regression as the cost function of binary classification neural network. Sigmoid regression uses cross-entropy loss as its cost function.
The following code implements the cost function.
def compute_cost(AL, Y): """ Computes the cross-entropy loss. Parameters ---------- AL: (ndarray (1, number of examples)) - the output of the last layer Y: (ndarray (1, number of examples)) - true labels Returns ------- cost: (float) - the cross-entropy cost """ m = Y.shape[1] cost = -(1 / m) * np.sum(Y * np.log(AL) + (1 - Y) * np.log(1 - AL), axis=1, keepdims=True) cost = np.squeeze(cost) return cost
Parameter initialization
We have previously listed the dimensions of all parameters W
and b
. Their dimensions are related to the number of neurons in each layer, so when initializing the parameters, we must first determine the number of neurons in each layer.
In the following code, we use random numbers to initialize the parameters W
and b
.
def initialize_parameters(layer_dims): """ Initializes parameters for a deep neural network. Parameters ---------- layer_dims: (list) - the number of units of each layer in the network. Returns ------- (dict) with keys where 1 <= l <= len(layer_dims) - 1: Wl: (ndarray (layer_dims[l], layer_dims[l-1])) - weight matrix for layer l bl: (ndarray (layer_dims[l], 1)) - bias vector for layer l """ parameters = {} for l in range(1, len(layer_dims)): parameters[f'W{l}'] = np.random.randn(layer_dims[l], layer_dims[l - 1]) / np.sqrt(layer_dims[l - 1]) parameters[f'b{l}'] = np.zeros((layer_dims[l], 1)) return parameters
Forward Propagation
Forward propagation is the first half of gradient descent in neural networks. The output of each layer A
will be the input of the next layer, so A
will be passed layer by layer, and each layer will change the value of A
. Finally, will be . Each layer will store the calculated values in caches because they will be needed for the backpropagation in the second half.
After we execute the entire gradient descent, we will get the final parameters Wfinal
and bfinal
. Suppose we want to use this model to predict Xnew
, we substitute the input value Xnew
and the parameters Wfinal
and bfinal
into forward propagation, and the final result is the prediction value of Xnew
.
In the following code, linear_forward()
implements the linear forward part of each level in the process. linear_forwarded()
not only returns Z
, but also returns A
, W
, and b
to the caller, who stores them in caches.
def linear_forward(A_prev, W, b): """ Implements the linear part of a layer's forward propagation. Parameters ---------- A_prev: (ndarray (size of previous layer, number of examples)) - activations from previous layer W: (ndarray (size of current layer, size of previous layer)) - weight matrix b: (ndarray (size of current layer, 1)) - bias vector Returns ------- Z: (ndarray (size of current layer, number of examples)) - the input to the activation function cache: (tuple) - containing A_prev, W, b for backpropagation """ Z = W @ A_prev + b cache = (A_prev, W, b) return Z, cache
In the following code, we implement four activation functions. These implementations are almost the same as the activation function implementation at the beginning of the article. The difference is that Z
is also returned to the caller. The caller will store it in caches.
def sigmoid(Z): """ Implements the sigmoid activation. Parameters ---------- Z: (ndarray of any shape) - input to the activation function Returns ------- A: (ndarray of same shape as Z) - output of the activation function cache: (ndarray) - returning Z for backpropagation """ A = 1 / (1 + np.exp(-Z)) cache = Z return A, cache def tanh(Z): """ Implements the tanh activation. Parameters ---------- Z: (ndarray of any shape) - input to the activation function Returns ------- A: (ndarray of same shape as Z) - output of the activation function cache: (ndarray) - returning Z for backpropagation """ A = (np.exp(Z) - np.exp(-Z)) / (np.exp(Z) + np.exp(-Z)) cache = Z return A, cache def relu(Z): """ Implements the ReLU activation. Parameters ---------- Z: (ndarray of any shape) - input to the activation function Returns ------- A: (ndarray of same shape as Z) - output of the activation function cache: (ndarray) - returning Z for backpropagation """ A = np.maximum(0, Z) cache = Z return A, cache def leaky_relu(Z, negative_slope=0.01): """ Implements the Leaky ReLU activation. Parameters ---------- Z: (ndarray of any shape) - input to the activation function negative_slope: (float) - the slope for negative values Returns ------- A: (ndarray of same shape as Z) - output of the activation function cache: (ndarray) - returning Z for backpropagation """ A = np.maximum(0, Z) + negative_slope * np.minimum(0, Z) cache = Z return A, cache
In the following code, linear_activation_forward()
implements one layer in the above figure. It will first call linear_forward()
to obtain Z
, and then pass Z
to an activation function to obtain A
. Finally, A
and cache are passed back to the caller.
def linear_activation_forward(A_prev, W, b, activation_function): """ Implements the forward propagation for the linear and activation layer. Parameters ---------- A_prev: (ndarray (size of previous layer, number of examples)) - activations from previous layer W: (ndarray (size of current layer, size of previous layer)) - weight matrix b: (ndarray (size of current layer, 1)) - bias vector activation_function: (str) - the activation function to be used Returns ------- A: (ndarray (size of current layer, number of examples)) - the output of the activation function cache: (tuple) - containing linear_cache (A_prev, W, b) and activation_cache (Z) for backpropagation """ Z, linear_cache = linear_forward(A_prev, W, b) if activation_function == 'sigmoid': A, activation_cache = sigmoid(Z) elif activation_function == 'tanh': A, activation_cache = tanh(Z) elif activation_function == 'relu': A, activation_cache = relu(Z) elif activation_function == 'leaky_relu': A, activation_cache = leaky_relu(Z) else: raise ValueError(f'Activation function {activation_function} not supported.') cache = (linear_cache, activation_cache) return A, cache
In the following code, model_forward()
implements the entire forward propagation. Finally, it returns and call caches.
def model_forward(X, parameters, activation_functions): """ Implements forward propagation for the entire network. Parameters ---------- X: (ndarray (input size, number of examples)) - input data parameters: (dict) - output of initialize_parameters() activation_functions: (list) - the activation function for each layer. The first element is unused. Returns ------- AL: (ndarray (output size, number of examples)) - the output of the last layer caches: (list of tuples) - containing caches for each layer """ caches = [] A = X L = len(activation_functions) for l in range(1, L): A_prev = A A, cache = linear_activation_forward(A_prev, parameters[f'W{l}'], parameters[f'b{l}'], activation_functions[l]) caches.append(cache) return A, caches
Backpropagation or Backward Propagation
In gradient descent, we must calculate the partial derivatives of J(W, b)
for W
and b
of each layer to update the parameters W
and b
. When a neural network has many layers, calculating partial derivatives will take a lot of time. Backpropagation can speed up the calculation of these partial derivatives. When calculating the partial derivatives of each layer, some values that have been calculated in the next layer are needed. If you calculate from the first level onwards, there will be many values that need to be calculated repeatedly. Therefore, if you calculate from the last layer forward, each layer can pass the calculated value to the previous layer, and the previous layer can directly access it without having to recalculate it again, as shown below.
In fact, backpropagation is the differential chain rule.
According to the above figure, we need to first calculate and then calculate . Among them, the activation function of the last layer is the sigmoid function .
We can then use the above results to calculate and .
Finally, all partial derivatives are calculated as follows.
In the following code, linear_backward()
implements the linear backward part of the figure.
def linear_backward(dZ, cache): """ Implements the linear portion of backward propagation for a single layer. Parameters ---------- dZ: (ndarray (size of current layer, number of examples)) - gradient of the cost with respect to the linear output cache: (tuple) - containing W, A_prev, b from the forward propagation Returns ------- dA_prev: (ndarray (size of previous layer, number of examples)) - gradient of the cost with respect to the activation from the previous layer dW: (ndarray (size of current layer, size of previous layer)) - gradient of the cost with respect to W db: (ndarray (size of current layer, 1)) - gradient of the cost with respect to b """ A_prev, W, b = cache dW = dZ @ A_prev.T db = np.sum(dZ, axis=1, keepdims=True) dA_prev = W.T @ dZ return dA_prev, dW, db
The following code implements the differentiation of four activation functions. It will be multiplied and then passed back , which is the activation backward part of the picture.
def sigmoid_backward(dA, cache): """ Implements the backward propagation for a single sigmoid unit. Parameters ---------- dA: (ndarray of any shape) - post-activation gradient cache: (ndarray) - Z from the forward propagation Returns -------- dZ: (ndarray of the same shape as A) - gradient of the cost with respect to Z """ Z = cache g = 1 / (1 + np.exp(-Z)) g_prime = g * (1 - g) dZ = dA * g_prime return dZ def tanh_backward(dA, cache): """ Implements the backward propagation for a single tanh unit. Parameters ---------- dA: (ndarray of any shape) - post-activation gradient cache: (ndarray) - Z from the forward propagation Returns ------- dZ: (ndarray of the same shape as A) - gradient of the cost with respect to Z """ Z = cache g = (np.exp(Z) - np.exp(-Z)) / (np.exp(Z) + np.exp(-Z)) g_prime = 1 - g ** 2 dZ = dA * g_prime return dZ def relu_backward(dA, cache): """ Implements the backward propagation for a single ReLU unit. Parameters ---------- dA: (ndarray of any shape) - post-activation gradient cache: (ndarray) - Z from the forward propagation Returns ------- dZ: (ndarray of the same shape as A) - gradient of the cost with respect to Z """ Z = cache dZ = np.array(dA, copy=True) dZ[Z < 0] = 0 return dZ def leaky_relu_backward(dA, cache, negative_slope=0.01): """ Implements the backward propagation for a single Leaky ReLU unit. Parameters ---------- dA: (ndarray of any shape) - post-activation gradient cache: (ndarray) - Z from the forward propagation negative_slope: (float) - the slope for negative values Returns ------- dZ: (ndarray of the same shape as A) - gradient of the cost with respect to Z """ Z = cache dZ = np.array(dA, copy=True) dZ[Z < 0] = negative_slope return dZ
The linear_activation_backward()
in the following code implements the one layer part of the diagram.
def linear_activation_backward(dA, cache, activation_function): """ Implements the backward propagation for the linear and activation layer. Parameters ---------- dA: (ndarray (size of current layer, number of examples)) - post-activation gradient for current layer cache: (tuple) - containing linear_cache (A_prev, W, b) and activation_cache (Z) for backpropagation activation_function: (str) - the activation function to be used Returns ------- dA_prev: (ndarray (size of previous layer, number of examples)) - gradient of the cost with respect to the activation from the previous layer dW: (ndarray (size of current layer, size of previous layer)) - gradient of the cost with respect to W db: (ndarray (size of current layer, 1)) - gradient of the cost with respect to b """ linear_cache, activation_cache = cache if activation_function == 'sigmoid': dZ = sigmoid_backward(dA, activation_cache) elif activation_function == 'tanh': dZ = tanh_backward(dA, activation_cache) elif activation_function == 'relu': dZ = relu_backward(dA, activation_cache) elif activation_function == 'leaky_relu': dZ = leaky_relu_backward(dA, activation_cache) else: raise ValueError(f'Activation function {activation_function} not supported.') dA_prev, dW, db = linear_backward(dZ, linear_cache) return dA_prev, dW, db
Finally, model_backward()
in the following code implements the entire backpropagation.
def model_backward(AL, Y, caches, activation_functions): """ Implements the backward propagation for the entire network. Parameters ---------- AL: (ndarray (output size, number of examples)) - the output of the last layer Y: (ndarray (output size, number of examples)) - true labels caches: (list of tuples) - containing linear_cache (A_prev, W, b) and activation_cache (Z) for each layer activation_functions: (list) - the activation function for each layer. The first element is unused. Returns ------- gradients: (dict) with keys where 0 <= l <= len(activation_functions) - 1: dA{l-1}: (ndarray (size of previous layer, number of examples)) - gradient of the cost with respect to the activation for previous layer l - 1 dWl: (ndarray (size of current layer, size of previous layer)) - gradient of the cost with respect to W for layer l dbl: (ndarray (size of current layer, 1)) - gradient of the cost with respect to b for layer l """ gradients = {} L = len(activation_functions) m = AL.shape[1] dAL = -(1 / m) * (np.divide(Y, AL) - np.divide(1 - Y, 1 - AL)) dA_prev = dAL for l in reversed(range(1, L)): current_cache = caches[l - 1] dA_prev, dW, db = linear_activation_backward(dA_prev, current_cache, activation_functions[l]) gradients[f'dA{l - 1}'] = dA_prev gradients[f'dW{l}'] = dW gradients[f'db{l}'] = db return gradients
Putting All Together
After performing backpropagation, the partial derivatives of J
with respect to all parameters W
and b
will be obtained. Then, you can call the following code to update all W
and b
.
def update_parameters(parameters, gradients, learning_rate): """ Updates parameters using the gradient descent update rule. Parameters ---------- parameters: (dict) - containing the parameters gradients: (dict) - containing the gradients learning_rate: (float) - the learning rate Returns ------- params: (dict) - containing the updated parameters """ updated_parameters = parameters.copy() L = len(updated_parameters) // 2 for l in range(L): updated_parameters[f'W{l + 1}'] = parameters[f'W{l + 1}'] - learning_rate * gradients[f'dW{l + 1}'] updated_parameters[f'b{l + 1}'] = parameters[f'b{l + 1}'] - learning_rate * gradients[f'db{l + 1}'] return updated_parameters
nn_model()
in the following code implements the entire model. It first performs forward propagation, then performs backpropagation, and finally updates the parameters.
def nn_model(X, Y, init_parameters, layer_activation_functions, learning_rate, num_iterations): """ Implements a neural network. Parameters ---------- X: (ndarray (input size, number of examples)) - input data Y: (ndarray (output size, number of examples)) - true labels init_parameters: (dict) - the initial parameters for the network layer_activation_functions: (list) - the activation function for each layer. The first element is unused. learning_rate: (float) - the learning rate num_iterations: (int) - the number of iterations Returns ------- parameters: (dict) - the learned parameters costs: (list) - the costs at every 100th iteration """ costs = [] parameters = init_parameters.copy() for i in range(num_iterations): AL, caches = model_forward(X, parameters, layer_activation_functions) cost = compute_cost(AL, Y) gradients = model_backward(AL, Y, caches, layer_activation_functions) parameters = update_parameters(parameters, gradients, learning_rate) if i % 100 == 0 or i == num_iterations: costs.append(cost) return parameters, costs
After training the parameters, we can use the following nn_model_predict()
to make predictions.
def nn_model_predict(X, parameters, activation_functions): """ Predicts the output of the neural network. Parameters ---------- X: (ndarray (input size, number of examples)) - input data parameters: (dict) - the learned parameters activation_functions: (list) - the activation function for each layer. The first element is unused. Returns ------- predictions: (ndarray (1, number of examples)) - the predicted labels """ probabilities, _ = model_forward(X, parameters, activation_functions) predictions = probabilities.copy() predictions[predictions > 0.5] = 1 predictions[predictions <= 0.5] = 0 return predictions
Example
We will use an example to show how to use our model. First, we load the training data x_orig
and y
. x_orig
is an array containing 100 images. Each image is 64 x 64 in size and has three channels. y
is an array containing 0 or 1, 1 means there is a cat in the picture, 0 means it is not a cat.
x_orig, y = load_data() print(f'x_orig shape: {x_orig.shape}') print(f'y shape: {y.shape}') # Output x_orig shape: ndarray(100, 64, 64, 3) y shape: ndarray(1, 100)
Previously we listed the dimensions of X
as (nh, m)
, so each picture is a row vector. Below we take the dimensions of x_orig
and convert the values 0 to 255 into values from 0 to 1. We don’t need to convert the dimensions of y
because the dimensions of y
are already the same as .
x_flatten = x_orig.reshape(x_orig.shape[0], -1).T x = x_flatten / 255. print("x shape: " + str(x.shape)) # Output x shape: ndarray(12288, 100)
First, we need to decide the number of layers of the model and the number of neurons in each layer. Below we set the model to have an input layer, three layers in the hidden layer, and an output layer. We also need to decide the activation function of each layer, where layer_activation_functions[0]
corresponds to the input layer, so it will not be used.
After these decisions are made, we can initialize all parameters W
and b
, and then call nn_model()
to train the model. Finally, obtain the trained parameters.
layer_dims = [12288, 20, 7, 10, 1] init_parameters = initialize_parameters(layer_dims) layer_activation_functions = ['none', 'relu', 'relu', 'relu', 'sigmoid'] learning_rate = 0.0075 parameters, costs = nn_model(x, y, init_parameters, layer_activation_functions, learning_rate, 3000)
With the trained parameters, we can use them to predict other pictures.
x_new_orig = load_new_data() x_new_flatten = x_new_orig.reshape(x_new_orig.shape[0], -1).T x_new = x_new_flatten / 255. y_new = nn_model_predict(x_new, parameters, layer_activation_functions)
Multi-class Classification
For information about multi-class classification neural network , please refer to the following article.
Conclusion
The backpropagation of Neural network involves the calculation of partial derivatives, so it is difficult to understand. Now we no longer need to implement backpropagation ourselves, but use machine learning libraries, such as PyTorch or TensorFlow. But understanding these details can help us better understand the operation of the neural network.
Reference
- Andrew Ng, Machine Learning Specialization, Coursera.
- 西内啓,統計学が最強の学問である[数学編]――データ分析と機械学習のための新しい教科書,ダイヤモンド社。