Introduction
- Min-max normalization is an operation which rescales a set of data.
- This can be useful when:
- Comparing data from two different scales
- Converting data to a new scale
- In most situations, data is normalized to a fit a target range of [0, 1]
- The smallest value in the original set would be mapped to 0
- The largest value in the original set would be mapped to 1
- Every other value would be mapped to a value somewhere between these two bounds
- It is also called:
- Feature Scaling
- Min-Max Scaling
- Rescaling
- Normalization
Normalizing to [0, 1]
- A set of numbers will have:
- A smallest value:
- This is also called the lower bound or least element
- It is denoted:
min(x)
- A largest value:
- This is also called the upper bound or greatest element
- It is denoted:
max(x)
- A range:
- The difference between the smallest and largest values
- It is denoted:
max(x) - min(x)
- Normalization is the process of changing the lower and upper bounds to be 0 and 1 respectively
Algorithm
- First we modify the data to have a lower bound of 0. To do this we subtract the minimum value from each value:
- Then we modify the data to have an upper bound of 1. We do this by dividing each value by the original range:
- Finally, if we combine these two steps we get:
Example
Normalize the following data: - First we calculate the lower bound, upper bound, and range:
- min(x) = 7
- max(x) = 21
- max(x) - min(x) = 14
- Next we subtract the lower bound from each value:
- Finally, we divide by the range:
Code (Python)
import numpy as np
def normalize(x):
min = np.min(x)
max = np.max(x)
range = max - min
return [(a - min) / range for a in x]
x = [7, 21, 13, 15]
normalizedX = normalize(x)
print(normalizedX) # prints: [0.0, 1.0, 0.42857142857142855, 0.5714285714285714]
Normalizing from [0, 1]
- If our numbers are in the range [0, 1] then we can scale them to have a different lower and upper bound
- To achieve this, we simply do the reverse of normalization:
- Find the new range by subtracting the lower bound from the upper bound
- Multiply each value by the new range
- and add the new lower bound to each value:
Example
Normalize the following data to have a lower bound of 3 and an upper bound of 24: - First we calculate the range:
- Then we multiply each value by the range:
- Finally, we add the lower bound to each value:
Code (Python)
def normalize(normalizedX, newLowerBound, newUpperBound):
range = newUpperBound - newLowerBound
return [a * range + newLowerBound for a in normalizedX]
normalizedX = [0.0, 1.0, 3/7, 4/7]
x = normalize(normalizedX, 3, 24)
print(x) # prints: [3.0, 24.0, 12.0, 15.0]
Normalizing from one range to another
- Sometimes we need to normalize data in which neither the source range nor the target range is [0, 1]
- In these situations, we first normalize the data to range of [0, 1], and then normalize it again to the true target range.
- These two steps can be combined:
Code (Python)
import numpy as np
def normalize(x, newLowerBound, newUpperBound):
min = np.min(x)
max = np.max(x)
range = max - min
newRange = newUpperBound - newLowerBound
return [((a - min) / range) * newRange + newLowerBound for a in x]
x = [7, 21, 13, 15]
y = normalize(x, 3, 24)
print(y) # prints: [3.0, 24.0, 12.0, 15.0]