2. KNN(K-Nearest Neighbor)

장점

2.1 거리 계산

<aside> 💡 Euclidean distance is more influenced by a single large difference in one feature rather than a lot of small differences across a set of features

</aside>

2.1.1 유클리드 거리(Euclidean Distance)

$$ d(p,q) = \sqrt {\sum \limits_{i=1}^{n} (q_{i} - p_{i})^{2}} $$

'''
test_instance 와 instances 사이의 euclidean distance
'''
import numpy as np

instances = np.array([ [5, 2.5, 3],
                       [2.75, 7.50, 4],
                       [9.10, 4.5, 4],
                       [8.9, 2.3, 6]])
test_instance = instances[0]
distances = []

for instance in instances:
    distance = np.sqrt(np.sum((instance - test_instance)**2))
    distances.append(distance)
print(distances)
'''
col vector euclidean col
'''
import numpy as np

instances = np.array([ [5, 2.5, 3],
                       [2.75, 7.50, 4],
                       [9.10, 4.5, 4],
                       [8.9, 2.3, 6]])
print(instances)
test_instance = instances[:, 0]

n_cols = instances.shape[1]  # column의 갯수
# instances.shape 4 by 3 (4,3) tuple
distances = []

for col_idx in range(n_cols):
    instance = instances[:, col_idx] # col vector 뽑아내기
    distance = np.sqrt(np.sum((instance- test_instance)**2))
    distances.append(distance)
print(distances)

2.1.2 맨하탄 거리 (Manhattan Distance)

$$ d( \vec{v}, \vec{u}){L1} = \sum \limits{i=1}^{n} |v_{i} - u_{i} | $$