k
:적절한 k 값을 찾는게 중요
장점
update
the model when new labeled
instances become available<aside>
💡 Euclidean distance is more influenced by a single large difference in one feature
rather than a lot of small differences across a set of features
</aside>
$$ d(p,q) = \sqrt {\sum \limits_{i=1}^{n} (q_{i} - p_{i})^{2}} $$
'''
test_instance 와 instances 사이의 euclidean distance
'''
import numpy as np
instances = np.array([ [5, 2.5, 3],
[2.75, 7.50, 4],
[9.10, 4.5, 4],
[8.9, 2.3, 6]])
test_instance = instances[0]
distances = []
for instance in instances:
distance = np.sqrt(np.sum((instance - test_instance)**2))
distances.append(distance)
print(distances)
'''
col vector euclidean col
'''
import numpy as np
instances = np.array([ [5, 2.5, 3],
[2.75, 7.50, 4],
[9.10, 4.5, 4],
[8.9, 2.3, 6]])
print(instances)
test_instance = instances[:, 0]
n_cols = instances.shape[1] # column의 갯수
# instances.shape 4 by 3 (4,3) tuple
distances = []
for col_idx in range(n_cols):
instance = instances[:, col_idx] # col vector 뽑아내기
distance = np.sqrt(np.sum((instance- test_instance)**2))
distances.append(distance)
print(distances)
$$ d( \vec{v}, \vec{u}){L1} = \sum \limits{i=1}^{n} |v_{i} - u_{i} | $$