Berikut ilustrasi singkat mengenai penggunaan SVM. Anggap saja terdapat dua kelompok data dengan sebaran seperti berikut ini:
Dari sebaran data di atas kemudian timbul pertanyaan baru. Bagaimana kita dapat mengelompokkan datanya (meng-cluster data)? tentu dengan cara membagi data tersebut bukan. Dan berikut bebereapa contoh pembagian kelompok data jika ditarik garis lurus yang linear.
Dari gambaran di atas maka tentu banyak sekali kemungkinan pembagian kelompok data yang bisa dilakukan. Tugas berikutnya adalah bagaimana mencari pembagi terbaik dari sebaran data tersebut.
Dari kemungkinan garis pembagi di atas (dari gambar di atas terdapat 3 garis hyperplane pembagi), pasti Anda akan berpikir bahwa gambar berikut ini adalah yang terbaik. Yup...benar sekali, itulah Support Vector Machine
Mengapa pembagi di atas adalah yang terbaik? baik sekarang kita coba analisa gambar di bawah ini.
dari gambar bantu di atas tampak bahwa garis tersebut adalah "jalan terluas yang memisahkan kedua kelompok". Bila kembali ke gambar sebelumnya dan kita bandingkan maka gambar di atas adalah yang terbaik.
gambar di atas jelas kurang terbagi dengan baik, karena "di satu sisi sempit dan di sisi lainnya lebih luas".
Jarak yang memisahkan antara garis pemisah dengan kedua kelompok data terdekat disebut dengan margin, yaitu jarak terjauh yang paling memungkinkan antara garis dengan dua kelompok data yang terdekat dengan garis.
Sehingga perlu dipahami bahwa jarak lebar kotak abu-abu disebut dengan support vector (margin) dan garis tengah pemisah disebut dengan hyperplane.
Ok sekarang kita coba menerapkan SVM untuk memecahkan masalah pada kasus nyata.
Kita akan membuat klasifikasi antara kue cupcakes dan muffin, dimana secara umum keduanya memiliki bahan dasar yang hampir serupa, selain itu perbedaannya pada bahan topping-nya
Cupcakes vs Muffins
Tantangan
Mengklasifikasi resep cupcakes atau muffin. Ketika diberikan resep, kemudian memprediksi apakah itu adalah cupcake atau muffin
Tahapan
1. Menemukan datanya
2. Menerapkan model data science, pada kasus ini digunakan SVM
3. Meninjau ulang hasilnya
1. Menemukan Data Resep
Resep dapat dicari di google, kemudian mencatat datanya
Dari google kita dapatkan 10 resep muffin dan 10 resep cupcake teratas seperti tampak data di atas. Di sini timbul masalah karena ternyata "setiap resep menghasilkan jumlah adonan yang berbeda". Solusinya adalah dengan melakukan normalisasi data. Caranya dengan mengkonversi setiap nilai item menjadi bentuk persen, dimana setiap baris totalnya adalah 100% (atau mendekati).
Konversi amount based menjadi percent based:
Setelah data direkap, maka akan direpresenatasikan sebagai berikut:
2. Menerapkan Model Data Science (Model SVM)
Sekarang kita mulai memasuki script Python dengan langkah sebagai berikut:
- Import Libraries
- Import Data
- Menyiapkan Data
- Fit Modelnya
- Visualisasi Hasil
- Memprediksi Kasus Baru
STEP 1: Import Libraries
# Packages for analysis
import pandas as pd
import numpy as np
from sklearn import svm
# Packages for visuals
import matplotlib.pyplot as plt
import seaborn as sns; sns.set(font_scale=1.2)
# Allows charts to appear in the notebook
%matplotlib inline
# Pickle package
import pickle
STEP 2: Import Dataimport pandas as pd
import numpy as np
from sklearn import svm
# Packages for visuals
import matplotlib.pyplot as plt
import seaborn as sns; sns.set(font_scale=1.2)
# Allows charts to appear in the notebook
%matplotlib inline
# Pickle package
import pickle
# Read in muffin and cupcake ingredient data
recipes = pd.read_csv('recipes_muffins_cupcakes.csv')
recipes
STEP 3: Persiapan Datarecipes = pd.read_csv('recipes_muffins_cupcakes.csv')
recipes
# Plot two ingredients
sns.lmplot('Flour', 'Sugar', data=recipes, hue='Type',
palette='Set1', fit_reg=False, scatter_kws={"s": 70});
sns.lmplot('Flour', 'Sugar', data=recipes, hue='Type',
palette='Set1', fit_reg=False, scatter_kws={"s": 70});
# Specify inputs for the model
# ingredients = recipes[['Flour', 'Milk', 'Sugar', 'Butter', 'Egg', 'Baking Powder', 'Vanilla', 'Salt']].as_matrix()
ingredients = recipes[['Flour','Sugar']].as_matrix()
type_label = np.where(recipes['Type']=='Muffin', 0, 1)
# Feature names
recipe_features = recipes.columns.values[1:].tolist()
recipe_features
STEP 4: Fit Modelnya# ingredients = recipes[['Flour', 'Milk', 'Sugar', 'Butter', 'Egg', 'Baking Powder', 'Vanilla', 'Salt']].as_matrix()
ingredients = recipes[['Flour','Sugar']].as_matrix()
type_label = np.where(recipes['Type']=='Muffin', 0, 1)
# Feature names
recipe_features = recipes.columns.values[1:].tolist()
recipe_features
# Fit the SVM model
model = svm.SVC(kernel='linear')
model.fit(ingredients, type_label)
STEP 5: Visualisasi Hasilnyamodel = svm.SVC(kernel='linear')
model.fit(ingredients, type_label)
# Get the separating hyperplane
w = model.coef_[0]
a = -w[0] / w[1]
xx = np.linspace(30, 60)
yy = a * xx - (model.intercept_[0]) / w[1]
# Plot the parallels to the separating hyperplane that pass through the support vectors
b = model.support_vectors_[0]
yy_down = a * xx + (b[1] - a * b[0])
b = model.support_vectors_[-1]
yy_up = a * xx + (b[1] - a * b[0])
w = model.coef_[0]
a = -w[0] / w[1]
xx = np.linspace(30, 60)
yy = a * xx - (model.intercept_[0]) / w[1]
# Plot the parallels to the separating hyperplane that pass through the support vectors
b = model.support_vectors_[0]
yy_down = a * xx + (b[1] - a * b[0])
b = model.support_vectors_[-1]
yy_up = a * xx + (b[1] - a * b[0])
# Plot the hyperplane
sns.lmplot('Flour', 'Sugar', data=recipes, hue='Type', palette='Set1', fit_reg=False, scatter_kws={"s": 70})
plt.plot(xx, yy, linewidth=2, color='black');
sns.lmplot('Flour', 'Sugar', data=recipes, hue='Type', palette='Set1', fit_reg=False, scatter_kws={"s": 70})
plt.plot(xx, yy, linewidth=2, color='black');
# Look at the margins and support vectors
sns.lmplot('Flour', 'Sugar', data=recipes, hue='Type', palette='Set1', fit_reg=False, scatter_kws={"s": 70})
plt.plot(xx, yy, linewidth=2, color='black')
plt.plot(xx, yy_down, 'k--')
plt.plot(xx, yy_up, 'k--')
plt.scatter(model.support_vectors_[:, 0], model.support_vectors_[:, 1], s=80, facecolors='none');
sns.lmplot('Flour', 'Sugar', data=recipes, hue='Type', palette='Set1', fit_reg=False, scatter_kws={"s": 70})
plt.plot(xx, yy, linewidth=2, color='black')
plt.plot(xx, yy_down, 'k--')
plt.plot(xx, yy_up, 'k--')
plt.scatter(model.support_vectors_[:, 0], model.support_vectors_[:, 1], s=80, facecolors='none');
STEP 6: Prediksi Kasus Baru
# Create a function to guess when a recipe is a muffin or a cupcake
def muffin_or_cupcake(flour, sugar):
if(model.predict([[flour, sugar]]))==0:
print('You\'re looking at a muffin recipe!')
else:
print('You\'re looking at a cupcake recipe!')
def muffin_or_cupcake(flour, sugar):
if(model.predict([[flour, sugar]]))==0:
print('You\'re looking at a muffin recipe!')
else:
print('You\'re looking at a cupcake recipe!')
# Predict if 50 parts flour and 20 parts sugar
muffin_or_cupcake(50, 20)
muffin_or_cupcake(50, 20)
# Plot the point to visually see where the point lies
sns.lmplot('Flour', 'Sugar', data=recipes, hue='Type', palette='Set1', fit_reg=False, scatter_kws={"s": 70})
plt.plot(xx, yy, linewidth=2, color='black')
plt.plot(50, 20, 'yo', markersize='9');
sns.lmplot('Flour', 'Sugar', data=recipes, hue='Type', palette='Set1', fit_reg=False, scatter_kws={"s": 70})
plt.plot(xx, yy, linewidth=2, color='black')
plt.plot(50, 20, 'yo', markersize='9');
# Predict if 40 parts flour and 20 parts sugar
muffin_or_cupcake(40,20)
muffin_or_cupcake(40,20)
Berikut kode program keseluruhan:
# Packages for analysis
import pandas as pd
import numpy as np
from sklearn import svm
# Packages for visuals
import matplotlib.pyplot as plt
import seaborn as sns; sns.set(font_scale=1.2)
# Allows charts to appear in the notebook
%matplotlib inline
# Pickle package
import pickle
# Read in muffin and cupcake ingredient data
recipes = pd.read_csv('recipes_muffins_cupcakes.csv')
recipes
# Plot two ingredients
sns.lmplot('Flour', 'Sugar', data=recipes, hue='Type',
palette='Set1', fit_reg=False, scatter_kws={"s": 70});
# Specify inputs for the model
# ingredients = recipes[['Flour', 'Milk', 'Sugar', 'Butter', 'Egg', 'Baking Powder', 'Vanilla', 'Salt']].as_matrix()
ingredients = recipes[['Flour','Sugar']].as_matrix()
type_label = np.where(recipes['Type']=='Muffin', 0, 1)
# Feature names
recipe_features = recipes.columns.values[1:].tolist()
recipe_features
# Fit the SVM model
model = svm.SVC(kernel='linear')
model.fit(ingredients, type_label)
# Get the separating hyperplane
w = model.coef_[0]
a = -w[0] / w[1]
xx = np.linspace(30, 60)
yy = a * xx - (model.intercept_[0]) / w[1]
# Plot the parallels to the separating hyperplane that pass through the support vectors
b = model.support_vectors_[0]
yy_down = a * xx + (b[1] - a * b[0])
b = model.support_vectors_[-1]
yy_up = a * xx + (b[1] - a * b[0])
# Plot the hyperplane
sns.lmplot('Flour', 'Sugar', data=recipes, hue='Type', palette='Set1', fit_reg=False, scatter_kws={"s": 70})
plt.plot(xx, yy, linewidth=2, color='black');
# Look at the margins and support vectors
sns.lmplot('Flour', 'Sugar', data=recipes, hue='Type', palette='Set1', fit_reg=False, scatter_kws={"s": 70})
plt.plot(xx, yy, linewidth=2, color='black')
plt.plot(xx, yy_down, 'k--')
plt.plot(xx, yy_up, 'k--')
plt.scatter(model.support_vectors_[:, 0], model.support_vectors_[:, 1],
s=80, facecolors='none');
# Create a function to guess when a recipe is a muffin or a cupcake
def muffin_or_cupcake(flour, sugar):
if(model.predict([[flour, sugar]]))==0:
print('You\'re looking at a muffin recipe!')
else:
print('You\'re looking at a cupcake recipe!')
# Predict if 50 parts flour and 20 parts sugar
muffin_or_cupcake(50, 20)
# Plot the point to visually see where the point lies
sns.lmplot('Flour', 'Sugar', data=recipes, hue='Type', palette='Set1', fit_reg=False, scatter_kws={"s": 70})
plt.plot(xx, yy, linewidth=2, color='black')
plt.plot(50, 20, 'yo', markersize='9');
# Predict if 40 parts flour and 20 parts sugar
muffin_or_cupcake(40,20)
import pandas as pd
import numpy as np
from sklearn import svm
# Packages for visuals
import matplotlib.pyplot as plt
import seaborn as sns; sns.set(font_scale=1.2)
# Allows charts to appear in the notebook
%matplotlib inline
# Pickle package
import pickle
# Read in muffin and cupcake ingredient data
recipes = pd.read_csv('recipes_muffins_cupcakes.csv')
recipes
# Plot two ingredients
sns.lmplot('Flour', 'Sugar', data=recipes, hue='Type',
palette='Set1', fit_reg=False, scatter_kws={"s": 70});
# Specify inputs for the model
# ingredients = recipes[['Flour', 'Milk', 'Sugar', 'Butter', 'Egg', 'Baking Powder', 'Vanilla', 'Salt']].as_matrix()
ingredients = recipes[['Flour','Sugar']].as_matrix()
type_label = np.where(recipes['Type']=='Muffin', 0, 1)
# Feature names
recipe_features = recipes.columns.values[1:].tolist()
recipe_features
# Fit the SVM model
model = svm.SVC(kernel='linear')
model.fit(ingredients, type_label)
# Get the separating hyperplane
w = model.coef_[0]
a = -w[0] / w[1]
xx = np.linspace(30, 60)
yy = a * xx - (model.intercept_[0]) / w[1]
# Plot the parallels to the separating hyperplane that pass through the support vectors
b = model.support_vectors_[0]
yy_down = a * xx + (b[1] - a * b[0])
b = model.support_vectors_[-1]
yy_up = a * xx + (b[1] - a * b[0])
# Plot the hyperplane
sns.lmplot('Flour', 'Sugar', data=recipes, hue='Type', palette='Set1', fit_reg=False, scatter_kws={"s": 70})
plt.plot(xx, yy, linewidth=2, color='black');
# Look at the margins and support vectors
sns.lmplot('Flour', 'Sugar', data=recipes, hue='Type', palette='Set1', fit_reg=False, scatter_kws={"s": 70})
plt.plot(xx, yy, linewidth=2, color='black')
plt.plot(xx, yy_down, 'k--')
plt.plot(xx, yy_up, 'k--')
plt.scatter(model.support_vectors_[:, 0], model.support_vectors_[:, 1],
s=80, facecolors='none');
# Create a function to guess when a recipe is a muffin or a cupcake
def muffin_or_cupcake(flour, sugar):
if(model.predict([[flour, sugar]]))==0:
print('You\'re looking at a muffin recipe!')
else:
print('You\'re looking at a cupcake recipe!')
# Predict if 50 parts flour and 20 parts sugar
muffin_or_cupcake(50, 20)
# Plot the point to visually see where the point lies
sns.lmplot('Flour', 'Sugar', data=recipes, hue='Type', palette='Set1', fit_reg=False, scatter_kws={"s": 70})
plt.plot(xx, yy, linewidth=2, color='black')
plt.plot(50, 20, 'yo', markersize='9');
# Predict if 40 parts flour and 20 parts sugar
muffin_or_cupcake(40,20)
Untuk script python lengkapnya silahkan download di Resep.ipynb.
___---SELAMAT MENCOBA, SEMOGA BERMANFAAT---___
mas klo bisa dijelasin setiap progamnya dong..
ReplyDeletesoalnya beda data beda kasus jadi biar para pembaca bisa memahami setiap maksud dari kode program yang dibuat.
Mohon maaf sebelumya
Mas ini ngoding nya pake software apa ya?
ReplyDeleteJupyter notbook googl colab
Deletemas ini masukkin csv nya foldernya dimana ya?
ReplyDeleteKlik aja link nya menuju ke google srive nanti sy proove
Deletemaaf mas itu link drive .csv nya pake kode akses
ReplyDeletepengen atuh file nya mas
Silahkan download di link yang tersedia nanti akan saya proove
Deletebang kenapa garisx tidak vertikal antara 2 item yang diklasifikasi ? kan garisx membagi 2 juga dan secara margin lebih jauh ?
ReplyDeleteMohon untuk di approve bang, terima kasih
ReplyDeleteka saya izin akses file-nya
ReplyDeleteMohon di approve kak, terimakasi
ReplyDeleteMaaf mohon di approve kak, terimakasih
ReplyDelete