Related to ML
Explanations, examples, code snippets, other useful info related to Machine Learning in Spark
Standarization - Standard Scaler
import numpy as np
import matplotlib.pyplot as plt
#create sample data
lst=[3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,4,4,4,4,4,4,4,4,4,4,
4,4,4,4,5,5,5,5,5,5,5,5,6,6,6,6,6,7,7,7,7,8,8,8,8,9,9,10,11,12,13,14,15,16,17]
#compute the standarized sample data: (x-mu)/sigma
lst2= [(i-np.mean(lst))/np.std(lst) for i in lst]
#plot them
_,ax=plt.subplots(nrows=1,ncols=2)
ax[0].hist(lst)
ax[0].set_title("data as-is")
ax[1].hist(lst2)
ax[1].set_title("standarized data")
plt.show()
#create the logarithm of original data
lst3=[np.log(i) for i in lst]
#plot them
_,ax=plt.subplots(nrows=1,ncols=2)
ax[0].hist(lst)
ax[0].set_title("data as-is")
ax[1].hist(lst3)
ax[1].set_title("log data")
plt.show()

Creating Categories, and Their Labels, Manually
How to include zip codes in a ML model
PreviousFull Worked Random Forest Classifier ExampleNextModeling in PySpark as a Function for Faster Testing
Last updated