sklearn已经提供了很多转换器,如果想自定义转换器,可以定义一个新的类并且实现其fit(),transform(),fit_transform()三个方法。
添加TransformerMixin作为基类,会直接得到fit_transform()方法;
添加BaseEstimator作为基类,可以获得两个自动调整超参数的方法:get_params()和set_params()
#自定义转换器,添加新的属性 from sklearn.base import BaseEstimator,TransformerMixin rooms_ix, bedrooms_ix, population_ix, household_ix = 3, 4, 5, 6 class CombinedAttributesAdder(BaseEstimator,TransformerMixin): def __init__(self,add_bedrooms_per_room=True): self.add_bedrooms_per_room=add_bedrooms_per_room def fit(self,X,y=None): return delf def transform(self,X,y=None): rooms_per_household=X[:,rooms_ix]/X[:,household_ix] population_per_household=X[:,population_ix]/X[:,household_ix] if self.add_bedrooms_per_room: bedrooms_per_room=X[:,bedrooms_ix]/X[:,rooms_ix] return np.c_[X,rooms_per_household,population_per_household,bedrooms_per_room] else: return np.c_[X,rooms_per_household,population_per_household] attr_adder=CombinedAttributesAdder(add_bedrooms_per_room=True) housing_extra_attribs=attr_adder.transform(housing.values)
pd.DataFrame(housing_extra_attribs,columns=['longitude', 'latitude', 'housing_median_age', 'total_rooms',
'total_bedrooms', 'population', 'households', 'median_income',
'ocean_proximity','rooms_per_household','population_per_household','bedrooms_per_room']).head()
输出为:
原来的训练集为:
多了三个属性:rooms_per_household,population_per_household,bedrooms_per_room