The following Python 3 code snippet demonstrates the implementation of a simple Random Forrest machine learning classification model to predict an output value from input values.

In the example, a CSV file is first loaded, which contains various input columns (X) and a column with the value (Y) to be predicted by the model. Then the data is divided into a training and a test set before the model itself is trained. The call model.score checks the quality of the generated model against the test data - the output of the variable model.feature_importances_ returns an array with the importance of each input column.

For the example code the Scikit-learn and the Pandas library must be installed (pip install sklearn, pip install pandas).


from sklearn.ensemble import RandomForestClassifier
from sklearn import model_selection
import pandas as pd
import numpy as np

# load csv data
input_data_1 = pd.read_csv("prediction-input-1.txt", sep="\t")

# prepare column to be predicted
input_data_1['Transactions'] = np.where(input_data_1['Transactions'] == 0, input_data_1['Transactions'], 1)

Y = np.array(input_data_1['Transactions'])

# select features (= columns used as input for prediction)
features = list(input_data_1.columns)

del features[5]
del features[0]
X = input_data_1[features]

# split data into test and train set
test_size = 0.33

seed = 77
X_train, X_test, Y_train, Y_test = model_selection.train_test_split(X, Y, test_size=test_size, random_state=seed)

# build and train model
model = RandomForestClassifier(n_estimators=100, n_jobs=2, random_state=0, max_depth=10)

model.fit(X_train, Y_train)

# test model quality
result = model.score(X_test, Y_test)

print(result)

# print vector with feature importances 
print(model.feature_importances_)


 

This website is making use of cookies for website analysis. Data is collected anonymously and solely for the purpose of improving the website. Do you agree to the use of cross-session cookies?