Ensemble learning is a process used in deep learning wherein multiple models, or experts or classifiers, are combined in an ensemble to improve forecasting results. Each individual model in the ensemble, once trained, produces a prediction on an unseen data point. These predictions are then aggregated in some way. For regression problems, the aggregation is typically the arithmetic mean, while the mode of the predictions is usually used for classification problems, i.e. the most predicted class. The quintessential example of an ensemble model is the random forest. In this case, a multitude of decision tree classifiers are trained on different subsets of the training data (so that the trees’ split points are different for each model) and added to a single ensemble to make predictions.
Ensemble methods provide an effective way to learn in an environment with too much data as well as too little data. The ‘majority vote’ technique used to aggregate ensemble member predictions is also effective at reducing generalization error. This is because it’s unlikely that each member of the ensemble will make the same errors on the test set or on unseen data.
PyML-Ensemble
This project uses a very basic framework called PyML-Ensemble. PyML-Ensemble provides a framework to create adaptable machine learning models. The ensembles are adaptable in that models can be added to or removed from the ensemble at any time, even after training is complete. This allows the creation of real-time ensembles that are retrained or otherwise modified as more data become available. Some examples of adaptability in ensemble methods are provided below.
The code for this project can be found on GitHub and is open to contributions to extend or fix its functionality.
Inspiration and Overview
Creating and using ensembles is fairly straightforward: create n subsets of the train and test data, train n different models, create an assembly of the models (e.g. a list of models in Python), pass a data point into each model, and record its prediction, and finally aggregate the predictions in some way to get the ensemble’s prediction.
In a past research project, Adaptive Deep Learning Ensembles (ADLE), I used ensemble learning in an attempt to improve the forecasting accuracy of neural networks when training on non-stationary time-series data. To do this, I used the Python programming language along with the typical suite of data science and machine learning libraries (e.g. Keras, Numpy, Pandas, etc.) Recently, I decided to take another look at the project and managing the ensembles with Python, although straightforward, turned out to be a bit of a mess. Because of this, I decided to create an easy-to-use framework for creating ensembles of any type of learning algorithm in Python. The framework provides some basic functionality for working with ensembles and a few abstract base classes (ABCs) that can be used to create custom aggregation techniques and base models.
The two ABCs provided by the framework are the aggregator and model classes.
import abc class Aggregator(abc.ABC): def __init__(self): pass @abc.abstractmethod def combine(self, predictions): pass
As seen here the aggregator ABC requires that any derived aggregator class provides only a combine() method which takes as input the predictions from each ensemble member and outputs the actual prediction (the combined, or aggregated, prediction).
import abc class Model(abc.ABC): def __init__(self): pass @abc.abstractmethod def train(self, x, y): pass @abc.abstractmethod def get_prediction(self, x): pass
Similarly, the model ABC forces any subclasses to provide methods to create and manage a model. The classes must be able to train the underlying model given some training inputs (x) and targets (y) and to produce a prediction given additional input data.
Although simple, these two classes provide a basic framework to keep the development of an ensemble focused on the task at hand. Furthermore, basic implementations for both the model and aggregator classes are provided by the framework. The implementations of the aggregator ABC are the MeanAggregator which averages the predictions (best for use in regression tasks) and the ModeAggregator which uses the most frequently predicted value as the ensemble’s prediction (created with classification tasks in mind). The framework provides one built-in model: the decision tree. This tree is built off of the scikit-learn decision tree model which, unfortunately, forces the installation of scikit-learn. To keep dependencies limited, no other models were built into the framework.
Below, example usage of these base classes, the built-in implementations, and the ensemble class are provided.
Creating a Basic Ensemble
Random Forest
Perhaps the most popular ensemble is the random forest which is a collection (ensemble) of classification or regression trees that are trained on different subsets of the data which, when combined, typically yield more predictive power than a single decision tree. Although random forests are already implemented in popular machine learning libraries, such as scikit-learn, creating one using only the built-in TreeModel from pyml_ensemble is a good first example. The breast cancer dataset, which is built into scikit-learn, will be used for training and testing.
For starters, the classes from pyml_ensemble and some helper methods from scikit-learn need to be imported.
# methods to manage the ensemble from pyml_ensemble import Ensemble # classificaton task aggregator from pyml_ensemble.aggregator import ModeAggregator # the tree model built from sklearn's DecisionTreeClassifier from pyml_ensemble.model import TreeModel # the built-in breast cancer dataset from sklearn.datasets import load_breast_cancer # easily split train and test data from sklearn.model_selection import train_test_split # determine how well the ensemble performs from sklearn import metrics
With the required classes imported, the ensemble can be built and trained and predictions can be found for the out-of-sample test data.
if __name__ == '__main__': bc = load_breast_cancer() # load the dataset # split the data into test and train sets bc_trainx, bc_testx, bc_trainy, bc_testy = \ train_test_split(bc.data, bc.target, test_size=0.33) # create an ensemble and set the aggregator to use the # most predicted class ensemble = Ensemble() ensemble.set_aggregator(ModeAggregator()) # add 10 models to the ensemble number_models = 10 for i in range(number_models): ensemble.add_model(TreeModel()) # train the models, all on the same data for now ensemble.train([bc_trainx for _ in range(number_models)], \ [bc_trainy for _ in range(number_models)]) # get the predictions from the ensemble, this method # uses the aggregator previously set on the ensemble y_hat = ensemble.predict(bc_testx) # print the accuracy of the ensemble print(metrics.accuracy_score(bc_testy, y_hat))
As seen here, in just a few lines of code an ensemble can be created which uses 10 decision trees (a Random Forest) to make predictions on the dataset. In the following sections, more intricate ensembles and their implementations in this framework will be explored.
Neural Network Ensemble
Ensembles of more expressive base models are sometimes necessary for complicated machine learning tasks. In this section, a neural network model is created with the framework’s model ABC. This model is then used to create ensembles each with a different number of neural network constituents. The results of these ensembles is then compared.
To begin the artificial neural network model is created (ANNModel). Keras was used to create this model which is why it is not included in the framework as this requires a slew of other large dependencies (e.g. TensorFlow and NumPy).
from keras.models import Sequential from keras.layers import Dense, Activation from pyml_ensemble.model import Model class ANNModel(Model): # "extends" the ABC def __init__(self, input_size, num_hidden_layers, hidden_layer_sizes, output_size, epochs=50, batch_size=1, fit_verbose=2, variables=None, weight_file=''): super().__init__() self.input_size = input_size self.num_hidden_layers = num_hidden_layers self.hidden_layer_sizes = hidden_layer_sizes self.output_size = output_size self.epochs = epochs self.batch_size = batch_size self.verbose = fit_verbose self.weight_file = weight_file self.build_model() def build_model(self): self.model = Sequential() self.model.add(Dense(self.hidden_layer_sizes[0], input_shape=(self.input_size, ), activation='sigmoid')) for i in range(1, self.num_hidden_layers - 1): self.model.add(Dense(self.hidden_layer_sizes[i], activation='sigmoid')) self.model.add(Dense(self.hidden_layer_sizes[len(self.hidden_layer_sizes) - 1], activation='sigmoid')) self.model.add(Dense(self.output_size, activation='sigmoid')) self.model.compile(loss='mean_squared_error', optimizer='adam') def train(self, x, y): self.history = self.model.fit(x, y, epochs=self.epochs, batch_size=self.batch_size, verbose=self.verbose, shuffle=False) def get_prediction(self, x): return self.model.predict(x) def load_weights(self): self.model.set_weights(self.weight_file) def save_weights(self): self.model.save_weights(self.weight_file) def set_weight_filename(self, filename): self.weight_file = filename
The model is very straightforward. The input size, number of hidden layers, number of neurons per hidden layer, and the output size are required as input to the model which are then used to build the neural network. In this case, the hidden layers and the output layer use sigmoidal activations to clamp the values between 0 and 1. This is probably not the BEST way to build the network for classification tasks (especially the output layer and loss function) but works well enough for this quick example.
Next, the ensemble is created using this model:
from sklearn.model_selection import train_test_split from sklearn.datasets import load_breast_cancer, load_boston from sklearn import metrics import numpy as np from pyml_ensemble import Ensemble from pyml_ensemble.aggregator import MeanAggregator from ann_model import ANNModel ensemble = Ensemble() aggregator = MeanAggregator() ensemble.set_aggregator(aggregator) bc = load_breast_cancer() trainx, testx, trainy, testy = train_test_split(bc.data, bc.target, test_size=0.33) num_models = 5 input_size = trainx.shape[1] output_size = 1 # class num_hidden_layers = 5 # 5 hidden layers hidden_layer_sizes = 10 # 10 nodes per hidden layer for i in range(num_models): ann = ANNModel(input_size, num_hidden_layers, \ [hidden_layer_sizes for _ in range(num_hidden_layers)], \ output_size, epochs=512, batch_size=8, fit_verbose=0) ensemble.add_model(ann) ensemble.train([trainx for _ in range(num_models)], [trainy for _ in range(num_models)]) y_hat = ensemble.predict(testx) # get predictions y_hat = np.round(y_hat) # set class to 0 or 1 rint(metrics.accuracy_score(testy, y_hat))
With the ensemble created the num_models parameter is set to either 1, 5, or 10 and the ensemble is trained, predictions are made, and the accuracy is measured. A comparison of the results is shown below and the main advantage of ensemble learning is accentuated as adding more networks increases predictive accuracy, even on this small dataset. However, when adding even more networks the improvement diminishes. Note that, a larger network (more hidden layers with more nodes) could have also been used but may not have given as good results and could, potentially, take longer to train.
Number of Networks | Accuracy (% Predicted Correctly) |
1 | 88.83% Correct |
5 | 92.71% Correct |
10 | 92.77% correct |
Adaptive Ensembles
For some applications, e.g. complex time-series forecasting, it is beneficial, and in some cases necessary, to have an adaptive ensemble. Two examples of adaptive ensembles are i) an ensemble that is retrained as new training data become available and ii) an ensemble where new models are added and trained on new training data. These two types of ensembles allow the models to remain strong predictors of the dataset as, for example, the statistical properties of the underlying data-generating function changes, as is the case with non-stationary time-series data. PyML-Ensemble was created for just these types of ensembles and examples of both are shown below. Note that, although these are the only types of dynamic ensembles being shown, the real purpose of this section is to show how this framework can be used to create any type of dynamic ensemble rather than being stuck with a static set of trained models.
For simplicity’s sake, the ensembles are still trained using the built-in breast cancer dataset.
In the first adaptive ensemble, the training and test data are split into smaller subsets. Each network in the ensemble is trained on a subset of the data. Testing data is predicted in chunks rather than en masse and, as new data becomes available (i.e. old test data is used as new training data which could be done in practice), new networks are added to the ensemble in an attempt to improve predictive accuracy. Comparisons of the performance are not presented here due to the length of this post and because this setup is strictly for demonstrative purposes; this probably shouldn’t be done on this dataset in the real-world.
from sklearn.model_selection import train_test_split from sklearn.datasets import load_breast_cancer, load_boston from sklearn import metrics import numpy as np from pyml_ensemble import Ensemble from pyml_ensemble.aggregator import MeanAggregator from ann_model import ANNModel # returns the model to be added to the ensemble def get_ann_model(input_size, num_hidden_layers, hidden_layer_sizes, output_size): return ANNModel(input_size, num_hidden_layers, \ [hidden_layer_sizes for _ in range(num_hidden_layers)], \ output_size, epochs=1024, batch_size=4, fit_verbose=0) ensemble = Ensemble() aggregator = MeanAggregator() ensemble.set_aggregator(aggregator) bc = load_breast_cancer() trainx, testx, trainy, testy = train_test_split(bc.data, bc.target, test_size=0.33) num_models = 5 chunk_size = 80 # ~80 rows of data per network (all except last network) trainx_chunks = [trainx[(i*chunk_size):((i+1)*chunk_size)] for i in range(num_models)] trainy_chunks = [trainy[(i*chunk_size):((i+1)*chunk_size)] for i in range(num_models)] input_size = trainx.shape[1] output_size = 1 # class num_hidden_layers = 5 hidden_layer_sizes = 10 for i in range(num_models): ann = get_ann_model(input_size, num_hidden_layers, hidden_layer_sizes, output_size) ensemble.add_model(ann) # train on individual chunks rather than on the same data ensemble.train(trainx_chunks, trainy_chunks) # Iterate the test data. Get predictions and use the new chunks of data to add # members to the ensemble. # # Unfortunately, the entire ensemble has to be re-trained when adding a # new member. Training individual members is slated for a future release. y_hat = None for i in range(0, testx.shape[0], chunk_size): # get the current data subsets testx_chunk = testx[i:(i+chunk_size)] testy_chunk = testy[i:(i+chunk_size)] preds = ensemble.predict(testx_chunk) # get predicitons y_hat = preds if y_hat is None else np.vstack([y_hat, preds]) # create, add, and train the new ensemble member ann = get_ann_model(input_size, num_hidden_layers, hidden_layer_sizes, output_size) ensmeble.add_model(ann) trainx_chunks.append(testx_chunk) trainy_chunks.append(testy_chunk) ensemble.train(trainx_chunks, trainy_chunks) y_hat = np.round(y_hat) print(metrics.accuracy_score(testy, y_hat))
In the next example of an adaptive ensemble, all of the neural networks in an ensemble will be trained on the entirety of the training dataset. Then subsets of the test dataset will be predicted and this newly seen data will then be added to the training dataset. All of the ensemble members will be re-trained using the newly available data combined with the pre-existing training data.
Essentially, this type of adaptation in the ensemble is just updating the models as new training data becomes available. For example, an ensemble could be trained on historic stock price data and, as time passes with the ensemble in production, new price data can be recorded. This new price data can then be used to retrain the networks in the ensemble potentially improving their predictive capabilities.
from sklearn.model_selection import train_test_split from sklearn.datasets import load_breast_cancer, load_boston from sklearn import metrics import numpy as np from pyml_ensemble import Ensemble from pyml_ensemble.aggregator import MeanAggregator from ann_model import ANNModel # get model for ensemble def get_ann_model(input_size, num_hidden_layers, hidden_layer_sizes, output_size): return ANNModel(input_size, num_hidden_layers, \ [hidden_layer_sizes for _ in range(num_hidden_layers)], \ output_size, epochs=1024, batch_size=4, fit_verbose=0) # create ensemble structure ensemble = Ensemble() aggregator = MeanAggregator() ensemble.set_aggregator(aggregator) # load and split data bc = load_breast_cancer() trainx, testx, trainy, testy = train_test_split(bc.data, bc.target, test_size=0.33) # build the ensemble by adding neural network models num_models = 3 input_size = trainx.shape[1] output_size = 1 # class num_hidden_layers = 5 hidden_layer_sizes = 10 for i in range(num_models): ann = get_ann_model(input_size, num_hidden_layers, hidden_layer_sizes, output_size) ensemble.add_model(ann) # train all available data ensemble.train([trainx for _ in range(num_models)], [trainy for _ in range(num_models)]) # retrain on old + new data every 50 test data points chunk_size = 50 y_hat = None for i in range(0, testx.shape[0], chunk_size): # get the predicitons and test data chunks testx_chunk = testx[i:(i+chunk_size)] testy_chunk = testy[i:(i+chunk_size)] preds = ensemble.predict(testx_chunk) y_hat = preds if y_hat is None else np.vstack([y_hat, preds]) # append the newly available data to the training sets trainx = np.concatenate([trainx, testx_chunk]) trainy = np.concatenate([trainy, testy_chunk]) # create and add the new model ann = get_ann_model(input_size, num_hidden_layers, hidden_layer_sizes, output_size) num_models += 1 ensemble.add_model(ann) # train the new model on old + new data ensemble.train([trainx for _ in range(num_models)], [trainy for _ in range(num_models)]) y_hat = np.round(y_hat) print(metrics.accuracy_score(testy, y_hat))
Chimera Ensembles
This section will demonstrate the inclusion of different types of ensemble methods in a single ensemble which is aptly named the chimera ensemble. I’m not familiar with any good reasons to use ensembles in this way but this can still easily be done using the PyML-Ensemble framework if need be.
The familiar TreeModel and ANNModel from above are used in the chimera ensemble. One small modification is made to the ANNModel to get the predictions into the right format (to match the other models) when predicting with the ensemble. This modification is shown, in short, below.
... import numpy as np ... class ANNModel(Model): ... def get_prediction(self, x): preds = np.round(self.model.predict(x)).tolist() return [pred[0] for pred in preds]
The prediction from the neural network is now rounded prior to being returned to the ensemble’s aggregator. Due to how the ModeAggregator works, the list of predictions has to be reshaped before being returned as well. Again, this isn’t the best way to do classification using neural networks but is being done here for simplicity’s sake.
A new classifier is also created to be added alongside the neural network and decision tree models:
from pyml_ensemble.model import Model import random class UselessClassifier(Model): def __init__(self, num_classes): self.num_classes = num_classes def train(self, x, y): pass # no training needed def get_prediction(self, x): # Randomly select a class. Assumes classes are labled 0 through n-1 and # a list or numpy array is used as input. return [random.randint(0, self.num_classes-1) for _ in range(len(x))]
Clearly, as indicated by the name, the UselessClassifier doesn’t really have a purpose. However, it is included to demonstrate how any classifier conforming to the model ABC can easily be used in conjunction with other models in the ensemble. It is also introduced to illustrate how, when using the framework provided, a machine learning practitioner can stray from the beaten path (i.e. neural networks, trees, etc.) to create and use any type of classifier they deem fit.
With this modification and a new model the ensemble is created:
from sklearn.model_selection import train_test_split from sklearn.datasets import load_breast_cancer from sklearn import metrics import numpy as np from pyml_ensemble import Ensemble from pyml_ensemble.aggregator import ModeAggregator from pyml_ensemble.model import TreeModel from ann_model import ANNModel from useless import UselessClassifier # split train/test data bc = load_breast_cancer() trainx, testx, trainy, testy = \ train_test_split(bc.data, bc.target, test_size=0.33) # set up the ensemble ensemble = Ensemble() ensemble.set_aggregator(ModeAggregator()) num_models = 3 ensemble.add_model(TreeModel()) # add a decision tree classifier # add a neural network ensemble.add_model(ANNModel(trainx.shape[1], 5, [10 for _ in range(5)], \ 1, epochs=512, batch_size=8, fit_verbose=0)) ensemble.add_model(UselessClassifier(2)) # add a useless classifer: 2 = two classes # train the ensemble ensemble.train([trainx for _ in range(num_models)], \ [trainy for _ in range(num_models)]) # get predictions y_hat = ensemble.predict(testx) print(metrics.accuracy_score(testy, y_hat))
Conclusion
In this post, the PyML-Ensemble framework has been introduced and described. Links to the PyPi repository page and the GitHub repository have also been provided for those interested. Some examples of the framework’s usage are given but are really only the tip of the iceberg for such an open-ended framework. The goal of this project is to provide a basic framework to handle the busy-work behind building and managing machine learning ensembles. In future releases, I plan to extend this functionality by providing additional built-in models and aggregators while simultaneously trying to keep dependencies to a minimum. There are also a few conveniences that I would like to add to ensemble prediction and training.