Has anyone done work on adaptive data collection to reduce the number of redundant data samples. We have a situation were the data is fairly well grouped so that each sample taken is usually similar to data samples already taken. To collect enough samples to cover the sample space results in a gaussian distribution with most of the samples near the "average" of the data space with only a few data points covering the the far away points.
As a result, the network converges very quickly for new samples near to the average, but can take some time to learn the outliers. What I am planning on doing is to start with a small training set and train the network. This network will probably perform poorly. Collect only those new samples that are incorrect or do not pass some threshold criteria. Add these additional samples to my original set and retrain the network. Repeat this cycle until the network performs to the desired level. I perceive that th
is will create a data set that is evenly distributed in the data space with out clumping of the data. and as a result should train faster with fewer samples.