The proposal has numerous flaws in it. Here is perhaps the biggest.
Suppose you are gathering data, and you see these values:
2,3,1
The mean, so far is 6/3=2.
Then comes an outlier:
2,3,1,1000
So you replace it with the mean:
2,3,1,2
The next number is good:
2,3,1,2,7
Now the mean is 3. Wait a minute, the mean is now 3, but we replaced 1000 with a mean of 2, just because it occurred as the fourth value. What if we change the order of the samples?
2,3,1,7,1000
Now the mean prior to the 1000 is (2+3+1+7)/4=13/4. So should we replace 1000 with that mean?
The problem is that the false datum we are substituting in place of 1000 is dependent on the other data. That's an epistemological problem if the samples are supposed to represent independent measurements.
Then you have the obvious problem that you not merely withholding data that doesn't fit your assumptions, but you're falsifying it. When some unwanted result occurs, you increment n, and substitute a fake value. This is wrong because n is supposed to be the count of samples. Now n represents the number of samples, plus the number of fudge values added to the data. It basically destroys the validity of all calculations involving n: even those which do not use the fudge values. Your n is a fudge value too!
Basically, trimming away results that don't fit is one thing (and can be justified if it is done consistently according to an algorithm, rather than according to changing mood swings of the experimenter).
Outright falsifying results is objectionable on philosophical, epistemological and ethical grounds.
There may be some extenuating circumstances, which have to do with how the results are used. Like for instance, say that this substitution of outliers by the current mean is part of some embedded computer's algorithm, which enables it to implement a closed-loop control system. (It samples some system outputs, then adjusts inputs in order to achieve control.) Everything is real time, and so something must be supplied for a given time period in the place of missing data. If this fudging helps to overcome glitches, and ensures smooth operation, then all is good.
Here is another example, from digital telephony: PLC (packet loss concealment). Crap happens, and packets get lost, yet communication is real time. PLC synthesizes fake pieces of voice based on recent pitch information from correctly received packets. So if a speaker was saying the vowel "aaa" and then a packet was lost, PLC can pad the missing packet by extrapolating the "aaa" for the frame duration (say 5 or 10 milliseconds or whatever). The "aaa" is such that it resembles the speaker's voice. This is analogous to using a "mean" to substitute for values regarded as bad. It's a good thing; it's better than the sound cutting in and out, and helps intelligibility.
If the fudging of data is part of a program of lying to people to cover up failing work, that's something else.
So, we cannot think about it independently of the application: how is the statistics being used? Will substitutions lead to invalid conclusions? Are there ethical implications?