7/21/2017

do some data preprocessing to improve the model

Today, I did not figure out how to improve the model. I just delete 66/4359 outliers to obtain a slightly better result.

Summary:

First, I changed the order of predicting these 8 outputs. I changed the order of perm 4 and perm 3 so that both orders of cond and perm are 2134, which is more reasonable. 8 changed models are shown as follows.

ANN1: log data   to   cond 2;
ANN2: log data + cond 2   to   cond 1;
ANN3: log data + cond 2 1   to   cond 3;
ANN4: log data + cond 2 1 3   to   cond 4;
ANN5: log data + cond 2 1 3 4   to   perm 2;
ANN6: log data + cond 2 1 3 4 + perm 2  to   perm 1;
ANN7: log data + cond 2 1 3 4 + perm 2 1   to   perm 3;
ANN8: log data + cond 2 1 3 4 + perm 2 1 3   to   perm 4.

The result improved 8 R2 of testing data in the above order are 0.91, 0.92, 0.88, 0.87, 0.71, 0.70, 0.65, 0.61 in the above order.

5 of them are higher, 2 of them are the same and just 1 is lower, which is perm 4.

For now, I predict 1 output and add predicted values into next model's inputs to train and test the next model. I plan to build all eight models with original data first and use predicted data to calculate the outputs directly next week. I think it may help.

Next week, I plan to do as the above idea.




No comments:

Post a Comment