Today, I write some case study.
Summary:
https://1drv.ms/w/s!Ao543UQvyvOWjFGXAiwi6homt05n
Next week, I will finish writing the draft.
4/28/2017
4/27/2017
Finish theory and methodology and start case study
Today, I finish theory and methodology and start case study.
Summary:
The following is my draft link:
https://1drv.ms/w/s!Ao543UQvyvOWjFGXAiwi6homt05n
Tomorrow, I will continue to write the paper.
Summary:
The following is my draft link:
https://1drv.ms/w/s!Ao543UQvyvOWjFGXAiwi6homt05n
Tomorrow, I will continue to write the paper.
4/26/2017
Write theory and methodology
Today, I write theory and methodology.
Summary:
The following is my draft link:
https://1drv.ms/w/s!Ao543UQvyvOWjFGXAiwi6homt05n
Tomorrow, I will continue to write the paper.
Summary:
The following is my draft link:
https://1drv.ms/w/s!Ao543UQvyvOWjFGXAiwi6homt05n
Tomorrow, I will continue to write the paper.
4/24/2017
change the introduction and write some of the theory and methodology
Today, I change my introduction and write some parts of the theory and methodology.
Summary:
The following is my draft link:
https://1drv.ms/w/s!Ao543UQvyvOWjFCNBYyyIdsU-5_a
Tomorrow, I will continue to write the paper.
Summary:
The following is my draft link:
https://1drv.ms/w/s!Ao543UQvyvOWjFCNBYyyIdsU-5_a
Tomorrow, I will continue to write the paper.
4/21/2017
Talk with Sang and change the abstract
Today, I talked with Sang and changed the abstract of my paper.
Summary:
Today, I talked with Sang. He gives me a lot of suggestions. He suggests me to read papers and learn from them. He also shows me how to change my sentences to professional ones.
I read some of my reference papers again and change my abstract of my paper.
The following is the link of my draft:
https://1drv.ms/w/s!Ao543UQvyvOWjE-TCxlFy5Jj_fbx
Next week, I will continue to change and write my paper.
Summary:
Today, I talked with Sang. He gives me a lot of suggestions. He suggests me to read papers and learn from them. He also shows me how to change my sentences to professional ones.
I read some of my reference papers again and change my abstract of my paper.
The following is the link of my draft:
https://1drv.ms/w/s!Ao543UQvyvOWjE-TCxlFy5Jj_fbx
Next week, I will continue to change and write my paper.
4/20/2017
abstract, introduction and references
Today, I write the paper and finish abstract, introduction and references parts.
Summary:
The link is the draft:
https://1drv.ms/w/s!Ao543UQvyvOWjE41weBd3unCLBpg
Tomorrow, I will continue to write the paper.
Summary:
The link is the draft:
https://1drv.ms/w/s!Ao543UQvyvOWjE41weBd3unCLBpg
Tomorrow, I will continue to write the paper.
4/19/2017
Finish writing the outline and start to write the draft of the paper
Today, I finish writing the outline and start to write the draft of the paper.
Summary:
The link is for the draft:
https://1drv.ms/w/s!Ao543UQvyvOWjE0PxDnTE7kpCSqB
Tomorrow, I will continue to write the paper.
Summary:
The link is for the draft:
https://1drv.ms/w/s!Ao543UQvyvOWjE0PxDnTE7kpCSqB
Tomorrow, I will continue to write the paper.
4/18/2017
find all related papers and write parts of the outline
Today, I find all related papers and write parts of the outline.
Summary:
First, I find all related papers and review their abstracts again.
Second, I write two parts of the outline, introduction and methods.
The following is my outline written today.
Outline
Top-level outline (sections)
Second-level outline (paragraphs)
Summary:
First, I find all related papers and review their abstracts again.
Second, I write two parts of the outline, introduction and methods.
The following is my outline written today.
Outline
Top-level outline (sections)
1.
Introduction – nobody predict 6
parameters before. Some use 6 or 9 parameters to predict permeability or pore
types.
2.
Methods- build an ANN model to
predict 6 parameters
1.
Introduction – nobody predict 6
parameters before.
1.1.
Predict NMR T2
distribution can save time and money as well as know the characteristics of the
reservoir.
1.2.
Some use ANN and
other models to predict porosity, permeability, water saturation, TOC, etc. But
they did not predict NMR.
1.3.
Some use ANN and other models to predict data related to NMR such
as free fluid, irreducible water, effective porosity obtained from NMR.
1.4.
Some use ANN and
other models to predict bin porosities and T2 logarithmic mean values instead
of T2 distribution.
1.5.
6 parameters are
predicted in this paper. They are very close to T2 distribution.
2.
Methods - build an ANN model to
predict 6 parameters
2.1.
Select target depths
in Hess data, there are 7 different lithologies in target depths.
2.2.
Select 12 different
logging data
2.3.
Select 10 different inversion
data (QElan)
2.4.
Set one category for
different lithologies, there are 7 different lithologies.
2.5.
Set four different
categories according to the shape of T2 distribution. Each different categories
have 2 or 3 levels.
2.6.
Use k-nearest
neighbor method to predict four different categories, with best k=3.
2.7.
Fit T2 distribution
with normal distributions. Each normal distribution corresponds to 3
parameters. There are 6 parameters for each T2 distribution at each depth.
2.8.
Show the accuracy of
fitting
2.9.
Preprocess
data
2.10.
Build an ANN model
with 2 hidden layers
2.11.
Use R and matlab to
build the model, the results are similar but R takes much more time than
matlab. So matlab is recommended to build and test the model.
2.12.
Build an ANN model
with 2 hidden layers using matlab
2.13.
Show the accuracy of
the model
Tomorrow, I will try to finish the outline.
4/17/2017
Combine 6 parameters and 64 bins
Today, I combined 6 parameters and 64 bins to improve the prediction accuracy, but it did not show improvement in accuracy of prediction.
Summary:
I compared the prediction of both 6 parameters and 64 bins.
The first figure is T2 distribution comparison of 64 bins.
The second figure is T2 distribution comparison of 6 parameters.
The third is 3 line comparison figure.
For 6 parameters line, the main problem is that the miu may be quite different from the actual one.
For 64 bins line, the main problem is that the peak may be quite different from the actual one.
After moving the bigger miu of 6 parameters to the more accurate one with 64 bins data, , it is a little better compared with original 6 parameters prediction, but not for original 64 bins prediction. The reason is that: although I adjust miu for 6 parameters, it will not change alpha and sigma, both of which will all impact the accuracy of T2 distribution.
In my opinion, I should divide my findings into two parts. Maybe later, I can divide it into two papers.
First, I choose logging data, set different kind of categories,predict these categories and fit T2 distribution with 6 parameters of normal distribution and change the ANN model to predict 6 parameters for predicting t2 distribution.
Second, I choose logging data, set different kind of categories,predict these categories and change the ANN model to predict t2 distribution directly.
Although median value of R2 of 6 parameters is a little smaller than that of 64 bins, it is more stable too. The median value of R2 of 6 parameters drops from 0.7553 (training) to 0.7220 (testing) by 0.0333 but the median value of R2 of 64 bins drops from 0.8445 (training) to 0.7662 (testing) by 0.0783.
Tomorrow, I want to discuss with you and start to write the paper.
Summary:
I compared the prediction of both 6 parameters and 64 bins.
The first figure is T2 distribution comparison of 64 bins.
The second figure is T2 distribution comparison of 6 parameters.
The third is 3 line comparison figure.
For 6 parameters line, the main problem is that the miu may be quite different from the actual one.
For 64 bins line, the main problem is that the peak may be quite different from the actual one.
After moving the bigger miu of 6 parameters to the more accurate one with 64 bins data, , it is a little better compared with original 6 parameters prediction, but not for original 64 bins prediction. The reason is that: although I adjust miu for 6 parameters, it will not change alpha and sigma, both of which will all impact the accuracy of T2 distribution.
In my opinion, I should divide my findings into two parts. Maybe later, I can divide it into two papers.
First, I choose logging data, set different kind of categories,predict these categories and fit T2 distribution with 6 parameters of normal distribution and change the ANN model to predict 6 parameters for predicting t2 distribution.
Second, I choose logging data, set different kind of categories,predict these categories and change the ANN model to predict t2 distribution directly.
Although median value of R2 of 6 parameters is a little smaller than that of 64 bins, it is more stable too. The median value of R2 of 6 parameters drops from 0.7553 (training) to 0.7220 (testing) by 0.0333 but the median value of R2 of 64 bins drops from 0.8445 (training) to 0.7662 (testing) by 0.0783.
Tomorrow, I want to discuss with you and start to write the paper.
4/14/2017
Change my ANN models and improve the prediction
Today, I changed my ANN models a little to set training and testing data manually and record them. The prediction results are a little better than before.
Summary:
The following is the comparison of predicting 6 parameters. The training part is a little better. 85% data (354) are used for training and 15% data (62) are used for testing.
The median value of R2 in training data is 0.7553. The median value of NRMSE is 0.1575.
The following are four examples of t2 distribution with different R2. Also, there are R2 distribution and NRMSE distribution.
The median value of R2 in testing data is 0.7220. The median value of NRMSE is 0.1685.
I also change the ANN model for 64 bins and plot these distributions.
The median value of R2 in training data is 0.8445. The median value of NRMSE is 0.1311.
The median value of R2 in training data is 0.7662. The median value of NRMSE is 0.1512.
Next week, I will try to see if I can improve the prediction accuracy more and start to write the paper.
Summary:
The following is the comparison of predicting 6 parameters. The training part is a little better. 85% data (354) are used for training and 15% data (62) are used for testing.
The median value of R2 in training data is 0.7553. The median value of NRMSE is 0.1575.
The following are four examples of t2 distribution with different R2. Also, there are R2 distribution and NRMSE distribution.
The median value of R2 in testing data is 0.7220. The median value of NRMSE is 0.1685.
I also change the ANN model for 64 bins and plot these distributions.
The median value of R2 in training data is 0.8445. The median value of NRMSE is 0.1311.
The median value of R2 in training data is 0.7662. The median value of NRMSE is 0.1512.
Next week, I will try to see if I can improve the prediction accuracy more and start to write the paper.
4/12/2017
Plot some figures
Today, I plotted some figures as we discussed.
Summary:
The following is the median R2 vs. delete number. Y-axis is the median R2 value. X-axis is the delete number. 1 represent using all 64 bins to calculate R2. 2 represent deleting 1 bin at each end of bins and use 62 bins to calculate R2. 10 represent deleting 9 bins at each end of bins and use 46 bins to calculate R2. From the figure, we can know that, actually, we do not need to delete any bins to calculate R2. The median R2 value of 64 bins is the highest.
The following is T2 Fit Accuracy. The median value of R2 is 0.983, which is very accurate.
The following is four examples of T2 distribution of different R2.
The following is R2 and NRMSE figure for training data (80% randomly). For all R2 below 0, I set it to 0 because they all represent poor prediction. The median value of R2 is 0.72 and the median value of NRMSE is 0.16.
The following is R2 and NRMSE figure for testing data (20% randomly). For all R2 below 0, I set it to 0 because they all represent poor prediction. The median value of R2 is 0.67 and the median value of NRMSE is 0.18.
Tomorrow, I will try to improve the model by combing 6 parameters and 64 bins results.
Summary:
The following is the median R2 vs. delete number. Y-axis is the median R2 value. X-axis is the delete number. 1 represent using all 64 bins to calculate R2. 2 represent deleting 1 bin at each end of bins and use 62 bins to calculate R2. 10 represent deleting 9 bins at each end of bins and use 46 bins to calculate R2. From the figure, we can know that, actually, we do not need to delete any bins to calculate R2. The median R2 value of 64 bins is the highest.
The following is T2 Fit Accuracy. The median value of R2 is 0.983, which is very accurate.
The following is four examples of T2 distribution of different R2.
The following is R2 and NRMSE figure for training data (80% randomly). For all R2 below 0, I set it to 0 because they all represent poor prediction. The median value of R2 is 0.72 and the median value of NRMSE is 0.16.
The following is R2 and NRMSE figure for testing data (20% randomly). For all R2 below 0, I set it to 0 because they all represent poor prediction. The median value of R2 is 0.67 and the median value of NRMSE is 0.18.
Tomorrow, I will try to improve the model by combing 6 parameters and 64 bins results.
4/11/2017
Plot T2 with different R2
Today, I plot T2 with different R2.
Summary:
The following is 6 parameters T2 plot with different R2. Because the lower R2 is, the poorer the prediction will be. There is nothing else special with negetive R2. So I just plot maximum, 0 and minumum R2.
R2= -4.3:
R2= 0:
R2= 0.9972:
The following is 64 bins T2 plot with different R2.
R2= -3.6:
R2= 0:
R2= 0.9934:
The following is the RMSE distribution of T2 with 6 parameters. Since it is not a value about percentage, I prefer not to use it.
The following is the RMSRE distribution of T2 with 6 parameters. The range of it is too big (from 0.2 to 1.8*10^37). I prefer not to use it.
The following is the NRMSE distribution of T2 with 6 parameters. The range of it is from 0 to 1. The smaller, the better. I prefer to use it. The median value of it is 0.17, which is good for prediction.
For MAE, it is also an absolute value, so I prefer not to use it.
In conclusion, I prefer R2 and NRMSE (normalized RMSE). They both have a relative value. If the value of R2 is close to 1, it means a good prediction. If the value of NRMSE is close to 0, it means a good prediction.
I spend about one hour to look for the terminology of the method of dividing my categories, but I do not find it. Maybe there is no terminology for it.
You can first read it clearly if I upload it. I will put my tonight's work to tomorrow's blog.
At tonight and tomorrow, I will plot what you say and continue to think about the combination of 64 bins and 6 parameters to improve the prediction.
Summary:
The following is 6 parameters T2 plot with different R2. Because the lower R2 is, the poorer the prediction will be. There is nothing else special with negetive R2. So I just plot maximum, 0 and minumum R2.
R2= -4.3:
R2= 0:
R2= 0.9972:
The following is 64 bins T2 plot with different R2.
R2= -3.6:
R2= 0:
R2= 0.9934:
The following is the RMSE distribution of T2 with 6 parameters. Since it is not a value about percentage, I prefer not to use it.
The following is the RMSRE distribution of T2 with 6 parameters. The range of it is too big (from 0.2 to 1.8*10^37). I prefer not to use it.
The following is the NRMSE distribution of T2 with 6 parameters. The range of it is from 0 to 1. The smaller, the better. I prefer to use it. The median value of it is 0.17, which is good for prediction.
For MAE, it is also an absolute value, so I prefer not to use it.
In conclusion, I prefer R2 and NRMSE (normalized RMSE). They both have a relative value. If the value of R2 is close to 1, it means a good prediction. If the value of NRMSE is close to 0, it means a good prediction.
I spend about one hour to look for the terminology of the method of dividing my categories, but I do not find it. Maybe there is no terminology for it.
You can first read it clearly if I upload it. I will put my tonight's work to tomorrow's blog.
At tonight and tomorrow, I will plot what you say and continue to think about the combination of 64 bins and 6 parameters to improve the prediction.
4/10/2017
divide categories
Today, I divided the 25th categories into 3 different categories:
Summary:
23th categories define lithologies (1 to 7)
24th categories define one or two peaks (0 or 1)
25th categories define small or big miu (-1 0 or 1)
26th categories define small peaks (0 or 1)
27th categories define big deviations (0 or 1)
The following is the T2 comparison of 2667 depths with all 11 lithologies. The accuracy is not good.
The following is the T2 comparison of 416 depths with middle 7 lithologies. The accuracy is good.
The median value of R2 is 0.71 (last week it is 0.65 with 25 inputs.).
The I use k-nearest neighbor method to predict these catogories with 23 input data.
24th categories testing data (20%) is 86% correctly predicted.
25th categories testing data (20%) is 83% correctly predicted.
26th categories testing data (20%) is 80% correctly predicted.
27th categories testing data (20%) is 86% correctly predicted.
The following is 416 depths T2 comparison with 64 bins predicted by 27 input data. The accuracy is good.
The median value of R2 is 0.77.
RMSE:
It can be used to calculate the accuracy, but it is not about a ratio. So I think it is better to use R2 to calculate the accuracy of prediction.
R2:
Tomorrow, I will continue to improve the prediction of categories and try to conbine the results of 64 bins prediction and 6 parameter prediction.
Summary:
23th categories define lithologies (1 to 7)
24th categories define one or two peaks (0 or 1)
25th categories define small or big miu (-1 0 or 1)
26th categories define small peaks (0 or 1)
27th categories define big deviations (0 or 1)
The following is the T2 comparison of 2667 depths with all 11 lithologies. The accuracy is not good.
The following is the T2 comparison of 416 depths with middle 7 lithologies. The accuracy is good.
The median value of R2 is 0.71 (last week it is 0.65 with 25 inputs.).
The I use k-nearest neighbor method to predict these catogories with 23 input data.
24th categories testing data (20%) is 86% correctly predicted.
25th categories testing data (20%) is 83% correctly predicted.
26th categories testing data (20%) is 80% correctly predicted.
27th categories testing data (20%) is 86% correctly predicted.
The following is 416 depths T2 comparison with 64 bins predicted by 27 input data. The accuracy is good.
The median value of R2 is 0.77.
RMSE:
It can be used to calculate the accuracy, but it is not about a ratio. So I think it is better to use R2 to calculate the accuracy of prediction.
R2:
Tomorrow, I will continue to improve the prediction of categories and try to conbine the results of 64 bins prediction and 6 parameter prediction.
4/07/2017
try different methods to predict 2 categories
Today, I tried different methods to predict 2 categories. But they all did not perform very well.
Summary:
Uwater and Uoil means unflushed water and unflushed oil.
I separate data into training (80%) and testing (20%) data.
1. The first is k-nearest neighbor model. The accuracy of predicting testing data is just about 60% for predicting the 2nd category (which is more complicate).
2. The second is SVM model. It can just predict 2 classes. So it cannot be applied into 2nd category prediction.
3. The third is discriminant analysis classification model. The accuracy of predicting testing data is just about 50% for predicting the 2nd category.
Next week, I will try to find ways to improve the prediction of 2 categories.
Summary:
Uwater and Uoil means unflushed water and unflushed oil.
I separate data into training (80%) and testing (20%) data.
1. The first is k-nearest neighbor model. The accuracy of predicting testing data is just about 60% for predicting the 2nd category (which is more complicate).
2. The second is SVM model. It can just predict 2 classes. So it cannot be applied into 2nd category prediction.
3. The third is discriminant analysis classification model. The accuracy of predicting testing data is just about 50% for predicting the 2nd category.
Next week, I will try to find ways to improve the prediction of 2 categories.
4/06/2017
predict categories
Today, I predicted two categories and add them to 23 input data from IP. The total 25 of them predict T2 distribution not badly.
Summary:
Since my 24th category is about one peak or two peak, so I set them 0 or 1. My 25th category is about the total permeability of the depth, I set them from -2, -1, 0, 1 to 2.
Since they are from T2 distribution, so they should be predicted first. I use the k-nearest neighbor method to predict them by 23 logging data.
The accuracy of 24th category is 97% and that of 25th category is 79%. And the new ANN model performance is shown below:
The first is R2 distribution of prediction performance. The median value is 0.5, which is lower than yesterday's (0.65).
The second is the comparison plots.
Tomorrow, I will try new methods to improve the prediction accuracy of two categories.
Summary:
Since my 24th category is about one peak or two peak, so I set them 0 or 1. My 25th category is about the total permeability of the depth, I set them from -2, -1, 0, 1 to 2.
Since they are from T2 distribution, so they should be predicted first. I use the k-nearest neighbor method to predict them by 23 logging data.
The accuracy of 24th category is 97% and that of 25th category is 79%. And the new ANN model performance is shown below:
The first is R2 distribution of prediction performance. The median value is 0.5, which is lower than yesterday's (0.65).
The second is the comparison plots.
Tomorrow, I will try new methods to improve the prediction accuracy of two categories.
4/05/2017
delete some depths and get better results
Today, I delete some depths and get better results.
Summary:
There are 11 depths in target depths. After looking at IP plots, I decide to delete the first 3 and last 1 depths. The first depth is very thick and not quite important. T fourth to the tenth is comparatively important than others. So I decide to use the middle 7 kind of depths.
The following are performance without categories. The median R2 is about 0.25 and median R is about 0.5.
The following are performance with categories. The median R2 is about 0.65 and median R is about 0.8.
They are much better than before.
Tomorrow, I will continue to see if there are other methods to improve the performance of the ANN model prediction.
Summary:
There are 11 depths in target depths. After looking at IP plots, I decide to delete the first 3 and last 1 depths. The first depth is very thick and not quite important. T fourth to the tenth is comparatively important than others. So I decide to use the middle 7 kind of depths.
The following are performance without categories. The median R2 is about 0.25 and median R is about 0.5.
The following are performance with categories. The median R2 is about 0.65 and median R is about 0.8.
They are much better than before.
Tomorrow, I will continue to see if there are other methods to improve the performance of the ANN model prediction.
4/04/2017
Add new inputs and results are better
Today, I add several inputs to get better results of ANN model performance.
Summary:
1. there are 11 kind of lithologies in my target depths, so I add one input which includes number from 1 to 11, each represent 1 kind of lithology.
2. now I predict depths which just has one or two peaks, so I add another input which includes number 0 and 1, which represent 1 peak and two peaks.
3. for different combinations of 6 parameters (some of their combinations may have no meaning), there are different physical meanings. I separate them into 5 different categories based on different kinds of reservoir permeability.
The following is the performance of 25 inputs, they are better than 22 and some depths could match better than before. The median value of R increased by about 0.1.
The following is to predict 64 bins directly. The performance of it also becomes better. (I predicted 32 and 16 before and they have similar results.)
Tomorrow, I will try to find more categories which have physical meanings. I find that categories of physical meanings are helpful for improving ANN model performance in my research.
Summary:
1. there are 11 kind of lithologies in my target depths, so I add one input which includes number from 1 to 11, each represent 1 kind of lithology.
2. now I predict depths which just has one or two peaks, so I add another input which includes number 0 and 1, which represent 1 peak and two peaks.
3. for different combinations of 6 parameters (some of their combinations may have no meaning), there are different physical meanings. I separate them into 5 different categories based on different kinds of reservoir permeability.
The following is the performance of 25 inputs, they are better than 22 and some depths could match better than before. The median value of R increased by about 0.1.
The following is to predict 64 bins directly. The performance of it also becomes better. (I predicted 32 and 16 before and they have similar results.)
Tomorrow, I will try to find more categories which have physical meanings. I find that categories of physical meanings are helpful for improving ANN model performance in my research.
4/03/2017
some trials
Today, I tried some new methods to try to improve the performance of ANN model prediction. But none of them show good results.
Summary:
What I do today:
1. set regularization for the performance function to avoid overfitting, but the performance does not improve.
2. set 10-fold cross validation for the training function to select the best one, but 10 of them all show similar results.
3. delete outliers of every logging data, the number of depths is deleted from 3030 to 2667 in total. Alos, I delete one logging data (from 23 to 22). but the performance of model does not improve much.
4. I discussed with Gawtham today. He has some data which are categories such as if it belongs to an anticline at some depths (1 for yes and 0 for no). So he can create qualitative inputs. But for now i did not have categories, so i cannot apply this method. (Tomorrow i will check IP to see if i can add more logging data like that.)
The above is the comparison plot. it does not improve much compared with results days before.
Tomorrow, I plan to think about how to deal with the 6 parameters. I think the ANN model cannot recognize the 6 parameters' physical meaning. So it cannot perform well.
In addition, I will try to check IP to add some logging data, but maybe it will not help a lot.
Summary:
What I do today:
1. set regularization for the performance function to avoid overfitting, but the performance does not improve.
2. set 10-fold cross validation for the training function to select the best one, but 10 of them all show similar results.
3. delete outliers of every logging data, the number of depths is deleted from 3030 to 2667 in total. Alos, I delete one logging data (from 23 to 22). but the performance of model does not improve much.
4. I discussed with Gawtham today. He has some data which are categories such as if it belongs to an anticline at some depths (1 for yes and 0 for no). So he can create qualitative inputs. But for now i did not have categories, so i cannot apply this method. (Tomorrow i will check IP to see if i can add more logging data like that.)
The above is the comparison plot. it does not improve much compared with results days before.
Tomorrow, I plan to think about how to deal with the 6 parameters. I think the ANN model cannot recognize the 6 parameters' physical meaning. So it cannot perform well.
In addition, I will try to check IP to add some logging data, but maybe it will not help a lot.
Subscribe to:
Posts (Atom)