Exploration Insights December 2019/ January 2020 | Page 12

12 | Halliburton Landmark Exploration Insights | 13 PCA Explained Variance of Shale Plays Lower 48 States Shale Plays Eagle Ford Western Gulf Shale plays Current plays Prospective plays Stacked plays Shallowest/youngest Intermediate depth/age Deepest/oldest 1.0 km “Data is the new oil” is often quoted when big data are used in ML models quantifying the value of large datasets. However, this metaphor extends further. Like hydrocarbons, raw data need to be broken down and refined in order Input Data 1) Data Cleaning and Preprocessing 3) Classify Data 4) Train and Test Algorithm (Classifier: RF, DT, SGD) 5) Output Accuracy and Confusion Matrix Analysis Modify hyperparameters to improve accuracy Low accuracy High accuracy 0.6 0 ** Mixed shale & limestone play 2 4 6 8 10 Number of Components *** Mixed shale & light dolostone- siltstone-sandstone Figure 3> Graph of principle component analysis (PCA) explained variance within the dataset used in this study, showing approximately 95% of the variance is explained through six principal components. to have commercial or scientific value (Flender, 2019) (Figure 2). Methodical cleaning of data enables the underlying relationships between production and geology to emerge. Median values were calculated and imputed for missing data where Feature Weighting Average Thickness 0.543 Pore Pressure 0.185 TVD 0.156 Resource Concentration 0.0487 Geothermal Gradient 0.0303 Reservoir Pressure 0.0204 Maximum Burial Temperature 0.0152 GOR 0.00127 Porosity 0.000807 MAXIMUM ACCURACY MODEL Figure 2> Machine Learning model data preparation and project workflow. 0.7 0.4 Basins * Mixed shale & chalk play Feature Weighting - All Input Parameters 2) Train and Test Algorithm (Regression) 0.8 0.5 Figure 1> Location of unconventional resource plays used in this project. (Source: EIA, 2018) DATA PREPARATION Elbow point at approximately 95% explained variance 0.9 Niobrara* Montana Thrust Bakken*** Heath** Belt Cody Williston Powder Basin Big Horn River Gammon Hilliard- Basin Basin Baxter Mowry Appalachian Mancos Michigan Basin Basin Antrim Greater Green Park Niobrara River Basin Basin Forest Marcellus City Basin Illinois Manning Uinta Basin Niobrara San Joaquin Marcellus Basin Canyon Piceance Basin Denver Mancos Basin Basin Excello- Utica New Hermosa Mulky Cherokee Platform Monterey- Albany Paradox Basin Pierre Temblor Lewis Fayetteville Raton Anadarko San Juan Basin Basin A Basin Monterey Chattanooga rdm o Black Warrior r Santa Maria, Ventura, Palo Duro e B Arkoma Basin asin Basin Conasauga Los Angeles basins Basin Avalon- Woodford Bend Valley & Ridge Bone Spring Province Permian Barnett 500 TX-LA-MS Wolfcamp Basin Fort Worth Salt Basin Floyd-Neal Marfa Barnett Basin Basin Tuscaloosa Woodford Eagle Haynesville- Ford Bossier Pearsall less than 50% of values for each attribute were not available was conducted to decrease bias, with subsequent normalization and scaling of the data for well comparison. Using Principal Component Analysis, the number of components along which these data could be projected and maintain 95% of the explained variance was reduced to six, shown in Figure 3. Feature extraction was also conducted to assess the correlation of all nine geological input features to initial production. As shown in Table 1, features SVM Linear Classification OVERVIEW OF ALGORITHMS AND PERFORMANCE EVALUATION Three algorithms were used to assess the accuracy of predicting success in shale plays: Support Vector Machine classifier, Decision Tree classifier, and Random Forest classifier. A model was built using Stochastic Gradient Descent (SGD) classifier with a Support Vector Machine (SVM) optimizer, which SVM Non-Linear Classification with Kernel Transformation Class 1 Class 1 Class 2 Class 2 A B SVM with Overlapping Data Points SVM with Regularisation with Hyperparameter Class 1 C Table 1> Feature weighting of all input parameters with respect to normalized initial production. such as gas-to-oil ratio and porosity can be excluded from model input to reduce noise. Class 1 Class 2 Class 2 D Figure 4> Stochastic Gradient Descent (SGD) classifier using Support Vector Machine (SVM) Linear classifier. Modified from Patel (2017).