w4h.classify module

The Classify module contains functions for defining geological intervals into a preset subset of geologic interpretations.

w4h.classify.depth_define(df, top_col='TOP', thresh=550.0, parallel_processing=False, verbose=False, log=False)[source]

Function to define all intervals lower than thresh as bedrock

Parameters

dfpandas.DataFrame

Dataframe to classify

top_colstr, default = ‘TOP’

Name of column that contains the depth information, likely of the top of the well interval, by default ‘TOP’

threshfloat, default = 550.0

Depth (in units used in df[‘top_col’]) below which all intervals will be classified as bedrock, by default 550.0.

verbosebool, default = False

Whether to print results, by default False

logbool, default = True

Whether to log results to log file

Returns

dfpandas.DataFrame

Dataframe containing intervals classified as bedrock due to depth

w4h.classify.export_undefined(df, outdir)[source]

Function to export terms that still need to be defined.

Parameters

dfpandas.DataFrame

Dataframe containing at least some unclassified data

outdirstr or pathlib.Path

Directory to save file. Filename will be generated automatically based on today’s date.

Returns

stillNeededDFpandas.DataFrame

Dataframe containing only unclassified terms, and the number of times they occur

w4h.classify.fill_unclassified(df, classification_col='CLASS_FLAG')[source]

Fills unclassified rows in ‘CLASS_FLAG’ column with np.nan

Parameters

dfpandas.DataFrame

Dataframe on which to perform operation

Returns

dfpandas.DataFrame

Dataframe on which operation has been performed

w4h.classify.get_unique_wells(df, wellid_col='API_NUMBER', verbose=False, log=False)[source]

Gets unique wells as a dataframe based on a given column name.

Parameters

dfpandas.DataFrame

Dataframe containing all wells and/or well intervals of interest

wellid_colstr, default=”API_NUMBER”

Name of column in df containing a unique identifier for each well, by default ‘API_NUMBER’. .unique() will be run on this column to get the unique values.

logbool, default = False

Whether to log results to log file

Returns

wellsDF

DataFrame containing only the unique well IDs

w4h.classify.merge_lithologies(well_data_df, targinterps_df, interp_col='INTERPRETATION', target_col='TARGET', target_class='bool')[source]

Function to merge lithologies and target booleans based on classifications

Parameters

well_data_dfpandas.DataFrame

Dataframe containing classified well data

targinterps_dfpandas.DataFrame

Dataframe containing lithologies and their target interpretations, depending on what the target is for this analysis (often, coarse materials=1, fine=0)

target_colstr, default = ‘TARGET’

Name of column in targinterps_df containing the target interpretations

target_class, default = ‘bool’

Whether the input column is using boolean values as its target indicator

Returns

df_targpandas.DataFrame

Dataframe containing merged lithologies/targets

w4h.classify.remerge_data(classifieddf, searchdf, parallel_processing=False)[source]

Function to merge newly-classified (or not) and previously classified data

Parameters

classifieddfpandas.DataFrame

Dataframe that had already been classified previously

searchdfpandas.DataFrame

Dataframe with new classifications

Returns

remergeDFpandas.DataFrame

Dataframe containing all the data, merged back together

w4h.classify.sort_dataframe(df, sort_cols=['API_NUMBER', 'TOP'], remove_nans=True)[source]

Function to sort dataframe by one or more columns.

Parameters

dfpandas.DataFrame

Dataframe to be sorted

sort_colsstr or list of str, default = [‘API_NUMBER’,’TOP’]

Name(s) of columns by which to sort dataframe, by default [‘API_NUMBER’,’TOP’]

remove_nansbool, default = True

Whether or not to remove nans in the process, by default True

Returns

df_sortedpandas.DataFrame

Sorted dataframe

w4h.classify.specific_define(df, terms_df, description_col='FORMATION', terms_col='DESCRIPTION', parallel_processing=False, verbose=False, log=False)[source]

Function to classify terms that have been specifically defined in the terms_df.

Parameters

dfpandas.DataFrame

Input dataframe with unclassified well descriptions.

terms_dfpandas.DataFrame

Dataframe containing the classifications

description_colstr, default=’FORMATION’

Column name in df containing the well descriptions, by default ‘FORMATION’.

terms_colstr, default=’DESCRIPTION’

Column name in terms_df containing the classified descriptions, by default ‘DESCRIPTION’.

verbosebool, default=False

Whether to print up results, by default False.

Returns

df_Interpspandas.DataFrame

Dataframe containing the well descriptions and their matched classifications.

w4h.classify.split_defined(df, classification_col='CLASS_FLAG', verbose=False, log=False)[source]

Function to split dataframe with well descriptions into two dataframes based on whether a row has been classified.

Parameters

dfpandas.DataFrame

Dataframe containing all the well descriptions

classification_colstr, default = ‘CLASS_FLAG’

Name of column containing the classification flag, by default ‘CLASS_FLAG’

verbosebool, default = False

Whether to print results, by default False

logbool, default = False

Whether to log results to log file

Returns

Two-item tuple of pandas.Dataframe

tuple[0] is dataframe containing classified data, tuple[1] is dataframe containing unclassified data.

w4h.classify.start_define(df, terms_df, description_col='FORMATION', terms_col='DESCRIPTION', parallel_processing=False, verbose=False, log=False)[source]

Function to classify descriptions according to starting substring.

Parameters

dfpandas.DataFrame

Dataframe containing all the well descriptions

terms_dfpandas.DataFrame

Dataframe containing all the startswith substrings to use for searching

description_colstr, default = ‘FORMATION’

Name of column in df containing descriptions, by default ‘FORMATION’

terms_colstr, default = ‘FORMATION’

Name of column in terms_df containing startswith substring to match with description_col, by default ‘FORMATION’

verbosebool, default = False

Whether to print out results, by default False

logbool, default = True

Whether to log results to log file

Returns

dfpandas.DataFrame

Dataframe containing the original data and new classifications

w4h.classify.wildcard_define(df, terms_df, description_col='FORMATION', terms_col='DESCRIPTION', verbose=False, log=False)[source]

Function to classify descriptions according to any substring.

Parameters

dfpandas.DataFrame

Dataframe containing all the well descriptions

terms_dfpandas.DataFrame

Dataframe containing all the startswith substrings to use for searching

description_colstr, default = ‘FORMATION’

Name of column in df containing descriptions, by default ‘FORMATION’

terms_colstr, default = ‘FORMATION’

Name of column in terms_df containing startswith substring to match with description_col, by default ‘FORMATION’

verbosebool, default = False

Whether to print out results, by default False

logbool, default = True

Whether to log results to log file

Returns

dfpandas.DataFrame

Dataframe containing the original data and new classifications