w4h package¶

This is the wells4hydrogeology package.

It contains the functions needed to convert raw well descriptions into usable (hydro)geologic data.

w4h.add_control_points(df_without_control, df_control=None, xcol='LONGITUDE', ycol='LATITUDE', zcol='ELEV_FT', controlpoints_crs='EPSG:4269', output_crs='EPSG:5070', description_col='FORMATION', interp_col='INTERPRETATION', target_col='TARGET', verbose=False, log=False, **kwargs)[source]¶

Function to add control points, primarily to aid in interpolation. This may be useful when conditions are known but do not exist in input well database

Parameters:

df_without_controlpandas.DataFrame: Dataframe with current working data
df_controlstr, pathlib.Purepath, or pandas.DataFrame: Pandas dataframe with control points
well_keystr, optional: The column containing the “key” (unique identifier) for each well, by default ‘API_NUMBER’
xcolstr, optional: The column in df_control containing the x coordinates for each control point, by default ‘LONGITUDE’
ycolstr, optional: The column in df_control containing the y coordinates for each control point, by default ‘LATITUDE’
zcolstr, optional: The column in df_control containing the z coordinates for each control point, by default ‘ELEV_FT’
controlpoints_crsstr, optional: The column in df_control containing the crs of points, by default ‘EPSG:4269’
output_crsstr, optional: The output coordinate system, by default ‘EPSG:5070’
description_colstr, optional: The column in df_control with the description (if this is used), by default ‘FORMATION’
interp_colstr, optional: The column in df_control with the interpretation (if this is used), by default ‘INTERPRETATION’
target_colstr, optional: The column in df_control with the target code (if this is used), by default ‘TARGET’
verbosebool, optional: Whether to print information to terminal, by default False
logbool, optional: Whether to log information in log file, by default False
**kwargs: Keyword arguments of pandas.concat() or pandas.read_csv that will be passed to that function, except for objs, which are df and df_control

Returns:

pandas.DataFrame: Pandas DataFrame with original data and control points formatted the same way and concatenated together

w4h.align_rasters(grids_unaligned=None, model_grid=None, no_data_val_grid=0, verbose=False, log=False)[source]¶

Reprojects two rasters and aligns their pixels

Parameters:

grids_unalignedlist or xarray.DataArray: Contains a list of grids or one unaligned grid
model_gridxarray.DataArray: Contains model grid
no_data_val_gridint, default=0: Sets value of no data pixels
logbool, default = False: Whether to log results to log file, by default False

Returns:

alignedGridslist or xarray.DataArray: Contains aligned grids

w4h.clip_gdf2study_area(study_area, gdf, log=False, verbose=False)[source]¶

Clips dataframe to only include things within study area.

Parameters:

study_areageopandas.GeoDataFrame: Inputs study area polygon
gdfgeopandas.GeoDataFrame: Inputs point data
logbool, default = False: Whether to log results to log file, by default False

Returns:

gdfClipgeopandas.GeoDataFrame: Contains only points within the study area

w4h.combine_dataset(layer_dataset, surface_elev, bedrock_elev, layer_thick, log=False)[source]¶

Function to combine xarray datasets or datarrays into a single xr.Dataset. Useful to add surface, bedrock, layer thick, and layer datasets all into one variable, for pickling, for example.

Parameters:

layer_datasetxr.DataArray: DataArray contining all the interpolated layer information.
surface_elevxr.DataArray: DataArray containing surface elevation data
bedrock_elevxr.DataArray: DataArray containing bedrock elevation data
layer_thickxr.DataArray: DataArray containing the layer thickness at each point in the model grid
logbool, default = False: Whether to log inputs and outputs to log file.

Returns:

xr.Dataset: Dataset with all input arrays set to different variables within the dataset.

w4h.coords2geometry(df_no_geometry, xcol='LONGITUDE', ycol='LATITUDE', zcol='ELEV_FT', input_coords_crs='EPSG:4269', output_crs='EPSG:5070', use_z=False, wkt_col='WKT', geometry_source='coords', verbose=False, log=False)[source]¶

Adds geometry to points with xy coordinates in the specified coordinate reference system.

Parameters:

df_no_geometrypandas.Dataframe: a Pandas dataframe containing points
xcolstr, default=’LONGITUDE’: Name of column holding x coordinate data in df_no_geometry
ycolstr, default=’LATITUDE’: Name of column holding y coordinate data in df_no_geometry
zcolstr, default=’ELEV_FT’: Name of column holding z coordinate data in df_no_geometry
input_coords_crsstr, default=’EPSG:4269’: Name of crs used for geometry
use_zbool, default=False: Whether to use z column in calculation
geometry_sourcestr {‘coords’, ‘wkt’, ‘geometry’}
logbool, default = False: Whether to log results to log file, by default False

Returns:

gdfgeopandas.GeoDataFrame: Geopandas dataframe with points and their geometry values

w4h.define_dtypes(undefined_df, datatypes=None, verbose=False, log=False)[source]¶

Function to define datatypes of a dataframe, especially with file-indicated dyptes

Parameters:

undefined_dfpd.DataFrame: Pandas dataframe with columns whose datatypes need to be (re)defined
datatypesdict, str, pathlib.PurePath() object, or None, default = None: Dictionary containing datatypes, to be used in pandas.DataFrame.astype() function. If None, will read from file indicated by dtype_file (which must be defined, along with dtype_dir), by default None
logbool, default = False: Whether to log inputs and outputs to log file.

Returns:

dfoutpandas.DataFrame: Pandas dataframe containing redefined columns

w4h.depth_define(df, top_col='TOP', thresh=550.0, verbose=False, log=False)[source]¶

Function to define all intervals lower than thresh as bedrock

Parameters:

dfpandas.DataFrame: Dataframe to classify
top_colstr, default = ‘TOP’: Name of column that contains the depth information, likely of the top of the well interval, by default ‘TOP’
threshfloat, default = 550.0: Depth (in units used in df[‘top_col’]) below which all intervals will be classified as bedrock, by default 550.0.
verbosebool, default = False: Whether to print results, by default False
logbool, default = True: Whether to log results to log file

Returns:

dfpandas.DataFrame: Dataframe containing intervals classified as bedrock due to depth

w4h.export_dataframe(df, out_dir, filename, date_stamp=True, log=False)[source]¶

Function to export dataframes

Parameters:

dfpandas dataframe, or list of pandas dataframes: Data frame or list of dataframes to be exported
out_dirstring or pathlib.Path object: Directory to which to export dataframe object(s) as .csv
filenamestr or list of strings: Filename(s) of output files
date_stampbool, default=True: Whether to include a datestamp in the filename. If true, file ends with _yyyy-mm-dd.csv of current date, by default True.
logbool, default = True: Whether to log inputs and outputs to log file.

w4h.export_grids(grid_data, out_path, file_id='', filetype='tif', variable_sep=True, date_stamp=True, verbose=False, log=False)[source]¶

Function to export grids to files.

Parameters:

grid_dataxarray DataArray or xarray Dataset: Dataset or dataarray to be exported
out_pathstr or pathlib.Path object: Output location for data export. If variable_sep=True, this should be a directory. Otherwise, this should also include the filename. The file extension should not be included here.
file_idstr, optional: If specified, will add this after ‘LayerXX’ or ‘AllLayers’ in the filename, just before datestamp, if used. Example filename for file_id=’Coarse’: Layer1_Coarse_2023-04-18.tif.
filetypestr, optional: Output filetype. Can either be pickle or any file extension supported by rioxarray.rio.to_raster(). Can either include period or not., by default ‘tif’
variable_sepbool, optional: If grid_data is an xarray Dataset, this will export each variable in the dataset as a separate file, including the variable name in the filename, by default False
date_stampbool, optional: Whether to include a date stamp in the file name., by default True
logbool, default = True: Whether to log inputs and outputs to log file.

w4h.export_undefined(df, outdir)[source]¶

Function to export terms that still need to be defined.

Parameters:

dfpandas.DataFrame: Dataframe containing at least some unclassified data
outdirstr or pathlib.Path: Directory to save file. Filename will be generated automatically based on today’s date.

Returns:

stillNeededDFpandas.DataFrame: Dataframe containing only unclassified terms, and the number of times they occur

w4h.file_setup(well_data, metadata=None, data_filename='*ISGS_DOWNHOLE_DATA*.txt', metadata_filename='*ISGS_HEADER*.txt', log_dir=None, verbose=False, log=False)[source]¶

Function to setup files, assuming data, metadata, and elevation/location are in separate files (there should be one “key”/identifying column consistent across all files to join/merge them later)

This function may not be useful if files are organized differently than this structure. If that is the case, it is recommended to use the get_most_recent() function for each individual file if needed. It may also be of use to simply skip this function altogether and directly define each filepath in a manner that can be used by pandas.read_csv()

Parameters:

well_datastr or pathlib.Path object: Str or pathlib.Path to directory containing input files, by default str(repoDir)+’/resources’
metadatastr or pathlib.Path object, optional: Str or pathlib.Path to directory containing input metadata files, by default str(repoDir)+’/resources’
data_filenamestr, optional: Pattern used by pathlib.glob() to get the most recent data file, by default ‘ISGS_DOWNHOLE_DATA.txt’
metadata_filenamestr, optional: Pattern used by pathlib.glob() to get the most recent metadata file, by default ‘ISGS_HEADER.txt’
log_dirstr or pathlib.PurePath() or None, default=None: Directory to place log file in. This is not read directly, but is used indirectly by w4h.logger_function()
verbosebool, default = False: Whether to print name of files to terminal, by default True
logbool, default = True: Whether to log inputs and outputs to log file.

Returns:

tuple: Tuple with paths to (well_data, metadata)

w4h.fill_unclassified(df, classification_col='CLASS_FLAG')[source]¶

Fills unclassified rows in ‘CLASS_FLAG’ column with np.nan

Parameters:

dfpandas.DataFrame: Dataframe on which to perform operation

Returns:

dfpandas.DataFrame: Dataframe on which operation has been performed

w4h.get_current_date()[source]¶

Gets the current date to help with finding the most recent file¶

Parameters:: None

dateSuffix : str to use for naming output files

w4h.get_drift_thick(surface_elev=None, bedrock_elev=None, layers=9, plot=False, verbose=False, log=False)[source]¶

Finds the distance from surface_elev to bedrock_elev and then divides by number of layers to get layer thickness.

Parameters:

surface_elevrioxarray.DataArray: array holding surface elevation
bedrock_elevrioxarray.DataArray: array holding bedrock elevation
layersint, default=9: number of layers needed to calculate thickness for
plotbool, default=False: tells function to either plot the data or not

Returns:

driftThickrioxarray.DataArray: Contains data array containing depth to bedrock at each point
layerThickrioxarray.DataArray: Contains data array with layer thickness at each point

w4h.get_layer_depths(df_with_depths, surface_elev_col='SURFACE_ELEV', layer_thick_col='LAYER_THICK', layers=9, log=False)[source]¶

Function to calculate depths and elevations of each model layer at each well based on surface elevation, bedrock elevation, and number of layers/layer thickness

Parameters:

df_with_depthspandas.DataFrame: Dataframe containing well metdata
layersint, default=9: Number of layers. This should correlate with get_drift_thick() input parameter, if drift thickness was calculated using that function, by default 9.
logbool, default = False: Whether to log inputs and outputs to log file.

Returns:

pandas.Dataframe: Dataframe containing new columns for depth to layers and elevation of layers.

w4h.get_most_recent(dir=PosixPath('/home/docs/checkouts/readthedocs.org/user_builds/wells4hydrogeology/envs/latest/lib/python3.12/site-packages/w4h/resources'), glob_pattern='*', verbose=False)[source]¶

Function to find the most recent file with the indicated pattern, using pathlib.glob function.

Parameters:

dirstr or pathlib.Path object, optional: Directory in which to find the most recent file, by default str(repoDir)+’/resources’
glob_patternstr, optional: String used by the pathlib.glob() function/method for searching, by default ‘*’

Returns:

pathlib.Path object: Pathlib Path object of the most recent file fitting the glob pattern indicated in the glob_pattern parameter.

w4h.get_resources(resource_type='filepaths', scope='local', verbose=False)[source]¶

Function to get filepaths for resources included with package

Parameters:

resource_typestr, {‘filepaths’, ‘data’}: If filepaths, will return dictionary with filepaths to sample data. If data, returns dictionary with data objects.
scopestr, {‘local’, ‘statewide’}: If ‘local’, will read in sample data for a local (around county sized) project. If ‘state’, will read in sample data for a statewide project (Illinois)
verbosebool, optional: Whether to print results to terminal, by default False

Returns:

resources_dictdict: Dictionary containing key, value pairs with filepaths to resources that may be of interest.

w4h.get_search_terms(spec_path='/home/docs/checkouts/readthedocs.org/user_builds/wells4hydrogeology/checkouts/latest/docs/resources/', spec_glob_pattern='*SearchTerms-Specific*', start_path=None, start_glob_pattern='*SearchTerms-Start*', wildcard_path=None, wildcard_glob_pattern='*SearchTerms-Wildcard', verbose=False, log=False)[source]¶

Read in dictionary files for downhole data

Parameters:

spec_pathstr or pathlib.Path, optional: Directory where the file containing the specific search terms is located, by default str(repoDir)+’/resources/’
spec_glob_patternstr, optional: Search string used by pathlib.glob() to find the most recent file of interest, uses get_most_recent() function, by default ‘SearchTerms-Specific’
start_pathstr or None, optional: Directory where the file containing the start search terms is located, by default None
start_glob_patternstr, optional: Search string used by pathlib.glob() to find the most recent file of interest, uses get_most_recent() function, by default ‘SearchTerms-Start’
wildcard_pathstr or pathlib.Path, default = None: Directory where the file containing the wildcard search terms is located, by default None
wildcard_glob_patternstr, default = ‘*SearchTerms-Wildcard’: Search string used by pathlib.glob() to find the most recent file of interest, uses get_most_recent() function, by default ‘SearchTerms-Wildcard’
logbool, default = True: Whether to log inputs and outputs to log file.

Returns:

(specTermsPath, startTermsPath, wilcardTermsPath)tuple: Tuple containing the pandas dataframes with specific search terms, with start search terms, and with wildcard search terms

w4h.get_unique_wells(df, wellid_col='API_NUMBER', verbose=False, log=False)[source]¶

Gets unique wells as a dataframe based on a given column name.

Parameters:

dfpandas.DataFrame: Dataframe containing all wells and/or well intervals of interest
wellid_colstr, default=”API_NUMBER”: Name of column in df containing a unique identifier for each well, by default ‘API_NUMBER’. .unique() will be run on this column to get the unique values.
logbool, default = False: Whether to log results to log file

Returns:

wellsDF: DataFrame containing only the unique well IDs

w4h.grid2study_area(study_area, grid, output_crs='EPSG:5070', verbose=False, log=False)[source]¶

Clips grid to study area.

Parameters:

study_areageopandas.GeoDataFrame: inputs study area polygon
gridxarray.DataArray: inputs grid array
output_crsstr, default=’EPSG:5070’: inputs the coordinate reference system for the study area
logbool, default = False: Whether to log results to log file, by default False

Returns:

gridxarray.DataArray: returns xarray containing grid clipped only to area within study area

w4h.layer_interp(points, grid, layers=None, interp_kind='nearest', return_type='dataarray', export_dir=None, target_col='TARG_THICK_PER', layer_col='LAYER', xcol=None, ycol=None, xcoord='x', ycoord='y', log=False, verbose=False, **kwargs)[source]¶

Function to interpolate results, going from points to grid data. Uses scipy.interpolate module.

Parameters:

pointslist: List containing pandas dataframes or geopandas geoadataframes containing the point data. Should be resDF_list output from layer_target_thick().
gridxr.DataArray or xr.Dataset: Xarray DataArray or DataSet with the coordinates/spatial reference of the output grid to interpolate to
layersint, default=None: Number of layers for interpolation. If None, uses the length ofthe points list to determine number of layers. By default None.
interp_kindstr, {‘nearest’, ‘interp2d’,’linear’, ‘cloughtocher’, ‘radial basis function’}: Type of interpolation to use. See scipy.interpolate N-D scattered. Values can be any of the following (also shown in “kind” column of N-D scattered section of table here: https://docs.scipy.org/doc/scipy/tutorial/interpolate.html). By default ‘nearest’
return_typestr, {‘dataarray’, ‘dataset’}: Type of xarray object to return, either xr.DataArray or xr.Dataset, by default ‘dataarray.’
export_dirstr or pathlib.Path, default=None: Export directory for interpolated grids, using w4h.export_grids(). If None, does not export, by default None.
target_colstr, default = ‘TARG_THICK_PER’: Name of column in points containing data to be interpolated, by default ‘TARG_THICK_PER’.
layer_colstr, default = ‘Layer’: Name of column containing layer number. Not currently used, by default ‘LAYER’
xcolstr, default = ‘None’: Name of column containing x coordinates. If None, will look for ‘geometry’ column, as in a geopandas.GeoDataframe. By default None
ycolstr, default = ‘None’: Name of column containing y coordinates. If None, will look for ‘geometry’ column, as in a geopandas.GeoDataframe. By default None
xcoordstr, default=’x’: Name of x coordinate in grid, used to extract x values of grid, by default ‘x’
ycoordstr, default=’y’: Name of y coordinate in grid, used to extract x values of grid, by default ‘y’
logbool, default = True: Whether to log inputs and outputs to log file.
**kwargs: Keyword arguments to be read directly into whichever scipy.interpolate function is designated by the interp_kind parameter.

Returns:

interp_dataxr.DataArray or xr.Dataset, depending on return_type: By default, returns an xr.DataArray object with the layers added as a new dimension called Layer. Can also specify return_type=’dataset’ to return an xr.Dataset with each layer as a separate variable.

w4h.layer_target_thick(df, layers=9, return_all=False, export_dir=None, outfile_prefix=None, depth_top_col='TOP', depth_bot_col='BOTTOM', log=False)[source]¶

Function to calculate thickness of target material in each layer at each well point

Parameters:

dfgeopandas.geodataframe: Geodataframe containing classified data, surface elevation, bedrock elevation, layer depths, geometry.
layersint, default=9: Number of layers in model, by default 9
return_allbool, default=False: If True, return list of original geodataframes with extra column added for target thick for each layer. If False, return list of geopandas.geodataframes with only essential information for each layer.
export_dirstr or pathlib.Path, default=None: If str or pathlib.Path, should be directory to which to export dataframes built in function.
outfile_prefixstr, default=None: Only used if export_dir is set. Will be used at the start of the exported filenames
depth_top_colstr, default=’TOP’: Name of column containing data for depth to top of described well intervals
depth_bot_colstr, default=’BOTTOM’: Name of column containing data for depth to bottom of described well intervals
logbool, default = True: Whether to log inputs and outputs to log file.

Returns:

res_df or resgeopandas.geodataframe: Geopandas geodataframe containing only important information needed for next stage of analysis.

w4h.logger_function(logtocommence, parameters, func_name)[source]¶

Function to log other functions, to be called from within other functions

Parameters:

logtocommencebool: Whether to perform logging steps
parametersdict: Dictionary containing parameters and their values, from function
func_namestr: Name of function within which this is called

w4h.merge_lithologies(well_data_df, targinterps_df, interp_col='INTERPRETATION', target_col='TARGET', target_class='bool')[source]¶

Function to merge lithologies and target booleans based on classifications

Parameters:

well_data_dfpandas.DataFrame: Dataframe containing classified well data
targinterps_dfpandas.DataFrame: Dataframe containing lithologies and their target interpretations, depending on what the target is for this analysis (often, coarse materials=1, fine=0)
target_colstr, default = ‘TARGET’: Name of column in targinterps_df containing the target interpretations
target_class, default = ‘bool’: Whether the input column is using boolean values as its target indicator

Returns:

df_targpandas.DataFrame: Dataframe containing merged lithologies/targets

w4h.merge_metadata(data_df, header_df, data_cols=None, header_cols=None, auto_pick_cols=False, drop_duplicate_cols=True, log=False, verbose=False, **kwargs)[source]¶

Function to merge tables, intended for merging metadata table with data table

Parameters:

data_dfpandas.DataFrame: “Left” dataframe, intended for this purpose to be dataframe with main data, but can be anything
header_dfpandas.DataFrame: “Right” dataframe, intended for this purpose to be dataframe with metadata, but can be anything
data_colslist, optional: List of strings of column names, for columns to be included after join from “left” table (data table). If None, all columns are kept, by default None
header_colslist, optional: List of strings of columns names, for columns to be included in merged table after merge from “right” table (metadata). If None, all columns are kept, by default None
auto_pick_colsbool, default = False: Whether to autopick the columns from the metadata table. If True, the following column names are kept:[‘API_NUMBER’, ‘LATITUDE’, ‘LONGITUDE’, ‘BEDROCK_ELEV’, ‘SURFACE_ELEV’, ‘BEDROCK_DEPTH’, ‘LAYER_THICK’], by default False
drop_duplicate_colsbool, optional: If True, drops duplicate columns from the tables so that columns do not get renamed upon merge, by default True
logbool, default = False: Whether to log inputs and outputs to log file.
**kwargs: kwargs that are passed directly to pd.merge(). By default, the ‘on’ and ‘how’ parameters are defined as on=’API_NUMBER’ and how=’inner’

Returns:

mergedTablepandas.DataFrame: Merged dataframe

w4h.read_dict(file, keytype='np')[source]¶

Function to read a text file with a dictionary in it into a python dictionary

Parameters:

filestr or pathlib.Path object: Filepath to the file of interest containing the dictionary text
keytypestr, optional: String indicating the datatypes used in the text, currently only ‘np’ is implemented, by default ‘np’

Returns:

dict: Dictionary translated from text file.

w4h.read_dictionary_terms(dict_file=None, id_col='ID', search_col='DESCRIPTION', definition_col='LITHOLOGY', class_flag_col='CLASS_FLAG', dictionary_type=None, class_flag=6, rem_extra_cols=True, verbose=False, log=False)[source]¶

Function to read dictionary terms from file into pandas dataframe

Parameters:

dict_filestr or pathlib.Path object, or list of these

File or list of files to be read

search_colstr, default = ‘DESCRIPTION’

Name of column containing search terms (geologic formations)

definition_colstr, default = ‘LITHOLOGY’

Name of column containing interpretations of search terms (lithologies)

dictionary_typestr or None, {None, ‘exact’, ‘start’, ‘wildcard’,}

Indicator of which kind of dictionary terms to be read in: None, ‘exact’, ‘start’, or ‘wildcard’ by default None.

If None, uses name of file to try to determine. If it cannot, it will default to using the classification flag from class_flag
If ‘exact’, will be used to search for exact matches to geologic descriptions
If ‘start’, will be used as with the .startswith() string method to find inexact matches to geologic descriptions
If ‘wildcard’, will be used to find any matching substring for inexact geologic matches

class_flagint, default = 1

Classification flag to be used if dictionary_type is None and cannot be otherwise determined, by default 1

rem_extra_colsbool, default = True

Whether to remove the extra columns from the input file after it is read in as a pandas dataframe, by default True

logbool, default = False

Whether to log inputs and outputs to log file.

Returns:

dict_termspandas.DataFrame: Pandas dataframe with formatting ready to be used in the classification steps of this package

w4h.read_grid(grid_path=None, grid_type='model', no_data_val_grid=0, use_service=False, study_area=None, grid_crs=None, output_crs='EPSG:5070', verbose=False, log=False, **kwargs)[source]¶

Reads in grid

Parameters:

grid_pathstr or pathlib.Path, default=None: Path to a grid file
grid_typestr, default=’model’: Sets what type of grid to load in
no_data_val_gridint, default=0: Sets the no data value of the grid
use_servicestr, default=False: Sets which service the function uses
study_areageopandas.GeoDataFrame, default=None: Dataframe containing study area polygon
grid_crsstr, default=None: Sets crs to use if clipping to study area
logbool, default = False: Whether to log results to log file, by default False

Returns:

gridINxarray.DataArray: Returns grid

w4h.read_lithologies(lith_file=None, interp_col='LITHOLOGY', target_col='CODE', use_cols=None, verbose=False, log=False)[source]¶

Function to read lithology file into pandas dataframe

Parameters:

lith_filestr or pathlib.Path object, default = None: Filename of lithology file. If None, default is contained within repository, by default None
interp_colstr, default = ‘LITHOLOGY’: Column to used to match interpretations
target_colstr, default = ‘CODE’: Column to be used as target code
use_colslist, default = None: Which columns to use when reading in dataframe. If None, defaults to [‘LITHOLOGY’, ‘CODE’].
logbool, default = True: Whether to log inputs and outputs to log file.

Returns:

pandas.DataFrame: Pandas dataframe with lithology information

w4h.read_model_grid(model_grid_path, study_area=None, no_data_val_grid=0, read_grid=True, node_byspace=True, grid_crs=None, output_crs='EPSG:5070', verbose=False, log=False)[source]¶

Reads in model grid to xarray data array

Parameters:

grid_pathstr: Path to model grid file
study_areageopandas.GeoDataFrame, default=None: Dataframe containing study area polygon
no_data_val_gridint, default=0: value assigned to areas with no data
readGridbool, default=True: Whether function to either read grid or create grid
node_byspacebool, default=False: Denotes how to create grid
output_crsstr, default=’EPSG:5070’: Inputs study area crs
grid_crsstr, default=None: Inputs grid crs
logbool, default = False: Whether to log results to log file, by default False

Returns:

modelGridxarray.DataArray: Data array containing model grid

w4h.read_raw_csv(data_filepath, metadata_filepath, data_cols=None, metadata_cols=None, xcol='LONGITUDE', ycol='LATITUDE', well_key='API_NUMBER', encoding='latin-1', verbose=False, log=False, **read_csv_kwargs)[source]¶

Easy function to read raw .txt files output from (for example), an Access database

Parameters:

data_filepathstr: Filename of the file containing data, including the extension.
metadata_filepathstr: Filename of the file containing metadata, including the extension.
data_colslist, default = None: List with strings with names of columns from txt file to keep after reading. If None, [“API_NUMBER”,”TABLE_NAME”,”FORMATION”,”THICKNESS”,”TOP”,”BOTTOM”], by default None.
metadata_colslist, default = None: List with strings with names of columns from txt file to keep after reading. If None, [‘API_NUMBER’,”TOTAL_DEPTH”,”SECTION”,”TWP”,”TDIR”,”RNG”,”RDIR”,”MERIDIAN”,”QUARTERS”,”ELEVATION”,”ELEVREF”,”COUNTY_CODE”,”LATITUDE”,”LONGITUDE”,”ELEVSOURCE”], by default None
x_colstr, default = ‘LONGITUDE’: Name of column in metadata file indicating the x-location of the well, by default ‘LONGITUDE’
ycolstr, default = ‘LATITUDE’: Name of the column in metadata file indicating the y-location of the well, by default ‘LATITUDE’
well_keystr, default = ‘API_NUMBER’: Name of the column with the key/identifier that will be used to merge data later, by default ‘API_NUMBER’
encodingstr, default = ‘latin-1’: Encoding of the data in the input files, by default ‘latin-1’
verbosebool, default = False: Whether to print the number of rows in the input columns, by default False
logbool, default = False: Whether to log inputs and outputs to log file.
**read_csv_kwargs: **kwargs that get passed to pd.read_csv()

Returns:

(pandas.DataFrame, pandas.DataFrame/None): Tuple/list with two pandas dataframes: (well_data, metadata) metadata is None if only well_data is used

w4h.read_study_area(study_area=None, output_crs='EPSG:5070', buffer=None, return_original=False, log=False, verbose=False, **read_file_kwargs)[source]¶

Read study area geospatial file into geopandas

Parameters:

study_areastr, pathlib.Path, geopandas.GeoDataFrame, or shapely.Geometry: Filepath to any geospatial file readable by geopandas. Polygon is best, but may work with other types if extent is correct.
study_area_crsstr, tuple, dict, optional: CRS designation readable by geopandas/pyproj
bufferNone or numeric, default=None: If None, no buffer created. If a numeric value is given (float or int, for example), a buffer will be created at that distance in the unit of the study_area_crs.
return_originalbool, default=False: Whether to return the (reprojected) study area as well as the (reprojected) buffered study area. Study area is only used for clipping data, so usually return_original=False is sufficient.
logbool, default = False: Whether to log results to log file, by default False
verbosebool, default=False: Whether to print status and results to terminal

Returns:

studyAreaINgeopandas dataframe: Geopandas dataframe with polygon geometry.

w4h.read_wcs(study_area, wcs_url='https://data.isgs.illinois.edu/arcgis/services/Elevation/IL_Statewide_Lidar_DEM_WGS/ImageServer/WCSServer?request=GetCapabilities&service=WCS', res_x=30, res_y=30, verbose=False, log=False, **kwargs)[source]¶

Reads a WebCoverageService from a url and returns a rioxarray dataset containing it.

Parameters:

study_areageopandas.GeoDataFrame: Dataframe containing study area polygon
wcs_urlstr, default=lidarURL
Represents the url for the WCS
res_xint, default=30: Sets resolution for x axis
res_yint, default=30: Sets resolution for y axis
logbool, default = False: Whether to log results to log file, by default False
**kwargs

Returns:

wcsData_rxrxarray.DataArray: A xarray dataarray holding the image from the WebCoverageService

w4h.read_wms(study_area, layer_name='IL_Statewide_Lidar_DEM_WGS:None', wms_url='https://data.isgs.illinois.edu/arcgis/services/Elevation/IL_Statewide_Lidar_DEM_WGS/ImageServer/WCSServer?request=GetCapabilities&service=WCS', srs='EPSG:3857', clip_to_studyarea=True, bbox=[-9889002.6155, 5134541.069716, -9737541.607038, 5239029.6274], res_x=30, res_y=30, size_x=512, size_y=512, format='image/tiff', verbose=False, log=False, **kwargs)[source]¶

Reads a WebMapService from a url and returns a rioxarray dataset containing it.

Parameters:

study_areageopandas.GeoDataFrame: Dataframe containg study area polygon
layer_namestr, default=’IL_Statewide_Lidar_DEM_WGS:None’: Represents the layer name in the WMS
wms_urlstr, default=lidarURL: Represents the url for the WMS
srsstr, default=’EPSG:3857’: Sets the srs
clip_to_studyareabool, default=True: Whether to clip to study area or not
res_xint, default=30: Sets resolution for x axis
res_yint, default=512: Sets resolution for y axis
size_xint, default=512: Sets width of result
size_yint, default=512: Sets height of result
logbool, default = False: Whether to log results to log file, by default False

Returns:

wmsData_rxrxarray.DataArray: Holds the image from the WebMapService

w4h.read_xyz(xyzpath, datatypes=None, verbose=False, log=False)[source]¶

Function to read file containing xyz data (elevation/location)

Parameters:

xyzpathstr or pathlib.Path: Filepath of the xyz file, including extension
datatypesdict, default = None: Dictionary containing the datatypes for the columns int he xyz file. If None, {‘ID’:np.uint32,’API_NUMBER’:np.uint64,’LATITUDE’:np.float64,’LONGITUDE’:np.float64,’ELEV_FT’:np.float64}, by default None
verbosebool, default = False: Whether to print the number of xyz records to the terminal, by default False
logbool, default = False: Whether to log inputs and outputs to log file.

Returns:

pandas.DataFrame: Pandas dataframe containing the elevation and location data

w4h.remerge_data(classifieddf, searchdf)[source]¶

Function to merge newly-classified (or not) and previously classified data

Parameters:

classifieddfpandas.DataFrame: Dataframe that had already been classified previously
searchdfpandas.DataFrame: Dataframe with new classifications

Returns:

remergeDFpandas.DataFrame: Dataframe containing all the data, merged back together

w4h.remove_bad_depth(df_with_depth, top_col='TOP', bottom_col='BOTTOM', depth_type='depth', verbose=False, log=False)[source]¶

Function to remove all records in the dataframe with well interpretations where the depth information is bad (i.e., where the bottom of the record is neerer to the surface than the top)

Parameters:

df_with_depthpandas.DataFrame: Pandas dataframe containing the well records and descriptions for each interval
top_colstr, default=’TOP’: The name of the column containing the depth or elevation for the top of the interval, by default ‘TOP’
bottom_colstr, default=’BOTTOM’: The name of the column containing the depth or elevation for the bottom of each interval, by default ‘BOTTOM’
depth_typestr, {‘depth’, ‘elevation’}: Whether the table is organized by depth or elevation. If depth, the top column will have smaller values than the bottom column. If elevation, the top column will have higher values than the bottom column, by default ‘depth’
verbosebool, default = False: Whether to print results to the terminal, by default False
logbool, default = False: Whether to log results to log file, by default False

Returns:

pandas.Dataframe: Pandas dataframe with the records remvoed where the top is indicatd to be below the bottom.

w4h.remove_no_depth(df_with_depth, top_col='TOP', bottom_col='BOTTOM', no_data_val_table='', verbose=False, log=False)[source]¶

Function to remove well intervals with no depth information

Parameters:

df_with_depthpandas.DataFrame: Dataframe containing well descriptions
top_colstr, optional: Name of column containing information on the top of the well intervals, by default ‘TOP’
bottom_colstr, optional: Name of column containing information on the bottom of the well intervals, by default ‘BOTTOM’
no_data_val_tableany, optional: No data value in the input data, used by this function to indicate that depth data is not there, to be replaced by np.nan, by default ‘’
verbosebool, optional: Whether to print results to console, by default False
logbool, default = False: Whether to log results to log file, by default False

Returns:

df_with_depthpandas.DataFrame: Dataframe with depths dropped

w4h.remove_no_description(df_with_descriptions, description_col='FORMATION', no_data_val_table='', verbose=False, log=False)[source]¶

Function that removes all records in the dataframe containing the well descriptions where no description is given.

Parameters:

df_with_descriptionspandas.DataFrame: Pandas dataframe containing the well records with their individual descriptions
description_colstr, optional: Name of the column containing the geologic description of each interval, by default ‘FORMATION’
no_data_val_tablestr, optional: The value expected if the column is empty or there is no data. These will be replaced by np.nan before being removed, by default ‘’
verbosebool, optional: Whether to print the results of this step to the terminal, by default False
logbool, default = False: Whether to log results to log file, by default False

Returns:

pandas.DataFrame: Pandas dataframe with records with no description removed.

w4h.remove_no_topo(df_with_topo, zcol='ELEVATION', no_data_val_table='', verbose=False, log=False)[source]¶

Function to remove wells that do not have topography data (needed for layer selection later).

This function is intended to be run on the metadata table after elevations have attempted to been added.

Parameters:

df_with_topopandas.DataFrame: Pandas dataframe containing elevation information.
zcolstr: Name of elevation column
no_data_val_tableany: Value in dataset that indicates no data is present (replaced with np.nan)
verbosebool, optional: Whether to print outputs, by default True
logbool, default = False: Whether to log results to log file, by default False

Returns:

pandas.DataFrame: Pandas dataframe with intervals with no topography removed.

w4h.remove_nonlocated(df_with_locations, xcol='LONGITUDE', ycol='LATITUDE', no_data_val_table='', verbose=False, log=False)[source]¶

Function to remove wells and well intervals where there is no location information

Parameters:

df_with_locationspandas.DataFrame: Pandas dataframe containing well descriptions
metadata_DFpandas.DataFrame: Pandas dataframe containing metadata, including well locations (e.g., Latitude/Longitude)
logbool, default = False: Whether to log results to log file, by default False

Returns:

df_with_locationspandas.DataFrame: Pandas dataframe containing only data with location information

w4h.run(well_data, surf_elev_grid, bedrock_elev_grid, model_grid=None, metadata=None, layers=9, well_data_cols=None, well_metadata_cols=None, description_col='FORMATION', top_col='TOP', bottom_col='BOTTOM', depth_type='depth', study_area=None, xcol='LONGITUDE', ycol='LATITUDE', zcol='ELEVATION', well_id_col='API_NUMBER', lith_dict=None, lith_dict_start=None, lith_dict_wildcard=None, target_dict=None, target_name='', export_dir=None, verbose=False, log=False, **kw_params)[source]¶

w4h.run() is a function that runs the intended workflow of the wells4hydrogeology (w4h) package. This means that it runs several constituent functions. The workflow that this follows is provided in the package wiki. It accepts the parameters of the constituent functions. To see a list of these functions and parameters, use help(w4h.run).

The following functions used in w4h.run() are listed below, along with their parameters and default values for those parameters. See the documentation for the each of the individual functions for more information on a specific parameter:

file_setup

well_data | default = ‘<no default>’

metadata | default = None

data_filename | default = ‘ISGS_DOWNHOLE_DATA.txt’

metadata_filename | default = ‘ISGS_HEADER.txt’

log_dir | default = None

verbose | default = False

log | default = False

read_raw_csv

data_filepath | default = ‘<output of previous function>’

metadata_filepath | default = ‘<output of previous function>’

data_cols | default = None

metadata_cols | default = None

xcol | default = ‘LONGITUDE’

ycol | default = ‘LATITUDE’

well_key | default = ‘API_NUMBER’

encoding | default = ‘latin-1’

verbose | default = False

log | default = False

read_csv_kwargs | default = {}

define_dtypes

undefined_df | default = ‘<output of previous function>’

datatypes | default = None

verbose | default = False

log | default = False

merge_metadata

data_df | default = ‘<output of previous function>’

header_df | default = ‘<output of previous function>’

data_cols | default = None

header_cols | default = None

auto_pick_cols | default = False

drop_duplicate_cols | default = True

log | default = False

verbose | default = False

kwargs | default = {}

coords2geometry

df_no_geometry | default = ‘<output of previous function>’

xcol | default = ‘LONGITUDE’

ycol | default = ‘LATITUDE’

zcol | default = ‘ELEV_FT’

input_coords_crs | default = ‘EPSG:4269’

output_crs | default = ‘EPSG:5070’

use_z | default = False

wkt_col | default = ‘WKT’

geometry_source | default = ‘coords’

verbose | default = False

log | default = False

read_study_area

study_area | default = None

output_crs | default = ‘EPSG:5070’

buffer | default = None

return_original | default = False

log | default = False

verbose | default = False

read_file_kwargs | default = {}

clip_gdf2study_area

study_area | default = ‘<output of previous function>’

gdf | default = ‘<output of previous function>’

log | default = False

verbose | default = False

read_grid

grid_path | default = None

grid_type | default = ‘model’

no_data_val_grid | default = 0

use_service | default = False

study_area | default = None

grid_crs | default = None

output_crs | default = ‘EPSG:5070’

verbose | default = False

log | default = False

kwargs | default = {}

add_control_points

df_without_control | default = ‘<output of previous function>’

df_control | default = None

xcol | default = ‘LONGITUDE’

ycol | default = ‘LATITUDE’

zcol | default = ‘ELEV_FT’

controlpoints_crs | default = ‘EPSG:4269’

output_crs | default = ‘EPSG:5070’

description_col | default = ‘FORMATION’

interp_col | default = ‘INTERPRETATION’

target_col | default = ‘TARGET’

verbose | default = False

log | default = False

kwargs | default = {}

remove_nonlocated

df_with_locations | default = ‘<output of previous function>’

xcol | default = ‘LONGITUDE’

ycol | default = ‘LATITUDE’

no_data_val_table | default = ‘’

verbose | default = False

log | default = False

remove_no_topo

df_with_topo | default = ‘<output of previous function>’

zcol | default = ‘ELEVATION’

no_data_val_table | default = ‘’

verbose | default = False

log | default = False

remove_no_depth

df_with_depth | default = ‘<output of previous function>’

top_col | default = ‘TOP’

bottom_col | default = ‘BOTTOM’

no_data_val_table | default = ‘’

verbose | default = False

log | default = False

remove_bad_depth

df_with_depth | default = ‘<output of previous function>’

top_col | default = ‘TOP’

bottom_col | default = ‘BOTTOM’

depth_type | default = ‘depth’

verbose | default = False

log | default = False

remove_no_description

df_with_descriptions | default = ‘<output of previous function>’

description_col | default = ‘FORMATION’

no_data_val_table | default = ‘’

verbose | default = False

log | default = False

get_search_terms

spec_path | default = ‘/home/docs/checkouts/readthedocs.org/user_builds/wells4hydrogeology/checkouts/latest/docs/resources/’

spec_glob_pattern | default = ‘SearchTerms-Specific’

start_path | default = None

start_glob_pattern | default = ‘SearchTerms-Start’

wildcard_path | default = None

wildcard_glob_pattern | default = ‘*SearchTerms-Wildcard’

verbose | default = False

log | default = False

read_dictionary_terms

dict_file | default = None

id_col | default = ‘ID’

search_col | default = ‘DESCRIPTION’

definition_col | default = ‘LITHOLOGY’

class_flag_col | default = ‘CLASS_FLAG’

dictionary_type | default = None

class_flag | default = 6

rem_extra_cols | default = True

verbose | default = False

log | default = False

specific_define

df | default = ‘<output of previous function>’

terms_df | default = ‘<output of previous function>’

description_col | default = ‘FORMATION’

terms_col | default = ‘DESCRIPTION’

verbose | default = False

log | default = False

split_defined

df | default = ‘<output of previous function>’

classification_col | default = ‘CLASS_FLAG’

verbose | default = False

log | default = False

start_define

df | default = ‘<output of previous function>’

terms_df | default = ‘<output of previous function>’

description_col | default = ‘FORMATION’

terms_col | default = ‘DESCRIPTION’

verbose | default = False

log | default = False

wildcard_define

df | default = ‘<output of previous function>’

terms_df | default = ‘<output of previous function>’

description_col | default = ‘FORMATION’

terms_col | default = ‘DESCRIPTION’

verbose | default = False

log | default = False

remerge_data

classifieddf | default = ‘<output of previous function>’

searchdf | default = ‘<output of previous function>’

fill_unclassified

df | default = ‘<output of previous function>’

classification_col | default = ‘CLASS_FLAG’

read_lithologies

lith_file | default = None

interp_col | default = ‘LITHOLOGY’

target_col | default = ‘CODE’

use_cols | default = None

verbose | default = False

log | default = False

merge_lithologies

well_data_df | default = ‘<output of previous function>’

targinterps_df | default = ‘<output of previous function>’

interp_col | default = ‘INTERPRETATION’

target_col | default = ‘TARGET’

target_class | default = ‘bool’

align_rasters

grids_unaligned | default = None

model_grid | default = None

no_data_val_grid | default = 0

verbose | default = False

log | default = False

get_drift_thick

surface_elev | default = None

bedrock_elev | default = None

layers | default = 9

plot | default = False

verbose | default = False

log | default = False

sample_raster_points

raster | default = None

points_df | default = None

well_id_col | default = ‘API_NUMBER’

xcol | default = ‘LONGITUDE’

ycol | default = ‘LATITUDE’

new_col | default = ‘SAMPLED’

verbose | default = False

log | default = False

get_layer_depths

df_with_depths | default = ‘<output of previous function>’

surface_elev_col | default = ‘SURFACE_ELEV’

layer_thick_col | default = ‘LAYER_THICK’

layers | default = 9

log | default = False

layer_target_thick

df | default = ‘<output of previous function>’

layers | default = 9

return_all | default = False

export_dir | default = None

outfile_prefix | default = None

depth_top_col | default = ‘TOP’

depth_bot_col | default = ‘BOTTOM’

log | default = False

layer_interp

points | default = ‘<no default>’

grid | default = ‘<no default>’

layers | default = None

interp_kind | default = ‘nearest’

return_type | default = ‘dataarray’

export_dir | default = None

target_col | default = ‘TARG_THICK_PER’

layer_col | default = ‘LAYER’

xcol | default = None

ycol | default = None

xcoord | default = ‘x’

ycoord | default = ‘y’

log | default = False

verbose | default = False

kwargs | default = {}

export_grids

grid_data | default = ‘<no default>’

out_path | default = ‘<no default>’

file_id | default = ‘’

filetype | default = ‘tif’

variable_sep | default = True

date_stamp | default = True

verbose | default = False

log | default = False”

w4h.sample_raster_points(raster=None, points_df=None, well_id_col='API_NUMBER', xcol='LONGITUDE', ycol='LATITUDE', new_col='SAMPLED', verbose=False, log=False)[source]¶

Sample raster values to points from geopandas geodataframe.

Parameters:

rasterrioxarray data array: Raster containing values to be sampled.
points_dfgeopandas.geodataframe: Geopandas dataframe with geometry column containing point values to sample.
well_id_colstr, default=”API_NUMBER”: Column that uniquely identifies each well so multiple sampling points are not taken per well
xcolstr, default=’LONGITUDE’: Column containing name for x-column, by default ‘LONGITUDE.’ This is used to output (potentially) reprojected point coordinates so as not to overwrite the original.
ycolstr, default=’LATITUDE’: Column containing name for y-column, by default ‘LATITUDE.’ This is used to output (potentially) reprojected point coordinates so as not to overwrite the original. new_col : str, optional
new_colstr, default=’SAMPLED’: Name for name of new column containing points sampled from the raster, by default ‘SAMPLED’.
verbosebool, default=True: Whether to send to print() information about progress of function, by default True.
logbool, default = False: Whether to log results to log file, by default False

Returns:

points_dfgeopandas.geodataframe: Same as points_df, but with sampled values and potentially with reprojected coordinates.

w4h.sort_dataframe(df, sort_cols=['API_NUMBER', 'TOP'], remove_nans=True)[source]¶

Function to sort dataframe by one or more columns.

Parameters:

dfpandas.DataFrame: Dataframe to be sorted
sort_colsstr or list of str, default = [‘API_NUMBER’,’TOP’]: Name(s) of columns by which to sort dataframe, by default [‘API_NUMBER’,’TOP’]
remove_nansbool, default = True: Whether or not to remove nans in the process, by default True

Returns:

df_sortedpandas.DataFrame: Sorted dataframe

w4h.specific_define(df, terms_df, description_col='FORMATION', terms_col='DESCRIPTION', verbose=False, log=False)[source]¶

Function to classify terms that have been specifically defined in the terms_df.

Parameters:

dfpandas.DataFrame: Input dataframe with unclassified well descriptions.
terms_dfpandas.DataFrame: Dataframe containing the classifications
description_colstr, default=’FORMATION’: Column name in df containing the well descriptions, by default ‘FORMATION’.
terms_colstr, default=’DESCRIPTION’: Column name in terms_df containing the classified descriptions, by default ‘DESCRIPTION’.
verbosebool, default=False: Whether to print up results, by default False.

Returns:

df_Interpspandas.DataFrame: Dataframe containing the well descriptions and their matched classifications.

w4h.split_defined(df, classification_col='CLASS_FLAG', verbose=False, log=False)[source]¶

Function to split dataframe with well descriptions into two dataframes based on whether a row has been classified.

Parameters:

dfpandas.DataFrame: Dataframe containing all the well descriptions
classification_colstr, default = ‘CLASS_FLAG’: Name of column containing the classification flag, by default ‘CLASS_FLAG’
verbosebool, default = False: Whether to print results, by default False
logbool, default = False: Whether to log results to log file

Returns:

Two-item tuple of pandas.Dataframe: tuple[0] is dataframe containing classified data, tuple[1] is dataframe containing unclassified data.

w4h.start_define(df, terms_df, description_col='FORMATION', terms_col='DESCRIPTION', verbose=False, log=False)[source]¶

Function to classify descriptions according to starting substring.

Parameters:

dfpandas.DataFrame: Dataframe containing all the well descriptions
terms_dfpandas.DataFrame: Dataframe containing all the startswith substrings to use for searching
description_colstr, default = ‘FORMATION’: Name of column in df containing descriptions, by default ‘FORMATION’
terms_colstr, default = ‘FORMATION’: Name of column in terms_df containing startswith substring to match with description_col, by default ‘FORMATION’
verbosebool, default = False: Whether to print out results, by default False
logbool, default = True: Whether to log results to log file

Returns:

dfpandas.DataFrame: Dataframe containing the original data and new classifications

w4h.verbose_print(func, local_variables, exclude_params=[])[source]¶

w4h.wildcard_define(df, terms_df, description_col='FORMATION', terms_col='DESCRIPTION', verbose=False, log=False)[source]¶

Function to classify descriptions according to any substring.

Parameters:

dfpandas.DataFrame: Dataframe containing all the well descriptions
terms_dfpandas.DataFrame: Dataframe containing all the startswith substrings to use for searching
description_colstr, default = ‘FORMATION’: Name of column in df containing descriptions, by default ‘FORMATION’
terms_colstr, default = ‘FORMATION’: Name of column in terms_df containing startswith substring to match with description_col, by default ‘FORMATION’
verbosebool, default = False: Whether to print out results, by default False
logbool, default = True: Whether to log results to log file

Returns:

dfpandas.DataFrame: Dataframe containing the original data and new classifications

w4h.xyz_metadata_merge(xyz, metadata, verbose=False, log=False)[source]¶

Add elevation to header data file.

Parameters:

xyzpandas.Dataframe: Contains elevation for the points
metadatapandas dataframe: Header data file
logbool, default = False: Whether to log results to log file, by default False

Returns:

headerXYZDatapandas.Dataframe: Header dataset merged to get elevation values

w4h package¶

Gets the current date to help with finding the most recent file¶

Submodules¶