Beyond Mapping III

Topic 30:  A Math/Stat Framework for Grid-based Map Analysis and Modeling

 

Cover_small

Map Analysis book with

 companion CD-ROM for

hands-on exercises

and further reading

 

SpatialSTEM Has Deep Mathematical Roots  provides a conceptual framework for a map-ematical treatment of mapped data  

Map-ematically Messing with Mapped Data  discusses the nature of grid-based mapped data and Spatial Analysis operations

Paint by Numbers Outside the Traditional Statistics Box  discusses the nature of Spatial Statistics operations  

Simultaneously Trivializing and Complicating GIS  describes a mathematical structure for spatial analysis operations

Infusing Spatial Character into Statistics  describes a statistical structure for spatial statistics operations

Depending on Where is What  develops an organizational structure for spatial statistics

Spatially Evaluating the T-test  illustrates the expansion of traditional math/stat procedures to operate on map variables to spatially solve traditional non-spatial equations

Organizing Geographic Space for Effective Analysis  an overview of data organization for grid-based map analysis

To Boldly Go Where No Map Has Gone Before  identifies Lat/Lon as a Universal Spatial Key for joining database tables

The Spatial Key to Seeing the Big Picture  describes a five step process for generating grid map layers from spatially tagged data

Laying the Foundation for SpatialSTEM: Spatial Mathematics, Map Algebra and Map Analysis  discusses the conceptual foundation and intellectual shifts needed for SpatialSTEM  

Recasting Map Analysis Operations for General Consumption  reorganizes ArcGIS’s Spatial Analyst tools into the SpatialSTEM framework that extends traditional math/stat procedures

 

Note: The processing and figures discussed in this topic were derived using MapCalcTM software.  See www.innovativegis.com to download a free MapCalc Learner version with tutorial materials for classroom and self-learning map analysis concepts and procedures. 

 

<Click here> right-click to download a printer-friendly version of this topic (.pdf).

(Back to the Table of Contents)
______________________________


SpatialSTEM Has Deep Mathematical Roots

    
(GeoWorld, January 2012)  

(return to top of Topic)

 

Recently my interest has been captured by a new arena and expression for the contention that “maps are data”—spatialSTEM (or sSTEM for short)—as a means for redirecting education in general, and GIS education in particular.  I suspect you have heard of STEM (Science, Technology, Engineering and Mathematics) and the educational crisis that puts U.S. students well behind many other nations in these quantitatively-based disciplines. 

 

While Googling around the globe makes for great homework in cultural geography, it doesn’t advance quantitative proficiency, nor does it stimulate the spatial reasoning skills needed for problem solving.  Lots of folks from Freed Zakaria to Bill Gates to President Obama are looking for ways that we can recapture our leadership in the quantitative fields.  That’s the premise of spatialSTEM– that “maps are numbers first, pictures later” and we do mathematical things to mapped data for insight and better understanding of spatial patterns and relationships within decision-making contexts. 

 

This contention suggests that there is a map-ematics that can be employed to solve problems that go beyond mapping, geo-query, visualization and GPS navigation.  This column’s discussion about the quantitative nature of maps is the first part of a three-part series that sets the stage to fully develop this thesis— that grid-based Spatial Analysis Operations are extensions of traditional mathematics (Part 2 investigating map math, algebra, calculus, plane and solid geometry, etc.) and that grid-based Spatial Statistics Operations are extensions of traditional statistics (Part 3 looking at map descriptive statistics, normalization, comparison, classification, surface modeling, predictive statistics, etc.). 

 

Intro_1new_a

 

Figure 1. Conceptual overview of the SpatialSTEM framework.

 

Figure 1 outlines the important components of map analysis and modeling within a mathematical structure that has been in play since the 1980s (see author’s note).  Of the three disciplines forming Geotechnology (Remote Sensing, Geographic Information Systems and Global Positioning System), GIS is at the heart of converting mapped data into spatial information.  There are two primary approaches used in generating this information—Mapping/Geo-query and Map Analysis/Modeling.

 

The major difference between the two approaches lies in the structuring of mapped data and their intended use.  Mapping and geo-query utilizes a data structure akin to manual mapping in which discrete spatial objects (points, lines and polygons) form a collection of independent, irregular features to characterize geographic space.  For example, a Water map might contain categories of Spring (points), Stream (lines) and Lake (polygons) with the features scattered throughout a landscape. 

 

Map analysis and modeling procedures, on the other hand, operate on continuous map variables (termed map surfaces) composed of thousands upon thousands of map values stored in geo-registered matrices.  Within this context, a Water map no longer contains separate and distinct features but is a collection of adjoining grid cells with a map value indicating the characteristic at each location (e.g., Spring=1,  Stream= 2 and Lake= 3). 

 

Intro_1new_b

 

Figure 2. Basic data structure for Vector and Raster map types.

 

Figure 2 illustrates two broad types of digital maps, formally termed Vector for storing discrete spatial objects and Raster for storing continuous map surfaces.  In vector format, spatial data is stored as two linked data tables.  A “spatial table” contains all of the X,Y coordinates defining a set of spatial objects that are grouped by object identification numbers.  For example, the location of the Forest polygon identified on the left side of the figure is stored as ID#32 followed by an ordered series of X,Y coordinate pairs delineating its border (connect-the-dots). 

 

In a similar manner, the ID#s and X,Y coordinates defining the other cover type polygons are sequentially listed in the table.  The ID#s link the spatial table (Where) to a corresponding “attribute table” (What) containing information about each spatial object as a separate record.  For example, polygon ID#31 is characterized as a mature 60 year old Ponderosa Pine (PP) Forest stand.

 

The right side of figure 2 depicts raster storage of the same cover type information.  Each grid space is assigned a number corresponding to the dominant cover type present— the “cell position” in the matrix determines the location (Where) and the “cell value” determines the characteristic/condition (What).  It is important to note that the raster representation stores information about the interior of polygons and “pre-conditions geographic space” for analysis by applying a consistent grid configuration to each grid map.  Since each map’s underlying data structure is the same, the computer simply “hits disk” to get information and does not have to calculate whether irregular sets of points, lines or polygons on different maps intersect.  

 

Intro_1new_c

 

Figure 3. Organizational considerations and terminology for grid-based mapped data.

 

Figure 3 depicts the fundamental concepts supporting raster data.  As a comparison between vector and raster data structures consider how the two approaches represent an Elevation surface.  In vector, contour lines are used to identify lines of constant elevation and contour interval polygons are used to identify specified ranges of elevation.  While contour lines are exacting, they fail to describe the intervening surface configuration. 

 

Contour intervals describe the interiors but overly generalize the actual “ups and downs” of the terrain into broad ranges that form an unrealistic stair-step configuration (center-left portion of figure 3).  As depicted in the figure, rock climbers would need to summit each of the contour interval “200-foot cliffs” rising from presumed flat mesas.  Similarly, surface water flow presumably would cascade like waterfalls from each contour interval “lake” like a Spanish multi-tiered fountain. 

 

The upshot is that within a mathematical context, vector maps are ineffective representations of real-world gradients and actual movements and flows over these surfaces— while contour line/interval maps have formed colorful and comfortable visualizations for generations, the data structure is too limited for modern map analysis and modeling. 

 

The remainder of figure 3 depicts the basic Raster/Grid organizational structure.  Each grid map is termed a Map Layer and a set of geo-registered layers constitutes a Map Stack.  All of the map layers in a project conform to a common Analysis Frame with a fixed number of rows and columns at a specified cell size that can be positioned anywhere in geographic space.  As in the case of the Elevation surface in the lower-left portion of figure 3, a continuous gradient is formed with subtle elevation differences that allow hikers to step from cell to cell while considering relative steepness.  Or surface water to sequentially stream from a location to its steepest downhill neighbor thereby identifying a flow-path.  

 

The underlying concept of this data structure is that grid cells for all of the map layers precisely coincide, and by simply accessing map values at a row, column location a computer can “drill” down through the map layers noting their characteristics.  Similarly, noting the map values of surrounding cells identifies the characteristics within a location’s vicinity on a given map layer, or set of map layers. 

 

Keep in mind that while terrain elevation is the most common example of a map surface, it is by no means the only one.  In natural systems, temperature, barometric pressure, air pollution concentration, soil chemistry and water turbidity are but a few examples of continuous mapped data gradients.  In human systems, population density, income level, life style concentration, crime occurrence, disease incidence rate all form continuous map surfaces.  In economic systems, home values, sales activity and travel-time to/from stores form map variables that that track spatial patterns.   

 

In fact the preponderance of spatial data is easily and best represented as grid-based continuous map surfaces that are preconditioned for use in map analysis and modeling.  The computer does the heavy-lifting of the computation …what is needed is a new generation of creative minds that goes beyond mapping to “thinking with maps” within this less familiar, quantitative framework— a SpatialSTEM environment.

 _____________________________

Author’s Notes:  My involvement in map analysis/modeling began in the 1970s with doctoral work in computer-assisted analysis of remotely sensed data a couple of years before we had civilian satellites.  The extension from digital imagery classification using multivariate statistics and pattern recognition algorithms in the 70s to a comprehensive grid-based mathematical structure for all forms of mapped data in the 80s was a natural evolution.  See www.innovativegis.com, select “Online Papers” for a link to a 1986 paper on “A Mathematical Structure for Analyzing Maps” that serves as an early introduction to a comprehensive framework for map analysis/modeling.

 


Map-ematically Messing with Mapped Data

    
(GeoWorld, February 2012)  

(return to top of Topic)

 

The last section introduced the idea of spatialSTEM for teaching map analysis and modeling fundamentals within a mathematical context that resonates with science, technology, engineering and math/stat communities.  The discussion established a general framework and grid-based data structure needed for quantitative analysis of spatial patterns and relationships.  This section focuses on the nature of mapped data, an example of a grid-math/algebra application and discussion of extended spatial analysis operations.       

 

Intro_2new_a

 

Figure 1. Spatial Data Perspectives—Where is What.

 

Figure 1 identifies the two primary perspectives of spatial data—1) Numeric that indicates how numbers are distributed in “number space” (What condition) and 2) Geographic that indicated how numbers are distributed in “geographic space” (Where condition).  The numeric perspective can be grouped into categories of Qualitative numbers that deal with general descriptions based on perceived “quality” and Quantitative numbers that deal with measured characteristics or “quantity.” 

 

Further classification identifies the familiar numeric data types of Nominal, Ordinal, Interval, Ratio and Binary.  It is generally well known that very few math/stat operations can be performed using qualitative data (Nominal, Ordinal), whereas a wealth of operations can be used with quantitative data (Interval, Ratio).  Only a specialized few operations utilize Binary data. 

 

Less familiar are the two geographic data types.  Choropleth numbers form sharp and unpredictable boundaries in space, such as the values assigned to the discrete map features on a road or cover type map.  Isopleth numbers, on the other hand, form continuous and often predictable gradients in geographic space, such as the values on an elevation or temperature surface. 

 

Putting the Where and What perspectives of spatial data together, Discrete Maps identify mapped data with spatially independent numbers (qualitative or quantitative) forming sharp abrupt boundaries (choropleth), such as a cover type map.  Discrete maps generally provide limited footholds for quantitative map analysis.  On the other hand, Continuous Maps contain a range of values (quantitative only) that form spatial gradients (isopleth), such as an elevation surface.  They provide a wealth of analytics from basic grid math to map algebra, calculus and geometry.  

 

Intro_2new_b

 

Figure 2. Basic Grid Math and Algebra example.

 

Site-specific farming provides a good example of basic grid math and map algebra using continuous maps (figure 2).  Yield Mapping involves simultaneously recording yield flow and GPS position as a combine harvests a crop resulting in a grid map of thousands of geo-registered numbers that track crop yield throughout a field.   Grid Math can be used to calculate the mathematical difference in yield at each location between two years by simply subtracting the respective yield maps.  Map Algebra extends the processing by spatially evaluating the full algebraic percent change equation. 

 

The paradigm shift in this map-ematical approach is that map variables, comprised of thousands of geo-registered numbers, are substituted for traditional variables defined by only a single value.  Map algebra’s continuous map solution shows localized variation, rather than a single “typical” value being calculated (i.e., 37.3% increase in the example) and assumed everywhere the same in non-spatial analysis.  

 

Figure 3 expands basic Grid Math and Map Algebra into other mathematical arenas.  Advanced Grid Math includes most of the buttons on a scientific calculator to include trigonometric functions.  For example, taking the cosine of a slope map expressed in degrees and multiplying it times the planimetric surface area of a grid cell calculates the surface area of the “inclined plane” at each grid location.  The difference between planimetric area represented by traditional maps and surface area based on terrain steepness can be dramatic and greatly affect the characterization of “catchment areas” in environmental and engineering models of surface runoff. 

 

Intro_2new_c

 

Figure 3.  Spatial Analysis operations.

 

A Map Calculus expresses such functions as the derivative and integral within a spatial context.  The derivative traditionally identifies a measure of how a mathematical function changes as its input changes by assessing the slope along a curve in 2-dimensional abstract space. 

 

The spatial equivalent calculates a “slope map” depicting the rate of change in a continuous map variable in 3-dimensional geographic space.  For an elevation surface, slope depicts the rate of change in elevation.  For an accumulation cost surface, its slope map represents the rate of change in cost (i.e., a marginal cost map).  For a travel-time accumulation surface, its slope map indicates the relative change in speed and its aspect map identifies the direction of optimal movement at each location.  Also, the slope map of an existing topographic slope map (i.e., second derivative) will characterize surface roughness (i.e., areas where slope itself is changing).

 

Traditional calculus identifies an integral as the net signed area of a region along a curve expressing a mathematical function.  In a somewhat analogous procedure, areas under portions of continuous map surfaces can be characterized.  For example, the total area (planimetric or surface) within a series of watersheds can be calculated; or the total tax revenue for various neighborhoods; or the total carbon emissions along major highways; or the net difference in crop yield for various soil types in a field.  In the spatial integral, the net sum of the numeric values for portions of a continuous map surface (3D) is calculated in a manner comparable to calculating the area under a curve (2D).

 

Traditional geometry defines Distance as “the shortest straight line between two points” and routinely measures it with a ruler or calculates it using the Pythagorean Theorem.  Map Geometry extends the concept of distance to Simple Proximity by relaxing the requirement of just “two points” for distances to all locations surrounding a point or other map feature, such as a road. 

 

A further extension involves Effective Proximity that relaxes “straight line” to consider absolute and relative barriers to movement.  For example effective proximity might consider just uphill locations along a road or a complex set of variable hiking conditions that impede movement from a road as a function of slope, cover type and water barriers.  

 

The result is that the “shortest but not necessarily straight distance” is assigned to each grid location.  Because a straight line connection cannot be assumed, optimal path routines in Plane Geometry Connectivity (2D space) are needed to identify the actual shortest routes.  Solid Geometry Connectivity (3D space) involves line-of-sight connections that identify visual exposure among locations.  A final class of operations involves Unique Map Analytics, such as size, shape, intactness and contiguity of map features.     

 

Grid-based map analysis takes us well beyond traditional mapping …as well as taking us well beyond traditional procedures and paradigms of mathematics.  The next installment of spatialSTEM discussion considers the extension of traditional statistics to spatial statistics. 

_____________________________

Author’s Notes:  a table of URL links to further readings on the grid-based map analysis/modeling concepts, terminology, considerations and procedures described in this three-part series on spatialSTEM is posted at www.innovativegis.com/basis/MapAnalysis/Topic30/sSTEM/sSTEMreading.htm.

 

 

Paint by Numbers Outside the Traditional Statistics Box

    
(GeoWorld, March 2012)  

(return to top of Topic)

 

The two previous sections described a general framework and approach for teaching spatial analysis within a mathematical context that resonates with science, technology, engineering and math/stat communities (spatialSTEM).  The following discussion focuses on extending traditional statistics to a spatial statistics for understanding geographic-based patterns and relationships. 

 

Whereas Spatial analysis focuses on “contextual relationships” in geographic space (such as effective proximity and visual exposure), Spatial statistics focuses on “numerical relationships” within and among mapped data (figure 1).  From a spatial statistics perspective there are three primary analytical arenas— Summaries, Comparisons and Correlations.

 

Intro_3new_a

 

Figure 1. Spatial Statistics uses numerical analysis to uncover spatial relationships and patterns.

 

Statistical summaries provide generalizations of the grid values comprising a single map layer (within), or set of map layers (among).  Most common is a tabular summary included in a discrete map’s legend that identifies the area and proportion of occurrence for each map category, such as extremely steep terrain comprising 286 acres (19 percent) of a project area.  Or for a continuous map surface of slope values, the generalization might identify the data range as from 0 to 65% and note that the average slope is 24.4 with a standard deviation of 16.7. 

 

Summaries among two or more discrete maps generate cross-tabular tables that “count” the joint occurrence of all categorical combinations of the map layers.  For example, the coincidence of steepness and cover maps might identify that there are 242 acres of forest cover on extremely steep slopes (16 percent), a particularly hazardous wildfire joint condition.

 

Map comparison and correlation techniques only apply to continuous mapped data.  Comparisons within a single map surface involve normalization techniques.  For example, a Standard Normal Variable (SNV) map can be generated to identify “how unusual” (above or below) each map location is compared to the typical value in a project area.

 

Direct comparisons among continuous map surfaces include appropriate statistical tests (e.g., F-test), difference maps and surface configuration differences based on variations in surface slope and orientation at each grid location.

 

Map correlations provide a foothold for advanced inferential spatial statistics.  Spatial autocorrelation within a single map surface identifies the similarity among nearby values for each grid location.  It is most often associated with surface modeling techniques that employ the assumption that “nearby things are more alike than distant things”—high spatial autocorrelation—for distance-based weight averaging of discrete point samples to derive a continuous map surface.

 

Spatial correlation, on the other hand, identifies the degree of geographic dependence among two or more map layers and is the foundation of spatial data mining.  For example, a map surface of a bank’s existing concentration of home equity loans within a city can be regressed against a map surface of home values.  If a high level of spatial dependence exists, the derived regression equation can be used on home value data for another city.  The resulting map surface of estimated loan concentration proves useful in locating branch offices. 

 

In practice, many geo-business applications utilize numerous independent map layers including demographics, life style information and sales records from credit card swipes in developing spatially consistent multivariate models with very high R-squared values.  Like most things from ecology to economics to environmental considerations, spatial expression of variable dependence echoes niche theory with grid-based spatial statistics serving as a powerful tool for understanding geographic patterns and relationships.

 

Figure 2 describes an example of basic surface modeling and the linkage between numeric space and geographic space representations using environmentally-oriented mapped data.  Soil samples are collected and analyzed assuring that geographic coordinates accompany the field samples.  The resulting discrete point map of the field soil chemistry data are spatially interpolated into a continuous map surface characterizing the data set’s geographic distribution. 

 

The bottom portion of figure 2 depicts the linkage between Data Space and Geographic Space representations of the mapped data.  In data space, a standard normal curve is fitted to the data as means to characterize its overall “typical value” (Average= 22.9) and “typical dispersion” (StDev= 18.7) without regard for the data’s spatial distribution. 

 

In geographic space, the Average forms a flat plane implying that this value is assumed to be everywhere within +/- 1 Standard Deviation about two-thirds of the time and offering no information about where values are likely more or less than the typical value.  The fitted continuous map surface, on the other hand, details the spatial variation inherent in the field collected samples. 

       

Intro_3new_b

 

Figure 2. An example of Surface Modeling that derives a continuous map surface from set of discrete point data.

 

Nonspatial statistics identifies the “central tendency” of the data, whereas surface modeling maps the “spatial variation” of the data.  Like a Rochart ink blot, the histogram and the map surface provide two different perspectives.  Clicking a histogram pillar identifies all of the grid cells within that range; clicking on a grid location identifies which histogram range contains it. 

 

This direct linkage between the numerical and spatial characteristics of mapped data provides the foundation for the spatial statistics operations outlined in figure 3.  The first four classes of operations are fairly self-explanatory with the exception “Roving Window” summaries.  This technique first identifies the grid values surrounding a location, then mathematically/statistically summarizes the values, assigns the summary to that location and then moves to the next location and repeats the process. 

 

Another specialized use of roving windows is for Surface Modeling.  As described in figure 2, inverse-distance weighted spatial interpolation (IDW) is the weight-averaged of samples based on their relative distances from the focal location.  For qualitative data, the total number of occurrences within a window reach can be summed for a density surface.

 

In figure 3 for example, a map identifying customer locations can be summed to identify the total number of customers within a roving window to generate a continuous map surface customer density.  In turn, the average and standard deviation can be used to identify “pockets” of unusually high customer density. 

     

Intro_3new_c

 

Figure 3. Classes of Spatial Statistics operations.

 

Standard multivariate techniques using “data distance,” such as Maximum Likelihood and Clustering, can be used to classify sets of map variables.  Map Similarity, for example, can be used to compare each map location’s pattern of values with a comparison location’s pattern to create a continuous map surface of the relative degree of similarity at each map location.

 

Statistical techniques, such as Regression, can be used to develop mathematical functions between dependent and independent map variables.  The difference between spatial and non-spatial approaches is that the map variables are spatially consistent and yield a prediction map that shows where high and low estimates are to be expected.  

 

The bottom line in spatial statistics (as well as spatial analysis) is that the spatial character within and among map layers is taken into account.  The grid-based representation of mapped data provides the consistent framework that needed for these analyses.  Each database record contains geographic coordinates (X,Y= Where) and value fields identifying the characteristics/conditions at that location (Vi= What). 

 

From this map-ematical view, traditional math/stat procedures can be extended into geographic space.  The paradigm shift from our paper map legacy to “maps as data first, pictures later” propels us beyond mapping to map analysis and modeling.  In addition, it defines a comprehensive and common spatialSTEM educational environment that stimulates students with diverse backgrounds and interests to “think analytically with maps” in solving complex problems. 

_____________________________

Author’s Notes:  a table of URL links to further readings on the grid-based map analysis/modeling concepts, terminology, considerations and procedures described in this three-part series on spatialSTEM is posted at www.innovativegis.com/basis/MapAnalysis/Topic30/sSTEM/sSTEMreading.htm.

 

 

Simultaneously Trivializing and Complicating GIS   

(GeoWorld, April 2012)

  (return to top of Topic)

 

Several things seem to be coalescing in my mind (or maybe colliding is a better word).  GIS has moved up the technology adoption curve from Innovators in the 1970s to Early Adopters in the 80s, to Early Majority in the 90s, to Late Majority in the 00s and is poised to capture the Laggards this decade.  Somewhere along this progression, however, the field seems to have bifurcated along technical and analytical lines. 

 

The lion’s share of this growth has been GIS’s ever expanding capabilities as a “technical tool” for corralling vast amounts of spatial data and providing near instantaneous access to remote sensing images, GPS navigation, interactive maps, asset management records, geo-queries and awesome displays.  In just forty years GIS has morphed from boxes of cards passed through a window to a megabuck mainframe that generated page-printer maps, to today’s sizzle of a 3D fly-through rendering of terrain anywhere in the world with back-dropped imagery and semi-transparent map layers draped on top—all pushed from the cloud to a GPS enabled tablet or smart phone.  What a ride!  

 

However, GIS as an “analytical tool” hasn’t experienced the same meteoric rise—in fact it might be argued that the analytic side of GIS has somewhat stalled over the last decade.  I suspect that in large part this is due to the interests, backgrounds, education and excitement of the ever enlarging GIS tent.  Several years ago (see figure 1 and author’s note 1) I described the changes in breadth and depth of the community as flattening from the 1970s through the 2000s.  By sheer numbers, the balance point has been shifting to the right toward general and public users with commercial systems responding to market demand for more technological advancements. 

 

The 2010s will likely see billions of general and public users with the average depth of science and technology knowledge supporting GIS nearly “flatlining.”  Success stories in quantitative map analysis and modeling applications have been all but lost in the glitz n' flash of the technological whirlwind.  The vast potential of GIS to change how society perceives maps, mapped data and their use in spatial reasoning and problem solving seems relatively derailed.

 

In a recent editorial in Science entitled Trivializing Science Education, Editor-in-Chief Bruce Alberts laments that “Tragically, we have managed to simultaneously trivialize and complicate science education” (author’s note 2).  A similar assessment might be made for GIS education.  For most students and faculty on campus, GIS technology is simply a set of highly useful apps on their smart phone that can direct them to the cheapest gas for tomorrow’s ski trip and locate the nearest pizza pub when they arrive.  Or it is a Google fly-by of the beaches around Cancun.  Or a means to screen grab a map for a paper on community-based conservation of howler monkeys in Belize.   

 

 

Figure 1.  Changes in breadth and depth of the community.

 

To a smaller contingent on campus, it is career path that requires mastery of the mechanics, procedures and buttons of extremely complex commercial software systems for acquiring, storage, processing, and display spatial information.  Both perspectives are valid.  However neither fully grasps the radical nature of the digital map and how it can drastically change how we perceive and infuse spatial information and reasoning into science, policy formation and decision-making—in essence, how we can “think with maps.”    

 

A large part of missing the mark on GIS’s full potential is our lack of “reaching” out to the larger science, technology, engineering and math (STEM) communities on campus by insisting 1) that non-GIS students interested in understanding map analysis and modeling must be tracked into general GIS courses that are designed for GIS specialists, and 2) that the material presented primarily focuses on commercial GIS software mechanics that GIS-specialists need to know to function in the workplace.

 

Much of the earlier efforts in structuring a framework for quantitative map analysis has focused on how the analytical operations work within the context of Focal, Local and Zonal classification by Tomlin, or even my own the Reclassify, Overlay, Distance and Neighbors classification scheme (see top portion of figure 2 and author’s note 3).  The problem with these structuring approaches is that most STEM folks just want to understand and use the analytical operations properly—not appreciate the theoretical geographic-related elegance, or code the algorithm. 

 

Topic24_new_3b

 

Figure 2.  Alternative frameworks for quantitative map analysis.

 

The bottom portion of figure 2 outlines restructuring of the basic spatial analysis operations to align with traditional mathematical concepts and operations (author’s note 4).  This provides a means for the STEM community to jump right into map analysis without learning a whole new lexicon or an alternative GIS-centric mindset.  For example, the GIS concept/operation of Slope= spatial “derivative”, Zonal functions= spatial “integral”, Eucdistance= extension of “planimetric distance” and the Pythagorean Theorem to proximity, Costdistance= extension of distance to effective proximity considering absolute and relative barriers that is not possible in non-spatial mathematics, and Viewshed= “solid geometry connectivity”. 

 

Figure 3 outlines the conceptual development of three of these operations.  The top set of graphics identifies the Calculus Derivative as a measure of how a mathematical function changes as its input changes by assessing the slope along a curve in 2-dimensional abstract space—calculated as the “slope of the tangent line” at any location along the curve.  In an equivalent manner the Spatial Derivative creates a slope map depicting the rate of change of a continuous map variable in 3-dimensional geographic space—calculated as the slope of the “best fitted plane” at any location along the map surface.

    

Advanced Grid Math includes most of the buttons on a scientific calculator to include trigonometric functions.  For example, calculating the “cosine of the slope values” along a terrain surface and then multiplying times the planimetric surface area of a grid cell will solve for the increased real-world surface area of the “inclined plane” at each grid location.

 

The Calculus Integral is identified as the “area of a region under a curve” expressing a mathematical function.  The Spatial Integral counterpart “summarizes map surface values within specified geographic regions.”  The data summaries are not limited to just a total but can be extended to most statistical metrics.  For example, the average map surface value can be calculated for each district in a project area.  Similarly, the coefficient of variation ((Stdev / Average) * 100) can be calculated to assess data dispersion about the average for each of the regions.

 

Topic24_new_3c

 

Figure 3.  Conceptual extension of derivative, trigonometric functions and integral to mapped data and map analysis operations.

 

By recasting GIS concepts and operations of map analysis within the general scientific language of math/stat we can more easily educate tomorrow’s movers and shakers in other fields in “spatial reasoning”—to think of maps as “mapped data” and express the wealth of quantitative analysis thinking they already understand on spatial variables. 

 

Innovation and creativity in spatial problem solving is being held hostage to a trivial mindset of maps as pictures and a non-spatial mathematics that presuppose mapped data can be collapsed to a single central tendency value that ignores the spatial variability inherent in the data.  Simultaneously, the “build it (GIS) and they will come (and take our existing courses)” educational paradigm is not working as it requires potential users to become a GIS’perts in complicated software systems. 

 

GIS must take an active leadership role in “leading” the STEM community to the similarities/differences and advantages/disadvantages in the quantitative analysis of mapped data—there is little hope that the STEM folks will make the move on their own.  Next month we’ll consider recasting spatial statistics concepts and operations into a traditional statistics framework.   

_____________________________

Author’s Notes:  1) see “A Multifaceted GIS Community” in Topic 27, GIS Evolution and Future Trends in the online book Beyond Mapping III, posted at www.innovativegis.com.  2) Bruce Alberts in Science, 20 January 2012:Vol. 335 no. 6066 p. 263.  3) see “An Analytical Framework for GIS Modeling” posted at www.innovativegis.com/basis/Papers/Other/GISmodelingFramework/.  4) see “SpatialSTEM: Extending Traditional Mathematics and Statistics to Grid-based Map Analysis and Modeling” posted at www.innovativegis.com/basis/Papers/Other/SpatialSTEM/.

 

 

Infusing Spatial Character into Statistics   

(GeoWorld, May 2012)

  (return to top of Topic)

 

The previous section discussed the assertion that we might be simultaneously trivializing and complicating GIS.  At the root of the argument was the contention that “innovation and creativity in spatial problem solving is being held hostage to a trivial mindset of maps as pictures and a non-spatial mathematics that presuppose mapped data can be collapsed into a single central-tendency value that ignores the spatial variability inherent in data.” 

 

The discussion described a mathematical framework that organizes the spatial analysis toolbox into commonly understood mathematical concepts and procedures.  For example, the GIS concept/operation of Slope= spatial “derivative,” Zonal functions= spatial “integral,” Eucdistance= extension of “planimetric distance” and the Pythagorean Theorem to proximity, Costdistance= extension of distance to effective proximity considering absolute and relative barriers that is not possible in non-spatial mathematics, and Viewshed= “solid geometry connectivity.”

 

This section does a similar translation to describe a statistical framework for organizing the spatial statistics toolbox into commonly understood statistical concepts and procedures.  But first we need to clarify the differences between spatial analysis and spatial statistics.  Spatial analysis can be thought of as an extension of traditional mathematics involving the “contextual” relationships within and among mapped data layers.  It focuses on geographic associations and connections, such as relative positioning, configurations and patterns among map locations.

 

Spatial statistics, on the other hand, can be thought of as an extension of traditional statistics involving the “numerical” relationships within and among mapped data layers.  It focuses on mapping the variation inherent in a data set rather than characterizing its central tendency (e.g., average, standard deviation) and then summarizing the coincidence and correlation of the spatial distributions.  

 

The top portion of figure 1 identifies the two dominant GIS perspectives of spatial statistics— Surface Modeling that derives a continuous spatial distribution of a map variable from point sampled data and Spatial Data Mining that investigates numerical relationships of map variables. 

 

The bottom portion of the figure outlines restructuring of the basic spatial statistic operations to align with traditional non-spatial statistical concepts and operations (see author’s note).  The first three groupings are associated with general descriptive statistics, the middle two involve unique spatial statistics operations and the final two identify classification and predictive statistics.

 

Topic24_new_4a

 

Figure 1.  Alternative frameworks for quantitative map analysis.

 

Figure 2 depicts the non-spatial and spatial approaches for characterizing the distribution of mapped data and the direct link between the two representations.  The left side of the figure illustrates non-spatial statistics analysis of an example set of data as fitting a standard normal curve in “data space” to assess the central tendency of the data as its average and standard deviation.  In processing, the geographic coordinates are ignored and the typical value and its dispersion are assumed to be uniformly (or randomly) distributed in “geographic space.” 

 

The top portion of figure 2 illustrates the derivation of a continuous map surface from geo-registered point data involving spatial autocorrelation.  The discrete point map locates each sample point on the XY coordinate plane and extends these points to their relative values (higher values in the NE; lowest in the NW).  A roving window is moved throughout the area that weight-averages the point data as an inverse function of distance—closer samples are more influential than distant samples.  The effect is to fit a surface that represents the geographic distribution of the data in a manner that is analogous to fitting a SNV curve to characterize the data’s numeric distribution.  Underlying this process is the nature of the sampled data which must be numerically quantitative (measurable as continuous numbers) and geographically isopleth (numbers form continuous gradients in space).

 

The lower-right portion of figure 2 shows the direct linkage between the numerical distribution and the geographic distribution views of the data.  In geographic space, the “typical value” (average) forms a horizontal plane implying that the average is everywhere.  In reality, the average is hardly anywhere and the geographic distribution denotes where values tend to be higher or lower than the average. 

 

 

Topic24_new_4b

 

Figure 2.  Comparison and linkage between spatial and non-spatial statistics

 

In data space, a histogram represents the relative occurrence of each map value.  By clicking anywhere on the map, the corresponding histogram interval is highlighted; conversely, clicking anywhere on the histogram highlights all of the corresponding map values within the interval.  By selecting all locations with values greater than + 1SD, areas of unusually high values are located—a technique requiring the direct linkage of both numerical and geographic distributions. 

 

Figure 3 outlines two of the advance spatial statistics operations involving spatial correlation among two or more map layers.  The top portion of the figure uses map clustering to identify the location of inherent groupings of elevation and slope data by assigning pairs of values into groups (called clusters) so that the value pairs in the same cluster are more similar to each other than to those in other clusters.  

 

The bottom portion of the figure assesses map correlation by calculating the degree of dependency among the same maps of elevation and slope.  Spatially “aggregated” correlation involves solving the standard correlation equation for the entire set of paired values to represent the overall relationship as a single metric.  Like the statistical average, this value is assumed to be uniformly (or randomly) distributed in “geographic space” forming a horizontal plane.

 

“Localized” correlation, on the other hand, maps the degree of dependency between the two map variables by successively solving the standard correlation equation within a roving window to generate a continuous map surface.  The result is a map representing the geographic distribution of the spatial dependency throughout a project area indicating where the two map variables are highly correlated (both positively, red tones; and negatively, green tones) and where they have minimal correlation (yellow tones). 

 

With the exception of unique Map Descriptive Statistics and Surface Modeling classes of operations, the grid-based map analysis/modeling system simply acts as a mechanism to spatially organize the data.  The alignment of the geo-registered grid cells is used to partition and arrange the map values into a format amenable for executing commonly used statistical equations.  The critical difference is that the answer often is in map form indicating where the statistical relationship is more or less than typical.   

 

Topic24_new_4c

 

Figure 3.  Conceptual extension of clustering and correlation to mapped data and analysis.

 

While the technological applications of GIS have soared over the last decade, the analytical applications seem to have flat-lined.  The seduction of near instantaneous geo-queries and awesome graphics seem to be masking the underlying character of mapped data— that maps are numbers first, pictures later.  However, grid-based map analysis and modeling involving Spatial Analysis and Spatial Statistics is, for the larger part, simply extensions of traditional mathematics and statistics.  The recognition by the GIS community that quantitative analysis of maps is a reality and the recognition by the STEM community that spatial relationships exist and are quantifiable should be the glue that binds the two perspectives.  That reminds me of a very wise observation about technology evolution—  

 

“Once a new technology rolls over you, if you're not part of the steamroller, you're part of the road.”  ~Stewart Brand, editor of the Whole Earth Catalog

 

_____________________________

Author’s Notes:   for a more detailed discussion, see “SpatialSTEM: Extending Traditional Mathematics and Statistics to Grid-based Map Analysis and Modeling” posted at www.innovativegis.com/basis/Papers/Other/SpatialSTEM/.

 

 

Depending on Where is What   

(GeoWorld, March 2013)

  (return to top of Topic)

 

Early procedures in spatial statistics were largely focused on the characterization of spatial patterns formed by the relative positioning of discrete spatial objects—points, lines, and polygons.  The “area, density, edge, shape, core-area, neighbors, diversity and arrangement” of map features are summarized by numerous landscape analysis indices, such as Simpson's Diversity and Shannon's Evenness diversity metrics; Contagion and Interspersion/Juxtaposition arrangement metrics; and Convexity and Edge Contrast shape metrics (see Author’s Note 1).  Most of these techniques are direct extensions of manual procedures using paper maps and subsequently coded for digital maps. 

 

Grid-based map analysis, however, expands this classical view by the direct application of advanced statistical techniques in analyzing spatial relationships that consider continuous geographic space.  Some of the earliest applications (circa 1960) were in climatology and used map surfaces to generate isotherms of temperature and isobars of barometric pressure.  

 

In the 1970s, the analysis of remotely sensed data (raster images) began employing traditional statistical techniques, such as Maximum Likelihood Classification, Principle Component Analysis and Clustering that had been used in analyzing non-spatial data for decades.  By the 1990s, these classification-oriented procedures operating on spectral bands were extended to include the full wealth of statistical operations, such as Correlation and Regression, utilizing diverse sets of geo-registered map variables (grid-based map layers).  

 

It is the historical distinction between “Spatial Pattern characterization of discrete objects” and “Spatial Relationship analysis of continuous map surfaces” that identifies the primary conceptual branches in spatial statistics.  The spatial relationship analysis branch can be further grouped by two types of spatial dependency driving the relationships— Spatial Autocorrelation involving spatial relationships within a single map layer, and Spatial Correlation involving spatial relationships among multiple map layers (see figure 1).

 

Topic30_6a_Mar13.png

 

Figure 1. Spatial Dependency involves relationships within a single map layer (Spatial Autocorrelation) or among multiple map layers (Spatial Correlation).

 

Spatial Autocorrelation follows Tobler’s first law of geography— that “…near things are more alike than distant things.”  This condition provides the foundation for Surface Modeling used to identify the continuous spatial distribution implied in a set of discrete point data based on one of four fundamental approaches (see figure 2 and Author’s Note 2).  The first two approaches—Map Generalization and Geometric Facets—consider the entire set of point values in determining the “best fit” of a polynomial equation, or a set of 3-dimentional geographic shapes. 

 

For example, a 1st order polynomial (tilted plane) fitted to a set of data points indicates its spatial trend with decreasing values aligning with the direction cosines of the plane.  Or, a complex set of abutting tilted triangular planes can be fitted to the data values to capture significant changes in surface form (triangular tessellation).       

 

The lower two approaches—Density Analysis and Spatial Interpolation—are based on localized summaries of the point data utilizing “roving windows.” Density Analysis counts the number of data points in the window (e.g., number of crimes incidents within half a kilometer) or computes the sum of the values (e.g., total loan value within half a kilometer).

 

However, the most frequently used surface modeling approach is Spatial Interpolation that “weight-averages” data values within a roving window based on some function of distance.  For example, Inverse Distance Weighting (IDW) interpolation uses the geometric equation 1/D Power to greatly diminish the influence of distant data values in computing the weighted-average. 

 

image044.png

 

Figure 2. Surface Modeling involves generating map surfaces that portray the continuous spatial distribution implied in a set of discrete point data.

 

The bottom portion of figure 2 encapsulates the basis for Kriging which derives the weighting equation from the point data values themselves, instead of assuming a fixed geometric equation.  A variogram plot of the joint variation among the data values (blue curve) shows the differences in the values as a function of distance.  The inverse of this derived equation (red curve) is used to calculate the distance affected weights used in weight-averaging the data values.   

 

The other type of spatial dependency—Spatial Correlation—provides the foundation for analyzing spatial relationships among map layers.  It involves spatially evaluating traditional statistical procedures using one of four ways to access the geo-registered data— Local, Focal, Zonal and Global (see figure3 and Author’s Notes 3 and 4).  Once the spatially coincident data is collected and compatibly formatted, it can be directly passed to standard multivariate statistics packages or to more advanced statistical engines (CART, Induction or Neural Net).  Also, a growing number of GIS systems have incorporated many of the most frequently used statistical operations.

 

Topic30_6c_Mar13

 

Figure 3. Statistical Analysis of mapped data involves repackaging mapped data for processing by standard multivariate statistics or more advanced statistical operations.

 

The majority of the Statistical Analysis operations simply “repackage” the map values for processing by traditional statistics procedures.  For example, “Local” processing of map layers is analogous to what you see when two maps are overlaid on a light-table.  As your eye moves around, you note the spatial coincidence at each spot.  In grid-based map analysis, the cell-by-cell collection of data for two or more grid layers accomplishes the same thing by “spearing” the map values at a location, creating a summary (e.g., simple or weighted-average), storing the new value and repeating the process for the next location. 

 

“Focal” processing, on the other hand, “funnels” the map layer data surrounding a location (roving window), creates a summary (e.g., correlation coefficient), stores the new value and then repeats the process.   Note that both local and focal procedures store the results on a cell-by-cell basis. 

 

The other two techniques (right side of figure 3) generate entirely different summary results.  “Zonal” processing uses a predefined template (termed a map region) to “lace” together the map values for a region-wide summary.  For example, a wildlife habitat unit might serve as a template map to retrieve slope values from a data map of terrain steepness, compute the average of the values, and then store the result for all of the locations defining the region.  Or maps of animal activity for two time periods could be accessed and a paired t-test performed to determine if a significant difference exists within the habitat unit.  The interpretation of the resultant map value assigned to all of the template locations is that each cell is an “element of a spatial entity having that overall summary statistic.”

 

“Global” processing isn’t much different from the other techniques in terms of mechanics, but is radically different in terms of the numerical rigor implied.  In map-wide statistical analysis, the entire map is considered a variable, each cell a case and each value a measurement (or instance) in mathematical/statistical modeling terminology.  Within this context, the processing has “all of the rights, privileges and responsibilities” afforded non-spatial quantitative analysis.  For example, a regression could be spatially evaluated by “plunging” the equation through a set of independent map variables to generate a dependent variable map on cell-by-cell basis, or reported as an overall map-wide value. 

 

So what’s the take-home from all this discussion?  It is that maps are “numbers first, pictures later” and we can spatially discover and subsequently evaluate the spatial relationships inherent in sets of grid-based mapped data as true map-ematical expressions.  All that is needed is a new perspective of what a map is (and isn’t).    

_____________________________

Author’s Notes: in the online book Beyond Mapping III at www.innovativegis.com/basis/MapAnalysis/, 1) see Topic 9, “Analyzing Landscape Patterns”; 2) see Topics 2, “Spatial Interpolation Procedures and Assessment” and 8, “Investigating Spatial Dependency”; 3) refers to C. Dana Tomlin’s four data acquisition classes; 4) for more discussion on data acquisition techniques, see Topic 22, “Reclassifying and Overlaying Maps,” Section 2 “Getting the Numbers Right.”

 

 

Spatially Evaluating the T-test  

(GeoWorld, April 2013)

  (return to top of Topic)

 

The previous section  provided everything you ever wanted (or maybe never wanted) to know about the map-ematical framework for modern Spatial Statistics.  Its historical roots are in characterizing spatial patterns formed by the relative positioning of discrete spatial objects—points, lines, and polygons.  However, Spatial Data Mining has expanded the focus to the direct application of advanced statistical techniques in the quantitative analysis of spatial relationships that consider continuous geographic space. 

 

From this perspective, grid-based data is viewed as characterizing the spatial distribution of map variables, as well as the data’s numerical distribution.  For example, in precision agriculture GPS and yield monitors are used to record the position of a harvester and the current yield volume every second as it moves through a field (figure 1).  These data are mapped into the grid cells comprising the analysis frame geo-registered to the field to generate the 1997 Yield and 1998 Yield maps shown in the figure (3,289 50-foot grid cells covering a central-pivot field in Colorado). 

 

The deeper green appearance of the 1998 map indicates greater crop yield over the 1997 harvest—but how different is the yield between the two years?  …where are there greatest differences?  …are the differences statistically significant?     

 

Topic30_7a_Apr13

 

Figure 1. Precision Agriculture yield maps identify the yield volume harvested from each grid location throughout a field.  These data can be extracted using a “roving window” to form a localized subset of paired values surrounding a focal location.

 

Each grid cell location identifies the paired yield volumes for the two years.  The simplest comparison would be to generate a Difference map by simply subtracting them.  The calculated difference at each location would tell you how different the yield is between the two years and where the greatest differences occur.  But it doesn’t go far enough to determine if the differences are “significantly different” within a statistical context. 

 

An often used procedure for evaluating significant difference is the paired T-test that assesses whether the means of two groups are statistically different.  Traditionally, an agricultural scientist would sample several locations in the field and apply the T-test to the sampled data.  But the yield maps in essence form continuous set of geo-registered sample plots covering the entire field.  A T-test could be evaluated for the entire set of 3,289 paired yield values (or a sampled sub-set) for an overall statistical assessment of the difference.

 

However, the following discussion suggests a different strategy enabling the T-test concept to be spatially evaluated to identify 1) a continuous map of localized T-statistic metrics and 2) a binary map the T-test results.  Instead of a single scalar value determining whether to accept or reject the null hypothesis for an entire field, the spatially extended statistical procedure identifies where it can be accepted or rejected—valuable information for directing attention to specific areas.    

 

The key to spatially evaluating the T-test involves an often used procedure involving the statistical summary of values within a specified distance of a focal location, termed a “roving window.”  The lower portion of figure 1 depicts a 5-cell roving window (73 total cells) centered on column 33, row 53 in the analysis frame.  The pair of yield values within the window are shown in the Excel spread sheet (columns A and B) on the right side of the figure 1.

 

Figure 2 shows these same data and the procedures used to solve for the T-statistic within the localized window.  They involve the ratio of the “Mean of the differences” to a normalized “Standard Deviation of the differences.”  The equation and solution steps are—

 

TStatistic = dMean / ( dStdev / Sqrt(n) )

 

Step 1. Calculate the difference (di = yi − xi) between the two values for each pair.

Step 2. Calculate the mean difference of the paired observations, dAvg.

Step 3. Calculate the standard deviation of the differences, dStdev.

Step 4. Calculate the T-statistic by dividing the mean difference between the paired observations by the standard deviation of the difference divided by the square root of the number of paired values— TStatistic = dAvg / ( dStdev   / Sqrt(n) ).

 

One way to conceptualize the spatial T-statistic solution is to visualize the Excel spreadsheet moving throughout the field (roving window), stopping for an instant at a location, collecting the paired yield volume values within its vicinity (5-cell radius reach), pasting these values into columns A and B, and automatically computing the “differences” in column C and the other calculations.  The computed T-statistic is then stored at the focal location of the window and the procedure moves to the next cell location, thereby calculating the “localized T-statistic” for every location in the field.

 

Topic30_7b_Apr13

 

Figure 2. The T-statistic for the set of paired map values within a roving window is calculated by dividing the Mean of the Difference to the Standard Deviation of the Mean Differences divided by the number of paired values.

 

However, what really happens in the grid-based map analysis solution is shown in figure 3.  Instead of a roving Excel solution, steps 1 - 3 are derived as a separate map layers using fundamental map analysis operations.  The two yield maps are subtracted on a cell-by-cell basis and the result is stored as a new map of the Difference (step 1).  Then a neighborhood analysis operation is used to calculate and store a map of the “average of the differences” within a roving 5-cell window (step 2).  The same operation is used to calculate and store the map of localized “standard deviation of the differences” (step 3).   

 

The bottom-left portion of figure 3 puts it all together to derive the localized T-statistics (step 4).  Map variables of the Mean and StDev of the differences (both comprised of 3,289 geo-registered values) are retrieved from storage and the map algebra equation in the lower-left is solved 3,289 times— once for each map location in the field.  The resultant T-statistic map displayed in the bottom-right portion shows the spatial distribution of the T-statistic with darker tones indicating larger computed values (see author’s notes 1 and 2). 

 

The T-test map is derived by simply assigning the value 0 = no significant difference (yellow) to locations having values less than the critical statistic from a T-table; and by assigning 1= significant difference (black) to locations with larger computed values.

 

Topic30_7c_Apr13

 

Figure 3. The grid-based map analysis solution for T-statistic and T-test maps involves sequential processing of map analysis operations on geo-registered map variables, analogous to traditional, non-spatial algebraic solutions.   

 

The idea of a T-test map at first encounter might seem strange.  It concurrently considers the spatial distribution of data, as well as its numerical distribution in generating a new perspective of quantitative data analysis (dare I say a paradigm shift?).  While the procedure itself has significant utility in its application, it serves to illustrate a much broader conceptual point— the direct extension of the structure of traditional math/stat to map analysis and modeling.

 

Flexibly combining fundamental map analysis operations requires that the procedure accepts input and generates output in the same gridded format.  This is achieved by the geo-registered grid-based data structure and requiring that each analytic step involve—

 

 

The cyclical nature of the retrieval-manipulation-creation-storage processing structure is analogous to the evaluation of “nested parentheticals” in traditional algebra.  The logical sequencing of primitive map analysis operations on a set of map layers (a geo-registered “map stack”) forms the map analysis and modeling required in quantitative analysis of mapped data (see author’s note 3).  As with traditional algebra, fundamental techniques involving several basic operations can be identified, such as T-statistic and T-test maps, which are applicable to numerous research and applied endeavours.

 

The use of fundamental map analysis operations in a generalized map-ematical context accommodates a variety of analyses in a common, flexible and intuitive manner.  Also, it provides a familiar mathematical context for conceptualizing, understanding and communicating the principles of map analysis and modeling— the SpatialSTEM framework. 

_____________________________

Author’s Note: 1) Darian Krieter with DTSgis has developed an ArcGIS Python script calculating the localized T-statistic available at www.innovativegis.com/basis/MapAnalysis/Topic30/PythonT/; 2) an animated slide for communicating the spatial T-test concept, see www.innovativegis.com/basis/MapAnalysis/Topic30/Spatial_Ttest.ppt, 3) See www.innovativegis.com/basis/Papers/Online_Papers.htm  for a link to an early paper “A Mathematical Structure for Analyzing Maps.”

 

 

Organizing Geographic Space for Effective Analysis   

(GeoWorld, September 2012)

  (return to top of Topic)

 

A basic familiarity of the two fundamental data types supporting geotechnology—vector and raster—is important for understanding map analysis procedures and capabilities (see author’s note).  Vector data is closest to our manual mapping heritage and is familiar to most users as it characterizes geographic space as collection of discrete spatial objects (points, lines and polygons) that are easy to draw.  Raster data, on the other hand, describes geographic space as a continuum of grid cell values (surfaces) that while easy to conceptualize, requires a computer to implement.

 

Generally speaking, vector data is best for traditional map display and geo-query—“where is what,” applications that identify existing conditions and characteristics, such as “where are the existing gas pipelines in Colorado” (a descriptive query of existing information).  Raster data is best for advanced graphics and map analysis— “why, so what and what if” applications that analyze spatial relationships and patterns, such as “where is the best location for a new pipeline” (a prescriptive model deriving new information).     

     

Topic24_new6a_Sep12

 

Figure 1. A raster image is composed of thousands of numbers identifying different colors for the “pixel” locations in a rectangular matrix supporting visual interpretation.

 

Most vector applications involve the extension of manual mapping and inventory procedures that take advantage of modern computers’ storage, speed and Internet capabilities (better ways to do things).  Raster applications, however, tend to involve entirely new paradigms and procedures for visualizing and analyzing mapped data that advances innovative science (entirely new ways to do things).   

 

On the advanced graphics front, the lower-left portion of figure 1 depicts an interactive Google Earth display of an area in northern Wyoming’s Bighorn Mountains showing local roads superimposed on an aerial image draped over a 3D terrain perspective.  The roads are stored in vector format as an interconnecting set of line features (vector).  The aerial image and elevation relief are stored as numbers in geo-referenced matrices (raster).    

 

The positions in a raster image matrix are referred to as “pixels,” short for picture elements.  The value stored at each pixel corresponds to a displayed color as a combination of red, green and blue hues.  For example, the green tone for some of the pixels portraying the individual tree in the figure is coded as red= 116, green= 146 and blue= 24.  Your eye detects a greenish tone with more green than red and blue.  In the tree’s shadow toward the northwest the red, green and blue levels are fairly equally low (dark grey).  In a raster image the objective is to generate a visual graphic of a landscape for visual interpretation. 

 

A raster grid is a different type of raster format where the values indicate characteristics or conditions at each location in the matrix designed for quantitative map analysis (spatial analysis and statistics).  The elevation surface used to construct a tilted relief perspective in a Google Earth display is composed of thousands of matrix values indicating the undulating terrain gradient.

 

Topic24_new6b_Sep12

 

Figure 2. A raster grid contains a map values for each “grid cell” identifying the characteristic/condition at that location supporting quantitative analysis.

 

Figure 2 depicts a raster grid of the vegetation in the Bighorn area by assigning unique classification values to each of the cover types.  The upper portion of the figure depicts isolating just the Lodepole Pine cover type by assigning 0 to all of the other cover types and displaying the stored matrix values for a small portion of the project area.  While you see the assigned color in the grid map display (green in this example), keep in mind that the computer “sees” the stored matrix of map values.

 

The lower portion of the figure 2 identifies the underlying organizational structure of geo-registered map data.  An “analysis frame” delineates the geographic extent of the area of interest and in the case of raster data the size of each pixel/grid element.  In the example, the image pixel size for the visual backdrop is less than a foot comprising well over four million values and the grid cell size for analysis is 30 meters stored as a matrix with 99 columns and 99 rows totally nearly 10,000 individual cell locations. 

 

For geo-referencing, the lower-left grid cell is identified as the matrix’s origin (column 1, row1) and is stored in decimal degrees of latitude and longitude along with other configuration parameters as a few header lines in the file containing the matrix of numbers.  In most instances, the huge matrix of numbers is compressed to minimize storage but uncompressed on-the-fly for display and analytical processing.  

 

 

Figure 3. A set of geo-registered map layers forms a “map stack” organized as thousands upon thousands of numbers within a common “analysis frame.”

 

Figure 3 illustrates a broader level of organization for grid-based data.  Within this construct, each grid map layer in a geographically registered analysis frame forms a separate theme, such as roads, cover type, image and elevation.  Each point, line and polygon map feature is identified as a grid cell grouping having a unique value stored in implied matrix charactering a discrete spatial variable.  A surface gradient, on the other hand, is composed of fluctuating values that track the uninterrupted increases/decreases of a continuous spatial variable. 

 

The entire set of grid layers available in a database is termed a map stack.  In map analysis, the appropriate grid layers are retrieved, their vales map-ematically processed and the resulting matrix stored in the stack as a new layer— in the same manner as one solves an algebraic equation, except that the variables are entire grid maps composed of thousands upon thousands of geographically organized numbers.  

 

The major advantages of grid-based maps are their inherently uncomplicated data structure and consistent parsing within a holistic characterization of geographic space—just the way computers and math/stat mindsets like it.  No sets of irregular spatial objects scattered about an area that are assumed to be completely uniform within their interiors… rather, continuously defined spatial features and gradients that better align with geographic reality and, for the most part, with our traditional math/stat legacy.

 

The next section’s discussion builds on this point by extending grid maps and map analysis to “a universal key” for unlocking spatial relationships and patterns within standard database and quantitative analysis approaches and procedures.

_____________________________

Author’s Notes:  For a more detailed discussion of vector and raster data types and important considerations, see Topic 18, “Understanding Grid-based Data” in the online book Beyond Mapping III posted at www.innovativegis.com/basis/MapAnalysis/.

 

 

To Boldly Go Where No Map Has Gone Before   

(GeoWorld, October 2012)

  (return to top of Topic)

 

Previous sections have described a mathematical framework (dare I say a “map-ematical” framework?) for quantitative analysis of mapped data.  Recall that Spatial Analysis operations investigate the “contextual” relationships within and among maps, such as variable-width buffers that account for intervening conditions.  Spatial Statistics operations, on the other hand, examine the “numerical” relationships, such as map clustering to uncover inherent geographic patterns in the data. 

 

The cornerstone of these capabilities lies in the grid-based nature of the data that treats geographic space as continuous map surfaces composed of thousands upon thousands of cells with each containing data values that identify the characteristics/conditions occurring at each location.  This simple matrix structure provides a detailed account of the unique spatial distribution of each map variable and a geo-registered stack of map layers provides the foothold to quantitatively explore their spatial patterns and relationships. 

 

The most fundamental and ubiquitous grid form is the Latitude/Longitude coordinate system that enables every location on the Earth to be specified by a pair of numbers. The upper portion of figure 1, depicts a 2.50 Lat/Lon grid forming a matrix of 73 rows by 144 columns= 10,512 cells in total with each cell having an area of about 18,735mi2.

 

The lower portion of the figure shows that the data could be stored in Excel with each spreadsheet cell directly corresponding to a geographic grid cell.  In turn, additional map layers could be stored as separate spreadsheet pages to form a map stack for analysis. 

 

Of course this resolution is far too coarse for most map analysis applications, but it doesn’t have to be.  Using the standard single precision floating point storage of Lat/Long coordinates expressed in decimal degrees, the precision tightens to less than half a foot anywhere in the world (365214 ft/degree * 0.000001= .365214 ft *12 = 4.38257 inches or 0.11132 meters).  However, current grid-based technology limits the practical resolution to about 1m (e.g., Ikonos satellite images) to 10m (e.g., Google Earth) due to the massive amounts of data storage required. 

  

For example, to store a 10m grid for the state of Colorado it would take over two and half billion grid cells (26,960km²= 269,601,000,000m² / 100m² per cell= 2,696,010,000 cells).   To store the entire earth surface it would take nearly a trillion and a half cells (148,300,000km2 = 148,000,000,000,000m2 / 100m² per cell= 1,483,000,000,000 cells). 

 

Topic24_new7a_Oct12

 

Figure 1. Latitude and Longitude coordinates provide a universal framework for parsing the earth’s surface into a standardized set of grid cells. 

 

At first these storage loads seem outrageous but with distributed cloud computing the massive grid can be “easily” broken into manageable mouthfuls.  A user selects an area of interest and data for that area is downloaded and stitched together.  For example, Google Earth responds to your screen interactions to nearly instantaneously download millions of pixels, allowing you to pan/zoom and turn on/off map layers that are just a drop in the bucket of the trillions upon trillions of pixels and grid data available in the cloud.

 

Figure 2 identifies another, more practical mechanism for storage using a relational database.  In essence, each of the conceptual grid map spreadsheets can be converted to an interlaced format with a long string of numbers forming the columns (data fields); the rows (records) identify the information available each of the individual grid cells that form the reference grid.  

 

Topic24_new7b_Oct12

 

Figure 2. Within a relational database, Lat/Lon forms a Universal DBMS Key for joining tables.

 

For fairly small areas of up to a million or so cells this is an excellent way to store grid maps as their spatial coincidence is inherent in the organization and the robust standard set of database queries and processing operations is available.  Larger grids use more advanced, specialized mechanisms of storage to facilitate data compression and virtual paging of fully configured grid layers. 

 

But the move to a relational database structure is far more important than simply corralling mega-gulps of map values.  It provides a “Universal DBMS Key” that can link seemingly otherwise disparate database tables (see Authors Note).  The process is similar to a date/time stamp, except the “where information” provides a spatial context for joining data sets.  Demographic records can be linked to resource records that in turn can be linked to business records, health records, etc— all sharing a common Lat/Lon address. 

 

All that is necessary is to tag your data with its Lat/Lon coordinates (“where” it was collected) just as you do with the date/time (“when” it was collected) …not a problem with the ubiquitous availability and increasing precision of GPS that puts a real-time tool for handling detailed spatial data right in your pocket.  In today’s technology, most GPS-enabled smart phones are accurate to a few meters and specialized data collection devices precise to a few centimeters. 

 

Once your data is stamped with its “spatial key,” it can be linked to any other database table with spatially tagged records without the explicit storage of a fully expanded grid layer.  All of the spatial relationships are implicit in the relative positioning of the Lat/Lon coordinates. 

 

For example, a selection operation might be to identify of all health records jointly occurring within half a kilometer of locations that have high lead concentrations in the top soil.  Or, locate all of the customer records within five miles of my store; better yet, within a ten-minute drive from a store. 

 

Geotechnology is truly a mega-technology that will forever change how we perceive and process spatial information.  Gone are the days of manual measurements and specialized data formats that have driven our mapping legacy.  Lat/Lon coordinates move from cross-hairs for precise navigation (intersecting lines) to a continuous matrix of spaces covering the globe for consistent data storage (grid cells).  The recognition of a universal spatial key coupled with spatial analysis/statistics procedures and GPS/RS technologies provides a firm foothold “to boldly go where no map has gone before.”

_____________________________

Author’s Note:  See the online book Beyond Mapping III posted at www.innovativegis.com/basis/MapAnalysis/, Topic 28, “Spatial Data Mining in Geo-Business,” section on The Universal Key for Unlocking GIS’s Full Potential (October 2011 column).

 

 

The Spatial Key to Seeing the Big Picture  

(GeoWorld, September 2013)

  (return to top of Topic)

 

The previous section described the standard Latitude/Longitude grid as a “Universal Spatial dB Key” that is comparable to the date/time tagging of records in most database systems.  With general availability of GPS coordinates on most data collection devices, cameras, smartphones and tablets, earth position can be easily stamped with each data record.  Couple that with geo-coding by street address and most data collected today has a triplet of numbers indicating location (where), as well as characteristic/condition (what)—XY and Value designating “where is what.” 

 

Data flowing from a “spatially aware database” can be thought of as a faucet spewing data that meets a query (figure 1).  In turn, each value flows to the appropriate grid cell based on its Lat/Lon tag.  The process can be conceptualized as the “what” attributes aligning within an analysis frame (matrix of numbers) that characterizes the spatial pattern/distribution inherent in a set of data. 

 

While the long history of quantitative data analysis focused on the numerical distribution of data, quantitative analysis of the spatial distribution of geospatial data provides an new frontier for understanding spatial patterns and relationships influencing most physical, biological, environmental, economic, political and cultural systems.  The recognition, development and application of this fresh math/stat paradigm (sort of a “map-ematics”) promises to revolutionize how we extract and utilize information from field collected data (see Author’s Note 1).

 

Topic30_new10a_Sep13

Figure 1. Steps in generating a grid map layer from spatially tagged data.

 

Converting spatially tagged data into grid maps is outlined on the right side of figure 1 as a five step process.  The user first identifies the “geographic extent” of an area of interest by interactively dragging a box on a map or by entering Lat/Lon coordinates for the boundary (Step 1). 

 

An appropriate “cell size” for analysis is then entered as length of a side of an individual grid cell (Step 2).  The smaller the cell size the higher the spatial resolution affording greater detail in positioning but resulting in exponentially larger matrices for storage.  User judgment is applied to balance the precision (correct placement), accuracy (correct characterization) and storage/performance demands (see Author’s Note 2).

 

In Step 3, the computer divides the lengths of the NS and EW sides of the project area extent by the cell size to determine the number of rows and columns of a matrix (termed the Analysis Frame) used to store grid layer information (map variables).  This establishes an algorithm for determining the Lat/Lon ranges defining each grid cell and its centroid position.  Considerations and implications surrounding this technically tricky step (3D curved earth to 2D flat matrix) are reserved for later discussion.

 

Based on the positioning algorithm’s calculations, each geo-tagged value flowing from the database can be placed in the appropriate row/column position in the analysis frame’s matrix (Step 4).  The processing is repeated for all of the selected dB records.  If more than one value “falls” into a grid cell the values are summarized on-the-fly (Step 5).    

 

Figure 2 depicts the considerations surrounding the summary of multiple data values sharing a single grid cell.  The condition can be conceptualized as a “shish kebab of numbers” that needs to be reduced to an overall value that best typifies the actual characteristic/condition at that location.

 

Topic30_new10b_Sep13

 

Figure 2. Summarizing multiple data values falling in a single grid cell.  

 

The data type of the numbers determines the summary techniques available.  Most often quantitative values are averaged as shown in the figure but other statistical metrics can be used depending on the application.  Qualitative values are typically assigned the maximum or minimum value encountered in the string.  Binary values, such as crime occurrence, are usually summed to identify total count of instances at each grid location. 

 

The result of the five step procedure creates a grid map layer identifying the “discrete” spatial pattern of the data that is analogous to a histogram in non-spatial statistics.  In most applications, spatial interpolation or density analysis techniques are used to derive a continuous grid map layer characterizing the spatial distribution of the data which is analogous to fitting a standard normal curve to a histogram (see Author’s Note 3).  Once in this generalized form, most traditional quantitative analysis techniques (plus some spatially unique techniques) can be applied to investigate the spatial distribution, as well as the numerical distribution of the data.

 

The muddling concerns in applying the Lat/Lon grid as a Universal Spatial dB Key is in representing curved 3D earth positions as flat 2D cells of a matrix.  Figure 3 shows the reality of the grid cell shape that morphs from squares to stretched rectangles to elongated trapezoids with north/south movement away from the equator (see Author’s Note 4). 

 

Relatively small changes in the length of a degree of “latitude parallels” occur because of polar flattening— earth is an oblique spheroid instead of a perfect sphere due to centrifugal forces as the earth spins.  However huge changes occur for “longitude meridians” as the lines converge at the poles— a degree of longitude is widest at the equator and gradually shrinks to zero at the poles.

 

Topic30_new10c_Sep13

 

Figure 3. The area and shape of Lat/Lon grid cells varies with increasing latitude.       

 

The bottom line is that directly representing the Lat/Lon grid as a two-dimensional matrix can be unreliable for large project areas at the higher latitudes.  However two caveats are in play.  One is that projection algorithms can be applied on-the-fly to transform the curved 3D coordinates to a planar representation and then back to lat/Lon. 

 

The other is that for many applications involving relatively small project areas at low or mid latitudes, the positional precision tolerable.  The notion of “tolerable” precision is what most differentiates “mapping” from “map analysis.”  While neighbors and armies fight over inches in the placement of borders, most data analysts are more accommodating and satisfied knowing things are much higher (or lower) over there as compared to here—a few inches or feet (or even miles in some cases) misplacement doesn’t obscure the big picture of the spatial distribution and relationships.  

_____________________________

Author’s Notes: 1) See, Topic 30, “A Math/Stat Framework for Grid-based Map Analysis and Modeling;” 2) see Introduction, section 2, “Determining Exactly Where Is What;” 3) see Topic 2, “Spatial Interpolation Procedures and Assessment” and Topic 7, “Linking Data Space and Geographic Space” in the online book Beyond Mapping III  posted at www.innovativegis.com/basis/.  4) For a detailed discussion of latitude and longitude considerations see www.ncgia.ucsb.edu/giscc/units/u014/u014.html in the NCGIA Core Curriculum in Geographic Information Science, by Anthony P. Kirvan and edited by Kenneth Foote.

 

 

Laying the Foundation for SpatialSTEM: Spatial Mathematics, Map Algebra and Map Analysis

(GeoWorld, October 2013)

  (return to top of Topic)

 

Mathematics in general and geometry and trigonometry in particular have long been the keystone to mapping—from Spatial Mathematics that enables the development of mapped data; to a generalized Map Algebra for expressing math/stat relationships among map variables; to a comprehensive Map Analysis toolbox that extends traditional quantitative data analysis procedures by considering the spatial distribution and interaction of mapped data layers. 

 

Several years ago, Nigel Waters wrote a short synopsis on “The Most Beautiful Formulae in GIS” where he described the ten most useful Spatial Formulae and the ten most useful Attribute-related Formulae chosen for their elegance, simplicity, and generality, as well as their wide applicability and power (see author’s note 1).  More recently, the book “Spatial Mathematics: Theory and Practice through Mapping” by Arlinghaus and Kerski further develops the wealth of enabling Spatial Mathematics equations and techniques (see author’s note 2). 

 

These and a host of similar treatises provide a comfortable conceptual springboard for STEM disciplines to extend traditional scalar mathematics into the spatial realm.  The digital map expressed as an organized set of numbers fuels this transition— today “maps are numbers first, pictures later.”  The result is a generalized Map Algebra (see author’s note 3) enabling a user to add, subtract, divide, raise to a power, root, log and even differentiate and integrate digital maps— all of the functionality of a pocket calculator (and then some) operating on geo-registered stacks of digital maps. 

 

This algebraic framework provides a comprehensive toolbox of primitive mathematical operations transitioning traditional quantitative data analysis into Map Analysis that infuses the consideration of spatial patterns and relationships into the analysis.  From this perspective, the spatial distribution of data is as important as its numerical distribution in analyzing map variables.        

 

Topic30_new12a_Oct13

 

Figure 1. GIS can be viewed as both a “Technological Tool” and an “Analytical Tool.”

 

Figure 1 provides a 40,000-foot overview of the evolving field of Geotechnology, one of the three mega-technologies for the 21st century as identified by the U.S. Department of Labor (the other two are Biotechnology and Nanotechnology).  The left side of the figure depicts the “spatial triad” of technologies (GPS, GIS and RS) comprising Geotechnology that collects, stores, retrieves, processes, and displays digital mapped data.  The mapping and analysis capabilities of GIS can be characterized as both a “Technological Tool” involving mapping, display and geo-query and an “Analytical Tool” involving spatial mathematics, analysis and statistics. 

 

As a technological tool, GIS greatly extends traditional mapping and inventory techniques involving laborious, inefficient and generally ineffective manual procedures employed just a few decades ago.  Today it is commonplace to get real-time routing directions, superimposed on an interactive map with a satellite image backdrop and a street view of your destination; all from a smartphone that rivals the computing power of a mainframe computer a few decades ago.  For the most part, static paper maps have given way to dynamic digital mapped data that can be interactively viewed and processed in radically new ways—a revolution that is simply amazing for anyone over thirty, yet commonplace for those who are younger.

 

The meteoric rise in the technical expressions of Geotechnology is in large part due to its easily envisioned extension of its manual mapping and inventory legacies.  Database systems replaced the walls of file cabinets (attribute data) and digital maps replaced paper maps (spatial data).  Linking the two data set perspectives spawned a radically new paradigm of what a map is and isn’t and catapulted mapping to “mega-technology” status.  

 

Is a similar canonic step and radically changed paradigm in the future for traditional quantitative data analysis concepts, procedures and applications?  What are the impediments holding back GIS as an analytical tool?  What are the inducements needed for advancing spatially-aware quantitative data analysis?   

 

Topic30_new12b_Oct13

 

Figure 2. Types of GIS data, users and applications.

 

Figure 2 outlines the data, users and application approaches that is fueling this transformation.  A major hurdle is the historical perspective of maps as being comprised of discrete spatial objects (point, line and areal patterns) as depicted in the 2D vector-based map in the upper-left portion of the figure.  While this vector data format is comfortable and ideal for human visual interpretation, it lacks the spatial specificity and consistency required by advanced analysis procedures needed by most the STEM research and applications. 

 

Alternatively, raster data depicted in the lower-left portion of the figure provides a continuous and consistent data form that is preconditioned for quantitative data analysis.  A grid-based map surface tracks subtle spatial variations of a map variable as an uninterrupted gradient instead of aggregating the detailed data into discrete ranges (i.e., contour intervals). 

 

In addition, the matrix structuring provides a consistent “analysis frame” for a geo-registered stack of map layers for a project area.  Within this grid structure the row, column locators implicitly carry all of the necessary spatial topology relating each grid location to the positioning of all other locations within a single map layer and among multiple layers in a geo-registered map stack.           

 

The right side of figure 2 identifies several types of GIS users.  Currently, most of the GIS community is comprised of Data Providers, GIS Specialists, and General Users who are primarily involved with the technical aspects of GIS and their vector processing expressions— creating, maintaining and accessing mapped data and then executing standardized processing routines.  These users can be thought of as “of the technology.”     

 

The Power Users, Developers and Modelers, on the other hand, are more “of the application.”  Within this context, domain expertise identifies the scope of a problem and the map variables involved and then map analysis capabilities are used to uncover spatial relationships that then forms a spatially-aware solution.  It is in this arena that a “newly developing niche for SpatialSTEM” is poised to take-hold (see author’s note 4).   

 

Einstein noted that “we cannot solve our problems with the same level of thinking that created them” and that “the formulation of the problem is often more essential than its solution, which may be merely a matter of mathematical or experimental skill.”  This thinking suggests that the STEM disciplines need to be actively engaged and leading the search for spatially-aware solutions to today’s complex spatial problems.  Also, it recognizes that geospatial technologists need to fully recognize the quantitative nature mapped data and embrace its analytical potential, as well as its technical application.      

 

However when it comes to Map Analysis (grid-based Spatial Analysis and Spatial Statistics operations), the old adage that “they who know not, know not they know not” takes center stage and the status quo paradigms of science and technology continue to dominate education, research and application development.  As long as a conceptual chasm exists between the mapping and quantitative analysis communities, spatially-aware solutions to complex problems will continue to be mostly side-lined.    

_____________________________

Author’s Notes: 1) See “The Most Beautiful Formulae in GIS” by Nigel Waters (1995) posted at www.innovativegis.com/basis/MapAnalysis/Topic30/Beautiful_Formulae.pdf.   2) See “Spatial Mathematics: Theory and Practice through Mapping” by Sandra Arlinghaus and Joseph Kerski (2013, www.crcpress.com/product/isbn/9781466505322).  3) The concepts and procedures behind Spatial Mathematics was introduced by David Unwin with the University of London (Introductory Spatial Analysis, 1981, Methuen New York) and subsequently developed as a set-based Map Algebra for manipulating raster map layers by Dana Tomlin as a doctoral student at Yale University (Geographic Information Systems and Cartographic Modeling, 1990, Prentice-Hall, Englewood, New Jersey).  4) A twelve-part compilation of Beyond Mapping columns describing the math/stat framework, classification of procedures and future directions of SpatialSTEM is posted at www.innovativegis.com/basis/MapAnalysis/Topic30/Topic30.htm. 

 

 

Recasting Map Analysis Operations for General Consumption  

(GeoWorld, February 2013)

  (return to top of Topic)

 

Earlier discussions have suggested that there isa fundamental mathematical structure underlying grid-based map analysis and modeling that aligns with traditional non-spatial quantitative data analysis” (see Author’s Note 1).  This conceptual framework provides a common foothold for understanding, communicating and teaching basic concepts, procedures and considerations in spatial reasoning and analysis resonating with both GIS and non-GIS communities—a SpatialSTEM schema—that can be applied to any grid-based map analysis system (see Author’s Note 2). 

 

Topic24_new9a_Feb13

 

Figure 1. Grid-based map analysis operations in any GIS system, such as Spatial Analyst, can be reorganized into commonly understood classes of traditional quantitative data analysis.

 

For example, the top portion of figure 1 identifies the 22 map analysis “toolsets” containing over 170 individual “tools” in the Spatial Analyst module (ArcGIS by Esri).  The organization of the classes of operations involves a mixture of—

 

-        Traditional math/stat procedures (Conditional, Map Algebra, Math General, Math Bitwise, Math Logical, Math Trigonometric, Multivariate, Reclass);

-        Extensions of traditional math/stat procedures (Distance, Interpolation, Surface);

-        Unique map analysis procedures (Density, Local, Neighborhood, Overlay, Zonal);

-        Application-specific procedures (Groundwater, Hydrology, Solar Radiation); and

-        Housekeeping tasks (Extraction, Generalization, Raster Creation).

 

In large part, this toolset structuring is the result of the module’s development over-time responding to “business case” demands by clients instead of a comprehensive conceptual organization.  In contrast, Tomlin’s “Local, Focal, Zonal and Global” classes characterize the analytical operations on how the input data is obtained for processing, while my earlier groupings of “Reclassify, Overlay, Distance, Neighbors and Statistical” reflect the characteristics of the mapped data generated by the processing.

 

However, all three of these GIS-based schemas are foreign and confusing to the vast majority of potential map analysis users (all STEM disciplines) as they do not align with their traditional quantitative data analysis experiences.  This conceptual disconnect keeps GIS on the sidelines of the much larger quantitative analysis communities and reinforces the idea that GIS is a “technical tool” (mapping and geoquery) not a full-fledged “analytical tool” (spatial analysis and statistics).

 

The bottom portion of figure 1 identifies the two broad categories of traditional data analysis— Mathematics and Statistics—broken into seven major groupings that resonate with non-GIS communities.  All of Spatial Analysts’ 117 analytical operations (the other 53 are “reporting/housekeeping”) can be reorganized into the commonly recognized quantitative analysis categories.  

 

Figures 2 and 3 at the end of this section show my initial attempts at the reorganization (see Author’s Note 3).

 

The bottom line is that the SpatialSTEM framework recasts map analysis concepts and procedures into a more generally understood organization.  Within this general schema, map analysis is recognized as a set of natural extensions to familiar non-spatial math/stat operations.  For example—

 

-   A high school math teacher might follow a discussion of the Pythagorean Theorem with “…but what if there is an impassible barrier between the two points?  The distance is no longer a straight line but some sort of a ‘bendy-twisty’ route around the barrier.  How would you calculate the not-necessarily-straight distance?  The ‘Splash Algorithm’ does that by…” (you know the rest of the story).

 

-   Or a statistics instructor might follow a lecture on the derivation of the Standard Normal Curve for characterizing the ‘numerical distribution’ of a data set with “…but what about the ‘spatial distribution’ of the data?  Is data always uniform or randomly distributed in geographic space?  How could you characterize/visualize the spatial distribution?  ‘Spatial Interpolation’ does that by…” (you know the rest of the story). 

 

-   Or an environmental science teacher might follow a lecture on the use of riparian buffers with “…but are all ‘buffer-feet the same’?  What about the slope of the surrounding terrain?  …and the type of soil?  …and the density of vegetation?  Wouldn’t an area along a stream that is steep with an unstable soil and minimal vegetation require a much larger setback than an area that is flat with stable soils and dense vegetation?  How could you create a variable-width buffer around streams that considers the intervening erosion conditions?  A simple ‘sediment loading model does that by…” (you know the rest of the story).  

 

-   Or a crop scientist who historically calculated the increase (decrease) in yield over a previous year for a new genetic variety as the percent change in the total “weigh-wagon” records for an entire trial field.  But with GPS-enabled yield maps that automatically collect on-the-fly yield measurements as a harvester moves through a field, a detailed map of the percent change can be generated by spatially evaluating the standard algebraic equation by… (you know the rest of the story).

 

-   Or a sales manager can use ‘address geo-coding’ to sprinkle sales data onto a grid map and then compute ‘roving window’ totals to generate a sales density surface showing where sales are high (or low) throughout each of several sales territories.  The map analysis can be extended to calculate areas of unusually high (or low) sales by identifying locations that are more than one standard deviation above (or below) the average sales density… (you know the rest of the story). 

 

Dovetailing map analysis with traditional quantitative analysis thinking moves GIS from a “specialty discipline down the hall and to the right” for mapping and geoquery, to an integrated and active role in the spatial reasoning needed by tomorrow’s scientists, technologists, decision-makers and other professionals in solving increasing complex and knurly real-world problems.  From this perspective, “thinking with maps” becomes a true fabric of society thus fulfilling GIS’s mega-technology promise.

 

Topic24_new9b_Feb13

 

Figure 2. Reorganization of Spatial Analyst’s analytical “tools” into traditional mathematical categories.

 

 

Topic24_new9c_Feb13

 

Figure 3. Reorganization of Spatial Analyst’s analytical “tools” into traditional statistical categories.

 

_____________________________

Author’s Note: 1) see the Chronological Listing of Beyond Mapping columns posted at www.innovativegis.com/basis/MapAnalysis/ChronList/ChronologicalListing.htm; 2) for numerous links to papers, PowerPoint slide sets and other materials describing the SpatialSTEM framework, see www.innovativegis.com/Basis/Courses/SpatialSTEM/; 3) at the same SpatialSTEM posting, see the white paper entitled “Math/Stat Classification of Spatial Analysis and Spatial Statistics Tools (Spatial Analyst by Esri)” more detailed description of the recasting of Spatial Analyst’s operations by traditional non-spatial mathematics and statistics categories. 

 

(return to top of Topic)

 

(Back to the Table of Contents)