Topic 3 –
Assessing Neighborhood Characteristics using Roving Windows |
Beyond Mapping book |
Imagination is More Important than Information — describes
procedures for characterizing surface configuration (slope, aspect and profile)
It’s Like the New Math, I
am Just Too Old — discusses the
concept of calculating a “map derivative” and its use
Torture Numbers, They’ll Tell you Anything — discusses
the underlying theory and basic considerations of spatial interpolation
I Don’t Do Windows — describes
procedures for summarizing weighted roving windows
<Click here> for a printer-friendly version of this topic
(.pdf).
(Back to the Table of Contents)
______________________________
Imagination is more Important than
Information (Einstein)
(GIS World, June/July 1990)
...but
directed imagination needs the best information it can get.
When
viewing a map, the human mind nearly explodes with ideas about the
landscape. Although the ideas are
limitless, our ability to process the detailed spatial data is limited. When the computer 'views' a map, it sees an
organized set of spatial data ripe for processing, but has no idea of its
significance. Think about it, when was the last time you took your computer for
a walk in the woods?
That's
the beauty of the man/machine bonding in GIS.
The imagination of the user is magnified many-fold by the machine's
ability to assemble detail as directed.
Ian McHarg vividly makes this point in his
lectures on GIS-- 'it is a tool that extends the mind.' We easily conceptualize scenarios for a
landscape, but lack the facility to effectively evaluate their relative
merits. That's why we need our little
silicon subordinate... to take care of the details. From this perspective GIS is less computer
mapping and spatial data base management, than it is a Decision Support System
(DSS) for modeling and evaluating alternative land uses.
The
foundation of this 'thinking with maps' is rooted in the analytic capabilities
of GIS. The last series of articles
described how our simple concept of distance has been extended by the
computer's ability to calculate proximity, movement and connectivity. This is the first of a few articles
investigating a related set of analytic tools concerned with vicinity. Or, more technically stated, the analysis
of spatially defined neighborhoods by considering a map location within the
context of its neighboring locations. As
with all GIS processing, new values are computed as a function of the values on
another map. In neighborhood analysis
two steps are involved. First, establish
the neighborhood and its values, then summarize the
values.
Determination
of neighborhood membership is like a 'roving window' moving about a map. Picture a window with nine window panes
looking straight down onto a piece of the landscape (sort of makes you feel all
powerful, doesn't it). Now, like a nosey
neighbor, systematically move it around to check out the action. Suppose your concern was surface
configuration. You would note the nine
elevation values within the window, then summarize the
3-dimensional surface they form. If all
of the values were the same, say 100 feet elevation, you would say it was a
boring flat area and move your window slightly to one side. Some larger values appear on the side window
panes. Move it another couple of notches
and the window is full of different elevation values.
Imagine
the nine values become balls floating at their respective elevation. Drape a sheet over them like the magician
places a sheet over his suspended assistant (who says GIS isn't at least part
magic). There it is-- surface
configuration... now numerically summarize the lumps and bumps formed by the
ghostly sheet. That means reducing the
nine values to a single value characterizing the surface. How about its general steepness? You could compute the eight individual slopes
formed by the center value and its eight neighbors (change in elevation divided
by the change in horizontal distance expressed as a percent). Then average them for an average slope. You could, but how about choosing the maximum
slope? That's what water does. Or the minimum slope? That's what a weary hiker does. In special cases you would choose one of
these statistics.
Most
often you are interested in the best overall slope value. This is determined by the 'best fitted plane'
to the data. Replay your vision of the
nine floating balls. Now insert a glass
panel (plane) in such a way that the balls appear balanced about it (minimizing
the deviations from the plane to the balls).
If you're a 'techy', you will recognize this is simple 'linear
regression', except in 3-dimensional space.
But, if you value your computer's friendship, don't use a 'least-squares
fit' algorithm. Use vector algebra, its
much faster.
As
the window progresses about the map, slope values are assigned to the center
window pane (cell) until all locations have received a value... abra-ca-da-bra, a slope map. Locations with larger values indicate steep
terrain; smaller values for gently sloped terrain. But what is the terrain's orientation? That's an aspect map. Move the same window and best fitted plane
about the map, but this time use its 'direction cosines' to indicate the
orientation of the plane. Is it facing
south? Or north? Or 47 degrees azimuth? It could make a big difference. If you're trying to grow trees in a moisture
limited region those south facing slopes are only good for rattlesnakes. However, if there is ample water, they get
the most sunlight and tend to grow your best trees. If you're a land planner, the southern slopes
tend to grow the best houses, or at least lowest heating bills.
There
is one final surface configuration factor to consider-- profile. Imagine a loaf of bread, fresh from the
oven. It's sort of like an elevation
surface. At least mine has deep
depressions and high ridges. Now start
slicing the loaf and pull away an individual slice. Look at it in profile concentrating on the
line the top crust portion traces. From
left to right, the line goes up and down in accordance with the valleys and
ridges it sliced through. Use your arms
to mimic the shapes along the line. A 'V' with both arms up for a valley. An inverted 'V' with both
arms down for a ridge. Actually
there are only nine fundamental profile classes (distinct positions for your
two arms). Values one through nine will
serve as our numerical summary of profile.
However,
a new window is needed. This time as you
look down onto the landscape, move a window with just three panes along a
series of parallel lines. At an instant
in time, you have defined three elevation values. Compare the left side value to the
center. Is it higher or lower? Put your left arm in that position. Now do the same for the right side and center
values. Note the fundamental profile
shape you have formed and assign its value to the center location. Move the window over one pane and repeat
until you have assigned a profile value to every map location. The result of all this arm waving is a
profile map-- the continuous distribution profiles viewed from some direction. Provided your elevation data is at the proper
resolution, it's a big help in finding ridges and valleys running in a certain
direction. That's where the gold might
be. Or, if you look from two opposing
directions (orthogonal) and put the two profile maps together, a location with
an inverted 'V' in both directions is likely a peak.
There
is a lot more to neighborhood analysis than just characterizing the lumps and
bumps of the terrain. What would happen
if you created a slope map of a slope map?
Or a slope map of a barometric pressure map? Or of a cost surface? What would happen if the window wasn't a
fixed geometric shape? Say a ten minute
drive window. I wonder what the average
age and income is for the population within such a bazaar window?... See you next issue.
It’s Like the New Math… I am just too old
(GIS World, August/September 1990)
Earlier
discussions have established 'maps as data'.
The fact that maps are numbers in a GIS is what allows us to go beyond
mapping. It extends pens, symbols and
colors to spatial statistics, mathematics and modeling. But in doing so, does it leave the typical
user in the dust?
Your
initial response is likely, "You bet.
Its like new math, I'm just too
old." We have our established
procedures for dealing with maps and data, built up through years of study at
the School of Hardknocks. Maps are colorful graphics you hang on the
wall, and data are colorless numbers you align in a column. The thought that they are one of the same is
unsettling. Well, let's return to that
Land of Oz, featuring neighborhood characterization.
When
last we saw our hero, the 'roving window,' he was about to crash into a lumpy,
bumpy terrain surface. If that is all
that he is good for, he was going to end it all. Let's review the facts. The procedure for assessing surface
configuration was described as a window with nine panes moving about a map of
elevation values. At an instant in time,
the nine values in the window are summarized for the slope and aspect of the
three-dimensional surface they form (see last issue for algorithms). The results are assigned to the location at
the center of the window, then the window advances to the next position. This procedure is repeated until a slope or aspect
value is assigned to all locations in a project area. Useful information, but is that all there is?
Of
course not, this is the Land of Oz. A
slope map is actually the first derivative of the elevation surface. You remember the derivative; that blood-sucking
calculus teacher threatened you with it.
In simple terms, a derivative indicates the 'rate of change' in one
variable with respect to another. In the
terrain slope example, it is the rate of change in elevation per geographic
step-- "rise is to run." If
elevation doesn't change, the terrain is flat.
If it changes a lot over a short distance, it's steep. Slope (derivative) indicates how rapidly
things are changing throughout your map.
Aspect indicates the direction of this change. "I can handle that."
OK. Then what is the second derivative of an
elevation map? The
slope of a slope map? Let's see,
the first derivative is the rate of change in elevation per geographic
step. Then the second derivative must be
the rate of change in the rate of change in elevation per geographic step. What?
That doesn't make sense. Maps are
maps and math is math, and you shouldn't confuse them. No, the result is called a surface
'roughness' map. It shows you where
those little brown contour lines are close together, then far apart, then close
together again, then far apart-- heart break terrain for tired hikers. Close your eyes and envision a steep mountain
side you have to climb. It could be
worse.
Suppose
the slope isn't constant (a tilted, straight line in profile), but variable (a
tilted, wiggly line in profile). To get
to the same point, you would hike up, then down, then up, then down... at each
rise, your hopes (and elevation advantage earned) would be lost. Neither man nor machine likes to run around
in this sort of terrain. If you're a
forester, a roughness map could help you in harvest planning. If you're a regional planner, it could help
you assess likely corridors for a proposed highway. If you're a hydrologist, it could help in
modeling surface runoff.
Weatherpersons
have used map derivatives for years.
They collect barometric pressure readings at weather stations, then interpolate these data into pressure gradient
maps. To a casual observer, these maps
look just like a terrain surface... peaks, valleys and a host of varying slopes
connecting them. Winds blow from high to
low pressure; or stated another way, from the peaks to the valleys along the
pressure gradient. The steeper the slope
between peaks and valleys, the stronger the wind will be. Therefore, a 'slope' map of the pressure
gradient surface indicates wind speed at each location. The aspect of the same map indicates wind
direction at each location. So that's
how they get tomorrow's 'gusty' prediction.
Or how about a cooling pond's 'thermal gradient'? ...a mountain of high
temperature at the point of discharge that dissipates at different rates and
directions as a function of depth, bottom conditions, streams and springs. Slope and aspect of this thermal gradient
maps these complex interactions.
Let's
recapitulate before we go on. The
familiar concepts of terrain slope and aspect are down-to-earth examples of
that elusive mathematical concept of the derivative. It gives a firm footing in the real world to
one of the most powerful mathematical tools in numerical space. A slope map indicates the spatial
distribution of the rate of change in any map variable. An aspect map indicates the direction of that
change. Since slope and aspect have a
more general meaning than giving direction to raindrops, what else can they
do?
Consider
an accumulation surface. Remember that
bizarre map discussed a few issues ago in connection with weighted distance
measurement. An example is a
'travel-time' map indicating how long it would take to travel from one location
to all other locations in a project area.
If you move in straight lines in all directions, a perfect bowl of
constantly increasing distance is formed.
However, if you are a realistic hiker your movement throughout the area
will bend and twist around both absolute and relative barriers, as defined by
terrain and land cover features. The
result is a travel-time map that is bowl-like, but wrinkled with ridges and
valleys. That’s weird, but still a map
surface that is not unlike a terrain surface.
Its slope indicates the ease of optimal movement across any location in
the project area. Its aspect indicates
the direction of optimal movement. So
what?
Think
about it. This is a map of the pattern
of optimal movement throughout an area.
Such information is invaluable whether you are launching a crew of fire
fighters or a cruise missile. But why
stop at a travel-time map. Why not a
'cost surface,' in which its derivative produces a marginal cost map-- the cost
to go an additional step in space. Or a 'marginal revenue' map. Such is the decision fodder that fuels the
salaries of most MBA's-- except expressed as an entire map, instead of a single
number.
At
least two things should be apparent from the above discussion. First, that map analysis in a GIS is based on
mathematics. Maps are large groups of
numbers and most of the traditional analysis techniques are applicable. Certainly the derivative is a switch hitter
that propels GIS beyond mapping. How
about the integral? Sure, why not? Envision a single, huge window that covers
the entire map and sum all the values.
How about the mean? Just divide
the integrated sum by the number of locations.
How about the standard deviation?
And the coefficient of variation? Sure, but that's for the next issue. The other thing that should be apparent is
that you don't want to have a thing to do with this map-ematics
…maybe, maybe not. By reading this far,
a seed has been set. At minimum, the
thought of a map derivative will haunt you in your shower; like that innocent
bather who checked in at the Bates Motel.
Torture Numbers, They’ll Tell you Anything
(GIS World, October/November 1990)
The
last two sections introduced the idea of 'roving' a small window throughout a
map summarizing the surface configuration detected at each location. Slope, aspect and profile of an elevation
surface made sense. You have dug in your
fingernails on steep, southerly slopes and pitched your tent on flat ones. But the extension of the concept to abstract
maps, such as travel-time and cost surfaces, may have been a bit
uncomfortable. Relating
it to math's derivative made it down right inhospitable. Hopefully you saw through all that academic
hyperbole to visualize its application potential. Surface configuration tells you how and where
a 'mapped variable' is changing-- important information to tell you how and
where you should change your management action.
There
is another fundamental way to summarize a roving window-- statistically. For example, "How many houses are there
within a quarter mile radius?". Or, "What is the average lead
concentration in the soil within a hundred meters?" In these instances, the numbers defining a
map are 'windowed', then statistically summarized, with the summary value
assigned to the center location of the window.
The window repeats this process as it moves about the map. The list of data summary techniques is large
indeed-- total, average, standard deviation, coefficient of variation, maximum,
minimum, median, mode, diversity, deviation, and many others. In fact, any of the traditional mathematical
and statistical operations can be used to summarize the data in the window.
But
that's not all. Because of the spatial
nature of mapped data, new operations arise, such as Fourier two-dimensional
digital filtering-- a real trek on the quantitative side that is beyond the
scope of this article. Yet the basic
concept imbedded in these seemingly complex procedures actually is quite
simple. Consider the housing density map
noted above. The number of houses within
a quarter mile of any location is an indicator of human activity. More houses means
more activity. Yet suppose your concern
is a noisy neighborhood. It's not just
the total number of houses in the vicinity of a location, but their juxta-positioning.
If the woofers and tweeters are concentrated close to you, you'll be rock'n through the night.
If most are at the edge of your neighborhood 'window', no problem. Physics describes this condition as the
'dissipation of sound as a non-linear function of distance.' You probably describe it as relief. That means a house twice as far away sounds a
whole lot quieter than the one next door.
To
our GIS, that means a 'distance weighted' window capability. Weights are calculated as an inverse function
of the distance for each window position.
The result is a matrix of numbers analogous to writing a weighting
factor on each pane of glass forming the entire window. When you look though this window at the
landscape, multiply the data you see times its respective weight, then
statistically summarize these data and assign the summary value to the center
location of the window. In this
instance, the noise emanating from each house is adjusted for its positioning
in the window and the total noise computed by summing all of the adjusted
values.
The
concept of weighted windows is fairly easy to grasp. The procedure used to derive the weights is
what separates the manager from the mathematician. For now, let's stick to the easy stuff-- for
example, weighted nearest neighbor interpolation. It uses an 'inverse distance squared' window
similar to the one described above.
Instead of noisy data, field collected measurements of well pollution
levels, or barometric pressure, or animal activity can be used.
Consider
figure 1, but please excuse the PC EGA color graphics slide of the screen. It's not as pretty as a workstation rendering,
but I did it on my lap at 30,000 feet.
Inset (a) shows a geographic plot of animal activity recorded during a
twenty-four hour period at sixteen sample sites for a 625 hectare project area. Note the higher measurements are concentrated
in the northeast, while the lower measurements are in the northwest. The highest activity level is 87, while the
lowest is 0-- forming a rather large data range of 87. The computed average activity is 22.56, with
a standard deviation of + 26.2.
On the whole, the area is fairly active but not too bad.
Inset
(b) shows the result of moving the inverse distance squared window over the
data map. At each stop the activity data
is multiplied by its weighting factor, and the weighted average of all the
adjusted measurements is assigned to the center of the window. This provides an estimate (interpolated
value) of activity which is primarily influenced by those sampled points closer
to it. It's common sense... if there is
a lot of activity immediately around a location; chances are there is a lot of
activity at that location. This is often
the case, but not always (that's why we need the mathematician's complex
weighting schemes).
"Whoa! You mean tabular data can be translated into
maps?" In many instances, the answer
is yes. The 22.56 average animal activity actually implies a map. It's just that it is a perfectly flat surface
that estimates a 22.56 activity level is everywhere... plus or minus the
standard deviation of 26.2, of course.
But it doesn't indicate where you would expect more activity (plus), or
where to expect less (minus). That's
what the interpolated map does. If
higher activity is measured all around a location, such as in the northeastern
portion, then why not estimate more than the average? Not a bad assumption in this case, but 'it
depends on the data' is the correct answer.
As with all map analysis operations, you aren't just coloring maps,
you're processing numbers with all the rights, privileges and responsibilities
of math and stat. Be careful.
Figure
1. Spatial interpolation of discrete point
samples generates a continuous map surface that in turn, identifies areas of unusually
high activity.
You
might be asking yourself, "If the interpolated surface predicts a
different animal activity at each location, I wonder where there are areas of
unusual activity." That's a
'standard normal variable (SNV)' map.
It's this simple... SNV=((x-average)/standard
deviation)*100, where x is an interpolated value. It's not as bad as you might think. If the interpolated value (x) is exactly the
same as average, then it computes to 0... exactly what
you would expect. Positive SNV values
indicate areas above the average (more than you would expect); negative values
indicate areas below the average (less than you would expect). A +100 or larger value indicates areas that
are 100%, or more, of a standard deviation above the average... very unusually
high activity. Inset (c) of the
accompanying figure locates this area as easily accessible by the woods road in
the northeast. Now get in your pickup
truck and check it out. For the
techy-types, the SNV map is the geographic plot of the standard normal curve
and the map in inset (c) is the plot of the upper tail of the curve. For the rest of us, it's just a darn useful
technique that provides a new way of looking at our old data. It brings statistics down to earth.
So,
spatial interpolation is a neighborhood operation involving; at least
conceptually, a roving window; a weighted one at that. Actually, it's an operation fairly similar to
the familiar concepts of slope and aspect calculation. In the next issue, we will finish our brush with
neighbors by considering 'dynamic' windows.
Once you have tasted weighted windows, you will love dynamic ones. See you then.
I Don’t Do Windows
(GIS World, December 1990)
The
previous sections have discussed 'neighborhood' operations as moving a window
about map. We found that the data within
a window at an instant in time could be used to characterize the surface
configuration (e.g., slope or aspect) or generate a summary statistic (e.g.,
total or average). The value
representing the entire neighborhood is assigned to the focus of the window,
then the window shifts to the next location.
It
is a simple, straight forward process, except for two counts. One involves understanding the wealth of
mathematical and statistical processes involved. Most traditional math/stat operations are
possible (those termed 'commutative' operations for the techy types). That leads to the other complicating count--
why would I want to do these unnatural, numerical things to a map? And what would I do with the bazaar results,
such as a marginal cost map (slope of a cost surface)? Hopefully, the preceding articles provided
enough examples to stimulate your thinking beyond traditional mapping, to maps
as data, and finally to map analysis.
Intellectual
stimulation, however, can quickly turn to conceptual overload. Risking this, let's return to spatial
interpolation. Recall that interpolation
involves moving window about a map, identifying the sampled values within the
window, summarizing these samples and finally assigning the summary to the
focus of the window. The summary could
be a simple arithmetic average or a weighted average (most commonly 'the inverse
distance squared' weighted average).
How
about another conceptual step? Instead
of making the weights a simple function of distance, incorporate a 'bias' based
on the trend in the sampled data. This
is what that mysterious interpolator 'kriging' does. It's based on common sense-- the accuracy of
an estimated value is best at a sampled location and becomes less reliable as
interpolated points get further away. Simple and straight forward.
But the direction to a sampled value often makes a difference. For example, consider the change in
ecological conditions as you climb from Death Valley to the top of Mount
Whitney. As elevation rapidly increases
you quickly pass through several ecological communities. If you move along an elevation contour,
things don't change as quickly. For years,
ecologists have used elevation in their mapping.
Now
envision a map of this area (or refer to a map of southeastern
California). Major changes in elevation
primarily occur along the East/West axis.
Most of the contours (constant elevation) run along the North/South
axis. If our understanding of ecology
holds, an estimated location should be influenced more by samples in a
North/South direction from it. Samples
to the East/West should have less influence.
That's what kriging does. It
first analyzes the sample data set for directional bias, then
adjusts the weighting factors it uses in summarizing the samples in the
window. In this case, it would uncover
the directional bias in the sample data (induced by elevation gradient,
provided theory holds), then sets the window weighting factors.
Another
way to conceptualize the direction-biased window is as an ellipse instead of a
circle. Inverse distance squared
weighting forms concentric halos of equal weights-- a circular window. Kriging windows form football-shaped halos
reaching out the farthest in the direction of trend in the data-- an elliptical
window. For the techy few, this is
similar to the 'Mahalanobis' distance in multivariate analysis. For the rest of us, it demonstrates the first
consideration in roving window design-- direction. Why do widows have to form simple geometric
shapes, like circles and squares, in which all directions are symmetrically
considered? Well they don't.
For
example, consider secondary source air pollution and health risk mapping. If you have a map of the concentration of
lead in soils you might identify as 'risky' those areas with high
concentrations within five hundred meters.
To produce this map you could move a window with a radius of 500 meters
throughout the lead map, assigning the average concentration as you go. But this process ignores the prevailing
winds. An area might have a high
concentration to the north, with low concentrations elsewhere. Its average might be within the guidelines,
but as the wind blows from the north, the real effect would be disastrous for a
home built at this location. In this
case, a wedge-shaped window oriented to the north (up wind) would be more
appropriate.
Actually
there is more to windows than just direction.
There is distance. For example,
consider a big wind from the north.
Under these conditions relatively distant locations of high
concentrations could affect you. Under
light winds they wouldn't. Considering
both wind direction and strength, results in a dynamic window that adjusts
itself each time it defines a neighborhood.
To accomplish this, you need a wind map (often referred to as a 'wind
rose') as well as the lead concentration map.
The wind map develops the window configuration and the lead map provides
the data for summary. In reality,
'cumulative effects' and 'particulate mixing' should be considered, but that's
another story... even more complicated.
But in the end, it just results in better definition of window
weights.
Let's
try another dynamic window example.
Suppose you were looking for a good place for a fast food
restaurant. It should be on an existing
road (the automobile is king). It should
be close to those most prone to a 'Mac attack' (wealthy families with young
children). Armed with these criteria,
you begin your analysis. First you need
to build a data base containing information on roads and demographic
information. With any luck, the
necessary data are in the 'Tiger Files' available for your area (see the
several previous GIS World articles on this data source).
Now
all you need is a procedure that relates movement along roads from a location
to the people data-- a 'travel-time window'.
Based on the type of roads around a location, move the reach out ten
minutes in all directions. The result is
a spider-web-like window that reaches farther along fast roads than along slow
roads. A bit odd-shaped, but it's a
window no less. Now lay the window over
the demographic data to calculate the average income and number of children per
household. Assign your summary value
then move to the next location along the road.
When all locations have been considered, the ones with the highest
'yuppie indexes' are where candidate restaurant locations.
All
this may sound simple (ha!), but it's a different story when you attempt to
implement the theory. Several GIS
software packages will allow you to create 'dynamic weighted window 'maps. It's not a simple keystroke, but a complex
command 'macro.' Such concepts are
pushing at the frontier GIS. It's
currently the turf of the researcher.
Then again, GIS as you know it was just a glint in the researcher's eye
not so long ago. I bet you will 'do
windows' in your lifetime. It'll be fun.
_______________________________________