The geoid, on the other hand, is an undulating surface, roughly where the sea level would be (accounting for local gravity variations, caused by density variations in the earth's crust, but not water variations), if the land didn't get in the way, referenced to an absolute gravity model. It is an gravitic equipotential surface.
The sea level, is another undulating surface, which is different than the geoid due to temperature, current, wind and salinity differences in the water. However, the sea level doesn't work on the land!
When we formally speak of vertical, the vertical we speak of is the orthometric height. This is an algebraic result of subtracting the geoid separation (difference in geoid and ellipsoid height from the Earth's centroid of mass at a given surface point) from the ellipsoid height.
If, on the other hand, one were to use the multi-frequency, geodetic quality GPS systems, one only uses the autonomous measurements to get into the ballpark for timing and position, then uses a network between multiple receivers located on well-surveyed sites where the cartesian coordinates are well-known (at least for the purposed of a given survey). Then, once again, one calculates the cartesian coordinates of all receivers in the survey, including the unknown point, creates a fairly rigid geometric network among the various points, and performs a least-squares adjustment to determine the parameters of x, y, and z for the point of interest. Beyond that, one generally transforms the coordinates from cartesian to one of the more conventional forms: geographic, state plane, transverse mercator, etc., for textual and graphical dissemination.
GPS determination of height using a commercial, consumer grade GPS is problematical for several reasons.
3d positions are calculated using the code-phase method, where one uses the pseudorange between satellite(s) and receiver at a given epoch to determine a position. If 4 satellites are in view, the procedure is straightforward. If more than 4 are in view, then most receivers "over-determine" the position using all combinations of 4 satellites from those in view to establish positions, and then perform yet-another-least-squares-solution to attempt to determine which satellite combination is best. This combination is tested periodically to make things better, but usually not tested each and every solution.
Using code-phase, however, allows for a larger error budget which contributes to the overall error of autonomous positioning. In general, to get good 3d positioning, one would want a satellite directly over head and 3 others below the horizon in a constellation similar to the structure of a statically-depicted carbon molecule. All 4 hydrogen bonds are at ~120 deg to each adjacent bond. Since GPS signals don't traverse dirt too well, that's impractical for surface-based receivers.
Since most satellites used in conventional consumer-grade hardware are selected for being somewhat above the horizon, accuracy of vertical determination suffers.
We've consistently seen horizontal accuracies of 6m or better ever since Selective Availability was switched off in 2000. However vertical accuracies of 10-20 meters are not uncommon because of A) the problem cited about constellation selection being inconsistent for vertical determinations... and B) the problem of using the increased error-budget laden code-phase solutions.
Using geodetic receivers, a good network for adjustment, long-period (4-8 hours) data acquisition, data decimation to remove autocorrelation effects, and careful postprocessing to achieve good solutions to submit to least squared adjustment, you can readily achieve 1cm horizontal and 3cm vertical accuracies. But you'll never do that with anything from the consumer product line, unfortunately.
Wikipedia Entry on WGS84 containing usful information.
Story of Seven Level Headed Scientists -- a humourous take on all this.
Modelling GPS Vertical Accuracy -- some actual experiments.