I spoke to one of the Spatial Product Managers to get this response:
Using spatial queries is definitely the better option, especially for long/lat coordinates, and is the only way if you want to search for more than points in a rectangle. For that however, we need actual geometry objects (in SDO_GEOMETRY). I am starting with this table:
create table t1 as
select
rownum id,
rpad(rownum,10) data,
dbms_random.value(-90,90) lat,
dbms_random.value(-180,180) lng
from
(select 1 from dual connect by level <= 1000),
(select 1 from dual connect by level <= 1000);
Then I expand that table with a geometry column:
alter table t1 add location sdo_geometry;
And I populate the geometries:
update t1 set location = sdo_geometry(2001,4326,sdo_point_type(lng,lat,null),null,null);
commit;
Notice the number 4326. This number is a spatial reference system ID (or SRID). It says that the coordinates are not just any random numbers, but that they are longitude and latitude in GPS coordinates (also known as WGS84 coordinates). This is extremely important. If we do not tell the database what coordinate system we are using, then we will not be able to to meaningful comparisons or measurements. This is especially true for spherical coordinates (long/lat). If we do not say that they are spherical, then any measurement (area, length, distances, ...) will be totally wrong. Also we will not be able to correlate them with other geographic shapes, in other coordinate systems.
Coordinate systems for spatial data are like time zones for time. If I call for a tele conference at 2PM and send that to my people in Moscow, Berlin, Tokyo and New York, all of them will dial-in at 2PM - and find no-one else: they dialed at 2PM in their time zone. I need to specify 2PM CET for example. Same for spatial coordinates: if I do not specify in which coordinate system they are expressed, I will be unable to compare them.
Coordinate systems come in two flavors. Those that are represented on the spherical representation of the globe (called geodetic systems). And where you may think there is only one universal long/lat system, there are actually hundreds of different long/lats, each one using a slightly different representation of the earth, adapted to local conditions. But when nothing is specified, then in 99.9% of the case, it means the long/lat coordinates are in the WGS84 system, the one used by the GPS (and the other global positioning system). Incidentally: the earth is NOT a sphere. It is really slightly flattened at the poles (due to its rotation) … Also there used to be some systems using a different meridian for the origin (other than Greenwich): some old french systems used the Paris meridian (obviously).
The other kind of coordinate systems are projections: the coordinates are meters from some origin. There are literally thousands of those (the oracle database has some 5000 of them). Each country has several because projections lose accuracy on large areas. Each US state has several projections. There are also old and new projections - Oracle provides functions to convert from one coordinate system to the other.
To use the spatial types, you need to define metadata:
insert into user_sdo_geom_metadata (table_name,column_name,diminfo,srid)
values (
'T1' ,
'LOCATION',
sdo_dim_array(
sdo_dim_element('Longitude',-180,180,0.05),
sdo_dim_element('Latgitude',-90,90,0.05)
),
4326
);
commit;
Where:
- T1 is the table name and LOCATION the name of the SDO_GEOMETRY column.
- The DIMINFO structure describes the bounds of each dimension. For long/lat this is always -180 to 180 and -90 to 90.
- The value 0.05 in this structure (the same for each dimension) represents the tolerance of the spatial data in that table. It is expressed in meters and represents the minimum distance between two points for being considered distinct. It is not very important when dealing with customer locations (since those are very approximate), but it is important when it comes to determine if two adjacent land parcels are really adjacent. Here 0.05 means 5cm. It also means that if two parcels are separated
create index t1_sx on t1 (location) indextype is mdsys.spatial_index;
That is actually a resource-consuming operation, quite CPU-intensive, more so for long/lat coordinates.
Now we can start doing some queries. To do some of them, we use a table that contains the shape of all countries worldwide. You can find it along with other tables via
SPATIAL_COUNTRIES and importing via traditional import. The imported tables are:
TABLE_NAME NUM_ROWS
------------------------------ ----------
US_CITIES 195
US_COUNTIES 3230
US_INTERSTATES 239
US_PARKS 6371
US_RASTERS 1
US_RIVERS 56
US_STATES 56
WORLD_CONTINENTS 111
WORLD_COUNTRIES 251
31 rows selected.
And then we can do some sample queries:
1) The original query:
select t1.id
from t1
where sdo_inside (
location,
sdo_geometry(
2003,4326,null,
sdo_elem_info_array(1,1003,3),
sdo_ordinate_array (
-75.78729387674332, 39.188920190130624,
-72.017976123256688, 42.091677809869374
)
)
) = 'TRUE';
2) Find all points within 20km from a given point:
select b.id
from t1 a, t1 b
where a.id = 42
and sdo_within_distance (b.location, a.location, 'distance=20 unit=km') = 'TRUE';
ID
----------
29373
42
954025
922187
4 rows selected.
3) How far are they ?
select b.id, sdo_geom.sdo_distance(a.location, b.location, 0.05, 'unit=m') distance
from t1 a, t1 b
where a.id = 42
and sdo_within_distance (b.location, a.location, 'distance=20 unit=km') = 'TRUE'
order by distance;
ID DISTANCE
---------- ----------
42 0
29373 10881.377
954025 12654.2977
922187 17152.4322
4 rows selected.
Note that point 42 is the closest - at a distance of 0 meters.
4) How many points are in Germany ?
select count(*)
from t1 p, world_countries w
where sdo_inside (p.location, w.geometry) = 'TRUE'
and w.cntry_name = 'Germany';
COUNT(*)
----------
716
1 row selected.
5) How many points in each country ?
select w.cntry_name, count(*)
from t1 p, world_countries w
where sdo_inside (p.location, w.geometry) = 'TRUE'
group by w.cntry_name
order by w.cntry_name;
CNTRY_NAME COUNT(*)
---------------------------------------- ----------
Afghanistan 970
Albania 51
Algeria 3315
Andorra 1
Angola 1572
Antarctica 81817
Antigua and Barbuda 1
...
Western Samoa 1
Yemen 533
Zaire 2901
Zambia 1023
Zimbabwe 499
203 rows selected.
6) Countries with the largest number of points
select w.cntry_name, count(*) npoints
from t1 p, world_countries w
where sdo_inside (p.location, w.geometry) = 'TRUE'
group by w.cntry_name
order by npoints desc
fetch first 5 rows only;
CNTRY_NAME NPOINTS
---------------------------------------- ----------
Antarctica 81817
Russia 45255
Canada 26235
United States 17071
China 14642
5 rows selected.
7) Points not in any country
select count(*)
from t1
where id not in (
select p.id
from t1 p, world_countries w
where sdo_inside (p.location, w.geometry) = 'TRUE'
);
COUNT(*)
----------
680527
1 row selected.
8) Points within a radius of 100km around Denver
select count(*)
from t1, us_cities c
where c.city = 'Denver'
and sdo_within_distance (t1.location, c.location, 'distance=100 unit=km') = 'TRUE';
COUNT(*)
----------
51
1 row selected.