2.2 ORACLE 11g

1

INTRODUCTION A Benchmark is the act of running a computer program, or a set of programs, or other operations, in order to assess the relative performance of an object, normally by running a number of standard tests and trials against it. The term benchmark is also mostly utilized for the purposes of elaborately-designed benchmarking programs themselves. There are primarily two kinds of benchmarking. One is the Functionality Benchmarking to test all the functionalities supported by the system and the other is the Performance Benchmarking which test the speed of the system. Without Benchmarks, making detailed comparisons between different systems concerning their functionality or performance is generally difficult. ;i=1;} { if(i==1) { str="cd " $1 ";"; str=str "/usr/local/pgsql/bin/shp2pgsql -s 4326 -I " $1 " tiger.feature |"; str=str "grep -v \"\\[\" | "; str=str "/usr/local/pgsql/bin/psql -d test;" "cd ..;"; } else { str="cd " $1 ";"; str=str "/usr/local/pgsql/bin/shp2pgsql -a -s 4326 " $1 " tiger.feature |"; str=str "grep -v \"\\[\" | "; str=str "/usr/local/pgsql/bin/psql -d test;" "cd ..;"; } print str>>"Generate ld.sh"; i++;

7 }' ; cp “copy ld.sh to the required folder” This program generates a new script file called ld.sh for each feature and copies it to the directory where that feature lies. The reason for not using a single script for loading is that if we run ld.sh in different terminals for each new feature, then multiple features can be loaded into the ( java -cp %class_path% oracle.spatial.util.SampleShapefileToJGeomFeature -h localhost -p 1521 -s spatial -u subham -d subham -t /*Feature_Name*/ -f %%G\%%G -i gid -r 4326 -g geom set var=1 ) ELSE ( java -cp %class_path% oracle.spatial.util.SampleShapefileToJGeomFeature -a -h localhost -p 1521 -s spatial -u subham -d subham -t /*Feature_Name*/ -f %%G\%%G -i gid -r 4326 -g geom ) ) To facilitate parallel ; char p2[]=" a,tiger.states b where b.statefp10=" ; char p3[]=" and ST_Intersects(a.the_geom,b.the_geom);\n"; char o1[]="select distinct a.fullname from "; char o2[]=" a,states b where b.statefp10="; char o3[]=" and sdo_anyinteract(a.geom,b.geom)='TRUE';\n"; int bs[]={0,0,0,0,0,0,0,0}; int bt[]={0,0,0,0,0}; int globc=0,cs=0,ct=0,rt,rs; int flag1=0,flag2=0; int s,t,i; char tp[4000],to[4000]; fp=fopen("postgres.sql","w"); fo=fopen("oracle.sql","w"); srand(time(NULL)); while(1) { t=rand()%4; if(bt[t]) { t=(t+1)%4; while(bt[t]==1) t=(t+1)%4; } bt[t]=1; strcpy(tp,p1); strcat(tp,tables[t]);

33 strcat(tp,p2); strcpy(to,o1); strcat(to,tables[t]); strcat(to,o2); ct++; if(ct==4) { for(i=0;i Seq Scan on states b (cost=0.00..3.30 rows=2 width=1065526) (actual time=0.020..0.050 rows=2 loops=1) Filter: ((statefp10)::text = '36'::text) Buffers: shared hit=2 -> Index Scan using areawater_the_geom_gist on areawater a (cost=0.00..16.43 rows=3 width=3288) (actual time=0.068..12.497 rows=2384 loops=2) Index Cond: (a.the_geom && b.the_geom) Buffers: shared hit=2111 read=9 Total runtime: 2948.711 ms Here Postgres takes only 3 secs whereas Oracle 11g takes about 1545/60=26 mins! to execute the same query. Hit Ratio for Postgres is 117295/(117295+67)=0.99 and in Oracle 11g it is (1-(17/(537+4)))=0.97

38

QUERY 2 Oracle 11g:

select distinct a.fullname from linearwater a,states b where b.statefp10='53' and sdo_anyinteract(a.geom, b.geom)='TRUE' call count cpu elapsed disk query current rows ------- ------ -------- ---------- ---------- ---------- ---------- ---------------Parse 1 0.00 0.00 0 0 0 0 Execute 1 0.00 0.00 0 0 0 0 Fetch 284 246.15 283.32 9315 194261 4 4245 ------- ------ -------- ---------- ---------- ---------- ---------- ----------------total 286 246.15 283.32 9315 194261 4 4245 Misses in library cache during parse: 0 Optimizer mode: ALL_ROWS Parsing user id: 92 (SUBHAM) Rows Row Source Operation ------- --------------------------------------------------4245 HASH UNIQUE (cr=202941 pr=11311 pw=0 time=1929 us cost=3385 size=15018752 card=110432) 196428 NESTED LOOPS (cr=202941 pr=11311 pw=0 time=47318560 us cost=3 size=15478704 card=113814) 2 TABLE ACCESS FULL STATES (cr=7 pr=0 pw=0 time=35 us cost=3 size=188 card=2) 196428 TABLE ACCESS BY INDEX ROWID LINEARWATER (cr=202934 pr=11311 pw=0 time=47212960 us cost=3 size=2390094 card=56907) 196428 DOMAIN INDEX LINEARWATER_IDX (cr=13730 pr=2022 pw=0 time=160619 us cost=0 size=0 card=0) Rows Execution Plan ------- --------------------------------------------------0 SELECT STATEMENT MODE: ALL_ROWS 4245 HASH (UNIQUE) 196428 NESTED LOOPS 2 TABLE ACCESS MODE: ANALYZED (FULL) OF 'STATES' (TABLE) 196428 TABLE ACCESS MODE: ANALYZED (BY INDEX ROWID) OF 'LINEARWATER' (TABLE) 196428 DOMAIN INDEX OF 'LINEARWATER_IDX' (INDEX (DOMAIN))

Postgres: explain (analyze,buffers) select distinct a.fullname from tiger.linearwater a, tiger.states b where b.statefp10='53' and ST_Intersects(a.the_geom,b.the_geom);

39 QUERY PLAN --------------------------------------------------------------------------------------------------------------------------------HashAggregate (cost=436.67..467.87 rows=3120 width=12) (actual time=97567.370..97570.036 rows=4243 loops=1) Buffers: shared hit=3796624 read=46038 -> Nested Loop (cost=0.00..254.28 rows=72958 width=12) (actual time=49.342..97008.239 rows=196368 loops=1) Join Filter: _st_intersects(a.the_geom, b.the_geom) Buffers: shared hit=3786530 read=45712 -> Seq Scan on states b (cost=0.00..3.30 rows=2 width=1065526) (actual time=0.036..0.072 rows=2 loops=1) Filter: ((statefp10)::text = '53'::text) Buffers: shared hit=1 read=1 -> Index Scan using linearwater_the_geom_gist on linearwater a (cost=0.00..118.14 rows=28 width=3380) (actual time=19.515..7159.395 rows=105391 loo ps=2) Index Cond: (a.the_geom && b.the_geom) Buffers: shared hit=92110 read=42174 Total runtime: 97632.240 ms Here Postgres took about 98 secs whereas Oracle 11g took 246 secs to process the same query. Hit Ratio in Postgres is 3796624/(3796624+46038)=0.98 whereas in Oracle 11g it is (1-(9315/(194261+ 4)))=0.95

QUERY 3 Oracle 11g: select distinct a.fullname from pointlm a,states b where b.statefp10='48' and sdo_anyinteract(a.geom,b.geom)='TRUE' call count cpu elapsed disk query current rows ------- ------ -------- ---------- ---------- ---------- ---------- -- ---------------Parse 1 0.00 0.00 0 0 0 0 Execute 1 0.00 0.00 0 0 0 0 Fetch 751 245.60 246.47 482 33731 4 11236 ------- ------ -------- ---------- ---------- ---------- ----------------------------total 753 245.60 246.47 482 33731 4 11236 Misses in library cache during parse: 0 Optimizer mode: ALL_ROWS Parsing user id: 92 (SUBHAM)

40 Rows Row Source Operation ------- --------------------------------------------------11236 HASH UNIQUE (cr=35283 pr=889 pw=0 time=3874 us cost=490 size=2187648 card=13504) 59168 NESTED LOOPS (cr=35283 pr=889 pw=0 time=203300368 us cost=3 size=2187648 card=13504) 2 TABLE ACCESS FULL STATES (cr=7 pr=0 pw=0 time=25 us cost=3 size=188 card=2) 59168 TABLE ACCESS BY INDEX ROWID POINTLM (cr=35276 pr=889 pw=0 time=202847184 us cost=3 size=459136 card=6752) 59168 DOMAIN INDEX POINTLM_IDX (cr=7290 pr=603 pw=0 time=200291952 us cost=0 size=0 card=0) Rows Execution Plan ------- --------------------------------------------------0 SELECT STATEMENT MODE: ALL_ROWS 11236 HASH (UNIQUE) 59168 NESTED LOOPS 2 TABLE ACCESS MODE: ANALYZED (FULL) OF 'STATES' (TABLE) 59168 TABLE ACCESS MODE: ANALYZED (BY INDEX ROWID) OF 'POINTLM' (TABLE) 59168 DOMAIN INDEX OF 'POINTLM_IDX' (INDEX (DOMAIN))

Postgres: explain (analyze,buffers) select distinct a.fullname from tiger.pointlm a,tiger.states b where b.statefp10='48' and ST_Intersects(a.the_geom,b.the_geom); QUERY PLAN --------------------------------------------------------------------------------------------------------------------------------HashAggregate (cost=59.44..146.01 rows=8657 width=14) (actual time=370445.935..370452.941 rows=11236 loops=1) Buffers: shared hit=12062959 read=1309 -> Nested Loop (cost=0.00..37.80 rows=8657 width=14) (actual time=535.327..370094.733 rows=59168 loops=1) Join Filter: _st_intersects(a.the_geom, b.the_geom) Buffers: shared hit=12062959 read=1309 -> Seq Scan on states b (cost=0.00..3.30 rows=2 width=1065526) (actual time=0.042..0.091 rows=2 loops=1) Filter: ((statefp10)::text = '48'::text) Buffers: shared read=2 -> Index Scan using pointlm_the_geom_gist on pointlm a (cost=0.00..16.46 rows=3 width=114) (actual time=14.479..857.048 rows=49394 loops=2) Index Cond: (a.the_geom && b.the_geom) Buffers: shared hit=10820 read=1066 Total runtime: 370459.456 ms

41 Here Postgres takes 370/60=6 mins whereas surprisingly here Oracle 11g takes only 246/60=4 mins to execute the same query. Hit Ratio in Postgres is 12062959 /(12062959+1309)=0.99 and in Oracle 11g it is (1 – 482/(33731+4))=0.99

QUERY 4 Oracle 11g: select distinct a.fullname from arealm a,states b where b.statefp10='06' and sdo_anyinteract(a.geom,b.geom)='TRUE' call count cpu elapsed disk query current rows ------- ------ -------- ---------- ---------- ---------- -------------------------Parse 1 0.00 0.00 0 0 0 0 Execute 1 0.00 0.00 0 0 0 0 Fetch 368 126.60 127.35 766 15705 4 5501 ------- ------ -------- ---------- ---------- ---------- -------------------------total 370 126.60 127.35 766 15705 4 5501 Misses in library cache during parse: 0 Optimizer mode: ALL_ROWS Parsing user id: 92 (SUBHAM) Rows Row Source Operation ------- --------------------------------------------------5501 HASH UNIQUE (cr=18013 pr=1056 pw=0 time=1571 us cost=92 size=390897 card=2313) 15910 NESTED LOOPS (cr=18013 pr=1056 pw=0 time=3170854 us cost=3 size=390897 card=2313) 2 TABLE ACCESS FULL STATES (cr=7 pr=0 pw=0 time=43 us cost=3 size=188 card=2) 15910 TABLE ACCESS BY INDEX ROWID AREALM (cr=18006 pr=1056 pw=0 time=3140946 us cost=3 size=86700 card=1156) 15910 DOMAIN INDEX AREALM_IDX (cr=3208 pr=330 pw=0 time=16413 us cost=0 size=0 card=0) Rows Execution Plan ------- --------------------------------------------------0 SELECT STATEMENT MODE: ALL_ROWS 5501 HASH (UNIQUE) 15910 NESTED LOOPS 2 TABLE ACCESS MODE: ANALYZED (FULL) OF 'STATES' (TABLE) 15910 TABLE ACCESS MODE: ANALYZED (BY INDEX ROWID) OF 'AREALM' (TABLE) 15910 DOMAIN INDEX OF 'AREALM_IDX' (INDEX (DOMAIN))

42

Postgres: explain (analyze,buffers) select distinct a.fullname from tiger.arealm a,tiger.states b where b.statefp10='06' and ST_Intersects(a.the_geom,b.the_geom); QUERY PLAN --------------------------------------------------------------------------------------------------------------------------------HashAggregate (cost=24.12..38.95 rows=1483 width=18) (actual time=64477.008..64480.448 rows=5501 loops=1) Buffers: shared hit=432812 read=1398 -> Nested Loop (cost=0.00..20.42 rows=1483 width=18) (actual time=52.056..64350.055 rows=15910 loops=1) Join Filter: _st_intersects(a.the_geom, b.the_geom) Buffers: shared hit=427131 read=1349 -> Seq Scan on states b (cost=0.00..3.30 rows=2 width=1065526) (actual time=0.013..0.053 rows=2 loops=1) Filter: ((statefp10)::text = '06'::text) Buffers: shared hit=2 -> Index Scan using arealm_the_geom_gist on arealm a (cost=0.00..8.30 rows=1 width=3941) (actual time=9.024..74.826 rows=8237 loops=2) Index Cond: (a.the_geom && b.the_geom) Buffers: shared hit=5842 read=870 Total runtime: 64495.677 ms Here Postgres takes only 65 secs whereas Oracle 11g takes about 127 secs to process the same query, Hit Ratio in Postgres is 432812/(432812+1398)=0.99 and in Oracle 11g it is (1-766/(15705+4))=0.95

43

5. CONCLUSION & SUMMARY From the experimental results that we saw, we can conclude that Postgres performs better than Oracle 11g both in the Cold Phase and Warm Phase. Though in few queries Oracle 11g performed better but on the whole Postgres overpowered Oracle 11g. In the warm phase in 3 out of 4 queries Postgres performed significantly well, from this we can conclude that Postgres has better automatic memory management capabilities and page replacement policies. May be Oracle 11g needs more tuning to perform better. And also in the Cold Phase, Postgres performs significantly well except in few cases such as in the Adjacency Operation (TOUCH). Since Postgres uses the underlying GEOS(Geometry Engine - Open Source) library functions for implementing the geometric operations whereas Oracle 11g implements them on its own, and since in majority Postgres performs well, we can conclude that GEOS geometric algorithms are more efficiently designed than Oracle 11g. And also Postgres planner is more efficiently designed to take advantage of any available indexes to use in queries for achieving better performance whereas in Oracle 11g we saw that we have to specify them explicitly through functions. On the whole it is the open-source that wins the game! The next section briefly summarizes the queries and gives the bar graphs for the two phases separately.

5.1 COLD PHASE QUERIES 5.1.1 Contains Properly: a) Polygon/Point Query 1: Retrieve all the Hospitals located inside the state District_of_Columbia b) Polygon/Line Query 2: List all the linear water bodies in the state District_of_Columbia c) Polygon/Polygon Arealm/States

Query 3: Retrieve all Shopping Center located in the state California Areawater/States

Query 4: Retrieve all the Lakes/Ponds in the state District_of_Columbia

5.1.2 Adjacent Operation: a) Polygon/Polygon Query 5: Retrieve all the states which are adjacent to Nebraska b) Polygon/Line Query 6: Retrieve all polygons from arealm which are adjacent to Yukon River c) Line/Line Query 7: Retrieve all the tributaries of Yukon River

44

5.1.3 Intersect Operation: a) Polygon/Line Small Query 8: Retrieve all the interstate connector roads in the state California b) Polygon/Line Large Query 9: Retrieve all roads which go through forest Since the values range from 0.32 secs to 687 secs, so instead of bar graphs we have given Percent Bar Graph showing the percentage of time or the ratio of time taken by the two databases:

COLD PHASE 100%

90%

80%

PERCENTAGE OF TIME

70%

60%

50%

Oracle 11g Postgres

40%

30%

20%

10%

0% Query 1 Query 2 Query 3 Query 4 Query 5 Query 6 Query 7 Query 8 Query 9

QUERIES

45

5.2 WARM PHASE QUERIES a) Polygon/Polygon Large Query 1: Retrieve all the areawater bodies which intersect with the state New York having stateid 36 b) Polygon/Line Query 2: Retrieve all the linearwater features which intersect with the state Washington having stateid 53 c) Polygon/Point Query 3: Retrieve all the point features which intersect with the state Texas having stateid 48 a) Polygon/Polygon Query 4: Retrieve all the area landmarks which intersect with the state California having stateid 06

WARM PHASE 100% 90%

PERCENTAGE OF TIME

80% 70% 60% 50%

Oracle 11g Postgres

40% 30% 20% 10% 0% Query 1

Query 2

QUERIES

Query 3

Query 4

46

REFERENCES [1] Norman W. Paton, M. Howard Williams, Kosmas Dietrich, Olive Liew, Andrew Dinn, Alan Patrick, “VESPA: A benchmark for vector spatial databases,” Lecture Notes in Computer Science, Vol. 1832/2000, pp.81-101, 2001. [2] Stonebraker M., Frew J., Gardels K., and Meredith J.1993. The SEQUOIA 2000 Storage Benchmark, Proc. ACM SIGMOD Int. Conf. on Management of Data, Washington, DC, 1993, pp. 2-11. [3] PostGIS 1.5.2SVN Manual [4] Oracle® Spatial Developer's Guide 11g Release 1 (11.1) [5] Oracle® Database Concepts 11g Release 1 (11.1) [6] Oracle® Database Performance Tuning Guide 11g Release 2 (11.2) [7] 2010 TIGER/Line Shapefiles Technical Documentation. [8] PostgreSQL 9.0.4 Documentation: Appendix F. Additional Supplied Modules: F.24. pg_buffercache [9] PostgreSQL 9.0.4 Documentation: Chapter 18. Server Configuration [10] PostgreSQL 9.0.4 Documentation: EXPLAIN [11] PostgreSQL 9.0.4 Documentation: ANALYZE [12] Chapter 4. Using PostGIS: Data Management and Queries [13] EXPLAINING EXPLAIN EnterpriseDB [14] Inside the PostgreSQL Shared Buffer Cache by Greg Smith [15] http://pgfoundry.org/projects/pgfincore [16] http://www.adp-gmbh.ch/ora/concepts/sga.html [17] http://wiki.postgresql.org/wiki/VACUUM_FULL [18] http://wiki.postgresql.org/wiki/Introduction_to_VACUUM,_ANALYZE,_EXPLAIN,_and_COUNT