Brian -- Thanks for the question regarding "Partitioning question", version 8.1.7
Submitted on 12-Oct-2001 13:16 Central time zone
Last updated 8-Oct-2011 6:53
You Asked
Let's say we have a table of documents with key of doc_id. This table contains millions of documents(rows). There is also a table that assigns one or more industry codes to each doc: create table doc_ind ( doc_id integer, ind_code varchar2(10) ) This table will therefore also contain millions of rows. The industry code is a part of the default set of industry codes used by our business. Now let's say that we sign deals with clients that want to use their own industry classifications. I don't want to create new tables for each client (there could eventually be hundreds of clients and therefore hundreds of new tables) and have some scheme where logic is built into the application to derive the correct table_name to query based on the client. My first thought is to create a table like: create table doc_client_ind ( client_id varchar2(10), doc_id integer, ind_code varchar2(10) ) The client id could be any value - usually resembling the client name. Because each new client would add millions of rows to this table, I want to use partitioning. After reading up on partitioning in your book, my situation does not seem to fit perfectly into standard scenarios. At first, it seemed I would need to use hash partitioning. However, the number of distinct client_id values in this table will be very small to start with and grow over time. It is not clear how I would easily manage hash partitioning well over time. In order to use range partitioning, it seems to me that client_id's would need to be assigned with values that have a natural ascending sort value (i.e., c0001, c0002, etc.). Then I could have one partition per client and create them on an ongoing as needed basis. I have three questions: 1. Are there other options (with or without partitioning) I don't see to manage this? 2. Is there a way to manage this effectively using the original essentially random client id values? 3. What do you suggest given the limited info above? 4. Given your suggestion in question 3, what are the disadvantages, caveats I need to be aware of? Thanks - Brian
and we said...
Well, in 9i, there is LIST partitioning as well, that maybe something to consider. Here each client_id could be targeted to a specific partition. If you just want to achieve a nice distribution of data across many disks/partitions, you could hash right now on DOC_ID. There will be millions of unique values for that from the get go -- that will hash very nicely. Since you'll always join on doc_id and client_id, partition elimination will take place just as effectively. So, either look at 9i with list partitioning if you feel you would gain some advantage to partition by client_id (one would be that if you drop a client, gettng rid of their data is trivial -- drop the partition). Or, hash partition on the doc_id to achieve a good overall spread of data.
I wanted to know if client id was a small domain, something manageable. If not, they would not be able to range partition by it as it would be an ongoing job just to split and partition the table. 50 -- they could do it. 500,000 -- they would have a hard time at it (but hash partitioning would start to come into play). values in between with ranges/hashes -- not practical at some point. Index key compression is a feature whereby repeating fields on the leading edge of the index are removed from the leaf block row entries. Compresses the data. If you create an index on (owner,object_type,object_type) on a copy of all_objects - with compress 2, we would store the owner, object type ONCE per leaf block, rather then once per ROW per leaf block -- leading to significant savings in space.
couple of questions for you, 1.What if the column that i'm partitioning contains null entries ? is this allowed ? how will oracle know where to place the data ? 2. in Page 620 of the expert one on one book an error 01502 was raised because the local_nonprefixed index was unusable why ? are "queries" on Partitioned Tables (PT) index dependent ? 3. say i have a table create table detail ( cust_id number , cust_balance number , some other columns..... ) and cust_id has 65 distinct values over 7 million records. Will partitioning hash ( cust_id ) be of any use assuming that the Front-End apps issues out a dynamic query that will not use any indexed columns on any of its predicate ? one advantage that i see in using PT's is data will be forced group by partition.
Followup June 27, 2003 - 9am Central time zone:
1) yes allowed, it'll put them where it "wants". ops$tkyte@ORA920> create table t 2 ( 3 object_id number , 4 object_type varchar2(18) 5 ) 6 partition by hash(object_id) partitions 8 7 / Table created. ops$tkyte@ORA920> ops$tkyte@ORA920> insert into t select object_id, object_type from all_objects; 31589 rows created. ops$tkyte@ORA920> insert into t select NULL, 'xxx' from all_users; 47 rows created. ops$tkyte@ORA920> ops$tkyte@ORA920> declare 2 l_cnt number; 3 begin 4 for x in ( select partition_name 5 from user_tab_partitions 6 where table_name = 'T' ) 7 loop 8 execute immediate 9 'select count(*) 10 from t partition(' || x.partition_name || ')' 11 into l_cnt; 12 dbms_output.put_line 13 ( x.partition_name || ' ' || l_cnt || ' rows...' ); 14 15 execute immediate 16 'select count(*) 17 from t partition(' || x.partition_name || ') 18 where object_id is null' 19 into l_cnt; 20 dbms_output.put_line 21 ( x.partition_name || ' ' || l_cnt || ' NULL rows...' ); 22 end loop; 23 end; 24 / SYS_P347 3965 rows... SYS_P347 47 NULL rows... SYS_P348 3948 rows... SYS_P348 0 NULL rows... SYS_P349 4042 rows... SYS_P349 0 NULL rows... SYS_P350 3930 rows... SYS_P350 0 NULL rows... SYS_P351 3868 rows... SYS_P351 0 NULL rows... SYS_P352 3895 rows... SYS_P352 0 NULL rows... SYS_P353 4072 rows... SYS_P353 0 NULL rows... SYS_P354 3916 rows... SYS_P354 0 NULL rows... PL/SQL procedure successfully completed. for example -- they all went into the "first" partition in that case. 2) page 640 maybe? it was raised because a) the index was unusable b) the query plan said "i need to read that index" queries against partitioned tables are no more or less "index dependent" then queries against unpartitioned tables. same thing would happen against a "normal" table if its indexes were unusable and query plan said "i want to use that index" 3) 65 is too small in all likelyhood, unless you are really lucky. It would be far too easy to have this happen: ops$tkyte@ORA920> select count(distinct object_id) from t; COUNT(DISTINCTOBJECT_ID) ------------------------ 65 ops$tkyte@ORA920> ops$tkyte@ORA920> declare 2 l_cnt number; 3 begin 4 for x in ( select partition_name 5 from user_tab_partitions 6 where table_name = 'T' ) 7 loop 8 execute immediate 9 'select count(*) 10 from t partition(' || x.partition_name || ')' 11 into l_cnt; 12 dbms_output.put_line 13 ( x.partition_name || ' ' || l_cnt || ' rows...' ); 14 end loop; 15 end; 16 / SYS_P395 65 rows... SYS_P396 0 rows... SYS_P397 0 rows... SYS_P398 0 rows... SYS_P399 0 rows... SYS_P400 0 rows... SYS_P401 0 rows... SYS_P402 0 rows... PL/SQL procedure successfully completed. All 7 million rows could be in the same hash partition. You would use range or list instead.