Damir Vadas, Oracle as I learned: 2009.10

Thursday, October 29, 2009

Is you PC really fast...

Just to make it permanently. These days one thing is really fascinating me all the time.

First information come from a colleague of mine, Goran Cvijanović, CEO of VINTEH. Topic is about nVidia graphic cards based servers with TeraFLOPS CPU power. Core idea is to use nVidia graphic cards as main CPU. If one GeForce graphic card has cca. 200 Intel CPUs it is easy to imagine how much CPU power has server represented with 4 graphic cards in parallel (like on picture left). Such a server has 16GB RAM (4x4Gb each module) and you mount on it Oracle database and ... all you need to say is "Jaba Daba Duuu....".

All that for cca 9.000€ in total price....looks like a beautiful dream.

Sounds great... more read on Finance people will demand Gaming Cards in their PCs and nVidia GPU Technology Conference site where you can download some live presentations.

For those who has no time to read the article, let me cite one paragraph:
Here at Jedox we run TESLA hardware with 4 parallel GPUs (4 graphic cards op.a.) and 16 GB RAM in one server and it still scales almost linear. So this makes 20 x 4 x 5 = 400. A query that took 40 seconds to calculate on a CPU will be done in 0,1 seconds with GPU. Theoretically of course. Results in practice will be seen on CeBIT 2010.

Saturday, October 24, 2009

NOLOGGING facts and impacts

Another well-documented feature is Oracle’s NOLOGGING capability. Oracle’s NOLOGGING feature provides the ability to suppress the generation of redo for a subset of operations: Direct loader (SQL*Loader), direct-path inserts resulting from INSERT or MERGE statements and some DDL commands. Here they are in more descriptive order:

sqlldr (in direct mode)
CTAS (CREATE TABLE AS ...)
ALTER TABLE statements (move/add/split/merge partitions)
CREATE INDEX
INSERT /*+APPEND*/ ...
ALTER INDEX statements (move/add/split/merge partitions)

While Data Warehousing applications usually take advantage of NOLOGGING operations, 24x7 OLTP databases can benefit as well but with some caution.

In particular, most 24x7 applications still require maintenance windows to perform varied database activities as fast as possible. However, each maintenance window usually incurs at least a small outage to the business. Expeditious and well-planned changes are a necessity. The DML and data dictionary language (DDL) NOLOGGING features can dramatically reduce the time an Oracle scientist needs to perform the maintenance and substantially reduce the redo generated. If meticulous planning and care are given to address the recoverability issues the Oracle experts should give serious consideration to the NOLOGGING feature.

The performance gain and redo reduction benefits of NOLOGGING operations come at a sacrifice to recoverability. However, this can be quite dangerous if you need to roll-forward through this time period during a database recovery.

For example, if you invoke a sequence of direct-path inserts the affected data cannot be recovered using Oracle’s media recovery mechanisms. Following any NOLOGGING operation the data file(s) impacted by the NOLOGGING operation need to be backed up to ensure that the changes persist through media failure.

When use NOLOGGING operations there is one simple rule: Backup before and after! You must take a backup, both before and after all NOLOGGING operations. Keep in mind as well that it is not possible to roll forward through a point in time when an NOLOGGING operation has taken place.

This means very simple. If your backup was taken in time when NOLOGGING action occurred, then this backup will suffer from logical corruption (soft kind of error), what lead directly to restore problem in the future with more or less impact on your data!

FORCE_LOGGING parameter
From my previous EBS experience, the whole Oracle E-Business Suite 11.x and 12.x is masivelly using GTT (Global Temporary Tables) whose DDL creation has NOLOGGING option. This is done, as mentioned before, because of performance gain, so be aware that most of your backups would probably have bad blocks if you will be using EBS instance installed out of the box with no parameter changed!

However in some situations, NOLOGGING is not acceptable behavior. Beside backup problem this is a conditio sinne qua non for Oracle Streams and Data Guard (standby databases) where this option is not allowed. For those situations Oracle has provided init.ora parameter FORCE_LOGGING, which ensure that ALL operations are redo logged on most top level regardless underlying SQL statement source.

To check current value of FORCE_LOGGING perform:

SQL> SELECT force_logging FROM v$database;
FORC
----
NO

To enable force logging perform:

SQL> ALTER DATABASE force logging;
Database altered.

To explain more widely complexity of NOLOGGING problem in E-Business Suite, follow Metaling note 216211.1 "Nologging In The E-Business Suite" which gives you a complete "How to" steps you must provide after complete database recover if EBS instance was backup ed with FORCE_LOGGING=NO parameter.

The most frighten sentence comes at the very start of mentioned Metalink note:
We have made every effort to identify where these steps are required, and will update this note when we identify new uses of nologging. It is really a nightmare for every DBA if his production database dependent on updates of some document, even that document come from Oracle itself.

Checking of RMAN backup
If during restore/recover process you get any of next errors:

ORA-1578: ORACLE data block corrupted (file # %s, block # %s)
ORA-26040: Data block was loaded using the NOLOGGING option

then your backup is corrupted!

Normally, RMAN checks only for physically corrupt blocks, with every backup it takes and every image copy it makes. This is a common misunderstanding among a lot of DBAs. RMAN doesn't automatically detect logical corruption by default! We have to tell it to do so, implicitly by using option CHECK LOGICAL. Here is an example for that:

rman target /
run {
BACKUP VALIDATE CHECK LOGICAL DATABASE;
}

Corruption information are recorded in the V$DATABASE_BLOCK_CORRUPTION view. Ensure that MAXCORRUPT parameter (sets the total number of physical and logical corruptions permitted in a file) is set to enough large value.

Checking backups with dbverify
The primary tool for checking for corruption in an Oracle database is DBVERIFY. It can be used to perform a physical data structure integrity check on data files whether the database is online or offline.

dbv file=data01.dbf logfile=verify.log blocksize=8192 feedback=100

Combining CHECK LOGICAL and dbverify utility prevents DBA from unexpected backup results.

For the end mother of all backup thoughts-Test it in live from time to time!

Friday, October 23, 2009

Oracle 11g XE at least a year away

It may be "a year or two" before Oracle releases a no-cost Express Edition (XE) of its 11g database, according to Andrew Mendelsohn, the company's senior vice president of database server technologies. That's because Oracle is going to wait until after the first patch set ships for Oracle 11gR2, which was launched in July, Mendelsohn said in a brief interview following a speech at Oracle's OpenWorld conference in San Francisco on Monday.

Some Oracle database administrators believe there is a deliberate reason for the protracted rollout. "It's an approach that ensures that adoption is nil," said Paul Vallée, founder of the Pythian Group, a database management outsourcing company in Ontario, Canada. "I don't think they're interested in adoption.... I think they have to have it out there just for maybe a check box, just to maybe say they have a free edition."

It's unclear how the arrival of Sun's MySQL will affect new XE, or any other aspect of Oracle's database strategy, Vallée said.

Oracle plans to increase investment in MySQL, CEO Larry Ellison said.

All future plans for Oracle database incoming releases can be retrieved through Metalink note 742060.1 "Release Schedule of Current Database Releases"

Thursday, October 22, 2009

HROUG 2009

From 13.10-17.10, 14th HROUG conference took place in Rovinj, Croatia.

HROUG what is abbreviation for Croatian Oracle Users Group, is annual meeting of Oracle users in Croatia.This meeting has it's purpose in several ways. People can see and hear some new stuff from Oracle technology, exchange their experiences, extend acquaintances and in all means see in short time Oracle Croatia market.

This year we have several special guests, like Denes Kubiček, first Croatian Oracle ACE, Francisco Alvarez Munoz, president of Chile's OUG community, and some others famous Oracle experts from around the world.
After every day's expert lessons, evening was fun for all ages. Atmosphere, as it is every year, through all 5 days, was really fantastic. This is big compliment to spiritus movens of HROUG, Mr. Davor Ranković (president of the board), Mr. Vladimir Radić (vice president of the board) and all others involved in organization who has done their maximum.

Regardless economy crise in Croatia, there were enough events for all of us and we promised that we'll met next year in a much bigger number on the jubilant 15th annual HROUG meeting.

Tuesday, October 20, 2009

Enterpise backup strategy leaks

Every company has some kind of backup strategy regardless underlying technology. In Oracle world most of them are based on some enterprise archiving solution (Legato, Veritas etc.) which in core represent a powerful and robust backup solution. But no one of them will ensure and create backup as you need.

Oracle is really unbeatable platform for backup data in all means. It's features are well famous but here I'd like to accent some important but commonly made mistakes.

Any backup strategy without DRS (Disaster Recovery Site), which is very often in many small and mid range companies) has one crucial question: How much of data can company afford to loose? Lost of data are always present in such a situations and this answer determine many next steps and procedures which are crucial in defining backup strategy of the company at the very start.

At the begging let us clear up term "data loss problem". In it's core it means how often data has to be backup ed to ensure that unexpected lost of ANY production data will not be over wondered range (measured in time).What ever software vendor promise (including Oracle) that data are safe is 99.99% true. Most of experienced users can very easy demonstrate some kind of data loss, regardless underlying technology! Other thing is: How much is that possible and how to quantify that risk.

In praxis in Oracle, preventing "loss of data" means how often archive logs have to be backup ed on some other location so in a case of some huge disaster on production data, DBA can perform recover which could never be above mentioned data loss policy. The most common solution is implemented in a way that archived logs are sent to some tape/disk storage on a time policy. In most cases 2 hours are very reasonable timings for many companies around the world. In our case, where example script will be based on Legato backup, here is how to make that concept:

connect target backup_admin/backup_admin_pass@PROD_DB
connect catalog rman/rman_pass@RMAN_DB
run {
allocate channel t1 type 'SBT_TAPE';
allocate channel t2 type 'SBT_TAPE';
send 'NSR_ENV=(NSR_SERVER=backup_server.com,NSR_CLIENT=production_server.com)';
# 0.0840278 = 121 minutes!
backup archivelog FROM TIME = 'sysdate-0.0840278' filesperset 50 format '/arch1_%d_%u/';
release channel t1;
release channel t2;
}

If we put that script in Legato schedule (special kind of crontab for Legato solution) with restart policy on 120 minutes, 24x7, DBA ensure that all archived logs will be backup ed every 2 hours.

Even experienced user will not immediately noticed two problematic points in this looks like bullet proof concept:

Would that schedule fire every two hours?
What do we backup every two hours?

1) Would that schedule fire every two hours?
Immediate answer is: probably NOT!
In a case when tape/disk library backup is shared among other backups, this approach has a, I'll call it this way, "busy backup resource problem". This term manifests in a way that sometimes Oracle backup schedule can be postponed because in that time some other backup has already started so we face lack of free backup resources. If you keep in mind that ouor solution is time sensitive (every two hours has to be run!) it is easy to see that we have a serious problem.Solving this problem lead us to two kinds of solution:

Easiest implementation (and most frequent in praxis) is to create another schedule on daily basis which will backup all not backup ed archived logs. More frequent backup than once a day may lead to overlapping archived logs (backup storage space problem) or in a case of selective backup time consuming operation (RMAN catalog database may not be so fast) what lead us to beginning of the problem-busy backup resource problem.
However, problem of this implementation is that we still have a risk of not backup ed archive logs, but afaik, this is acceptable risk according time that last.
Here is a code for that:

connect target backup_admin/backup_admin_pass@PROD_DB
connect catalog rman/rman_pass@RMAN_DB
run {
allocate channel t1 type 'SBT_TAPE';
send 'NSR_ENV=(NSR_SERVER=backup_server.com,NSR_CLIENT=production_server.com)';
# 0.0840278 = 121 minutes!
backup NOT BACKED UP 1 times archivelog all filesperset 50 format '/arch1_%d_%u/';
release channel t1;
}

Enterprise solution would be to buy a dedicated backup solution only for this database or more "head device" (some kind of parallelism). With this no backup busy resource should happened any time!
But implementing this solution bring you next problem: continuity. If just one scheduled job is skipped (some problems in any IT resource during the day) you will be without some archived logs backup. So this bring you implement previous method because you have to be 100% sure to backup all archived log files and this cannot be done without sequential checking RMAN catalog database separately on some time basis whenever.

2) What do we backup every two hours?
This is interesting question so let me show an example what I meant by this. What will happened when (usually at night or during weekend days) there is no activities on database, so archive log switch doesn't occurs at all!
And this is main danger which could happened in your strategy because no archived log files are produced, so our retention policy is dangered regardless dedicated storage solution or not-RMAN has no redo log files to backup!
To solve that Oracle has a solution out of the box in way of one init.ora parameter ARCHIVE_LAG_TARGET which can fire archive log switch on time basis.From my past experience, setting this value to 899 seconds (slightly less then 15 minutes) will be satisfying for many companies. With this implementation you ensure that you risk of data loss is now smaller and more acceptable in all means of risk. For the end here is a script archived redo logs generation:

/* -----------------------------------------------------------------------------
Filename: rl.sql
CR/TR# :
Purpose : Archive log distribution per hours on each day ...
Date : 07.08.2007.
Author : Damir Vadas, funky_disco_freak@yahoo.com
Remarks : run as privileged user
Changes (DD.MM.YYYY, Name, CR/TR#):
--------------------------------------------------------------------------- */

set pagesize 120;
set linesize 200;
col day for a8;

spool rl.txt

PROMPT Archive log distribution per hours on each day ...

select
to_char(first_time,'YY-MM-DD') day,
to_char(sum(decode(substr(to_char(first_time,'HH24'),1,2),'00',1,0)),'999') "00",
to_char(sum(decode(substr(to_char(first_time,'HH24'),1,2),'01',1,0)),'999') "01",
to_char(sum(decode(substr(to_char(first_time,'HH24'),1,2),'02',1,0)),'999') "02",
to_char(sum(decode(substr(to_char(first_time,'HH24'),1,2),'03',1,0)),'999') "03",
to_char(sum(decode(substr(to_char(first_time,'HH24'),1,2),'04',1,0)),'999') "04",
to_char(sum(decode(substr(to_char(first_time,'HH24'),1,2),'05',1,0)),'999') "05",
to_char(sum(decode(substr(to_char(first_time,'HH24'),1,2),'06',1,0)),'999') "06",
to_char(sum(decode(substr(to_char(first_time,'HH24'),1,2),'07',1,0)),'999') "07",
to_char(sum(decode(substr(to_char(first_time,'HH24'),1,2),'08',1,0)),'999') "08",
to_char(sum(decode(substr(to_char(first_time,'HH24'),1,2),'09',1,0)),'999') "09",
to_char(sum(decode(substr(to_char(first_time,'HH24'),1,2),'10',1,0)),'999') "10",
to_char(sum(decode(substr(to_char(first_time,'HH24'),1,2),'11',1,0)),'999') "11",
to_char(sum(decode(substr(to_char(first_time,'HH24'),1,2),'12',1,0)),'999') "12",
to_char(sum(decode(substr(to_char(first_time,'HH24'),1,2),'13',1,0)),'999') "13",
to_char(sum(decode(substr(to_char(first_time,'HH24'),1,2),'14',1,0)),'999') "14",
to_char(sum(decode(substr(to_char(first_time,'HH24'),1,2),'15',1,0)),'999') "15",
to_char(sum(decode(substr(to_char(first_time,'HH24'),1,2),'16',1,0)),'999') "16",
to_char(sum(decode(substr(to_char(first_time,'HH24'),1,2),'17',1,0)),'999') "17",
to_char(sum(decode(substr(to_char(first_time,'HH24'),1,2),'18',1,0)),'999') "18",
to_char(sum(decode(substr(to_char(first_time,'HH24'),1,2),'19',1,0)),'999') "19",
to_char(sum(decode(substr(to_char(first_time,'HH24'),1,2),'20',1,0)),'999') "20",
to_char(sum(decode(substr(to_char(first_time,'HH24'),1,2),'21',1,0)),'999') "21",
to_char(sum(decode(substr(to_char(first_time,'HH24'),1,2),'22',1,0)),'999') "22",
to_char(sum(decode(substr(to_char(first_time,'HH24'),1,2),'23',1,0)),'999') "23",
COUNT(*) TOT
from v$log_history
group by to_char(first_time,'YY-MM-DD')
order by day
;
spool off;

This script reproduce rl.txt file in directory where sqlplus was started and here is an example of it with notice that this company was having ARCHIVE_LAG_TARGET=1799 (30 minutes). Result is intentionally reduced to be more readable (some hours are removed from original spool file):

DAY      00   01   02   03   04   05   06   07   20   21   22   23          TOT
-------- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ----------
09-09-27    5    2   64    2    2    2    2    2    2    2    2    2        113
09-09-28    4    2   63    2    2    2   16   29    2    2    3    2        294
09-09-29    5    2   65    2    2    2   23   44   43   33   34   37        895
09-09-30   41   27    3    2    2    2 142   44    2    2    2    2        373
09-10-01    4    3    3    2    2    2    2    2    2    2    2    2         57
09-10-02    4    3    3    2    2    2    2    2    2    2    2    2         52
09-10-03    4    2    3    2    2    2    2    2    2    2    2    2         51
09-10-04    4    2    2    2    2    2    2    2    2    2    2    2         50
09-10-05    4    2    2    2    2    2    2   15    2    2    2    2        546
09-10-06    4    2   83   12   33   37    2    2    2    2    2    2        207
09-10-07    4    2   93    3    2    2    2    2    2    2    2    2        142
09-10-08    4    2    3    2    2    2    2    2    2    2    2    2         51
09-10-09    4    2    3    2    2   54   41    2    2    2    2    2        142
09-10-10    4    2   80   15    2    2    2    2    2    2    2    2        141
09-10-11    5    2   94    2    2    2    2    2    2    2    2    2        143
09-10-12    4    2   93    2    2    2    2    2    2    2    2    2        142
09-10-13    4    2   82   13    2    2    2    7   14    3    2    2        257
09-10-14    4    2   95    3    2    2    2    2    2    2    2    2        144
09-10-15    4    2   94    4    2    2    2    2    2    2    2    2        144
09-10-16    5    2   95    3    2    2    2    2    2    2    3    2        146
09-10-17    5    3   96    2    2    2    2    2    2    2    2    2        146
09-10-18    4    2   95    2    2    2    2    2    2    2    2    2        143
09-10-19    4    2   96    2    2    2    2    2    2    2    4    2        148

As you can see at least two archived log are created each hour so we are sure that we can backup something!

Applying mentioned methods 1) and 2) in company backup strategy really reduce risk to it's minimuum.

For the end I'd like to express that there is no backup policy without testing it-so test it from time to time!!

Monday, October 19, 2009

Autonomous RMAN online backup

Regardless there are many topics written about RMAN and way of making a lot of variety of backup strategy (scenarios), for me a nice approach was a challenge to create autonomous consistent backup. "Autonomous consistent backup" stand for backup which could be run in any moment on any configuration, with 100% success its own consistency against parent database using its own files and no other external resource (like RMAN catalog for an example). This backup can be used for testing as well for some reporting purpose.

Autonomous consistent backup is typically created as a “point in time” process based on some periods (monthly, quarterly, yearly etc.) in a case when company data backup retention policy fail beyond supported time (company does not save so many backups in the past) and it is based on ad hock basis… regardless time of run or configuration of database.

Except condition that database must be in archive log mode, main problem is always the same: how to determine which archived redo log files should we backup along with data files to restore database from this backup set? Here is an example one way of treating this issue.

Even thought, we could find dynamically number of groups and its members, just to make easy of understanding the procedure I'll use fixed values. In my case I’ll show solving the problem with four groups with two members in each group (number of members are not important but only number of groups). This value is easy to get:

select distinct THREAD# from v$log order by 1;
THREAD#
----------
1
2
3
4

If you use RAC then you can get the same result through gv$log view in a way:

SELECT THREAD# FROM gv$log WHERE inst_id = THREAD#;

But I see no advantage in this query so probably the best way best way is to use v$log in all situations, whether is RAC or single instance database.

Basic idea of the whole example, which can be easy copied in bash script (same may be done on Windows as well with some changes in script), should be like this:
   1. Find all redo log’s minimum sequence numbers in moment before starting of backup
   2. Switch archive log
   3. Backup database
   4. Switch archive log
   5. Find all redo log’s maximum sequence numbers
   6. Backup archived log files created between min and max sequence numbers

1. Find all redo log’s minimum sequence# numbers in moment before starting of backup
Finding minimum sequence# is doen through easy sqlplus command:

select MIN(sequence#) from v$log where thread#=1;
select MIN(sequence#) from v$log where thread#=2;
select MIN(sequence#) from v$log where thread#=3;
select MIN(sequence#) from v$log where thread#=4;

In bash script all together for four threads looks like:

MIN1=`echo "select 'xX '||MIN(sequence#) from v\\$log where thread#=1;" | sqlplus -s "/ as sysdba" | grep "xX" | awk '{print $2}'`
MIN2=`echo "select 'xX '||MIN(sequence#) from v\\$log where thread#=2;" | sqlplus -s "/ as sysdba" | grep "xX" | awk '{print $2}'`
MIN3=`echo "select 'xX '||MIN(sequence#) from v\\$log where thread#=3;" | sqlplus -s "/ as sysdba" | grep "xX" | awk '{print $2}'`
MIN4=`echo "select 'xX '||MIN(sequence#) from v\\$log where thread#=4;" | sqlplus -s "/ as sysdba" | grep "xX" | awk '{print $2}'`

Interesting part is how easy is in bash script to read result from sqlplus prompt back and forth.

2-4 Switch archive log, backup database, switch archive log
Here is classic RMAN script with some difference from classic backup scripts at the begging and at the end:

rman target / nocatalog log=autonomous_backup.log
run {
sql "alter system archive log current";
backup database include current controlfile;
sql "alter system archive log current";
restore database preview;
}

Do not use other command then “alter system switch logfile” on RAC environment. Command "restore database preview" gives you comprehensive explanation about all necessary files and data for this backup-very useful for restore/recover part in the future.

5 Find all redo log’s maximum sequence# numbers
Now it easy to repeat step from step 1 with assigning values in different variables in a slightly changed script.

MAX1=`echo "select 'xX '||MAX(sequence#) from v\\$log where thread#=1;" | sqlplus -s "/ as sysdba" | grep "xX" | awk '{print $2}'`
MAX2=`echo "select 'xX '||MAX(sequence#) from v\\$log where thread#=2;" | sqlplus -s "/ as sysdba" | grep "xX" | awk '{print $2}'`
MAX3=`echo "select 'xX '||MAX(sequence#) from v\\$log where thread#=3;" | sqlplus -s "/ as sysdba" | grep "xX" | awk '{print $2}'`
MAX4=`echo "select 'xX '||MAX(sequence#) from v\\$log where thread#=4;" | sqlplus -s "/ as sysdba" | grep "xX" | awk '{print $2}'`

6 Backup archived log files created between min and max sequence# numbers
Now is RMAN archived log backup part where we use previously collected MINx and MAXx values:

rman target / nocatalog log=autonomous_backup_arch.log
run {
backup archivelog sequence between ${MIN1} and ${MAX1} thread 1;
backup archivelog sequence between ${MIN2} and ${MAX2} thread 2;
backup archivelog sequence between ${MIN3} and ${MAX3} thread 3;
backup archivelog sequence between ${MIN4} and ${MAX4} thread 4;
}

In all RMAN scripts I didn't wanted to place "allocate channel..." commands because this depends on your default RMAN configuration. Put it before all operation if you need it.

As you can see all scripts are run with nocatalog switch even thought it can run against the catalog as well. This solution works against RAC as well in non RAC database.

To make this script work put all together in one .sh file and run as ora_dba user with appropriate environment. If will be more interest for this topic, I can post bash script which will dynamically examine number of groups in for loop instead of (as in this example) fixed 4 values.

Subscribe to: Posts ( Atom )

Blog Archive

Zagreb u srcu!

Sign by Danasoft - Get Your Sign