如何處理 Database 一直處於 Recovery 狀態及 Recovery 狀態－#007FFF

(1)遇到資料庫一直持續復原的情形是正常的現象，請參考以下的說明：

1. 如果正在執行的批次 T-SQL 指令為處理大量的資料列，那麼SQL Server為了資料庫的一致性及完整性，就會使用相對應的Transaction Log 大小進行處理交易資料

2. 當取消T-SQL指令或作業時，SQL Server會進行 Transaction Rollback 的動作，越大的資料量需要花更長的時間處理

3. Rollback 是無法干預或是將其加快的(除非其他的資料庫都先停用)，如果在 Rollback 的過程中遇到未預期的重新開機或是服務停止的動作，由於 Rollback 尚未完成，因此 SQL Server 在下次服務啟動並載入 Database 時，將會檢視 Transaction Log 的資訊。SQL Server 會將該 Commit 或是該 Rollback 的資料列一一的寫入資料庫檔案(MDF)中，或是取消套用的資料更新等動作。如果上次 Rollback 時並沒有完成所有的紀錄，那麼 Database Recovery 會繼續進行 Recovery 動作，直到所有交易紀錄處理完畢為止。

4. Database Recovery 的過程是無法干預及加速進行的

建議作法：

在服務重新啟動或是重新開機之前，請對於大型資料庫手動下達 CHECKPOINT 的 T-SQL 指令
如果有大量資料的匯入作業，建議設定 DTS/SSIS Package 元件中Batch Size (批次量大小)選項，避免讓批次量太大。如果遇到需要 Rollback 的狀況，那麼就不需要太久的時間復原。

(2)如果確認AP人員不需要在發生問題前的所有交易紀錄，或是AP人員的作業可以重新執行的話，那麼可以透過重建交易紀錄檔的方式，取消這些需要復原的資訊。以下為 Workaround 方法：

1) SQL Server 2005/2008 Rebuild Log Steps

1. Stop SQL Server service and add –T3608;-T4022 into the startup parameters from SQL Configuration Manager.
2. Rename the original LDF file name.
3. Restart SQL Server service. After SQL Server is started, the problematic database is no longer in “recovery” status.
4. Run the following command to put database in emergency mode:

Alter Database <DatabaseName> set emergency

5. Remove –T3608;-T4022 from the startup parameters using SQL configuration Manager. Restart SQL Service service.
6. After SQL Server is started, you will see the database status is “emergency”. Run “select db_id('<DatabaseName>') to get the database ID.
7. Run “select * from sysprocesses where dbid = <dbid>” to find out if there is any connection using the database now. If any, run “Kill <SPID>” command to kill the connections.
8. Run the following command to put database in emergency mode:

Alter Database <DatabaseName> set single_user with rollback immediate

9. Run the following command which will help to build a new LDF for us.

Alter database SPN_dashboard rebuild log on (NAME=logical file name ,FILENAME='file location')

10.After the LDF is rebuilt, run the command “DBCC CHECKDB(<DatabaseName>, repair_allow_data_loss)” again to ensure there is no more consistency error or allocation error in the database.

11. Run the following commands to bring the database back to normal status:

Alter Database E_Commerce set multi_user

Alter Database E_Commerce set online

Note : 指定 DBCC CHECKTABLE 修復發現的錯誤。資料庫必須是單一使用者模式，才能使用以下的修復選項之一。

值	說明
REPAIR_ALLOW_DATA_LOSS	執行所有由 REPAIR_REBUILD 所完成的修復工作，並包括資料列與分頁的配置及解除配置，以更正配置錯誤、結構化資料列或分頁錯誤，以及損毀文字物件的刪除。這些修復動作可能會讓部份資料遺失。修復動作可在某個使用者交易之下執行，好讓使用者可以復原所做的變更。若修復動作被復原，資料庫仍含有錯誤，必須再從備份資料還原。如果修復某個錯誤的動作因為所提供的修復層級不足而跳過，則依賴該修復的其他修復動作也跟著跳過。修復動作完成之後，請替資料庫做備份。
REPAIR_FAST	執行較小、不費時的修復動作，如修復非叢集索引中額外的索引鍵。這類修復可以很快完成，而且沒有資料遺失的危險。
REPAIR_REBUILD	執行 REPAIR_FAST 所完成的所有修復動作，並包括諸如重建索引等耗時的修復。這類修復也沒有資料遺失的危險。

(3)以下介紹 Database Recovery 的原理：

What is Undo or Redo ?

Before knowing these, it is necessary to know how a transaction work with the modification or how SQL Server handles a transaction.

Whenever a data modification request is received by the SQL Server, regardless of explicit or implicit transaction, all the underlying pages are loaded into the buffer cache. A series of log records are created for this transaction including page numbers for which the modifications are to be carried out, before stage and after stage of the modification. All these logs records are internally linked together. Then the modifications take effect in the pages loaded in cache.

After modifications are done in cache, if a Rollback request for this particular transaction is received then all the undo operations for this transaction are carried out from the Transaction Log records. This Roll Backward operation is called 'Undoing'.

Suppose a Commit request for this transaction is received, then first log records are written to log disk files, prior to the data pages that are modified in cache are written to the data disk files. The buffer manager ensures this. Write LOG Records first and then DATA Records into Disk Files. This mechanism is called as 'Write-Ahead-Log'(WAL).

After writing log records into the log file or files and before writing the modified data records into the data files, say suppose, SQL Server stopped due to some resource problems, then, SQL Server uses these log records to recover all the transactions that are marked as committed and not reflected in data, during the restart of SQL Server. This recovery is called Restart Recovery. This restart recovery is always done for all the databases of an instance when that instance of SQL Server is restarted. This restart recovery, which is doing Roll Forward of all transactions to the data files is called 'Re-doing'.

Checkpoints in Transaction Logs

Within a start and end of a complete transaction, we may use checkpoints or save points with the help of CHECKPOINT and SAVE TRANSACTION T-SQL statements, to store partially done transactions to write in disk files. These checkpoints may also internally triggered by SQL Server itself too. What a checkpoint does within the transaction?

Checkpoint is a SQL Server process that writing all modified data pages I buffer cache into disk files.It is also forces any pending transaction log records into log file. Performing Checkpoints reduces the recovery time of restart recovery, as it forced the transactions to written to log files and also writes the dirty pages into disk files. This process of Checkpoints minimize the Roll forward operations of Restore Recovery.

The Checkpoint Operation involves following steps.

Writing out all dirty pages into Data disk files.
Writing a list of active transactions to Transaction log.
Storing check point log records to Transaction Log.

Scope of the Checkpoints is the Database level. So the Checkpoint operation run only for the current database only.

Checkpoint occurs in the following cases.

Whenever we issue CHECKPOINT T-SQL Command for the current database.
Whenever SQL Server shuts down without option WITH NOWAIT. This checkpoint works for all the databases in that instance. WITH NOWAIT option skips the checkpoint.
Whenever a Data Files are added to or removed from a Database using ALTER DATABASE Command.
When Bulk copy operation or Select Into operation performed in a database for which ‘Bulk-logged’ Recovery model is set.
When a database’s recovery model is changed from Bulk-logged or FULL to SIMPLE.
For a Simple Recovery Model Database, if the size of the Transaction Log exceeds 70%.
When number of log entries exceeds the estimated amount of work required by the SQL Server's 'Recovery Interval' configuration.

I think, I covered utmost every aspect of Restart Recovery often called as Crash Recovery.In my next part of this post, I will write about the another type of Recovery - Restore Recovery

When a checkpoint operation occurs, no matter how it's triggered (for instance through a manual CHECKPOINT, from a database or differential backup, or automatically) the same set of operations occurs:

All dirty data file pages for the database are written to disk (all pages that have changed in memory since they were read from disk or since the last checkpoint), regardless of the state of the transaction that made the change.
Before a page is written to disk, all log records up to and including the most recent log record describing a change to that page are written to disk (yes, log records can be cached in memory too). This guarantees recovery can work and is called write-ahead logging. Log records are written to the log sequentially, and log records from multiple transactions will be interspersed in the log. The log cannot be selectively written to disk, so writing a dirty page to disk that only has a single log record affecting it may mean writing many more previous log records to disk as well.
Log records describing the checkpoint are generated.
The LSN of the checkpoint is recorded in the database boot page in the dbi_checkptLSN field (see Search Engine Q&A #20: Boot pages, and boot page corruption).
If in the SIMPLE recovery model, the VLFs in the log are checked to see whether they can be marked inactive (called clearing or truncating the log - both of which are terrible misnomers, as nothing is either physically cleared or truncated).

(4)如果資料庫的的交易紀錄檔過大，建議參考以下的說明及建議作法：

1. 以下 KB 說明為何 Transaction Log 快速增長的原因：

INF: Causes of SQL Transaction Log Filling Up
http://support.microsoft.com/kb/110139/en-us

Even with the "truncate log on checkpoint" option enabled, a number of factors can cause log space exhaustion. These are listed below:

A large atomic transaction, especially a bulk UPDATE, INSERT, or DELETE: Each single SQL statement is considered an atomic unit that must be applied or not applied in its entirety. For this reason, all row alterations must be logged, and the transaction cannot be truncated over its duration. For example, if a large bulk INSERT was issued that had a running time of five minutes, the log consumed by this transaction cannot be truncated for this period. The database administrator must provide sufficient log space for the largest bulk operation expected or must perform the bulk operation in smaller groups.

2. 請參考以下KB中提到的建議作法：

INF：SQL Server 中的 Autogrow 及 Autoshrink 設定考量
http://support.microsoft.com/?id=315512

如何避免SQL Server 資料庫的交易記錄檔超出預期大小
http://support.microsoft.com/?id=873235

執行 SQL Server 之電腦上的交易記錄檔意外地擴充或滿溢
http://support.microsoft.com/?id=317375

my007FFF

#007FFF

my007FFF 發表在痞客邦留言(0) 人氣()

E-mail轉寄

#007FFF

"Sir, I have learnt to stand up on my feet after having broken both my legs. This attitude has come with great difficulty. No sir, I can't. You may keep your job, and let me keep my attitude." by 3 idiots

如何處理 Database 一直處於 Recovery 狀態及 Recovery 狀態

歷史上的今天

留言列表

文章搜尋

My Links

文章分類

Infra (2)

Azure (13)

文章精選

最新文章

誰來我家

新聞交換(RSS)

熱門文章

最新留言

動態訂閱