Basic Performance Investigation


When we have to go through a performance issue, the first thing that we have to do is gathering all the information as possible from the user that is reporting the issue. Many times, the source of the problem is on the user environment and we have to ensure that the problem is a real performance problem before starting the investigation. In order to gather information, we need to ask to the user as many question as we can build approaching meaningful information (like error messages, actions performed, etc) about the issue and if needed, we can ask the user to share their desktop to understand at all the user problem.

 

            When we’re sure there is server performance issue, we have to focus our efforts in 3 factors: CPU, Memory and Disk. These 3 factors are the principals for SQL Server performance and they’re the source for the possible bottlenecks. In order to monitor these factors, we need to connect to Server and execute the Performance Counters Monitor (perfmon.exe). We can gather the server infrastructure information from command prompt typing systeminfo; this information will show us System type, Operating System type, Processors number and other server related information. Also we can to query the processor architect by typing set processor_architect on the command prompt.

            Talking about the Performance Counters Monitor, we can invoke it by Run…->perfmon. SQL Server provides predefined counters for the 64 bit version systems. For the 32 bit SQL Server version, we have to execute the monitor in 32 bit mode by typing MMC /32 PERFMON.MSC. The counters that we have to monitor are the follow:

The counters that we need to select to analyze are:

For MEMORY we need to select the following counters:

  1. Available Mbytes – The Available MBytes performance counter shows the amount of physical memory in megabytes (MB) immediately available for allocation to a process or for system use.

 

  1. Pages/sec is the number of pages read from the disk or written to the disk to resolve memory references to pages that were not in memory at the time of the reference. This is the sum of Pages Input/sec and Pages Output/sec. The counter value should be close to zero. If it is more than zero, it indicates that virtual memory is being used to serve the request.

 

For SQL SERVER BUFFER MANAGER we need to select the following counters:

  1. Buffer Cache Hit Ratio – The Buffer Cache Hit Ratio counter is specific to an application. However, a rate of 90 percent or higher is desirable. Add more memory until the value is consistently greater than 90 percent. A value greater than 90 percent indicates that more than 90 percent of all requests for data were satisfied from the data cache. If it is lower than expected, then we can set the value in SQL Server. To view the current configuration, we can use sp_configure ‘max server memory’ in Query analyzer. It displays the config value and run value. To set  the config value to a new value, we need to run sp_configure ‘max server memory’,5120. But it will not set the run value. To set the run value run Reconfigure with override in query analyzer.

 

  1. Checkpoint Pages/sec – Number of pages flushed to disk per second by a checkpoint or other operation that require all dirty pages to be flushed. It should be close to zero as well.

 

  1. Page Life Expectancy – Number of seconds a page will stay in the buffer pool without references. If the value is very low, it means that the server would go to disk more often to fetch the data from disk.

 

  1. Target Pages – Ideal number of pages in the buffer pool.

 

  1. Total Pages – Number of pages in the buffer pool.

 

Ideally Target and Total pages should be same.

For PHYSICAL DISK we need to select the following counters. When we’re monitoring disk counter, we have to select only those disk where the data files and log files for the database resides. In all these counters, a high value means too many disk operations and high response times could mean disk problems.

  1. Avg. Disk Queue Length – Shows the average number of both read and writes requests that were queued for the selected disk during the sample interval.

 

  1. Avg. Disk Bytes/Write – Shows the average number of bytes that were transferred to the disk during write operations.

 

  1. Avg. Disk Bytes/Read – Shows the average number of bytes that were transferred from the disk during read operations.

 

  1. Disk Reads/Sec – Shows the rate, in incidents per second, at which read operations were performed on the disk.

 

  1. Disk Writes/Sec – Shows the rate, in incidents per second, at which write operations were performed on the disk.

 

For PROCESS we need to select the following counters for sqlservr application.

  1. %Privileged Time – Shows the percentage of non-idle processor time spent executing code in privileged mode. Privileged mode is a processing mode designed for operating system components and hardware-manipulating drivers. It allows direct access to hardware and memory. The operating system switches application threads to privileged mode to access operating system services.  When a Windows system service is called, the service often runs in privileged mode to gain access to system-private data.

 

  1. %Processor Time – Shows the percentage of time that the processor spent executing a non-idle thread. It is calculated by measuring the duration that the idle thread is active during the sample interval, and subtracting that time from 100 %. (Each processor has an idle thread that consumes cycles when no other threads are ready to run.) This counter is the primary indicator of processor activity, and displays the average percentage of busy time observed during the sample interval. Code executed to handle some hardware interrupts and trap conditions are included in this count.

 

  1. %User Time – Shows the percentage of time that the processor spent executing code in user mode. Applications, environment subsystems, and integral subsystems execute in user mode. Code executing in user mode cannot damage the integrity of the Windows Executive, kernel, and/or device drivers.

 

For PROCESSOR we need to select the following counters:

  1. %Privileged Time – Shows the percentage of elapsed time that this thread spent executing code in privileged mode. When a Windows Server 2003 family operating system service is called, the service often runs in privileged mode in order to gain access to system-private data. Such data is protected from access by threads executing in user mode. Calls to the system can be explicit or implicit, such as page faults and interrupts.

 

  1. %Processor Time – Shows the percentage of elapsed time that this thread used the processor to execute instructions. An instruction is the basic unit of execution in a processor, and a thread is the object that executes instructions. Code executed to handle some hardware interrupts and trap conditions is included in this count.

 

  1. %User Time – Shows the percentage of elapsed time that this thread spent executing code in user mode. Applications, environment subsystems, and integral subsystems execute in user mode. Code executing in user mode cannot damage the integrity of the Windows Server 2003 Executive, Kernel, and device drivers.

 

            Another tool that we have to use in order to find the source of a performance problem is the SQL Profiler. The SQL Profiler can be launched from the MS Management Studio under the Tools menu or on the path Start/All programs/Microsoft SQL Server/Performance Tools/SQL Profiler. The purpose of this tool is to provide us the set of operations that is performed when the bottlenecks or deadlocks appear. With this tool, we can create a trace file for post-analysis and reproduction of the facts. When we’re using SQL Profiler, we need to focus on gathering few event classes that are useful because with a lot of information we can get confused. We should ensure that we NEVER RUN THE TOOL ON THE SERVER, because we’re going to consume server resources and we’ll impact on the performance. Rather than, we can run the tool from another PC and then connect to the server.

            There’re various template types when we’re creating a trace. We’re using the Standard template when we’re facing performance issues. We need to provide a trace name and we have to mark “Save to file” in order to ensure a post-analysis. The trace that we’ll use, will have the following attributes:

–       All the events from the Standard template will be deleted. Then we are going to add new ones.

–       From the Error and Warnings group, we’ll add Exceptions and User Messages.

–       From the Locks group, we’ll add DeadLock and Lock:DeadLock Chain.

–       From the Stored Procedure group, we’ll add RPC:Completed, SP:Completed and SPStmtCompleted.

–       From the TSQL group, we’ll add BatchCompleted and StmtCompleted.

–       Add filters for the desired monitoring area. We need to set filters for Database (we have to use always the database id filter because for 2000 instances, the name will be not showing), for NTUserName or Login Name if we are monitoring for specific user.

            So, when our trace is ready, we can to run it. Once we have enough information gathered from the trace, we need to go through in order to identify Error Messages and DeadLock (as critical events). It is important to ensure that before we start the trace, we need to ensure that the server that will be monitored has enabled the ability to record global deadlocks, I mean, in all sessions. In order to do this, we need to execute the sentence dbcc traceon (1222,-1) for SQL 2005 instances and dbcc traceon (1204,-1) for SQL 2000 instances. If we identified any deadlock, we’ll need to kill the process and immediately rollback. The logs can be found in SQL Server error logs which will report the objects that are creating deadlocks.

Dbcc Tracestatus will display the list of traces running on the server.

 

NOTE: To have TSQL replay , we need to have full back up of the database. Then we can copy the database on to some other environment and then can replay the trace.

            When we’re working with a specific query or query batch, we can check the estimated performance by clicking “Display Estimated Performance Plan” on the MS Management Studio. This action will show the execution plan and estimated resources that the batch will take. In this plan, we can to focus in those steps that represents the more resources allocation and in this way, isolate them for improve it.

We can view the trace file by using SELECT * from fn_trace_gettable(‘path where the trace file is located’);

If there is a table scan, we can go and suggest for creating a clustered index to improve the performance.

If there is a bookmark lookup, we can go ahead and suggest for the creation of non clustered indexes.

            The final tool and which will help us to improve the performance on a specific scenario is the Database Engine Tuning Advisor. This tool will analyze a source like trace file or query batch and will do suggestions about improvements that we can to do on our database objects. The Database Engine Tuning Advisor tool can be launched by the option menu under Tools menu on MS Management Studio or by right clicking on the query batch that we want to tune. When we launch an analysis, we can to leave the default options and then run the analysis. Once the analysis is done, we’ll need to check for the recommendations and evaluate them in order to do the better improvement. For example, when a “Create Statistic” suggestion rise, we need to check with the application team before do any changes, may be that will not be the better solution. Its important to share the details of the analysis with the application team once is completed.

            For last, when we’ve decided which improvement to do and we’ve applied them, we need to test the performance again and evaluate the results with previous results.

Advertisements

3 Responses

  1. Good one. I would like to add some more ( though not specific but in general)

    1. Check connectivity to SQL Instance
    2. Is changes occured on the server recently ( within 24 Hrs or may be less)
    3. Check the all the logs ( SQL,events)
    4. Check any locks/blocks occuring.. if any check source and what causing blocks)
    5. Check the CPU/MEMORY/DISO IO usages ( use perfromace dashboard or PAL reports, these are available in MS site)
    6. Ananlyses the report
    7. Check how many CPU available for server and how many are SQL using, same goes with memory
    8. Check the disk I/O from perf counter
    9. see all the data and log files are on different filgroup or altelst on different disk
    10.see any missing indexes using DMVs

    Arun

  2. Really it is very good topic for finding the poor performance of sql server

    Good work thanks

  3. we have to look at all levels right from server to query level. It could be because of inadequate server config settings like not reserving the memory ideally for sql server. It could be related with MAXDOP. it could be related with database configuration in terms of file and filegroup arrangements accross multiple independent disks or using SAN Based RAID configurations. It could be an issue with Tempdb. Because the usage of tempdb has increased alot and often if not always i have noticed querys using worktable to generate final resultset keep waiting because of dis-organisation of your Tempdb Database. And last but not the least bad indexes, unused indexes(Read vs Write) ratio. And unoptimized queries. So, we should ideally use Profiler intelligently, Should try to capture information related with index usage and index operational stats using DMVs also try to configure Perfmon to monitor Hadware related Counters (Mostly). And we could also use some customised tools to diagnose further.

    Sachin Sharma

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: