Learn How To Unit Test Your Database Changes With The DevOps Lab Show

April 4, 2018, 9:00 am

≫ Next: Monitoring performance of natively compiled stored procedures – database-scoped configuration options

≪ Previous: Load solutions faster with Visual Studio 2017 version 15.6

Haven’t checked out The DevOps Lab yet? Well, now is your chance! Launched in December 2017, The DevOps Lab is a new Channel 9 show that focuses on - you guessed it - everything DevOps at Microsoft. Hosted by Cloud DevOps Advocate Damian Brady, the show goes beyond the buzzword to solve real DevOps problems using a range of tools and techniques.

This time around, Damian meets with newly awarded Data Platform MVP Hamish Watson to chat about unit testing and databases - a topic sure to intrigue anyone working with CI/CD pipelines.

Hamish begins by explaining that while unit testing is vital, in many cases people don’t extend these tests to the database. Actually it happens so seldom, that many might doubt if it can be done at all. But Hamish demonstrates that in fact, in can. He spends this 27-minute episode showing us two tools to write database tests in TSQL, and run them as part of a CI/CD pipeline.

“One of the key things about testing is that we don’t just do one thing. We want to do as [many] different forms of testing as we can,” he said. Hamish explains that this is especially important because well - no one wants to be called into a room full of developers at 3 a.m. to fix an issue. It’s better to just run the tests at 3 p.m., instead.

To learn more about unit testing database changes, check out the latest DevOps Lab episode here! The DevOps show also has a library of 8 more awesome episodes released since the launch back in December, which cover product areas including Visual Studio Team Services, related Azure cloud services and beyond. Don’t forget to check them out here.

↧

Monitoring performance of natively compiled stored procedures – database-scoped configuration options

April 4, 2018, 11:10 am

≫ Next: Intro to the DirectX Shader Compiler Editor

≪ Previous: Learn How To Unit Test Your Database Changes With The DevOps Lab Show

We just added new database-scoped configuration options that will help with monitoring performance of natively compiled stored procedures. The new options XTP_PROCEDURE_EXECUTION_STATISTICS and XTP_QUERY_EXECUTION_STATISTICS are available now in Azure SQL Database, and will be available in the next major release of SQL Server. These options will improve your monitoring and troubleshooting experience for databases leveraging In-Memory OLTP with natively compiled stored procedures.

After enabling these options, you can monitor the performance of natively compiled stored procedures using Query Store, as well as the DMVs sys.dm_exec_query_stats and sys.dm_exec_procedure_stats. Note that there is a performance impact to enabling execution statistics collection, thus we recommend to disable stats collection when not needed.

To enable execution statistics collection at the procedure level, run:

  ALTER DATABASE SCOPED CONFIGURATION SET XTP_PROCEDURE_EXECUTION_STATISTICS = ON

To enable execution statistics collection at the query level, run:

  ALTER DATABASE SCOPED CONFIGURATION SET XTP_QUERY_EXECUTION_STATISTICS = ON

The following example queries show the procedure-level and query-level execution statistics for natively compiled stored procedures:

select object_id, 
 object_name(object_id) as 'object name', 
 cached_time, 
 last_execution_time, 
 execution_count, 
 total_worker_time, 
 last_worker_time, 
 min_worker_time, 
 max_worker_time, 
 total_elapsed_time, 
 last_elapsed_time, 
 min_elapsed_time, 
 max_elapsed_time 
from sys.dm_exec_procedure_stats 
where database_id=db_id() and object_id in (select object_id 
from sys.sql_modules where uses_native_compilation=1) 
order by total_worker_time desc

select st.objectid, 
 object_name(st.objectid) as 'object name', 
 SUBSTRING(st.text, (qs.statement_start_offset/2) + 1, ((qs.statement_end_offset-qs.statement_start_offset)/2) + 1) as 'query text', 
 qs.creation_time, 
 qs.last_execution_time, 
 qs.execution_count, 
 qs.total_worker_time, 
 qs.last_worker_time, 
 qs.min_worker_time, 
 qs.max_worker_time, 
 qs.total_elapsed_time, 
 qs.last_elapsed_time, 
 qs.min_elapsed_time, 
 qs.max_elapsed_time 
from sys.dm_exec_query_stats qs cross apply sys.dm_exec_sql_text(sql_handle) st 
where st.dbid=db_id() and st.objectid in (select object_id 
from sys.sql_modules where uses_native_compilation=1) 
order by qs.total_worker_time desc

For more details about monitoring and troubleshooting the performance of natively compiled stored procedure, see Monitoring Performance of Natively Compiled Stored Procedures.

For an overview of In-Memory OLTP, including natively compiled stored procedures, see Overview and Usage Scenarios.

↧

Intro to the DirectX Shader Compiler Editor

April 4, 2018, 10:00 am

≫ Next: Microsoft Immersion Program: Bringing Microsoft Product Engineers to your Team

≪ Previous: Monitoring performance of natively compiled stored procedures – database-scoped configuration options

One of the goodies that you get when you build the GitHub DirectX Shader Compiler yourself is the DirectX Shader Compiler Editor, casually referred to by its executable name 'dndxc'.

Building the DirectX Compiler Sources

In case you're wondering, that's "dot net dxc" - it started off as a way to show how to call into the compiler from a C# project. There is very little work going on besides user interface management - pretty much every interesting thing that happens is delegated to the dxcompiler.dll component.

The basic UI is as follows: there's a code editor on the left-hand side, and options to operate on that code in the menu, and additional tools on the right-hand side.

Probably the most basic thing to do is to compile a shader and look at its disassembly. To do this quickly, press Ctrl+N to start with a new shader template, and then Ctrl+F7 to compile it. You'll see the Disassembly tab come to the foreground with color-coded disassembly.

If something goes wrong, you'll get error messages in the output instead.

This is simply exercising the IDxcCompiler APIs to compile and disassembly a blob. Next time, we'll look around some more.

Enjoy!

↧

Microsoft Immersion Program: Bringing Microsoft Product Engineers to your Team

April 4, 2018, 11:16 am

≫ Next: Investigating Issues while accessing Teams under VS Team Services accounts in South Central US – 04/04 – Investigating

≪ Previous: Intro to the DirectX Shader Compiler Editor

App Dev Manager Katie Konow spotlights the Microsoft Immersion Program and the value it brings to both Microsoft and our valued customers.

immerse

verb im·merse i-ˈmərs

ENGROSS. to take or engage the whole attention of : occupy completely

As a developer or development lead using a Microsoft technology, you may find yourself wondering: does Microsoft take feedback? How do I make an impact on SQL Server, Typescript or Visual Studio (to name a few)? Can anyone hear me?

We can! The Microsoft Immersion Program was created so that our engineers can come to where their users are.

What is the Immersion Program?

The Immersion Program is a unique opportunity to connect with Microsoft Product Group engineers. A group of engineers will be carefully chosen based on your workloads and feedback areas. The engineers will come to you - onsite and ready to meet face to face. They will learn about your company, how you use their products, hear your feedback and suggestions and work with you to understand your product and team. There are often situations where your question or suggestion can lead to a design change request (or you will be updated that the roadmap already reflects your idea if you are covered by a non-disclosure agreement).

Working with the Immersion Program through Premier Developer, I have been privileged to be a part of several Immerse Sessions. I have witnessed fascinating, deeply technical product conversations and have seen some of the development teams that I work with make improvements to their processes as a result. After a Visual Studio/Roslyn Immerse visit, one customer had builds that were twice as fast. Through working with the product engineers they received tips on what to modify about their build process and saw the benefit.

In another case, while watching a developer demonstrate their usage of Visual Studio, the product group engineers noticed that the engineer was finding files in a manner they felt was odd. After a deep dive, it was determined that the developer was working around an issue they had forgotten to even mention: that they found “Navigate to” search results were unstable during solution load. The results would change order as new projects and files were loaded. This is one of those pain points that the product group values learning about but the developer never thought to mention because he had his own workaround. Through seeing day-to-day Visual Studio onsite, with the developer, the product group found something unexpected to improve.

Does this type of visit appeal to your developers as a way to show Microsoft how you do business using our products? If so, learn more about the Immersion Program by contacting your Application Development Manager or visit the Immersion site to learn more.

Katie Konow is an Application Development Manager with Premier Developer since 2015. Jeff Young has been the Program Manager running the Immerse program since 2016.

Premier Support for Developers provides strategic technology guidance, critical support coverage, and a range of essential services to help teams optimize development lifecycles and improve software quality. Contact your Application Development Manager (ADM) or email us to learn more about what we can do for you.

↧

Investigating Issues while accessing Teams under VS Team Services accounts in South Central US – 04/04 – Investigating

April 4, 2018, 12:29 pm

≫ Next: Fail to create bot with error ‘Authorization_RequestDenied’

≪ Previous: Microsoft Immersion Program: Bringing Microsoft Product Engineers to your Team

Initial Update: Wednesday, April 4th 2018 19:22 UTC

We're investigating issues accessing Teams under Visual Studio Team Services accounts in South Central US. Customers may receive 404 error code while accessing the teams, backlogs under their Visual Studio Team services accounts.

Next Update: Before Wednesday, April 4th 2018 20:30 UTC

Sincerely,
Krishna Kishore

↧

Fail to create bot with error ‘Authorization_RequestDenied’

April 4, 2018, 1:09 pm

≫ Next: Power BI Premium and Power BI Embedded available for Azure Government

≪ Previous: Investigating Issues while accessing Teams under VS Team Services accounts in South Central US – 04/04 – Investigating

Since last December our team starts to support Azure Bot Framework, so in this blog you'll start seeing some best practices from our field experience

Many of our customers encounter the following error when creating a bot in Azure Bot portal (please check here for bot creation steps):

"Authorization_RequestDenied

Insufficient privileges to complete the operation. Please check that your account has sufficient access to the Microsoft app registration portal link below. "

This is a known issue which is often caused by insufficient permission of the user account in Azure AD. Basically there are 2 settings to verify if the account has appropriate permissions:

In Azure portal, go to Azure Active Directory -> click on "User Settings" -> verify the following values:

Under "App registrations" section, "Users can register applications" should be "Yes";
If your account is a guest user, then under "External users" section, "Guest users permissions are limited" should be set to "No".

You may find that you are not able to modify those settings, that's because they are managed by Global Admin accounts for this AD tenant.

Please contact your Global Admin account to implement the right settings.

Another situation you may encounter is, you don't have access to the "User settings" page with message "No access":

This is probably because you are a guest user with a custom domain in this Azure AD tenant, and action plan is the same: reach out to your Global Admin asking for permission check.

If this cannot resolve your problem, please don't hesitate to raise a support ticket so we can work together on it.

We'll continue sharing our knowledge about Azure Bot Framework, stay tuned and let's Bot together!

Some articles you may be interested in:

Migrate your bot from Bot Framework Portal to Azure Portal

https://docs.microsoft.com/en-us/bot-framework/bot-service-migrate-bot

Debug bots with Bot Framework Emulator

https://docs.microsoft.com/en-us/bot-framework/bot-service-debug-emulator

Jin W. from Microsoft Support team of IIS/ASP.NET/Azure Bot

↧

Power BI Premium and Power BI Embedded available for Azure Government

April 4, 2018, 12:33 pm

≫ Next: Troubleshooting data movement latency between synchronous-commit AlwaysOn Availability Groups

≪ Previous: Fail to create bot with error ‘Authorization_RequestDenied’

Power BI Embedded and BI Premium are now generally available for Microsoft Azure Government, extending existing Power BI capabilities to support U.S. government agencies and their partners in advancing the mission

Power BI Premium offers dedicated capacity for your organization or team, giving you more consistent performance and larger data volumes without requiring per-user licenses. This premium service enables you to:

Distribute content to anyone: Power BI Premium lets you distribute dashboards, reports, and other content broadly, without purchasing individual licenses for each recipient—whether they’re inside or outside your organization.
Even greater performance with dedicated capacity: Get the performance your organization, department, or team needs—with even more capacity in the Power BI service allocated exclusively to you.
Deploy on-premises or in the cloud: Choose Power BI in the cloud or keep reports on-premises with Power BI Report Server—with the flexibility to move to the cloud at your pace. View reports through the portal, on mobile, or embedded in your applications.

Power BI Embedded is a platform service that helps you supercharge your applications with stunning and interactive visuals, dashboards, and reports. With Power BI Embedded, quickly create, deploy, and manage data-rich applications using Power BI and connect to hundreds of data sources for real-time analytical insights.

The driving vision for Power BI has always been to enable users across roles, disciplines, and industries to get value by drawing insights from their data within minutes. In response to customer demand, we’re adding these new services to Azure Government to empower government agencies to share analytics more broadly and embed analytics in applications more easily. Power BI delivers an end-to-end business intelligence solution while helping government agencies and their partners comply with regulatory and policy requirements.

This announcement reinforces Microsoft’s longstanding and full commitment to support the needs of U.S. government agencies, delivering the exclusivity, highest compliance and security, hybrid flexibility, and commercial-grade innovation required to better meet citizen expectations

Learn More

For additional information on Microsoft Power BI, please see our documentation. Power BI Premium can only be purchased through Volume Licensing. You can see prices at the Power BI pricing page. You can contact your Microsoft representative for more information.

To explore Azure Government, request your free trial today. Or, check out purchasing options to get started now.

↧

Troubleshooting data movement latency between synchronous-commit AlwaysOn Availability Groups

April 4, 2018, 7:49 pm

≫ Next: Troubleshooting SQL Server Scheduling and Yielding

≪ Previous: Power BI Premium and Power BI Embedded available for Azure Government

Writer: Simon Su
Technical Reviewer: Pam Lahoud, Sourabh Agarwal, Tejas Shah
Applies to: SQL Server 2014 SP2, SQL Server 2016 SP1, SQL Server 2017 RTM

In synchronous-commit mode AG nodes sometimes you may observe your transactions are pending on HADR_SYNC_COMMIT waits. HADR_SYNC_COMMIT waits indicate that SQL server is waiting for the signal from the remote replicas to commit the transaction. To understand the transaction commit latency you can refer to below articles:

Troubleshooting High HADR_SYNC_COMMIT wait type with AlwaysOn Availability Groups
https://blogs.msdn.microsoft.com/sql_server_team/troubleshooting-high-hadr_sync_commit-wait-type-with-always-on-availability-groups/

SQL Server 2012 AlwaysOn – Part 12 – Performance Aspects and Performance Monitoring II
https://blogs.msdn.microsoft.com/saponsqlserver/2013/04/24/sql-server-2012-alwayson-part-12-performance-aspects-and-performance-monitoring-ii/

In above link you will learn that the transaction delay can be evaluated by below two performance counters:

SQL Server:Database Replica –> Transaction Delay
SQL Server:Database Replica –> Mirrored Write Transactions/sec

For example, assume there are poor performing AG nodes and you see “SQL Server:Database Replica –> Transaction Delay” is 1000 ms (milliseconds), and “SQL Server:Database Replica –> Mirrored Write Transactions/sec” is 50, then it means on average each transaction has a delay of 1000ms/50=20 ms.

Given above example, can we know where the delay of 20 ms is from? What are the factors that cause this latency? To find out the answers of this kind of questions we need to understand how synchronous-commit works:

https://blogs.msdn.microsoft.com/psssql/2011/04/01/alwayson-hadron-learning-series-how-does-alwayson-process-a-synchronous-commit-request/

To track the data movement between replicas we are lucky we have updated xevents:
https://support.microsoft.com/en-us/help/3173156/update-adds-alwayson-extended-events-and-performance-counters-in-sql-s

In synchronous-commit mode the basic logic of the log blocks movement is as below:

In primary:

1.1 Log block->LogCache->LogFlush->LocalHarden(LDF)
1.2 Log block->logPool->LogCapture->SendToRemote
(1.1 and 1.2 happens in parallel )

In remote synchronous replica:

The harden process is similar as primary:
logBlock receive->LogCache->LogFlush->HardenToLDF->AckPrimary

The xevents (see the first link in this article) happened in different places of log block movement. I have below figure to give you the detailed log movement flow of each steps and those related Xevents:

As showed above when the Xevent trace is captured we can know the precious time point of each step of the log block movement and you can know exactly where the transaction latency is from. Commonly the delay is from three parts:

1. The duration of log harden in primary

It equals to the time delta of Log_flush_start(step 2) and Log_flush_complete (step 3)

2. The duration of log harden in remote replica

It equals to the time delta of Log_flush_start (step 10) and Log_flush_complete (step 11)

3. The duration of network traffic

The sum of time deltas of (primary:hadr_log_block_send_complete ->secondary:hadr_transport_receive_log_block_message, step 6-7) and (secondary:hadr_lsn_send_complete->primary:hadr_receive_harden_lsn_message,step 12-13)

I use below script to capture xevents:

/* Note: this trace could generate very large amount of data very quickly, depends on the actual transaction rate. On a busy server it can grow several GB per minute, so do not run the script too long to avoid the impact to the production server. */

CREATE EVENT SESSION [AlwaysOn_Data_Movement_Tracing] ON SERVER
ADD EVENT sqlserver.file_write_completed,
ADD EVENT sqlserver.file_write_enqueued,
ADD EVENT sqlserver.hadr_apply_log_block,
ADD EVENT sqlserver.hadr_apply_vlfheader,
ADD EVENT sqlserver.hadr_capture_compressed_log_cache,
ADD EVENT sqlserver.hadr_capture_filestream_wait,
ADD EVENT sqlserver.hadr_capture_log_block,
ADD EVENT sqlserver.hadr_capture_vlfheader,
ADD EVENT sqlserver.hadr_db_commit_mgr_harden,
ADD EVENT sqlserver.hadr_db_commit_mgr_harden_still_waiting,
ADD EVENT sqlserver.hadr_db_commit_mgr_update_harden,
ADD EVENT sqlserver.hadr_filestream_processed_block,
ADD EVENT sqlserver.hadr_log_block_compression,
ADD EVENT sqlserver.hadr_log_block_decompression,
ADD EVENT sqlserver.hadr_log_block_group_commit ,
ADD EVENT sqlserver.hadr_log_block_send_complete,
ADD EVENT sqlserver.hadr_lsn_send_complete,
ADD EVENT sqlserver.hadr_receive_harden_lsn_message,
ADD EVENT sqlserver.hadr_send_harden_lsn_message,
ADD EVENT sqlserver.hadr_transport_flow_control_action,
ADD EVENT sqlserver.hadr_transport_receive_log_block_message,
ADD EVENT sqlserver.log_block_pushed_to_logpool,
ADD EVENT sqlserver.log_flush_complete ,
ADD EVENT sqlserver.log_flush_start,
ADD EVENT sqlserver.recovery_unit_harden_log_timestamps
ADD TARGET package0.event_file(SET filename=N'c:mslogAlwaysOn_Data_Movement_Tracing.xel',max_file_size=(500),max_rollover_files=(4))
WITH (MAX_MEMORY=4096 KB,EVENT_RETENTION_MODE=ALLOW_SINGLE_EVENT_LOSS,MAX_DISPATCH_LATENCY=30 SECONDS,MAX_EVENT_SIZE=0 KB,MEMORY_PARTITION_MODE=NONE,TRACK_CAUSALITY=OFF,STARTUP_STATE=ON)

For demo purpose I just run “insert into [AdventureWorks2014]..t1 values(1)” and then capture the xevent trace on primary and secondary, and below are the screenshot of the captured xevents:

Primary:

In second synchronous replica:

Note: You may notice that the log_block_id (146028889512) of hadr_receive_harden_lsn_message is not the same as others (146028889488). This is because the return id is always the next immediate id of the harden log block, we can use hadr_db_commit_mgr_update_harden xevent to correlate the xevents:

With above xevent log we now get below detailed time latency breakdowns of the transaction commit:

	From	To	latency
Network: Primary->Second	Primary: hadr_log_block_send_complete 2018-03-06 16:56:28.2174613	Secondary: hadr_transport_receive_log_block_message 2018-03-06 16:56:32.1241242	3.907 seconds
Network: Secondary->Primary	Secondary:hadr_lsn_send_complete 2018-03-06 16:56:32.7863432	Primary:hadr_receive_harden_lsn_message 2018-03-06 16:56:33.3732126	0.587 seconds
LogHarden(Primary)	log_flush_start 2018-03-06 16:56:28.2168580	log_flush_complete 2018-03-06 16:56:28.8785928	0.663 seconds
Log Harden(Secondary)	Log_flush_start 2018-03-06 16:56:32.1242499	Log_flush_complete 2018-03-06 16:56:32.7861231	0.663 seconds

I list the time delta (the latency) just for network and log harden process there, there could be other latency places happening as well, like log block compressing/decompressing etc, but mainly the latency comes from these three parts:

Network latency between replicas. In above example, it is 3.907+0.587=4.494 seconds

Log harden on Primary=0.663 seconds

Log harden on Secondary=0.663 seconds

To get the total transaction delay, we cannot simply sum them up because Log flush on primary and the network transfer are happening in parallel. Say, the network takes 4.494 seconds, but primary log harden finished (log_flush_complete:2018-03-06 16:56:28.8785928) far before the primary gets confirmation from replica (hadr_receive_harden_lsn_message:2018-03-06 16:56:33.3732126). Luckily, we do not need to manually determine which timestamp to use to calculate the total commit time of a transaction. We can use the time delta between the two hadr_log_block_group_commit xevents to know the time to commit. For example, in above log:

Primary: hadr_log_block_group_commit: 2018-03-06 16:56:28.2167393

Primary: hadr_log_block_group_commit: 2018-03-06 16:56:33.3732847

Total time to commit=delta of above two timestamps= 5.157 seconds

This number is equal to the network transfer time plus the log harden time on the secondary. This makes sense because the secondary has to wait for the network for the log block available before log harden, it cannot harden the log in parallel like in the primary.

If you look at the second hadr_log_block_group_commit event it has a column of “processing_time” which is the exactly the commit time of the transaction that we are talking about:

So now you have an overall picture about the log block movement between synchronous-commit mode replicas, and you know where is the latency (if any) from, replicas, network, or disk (log harden) or others.

By the way, you may notice that there is “hadr_db_commit_mgr_harden_still_waiting” xevent happening in primary Xevents. This event happens every 2 seconds (the 2 seconds is hardcoded) when the primary is waiting for the acknowledge message from second replica. If the ack comes back within 2 seconds you won’t see this xevent.

Reference

New in SSMS – Always On Availability Group Latency Reports
https://blogs.msdn.microsoft.com/sql_server_team/new-in-ssms-always-on-availability-group-latency-reports/

AlwaysOn Availability Groups Troubleshooting and Monitoring Guide
https://msdn.microsoft.com/library/dn135328

↧

Troubleshooting SQL Server Scheduling and Yielding

April 4, 2018, 9:32 pm

≫ Next: Now Boarding Startups And Scaleups For Qantas Innovation Program

≪ Previous: Troubleshooting data movement latency between synchronous-commit AlwaysOn Availability Groups

Writer: Simon Su
Technical Reviewer: Pam Lahoud, Sourabh Agarwal, Tejas Shah

Scheduling and Yielding Knowledge Recap

We all know that SQL server is a multi-threads and multi-tasks system and it has its own thread scheduling mechanism which is a small part of job of what we call SQLOS. If you are not familiar with SQLOS you can refer to below two links for the details:

A new platform layer in SQL Server 2005 to exploit new hardware capabilities and their trends
https://blogs.msdn.microsoft.com/slavao/2005/07/20/platform-layer-for-sql-server

Inside the SQL Server 2000 User Mode Scheduler
https://msdn.microsoft.com/library/aa175393.aspx

How To Diagnose and Correct Errors 17883, 17884, 17887, and 17888
https://technet.microsoft.com/en-us/library/cc917684.aspx

Inside SQL server source code, there are many voluntary yield points to make multi-threads run efficiently and cooperatively. If a SQL Server worker thread does not voluntarily yield, it will likely prevent other threads from running on the same scheduler. When the owner of the scheduler has not yielded within 60 seconds and as a result pending requests (tasks) are stalled, SQL Server will log “non-yielding scheduler error in error log like below:

2018-03-10 21:16:35.89 Server      ***********************************************
2018-03-10 21:16:35.89 Server      *
2018-03-10 21:16:35.89 Server      * BEGIN STACK DUMP:
2018-03-10 21:16:35.89 Server      *   03/10/18 21:16:35 spid 22548
2018-03-10 21:16:35.89 Server      *
2018-03-10 21:16:35.89 Server      * Non-yielding Scheduler
2018-03-10 21:16:35.89 Server      *
2018-03-10 21:16:35.89 Server      ***********************************************

And a mini dump is generated in the LOG folder. You can contact Microsoft support to understand the details of the mini dump and check whether this non-yielding scheduler warning is a serious issue or it can be safely ignored.

How long a thread has been running on scheduler without yielding

Regarding SQL server scheduling and yielding all things sound perfect until I got deeply involved in a customer’s case to troubleshoot a sudden transaction drop issue. I will write another post to share the story of that. In that case I need to find out how long a SQL server worker thread has been running on a scheduler without yielding. We know if non-yielding condition exceeds 60 seconds SQL server will log non-yielding error accordingly. However how about those threads that have been running on the schedulers without yielding but less than 60 seconds? Do we have a way to get the detail information of these non-yielding threads? Note that you cannot use the CPU time column of a statement in profiler trace because the CPU time of a query doesn’t mean it has been exclusively running on the CPU without yielding for that long. Some yields could have occurred among the life cycle of a query execution and SQL server records these yields with “SOS_SCHEDULER_YIELD” wait type.

SQL server worker has a quantum target of 4ms (milliseconds), however due to the complexity of the reality there could be places that the code runs unexpectedly long before it reaches to the voluntary yield point. Normally this is not a problem because that thread will eventually yield without starving the runnable list threads. In case we really need to know the details of this kind of things (like how long a thread has been running on the CPU without yielding) we can use below approaches.

Before talking about the method to troubleshoot this kind of scheduling and yielding issue let’s see how a scheduler and its tasks look like:

Here is the logic how scheduling works:

When the task is running, it is in “Running” state and it is the active worker of the scheduler.
When the task waits for nothing but CPU to run it is put into “Runnable” queue of a scheduler
When the task is waiting for something (like lock,disk I/O, etc) it is in “Suspended” state.
If the suspended task finishes the waits (wait for nothing) and is ready to run, it is put to the end of the Runnable queue.
If the running thread voluntarily yields it is put back to end of Runnable queue.
If the running thread needs to wait for something it is switched out of the scheduler and is put into suspended status.
If the running thread finishes its work then the top thread of the runnable queue becomes the “Running” thread.

Resource Wait Time

For a suspended task if it is waiting for something we have lots of way to get the wait related information like wait_time, wait_resource etc. For example, both DMVs of sys.dm_os_waiting_tasks or sys.dm_exec_requests can tell the detail wait statistics:

SELECT session_id,status,wait_time,wait_type,wait_resource,last_wait_type
FROM sys.dm_exec_requests
WHERE session_id=52

Result:

Signal Wait Time

If you query “sys.dm_os_wait_stats” you will find a column called “signal_wait_time_ms”. Signal wait time is the time a thread spent on the scheduler’s “runnable” list waiting to get on the CPU and run again. sys.dm_os_wait_stats output gives you an overall picture of waits for each wait type including the signal wait time. If you want to get the detailed information of signal wait time of each individual session you can leverage Xevent wait_info and wait_info_external. There is an excellent article discussing how to use wait_info event to trace REDO latency:

https://blogs.msdn.microsoft.com/alwaysonpro/2015/01/06/troubleshooting-redo-queue-build-up-data-latency-issues-on-alwayson-readable-secondary-replicas-using-the-wait_info-extended-event/

The same approach applies to all other waits. I use below steps to simulate signal waits:

1. Create a table in tempdb

USE tempdb
CREATE TABLE t1 (c1 int)

2. Alter SQL server to use only one CPU (never do this on production server!):

EXEC SP_CONFIGURE 'affinity mask',2 --use only the second CPU of the system
RECONFIGURE WITH OVERRIDE

3. Now start the xevent trace:

IF EXISTS(SELECT * FROM sys.server_event_sessions WHERE name LIKE 'SignalWaitDemo')
DROP EVENT SESSION [SignalWaitDemo] ON SERVER
GO
CREATE EVENT SESSION [SignalWaitDemo] ON SERVER
ADD EVENT sqlos.wait_info(
ACTION(sqlos.scheduler_id,sqlserver.database_id,sqlserver.session_id)
--Capture End event (opcode=1) only
WHERE ([package0].[equal_uint64]([opcode],(1))
--Only capture user sessions (session_id>=50)
AND [package0].[greater_than_equal_uint64]([sqlserver].[session_id],(50))
--You can change duration to bigger value, say, change below 10 ms to 3000ms
AND [duration]>=(10)))
ADD TARGET package0.event_file(SET filename=N'E:tempWait_Info_Demo.xel')
WITH (MAX_MEMORY=4096 KB,EVENT_RETENTION_MODE=ALLOW_SINGLE_EVENT_LOSS,MAX_DISPATCH_LATENCY=30 SECONDS,MAX_EVENT_SIZE=0 KB,MEMORY_PARTITION_MODE=NONE,TRACK_CAUSALITY=OFF,STARTUP_STATE=OFF)

ALTER EVENT SESSION [SignalWaitDemo] ON SERVER STATE=START;

4. Then use ostress.exe tool to simulate workload to SQL server:

ostress -n1000 -r200 -S. –isignal_wait.sql

In signal_wait.sql, I have below query inside:

SET NOCOUNT ON
USE tempdb
DECLARE @I int=0,@k int=0

BEGIN
IF(rand()>0.9)update t1 set c1=c1+10*(rand()-0.5)
DECLARE @document varchar(64);
SELECT @document = 'Reflectors are vital safety' +
                   ' components of your bicycle.';
DECLARE @j int
SET @j=CHARINDEX('bicycle', @document);
SET @j=CHARINDEX('are', @document);
SET @j=CHARINDEX('vital', @document);
SET @j=CHARINDEX('dd', @document);
SET @j=CHARINDEX('your', @document);
END

When the ostress tool is running, I query “select * from sys.dm_exec_requests where session_id>50” and I got below output:

You notice there are lots of runnable threads as well as suspended threads. Those suspended threads are the threads that are waiting for UPDATE lock while those runnable ones are waiting for scheduler to run.

I stop the Xevent SignalWaitDemo trace and I got below result:

You see that there are very long signal_duration sessions in the result which means they are in runnable queue for that long time.

Non-yielding Scheduler Time

From above description, we know how to check the resource waits time and signal waits time. Now comes to the million dollars question, how to know how long a thread has been running on a given scheduler without yielding (I call it Non-yielding Scheduler Time)?

Note that non-yielding scheduler time means how long a thread is occupying the scheduler without any yielding. It doesn’t always means the CPU execution time. The thread holding the scheduler may be held by operation system if other application is using the same CPU at that time point. This is not common since in most cases it is a dedicated server for SQL server and no other heavy application running on the machine.

The fact is that we do not have a handy way to track how long a thread is actually continuously running on a scheduler without yielding. I would expect there is a max_non_yielding_scheduler_time_ms column somewhere in DMV but there is no at this moment.

The good news is that we have yield_count in DMV of “sys.dm_os_schedulers” like below:

select yield_count, runnable_tasks_count, * from sys.dm_os_schedulers where scheduler_id<1024

If the scheduler yields the yield_count will be increased by one. We can query this DMV regularly and get the delta of this column. If the yield_count doesn’t change during the monitoring time delta we know that someone is running on the scheduler for that period of time.

For example:

At timestamp1, yield_count is 33555

After 5 seconds, we query it again, and if the yield_count is still the same, 33555, then we know that a thread has holding the scheduler for at least 5 seconds.

Once we get the non-yielding scheduler, we can use scheduler’s active worker to inner join sys.dm_os_workers to get the active task, and use the active task to join the sys.dm_exec_requests to get the related user session information. A scheduler’s active worker is the worker thread that is current running on the scheduler, which is normally the running session running on the scheduler.

Here is the script to save the yield_count and other related information to a permanent table called “yields”. The script will run with every specified interval until you manually stop it:

USE <yourdb>
CREATE TABLE yields
(runtime datetime, scheduler_id bigint,yield_count bigint,runnable int, session_id int,start_time datetime,command varchar(200),database_id int)

SET NOCOUNT ON
WHILE(1=1)
BEGIN
INSERT INTO yields
SELECT getdate() 'runtime', a.scheduler_id, a.yield_count, runnable_tasks_count, session_id,start_time, command,database_id
FROM sys.dm_os_schedulers a
inner join sys.dm_os_workers b on a.active_worker_address=b.worker_address
left join sys.dm_exec_requests c on c.task_address=b.task_address
--Most system has less than 1024 cores, use this to ignore those HIDDEN schedulers
WHERE a.scheduler_id<1024
--Monitor it every 5 seconds. you can change it to meet your needs
WAITFOR DELAY '00:00:05'
END

To get interesting non-yielding scheduler information out of table yields, I use below script. It is not the perfect one but it can give you the idea how to get meaningful information from the captured data.

DECLARE scheduler_cur CURSOR
FOR SELECT scheduler_id from yields group by scheduler_id order by scheduler_id
OPEN scheduler_cur
DECLARE @id bigint
FETCH NEXT FROM scheduler_cur INTO @id
WHILE (@@FETCH_STATUS=0)
BEGIN
DECLARE delta_cur CURSOR
FOR SELECT runtime, yield_count,scheduler_id,runnable,session_id,start_time, command,database_id
FROM yields WHERE scheduler_id=1 ORDER BY runtime ASC
OPEN delta_cur
DECLARE @runtime_previous datetime,@yieldcount_previous bigint
DECLARE @runtime datetime,@yieldcount bigint,@scheduler_id bigint,@runnable int,@session_id int,@start_time datetime,@command varchar(200),@database_id int

FETCH NEXT FROM delta_cur INTO @runtime ,@yieldcount ,@scheduler_id,@runnable ,@session_id ,@start_time,@command,@database_id
SET @runtime_previous=@runtime;SET @yieldcount_previous=@yieldcount
FETCH NEXT FROM delta_cur INTO @runtime ,@yieldcount ,@scheduler_id ,@runnable,@session_id ,@start_time,@command ,@database_id

WHILE(@@FETCH_STATUS=0)
BEGIN
--We find one non-yielding scheduler during the runtime delta
IF(@yieldcount=@yieldcount_previous)
BEGIN
PRINT 'Non-yielding Scheduler Time delta found!'
SELECT @runtime_previous 'runtime_previous', @runtime 'runtime', datediff(second, @runtime_previous,@runtime) 'non_yielding_scheduler_time_second', @yieldcount_previous 'yieldcount_previous',
@yieldcount 'yieldcount' ,@scheduler_id 'scheduler_id',@runnable 'runnable_tasks' ,@session_id 'session_id' ,@start_time 'start_time',

@command 'command' ,@database_id 'database_id'
END

-- print @id
SET @runtime_previous=@runtime;SET @yieldcount_previous=@yieldcount
FETCH NEXT FROM delta_cur INTO @runtime ,@yieldcount ,@scheduler_id,@runnable ,@session_id ,@start_time,@command ,@database_id

END

CLOSE delta_cur
DEALLOCATE delta_cur
FETCH NEXT FROM scheduler_cur INTO @id

END
CLOSE scheduler_cur
DEALLOCATE scheduler_cur

The output looks like below:

From above output you can see that scheduler 1 has non_yielding_scheduler_time for several times. Actually that scheduler 1 is hang because I suspended its active worker thread in debugger.

If you want to capture more information about the user session, like application name, hostname, etc you can run profiler trace or xevent at the same time to capture those events, and then you can correlate information with yields table to drill down further.

↧

Now Boarding Startups And Scaleups For Qantas Innovation Program

April 4, 2018, 7:13 pm

≫ Next: Azure Cosmos DB: využijte přístup s Mongo DB API bez změny kódu

≪ Previous: Troubleshooting SQL Server Scheduling and Yielding

Guest post by Slingshot.

3 APRIL 2018 – Qantas Group is inviting the world’s most innovative startups, scaleups and digital disruptors to apply for its second AVRO Accelerator program.

The AVRO Accelerator, run in partnership with Slingshot, gives entrepreneurs the chance to work with Qantas Group to develop, incubate and scale their businesses.

The program, launched in 2017, helped participants build products to trial with Qantas customers, people and operations, as well as commercial and investment deals to grow their businesses.

This year, Qantas is looking to continue that momentum and invest in emerging businesses that can help Qantas continue to diversify and grow, while offering its customers new and improved experiences, and its people smart solutions to transform how they work.

Qantas Group Executive of Strategy, Innovation and Technology, Rob Marcolina, said that Qantas is looking to discover more talent this year.

“The aim of the AVRO Accelerator is to find the next generation of businesses that can benefit from our unique assets to help identify new growth opportunities for Qantas, while helping to build their businesses at the same time,” he said.

“Through our 2017 cohort we discovered ideas that we have implemented to help improve our customers’ experiences. For example, with our help, Volantio launched its Yana platform, enabling airlines to give customers more flexibility when they travel. The dynamic technology is currently being trialled here at Qantas, along with several other airlines around the world.”

Slingshot CEO Karen Lawson said the aim of this year’s program is to continue to drive commercial outcomes for Qantas and startup founders.

“Australia is now recognised by the Global Entrepreneurship rankings as 7th in the world. There is a wealth of talent in this country that we want to work with. Our search and cohorts are often global in nature representing strength in diversity in all its forms,” she said.

AVRO Accelerator participants will receive funding – with the opportunity to secure additional, next round funding from Qantas – and work with industry-leading mentors and technology partners to accelerate their product development and customer traction. Supported by a structured curriculum and mentorship program, businesses will also have unique access to the scale and global reach of the Group’s brands, its domain expertise and anonymised data and insights.

“When you give agile startups and scaleups access to resources like these, the sky really is the limit”, added Ms Lawson.

Applicants wanting to join the program should reflect one or more of the following themes:

Personalised, seamless experiences: Revolutionise our customer journey. Streamline it, personalise it, and give the customer complete control.
Live well, feel well: Help us enrich the lives of our customers and our communities.
Connecting to customers: Help us better understand, engage and reach each of our customers, personalising and optimising the interactions we have with them.
Smarter, safer operations: Transform our processes and platforms to make us safer, faster, simpler and more efficient.
Innovating without limits: Challenge us with original thinking and show us new ways to use our unique assets.

More information can be found at qantas.com/avro along with details of the roadshows in Melbourne and Sydney.

Applications can be made here and will close at 11:59pm AEST 30 April 2018.

Azure Cosmos DB: využijte přístup s Mongo DB API bez změny kódu

April 5, 2018, 12:00 am

≫ Next: Error during serialization or deserialization using the JSON JavaScriptSerializer. The length of the string exceeds the value set on the maxJsonLength property

≪ Previous: Now Boarding Startups And Scaleups For Qantas Innovation Program

Mongo DB se stala mezi vývojáři populární pro svou jednoduchost použití přes kterou ale stále umí nabídnout pokročilejší agregační operace. Cosmos DB ale přináší zásadní věci, které se vám určitě budou líbit – plně as a service, SLA na dostupnost, výkon, latenci i konzistenci, laditelný model konzistence, globální distribuovatelnost. Použijte fantastickou Cosmos DB v Azure, ke které ale můžete přistoupovat přes Mongo API a ani nemusíte měnit své knihovny a kód.

Založení DB a import dat

Nejprve si v Azure portálu založíme Cosmos DB s API typu Mongo.

Po chvilce bude naše Cosmos DB připravena a na záložce rychlého startu se dostaneme k připojovacím parametrům a hotovým ukázkám napojení mongo CLI, .NET, Node apod.

Vytvořme si naší první kolekci.

Já použiji model jedné partition (tedy pro menší použití). Na začátek zvolím nejmenší výkon 400 RU.

Stáhněte si malinkatou sadu JSON dokumentů, kterou jsem si pro účely zkoušení připravil: https://raw.githubusercontent.com/tkubica12/cosmosdb-demo/master/zviratka.json

Pokračovat ve čtení

↧

Error during serialization or deserialization using the JSON JavaScriptSerializer. The length of the string exceeds the value set on the maxJsonLength property

April 5, 2018, 12:14 am

≫ Next: Lesson learned from an Availability Group performance case

≪ Previous: Azure Cosmos DB: využijte přístup s Mongo DB API bez změny kódu

Recently , I came across an issue where my WebMethod call was failing with the below error message .

Error(s): {"Message":"Error during serialization or deserialization using the JSON JavaScriptSerializer. The length of the string exceeds the value set on the maxJsonLength property.","StackTrace":" at System.Web.Script.Serialization.JavaScriptSerializer.Serialize(Object obj, StringBuilder output, SerializationFormat serializationFormat)rn at System.Web.Script.Serialization.JavaScriptSerializer.Serialize(Object obj, SerializationFormat serializationFormat)rn at System.Web.Script.Services.RestHandler.InvokeMethod(HttpContext context, WebServiceMethodData methodData, IDictionary`2 rawParams)rn at System.Web.Script.Services.RestHandler.ExecuteWebServiceCall(HttpContext context, WebServiceMethodData methodData)","ExceptionType":"System.InvalidOperationException"}

I was not sure what should be the length of maxJsonLength property so it will not fail in the future as well if in case my application is pulling more data by using a WebMethod. As we know in real world scenarios the data may increase in future and I may have to pull more records from the database which will crash my application and results in http 500 errors. I can not change this value every time based on amount of data I am pulling from the backend . I need to have a safe value which I can specify and meets my future requirements too..

Let’s try to understand What is maxJsonLength :

The maxJsonLength Gets or sets the maximum length that is accepted by the JavaScriptSerializer object for JavaScript Object Notation (JSON) strings.

Syntax :

[ConfigurationPropertyAttribute("maxJsonLength", DefaultValue = 102400)]

public int MaxJsonLength { get; set; }

Property Value

Type: System.Int32

An integer that represents the maximum length for JSON strings. The default is 102400 characters.

The value of the MaxJsonLength property applies only to the internal JavaScriptSerializer instance that is used by the asynchronous communication layer to invoke Web services methods.

Basically, the "internal" JavaScriptSerializer respects the value of maxJsonLength when called from a web method. Direct use of a JavaScriptSerializer (or use via an MVC action-method/Controller) does not respect the maxJsonLength property, at least not from the systemWebExtensions.scripting.webServices.jsonSerialization section which you define in the web.config file.

You will get the below error if you exceed the above default value(DefaultValue = 102400) while serializing the JSON object and your Json string is too big

Above error “The length of the string exceeds the value set on the maxJsonLength property” itself suggest that you have to increase the value for maxJsonLength property

Now the question in your mind would be how to define the “safe” number or how to find out the “safe” number for the value of the maxJsonLength property so we it will not fail in the future if the web service is accessing more data than the defined value(or default value) for the maxJsonLength property.

The answer to the above question would be don’t pull huge data from the web service on a single request instead of that you can chunk the data and can access it , so it will not exceed the value you have set for the maxJsonLength.

Also , if you are pulling huge records on a single request it will definitely going to hamper the performance of your application . In order to get the Safe number for the maxJsonLenght you have to perform some rigorous testing on your application and determine the same. There is no such default value which will be same for any amount of data or records you are pulling for the database which is invoked from the Web Service method call.

If you are exceeding the default value for this property you have to update it in the web.config file as below :

<system.web.extensions>

</jsonSerialization>

</webServices>

</scripting>

</system.web.extensions>

</configuration>

Or else you also can set the maxJsonLength in your controller like this

JavaScriptSerializer jss = new JavaScriptSerializer();
jss.MaxJsonLength = Int32.MaxValue;

Hope this helps!!!! J

↧

Lesson learned from an Availability Group performance case

April 5, 2018, 12:55 am

≫ Next: E2 Spotlight- Daniel Kerr

≪ Previous: Error during serialization or deserialization using the JSON JavaScriptSerializer. The length of the string exceeds the value set on the maxJsonLength property

Writer: Simon Su
Technical Reviewer: Pam Lahoud, Sourabh Agarwal, Tejas Shah

Problem description

One of my customers implemented a very high workload synchronous AG (Availability Group) solution and he needs 10k transactions/sec in AG databases. With the in-memory technology, this 10K/sec goal was achieved but they found a very strange behavior in transaction processing of SQL Server. During stress testing about every 5~10 minutes the transactions/sec counter (actually it is “SQL Server 2016 XTP Transactions:Transactions Created/sec” counter) could drop to zero suddenly and quickly resume to normal within a second or tens of micro-seconds. Normally you will not observe this interesting thing because the duration of the dip is so short. My customer’s transaction is very time-sensitive so he has his own transaction/sec calculation formula and he found this short sharp drop in his monitor log. If we observe the “SQL Server 2016 XTP Transactions:Transactions Created/sec” in his captured performance monitor log from primary replica, it looks like below:

I highlight the sharp drop with red circle in above chart. If we export the performance monitor log to text file, the “Transaction Created/sec” counter has below values:

You can see that the counter suddenly dropped to 33 at 37:53.4 as highlighted above. I do not think this drop is serious since SQL server keeps the same high transaction processing speed from next second. However, my customer is curious to this little dip and he want to find out the root cause of it.

How to troubleshoot AG performance delay?

For AG performance troubleshooting, we have two very good public articles:

https://blogs.msdn.microsoft.com/saponsqlserver/2013/04/21/sql-server-2012-alwayson-part-11-performance-aspects-and-performance-monitoring-i/

https://blogs.msdn.microsoft.com/saponsqlserver/2013/04/24/sql-server-2012-alwayson-part-12-performance-aspects-and-performance-monitoring-ii/

If you are not familiar with AG performance troubleshooting concepts and steps please read above two articles first. Let us look at the two key performance counters to check the transaction delay in my customer’s synchronous-commit mode replicas:

SQL Server:Database Replica –> Transaction Delay
SQL Server:Database Replica –> Mirrored Write Transactions/sec

In performance monitor, these two counters look like below:

The “Transaction Delay” value is an accumulation of the delay of all the current transaction delay in millisecond. You can see that the “Transaction Delay” counter has the same spikes as the sudden drop of the “Transactions Created/Sec”. Its spikes indicate that at those time points the AG transactions have time delay during commits. This gives us a very good start point. We can focus on the transaction delay in our AG performance troubleshooting.

So who causes the transaction delay? Is it primary replica, secondary replica, or other factors like network traffic?

As a must go-through step for performance troubleshooting we captured performance monitor logs to check how the performance behaved on both replicas. We want to find out whether there is any performance bottleneck existing in primary or secondary. For example, whether CPU usage is high when transaction delay spike happens, whether disk queue length is long, disk latency is large, etc. We expect to find something that has the same spike trend as the “Transaction Created/sec” or “Transaction Delay”. Unfortunately, we do not anything interesting. CPU usage is as low 30%, Disk speed is quite fast. No disk queue length at all. We then checked AG related counters, like the log send queue and the recovery queue as the above two links mentioned but again we do not find anything helpful. We have below conclusions according to the performance monitor log:

--There is no overall CPU performance bottleneck

--There is no disk performance bottleneck, especially no disk issue on second replica.

--There is no network traffic issue.

In short, the performance monitor log does not tell us much why the transaction delay is happening.

Who introduces the transaction delay?

To investigate the details of the AG transaction performance, we need to study the performance of data movement between the two synchronous replicas. I wrote another article discussing the detailed steps to troubleshoot log block movement latency:

Troubleshooting data movement latency between synchronous-commit AlwaysOn Availability Groups
https://blogs.msdn.microsoft.com/psssql/2018/04/05/troubleshooting-data-movement-latency-between-synchronous-commit-always-on-availability-groups/

I use similar script as above article to capture Xevent traces on both replicas. From the xevent logs we find out that the transaction latency is not caused by below factors:

<>Network transfer

<>Local log harden

<>Remote Log harden

The latency is happening on the primary replica after the primary receives the LSN harden message from remote node. This is a big milestone because it gives us clear direction where to investigate further. We should focus on the primary to know why it cannot commit the transaction in time. Below is the figure to tell you where comes the delay:

From above xevent log we can see the delay (about 3.2 seconds gap) occurs mainly between xevents of hadr_receive_harden_lsn_message and hadr_db_commit_mgr_update_harden, i.e. between step 13 and step 14 in below figure:

Normally once hadr_receive_harden_lsn_message arrives from remote replicas, SQL server will process the message and update LSN progress very quickly. Now we see it has delay to process the messages.

HADR_LOGPROGRESS_SYNC Wait

Now comes the challenge. How to troubleshoot this further, why step 13-14 produces the latency? To get the answer of this I use below script (wait.sql) to understand the request status for every second:

declare @i integer =0

WHILE(1=1)

BEGIN

set @i=@i+1

RAISERROR ('-- sys.dm_exec_requests --', 0, 1) WITH NOWAIT

SELECT GETDATE() 'runtime', * from sys.dm_exec_requests where session_id >50

RAISERROR ('-- sys.dm_os_waiting_tasks --', 0, 1) WITH NOWAIT

SELECT getdate() 'runtime',* from sys.dm_os_waiting_tasks WHERE session_id >50

--Please don’t use so small value in production, it will eat up one core’s usage.

WAITFOR DELAY '00:00:01.000';

END

I am lucky that from the script output I have a big finding. Whenever the transaction sharp drop occurs there are always HADR_LOGPROGRESS_SYNC waits happening there as well:

HADR_LOGPROGRESS_SYNC wait is “Concurrency control wait when updating the log progress status of database replicas”. To update log progress for an AG database, like the latest harden LSN from remote replica etc, a thread has to acquire the HADR_LOGPROGRESS_SYNC lock first. For any given point in time, only one thread can hold this lock, and when this lock is held by someone other threads who want to update the log progress have to wait until the lock release. One example is that thread A holds this lock to update the latest harden LSN is 1:20:40, after thread A finishes, it releases the lock, and then thread B holds this lock and update remote harden LSN to 1:20:44. LSN progress update has to be serialized to make the log consistent.

Besides HADR_LOGPROGRESS_SYNC waits in the output, there are also lots of HADR_SYNC_COMMIT occurring. This is expected because we know that there is latency happening at that time (see the transaction delay spike at the beginning of this article). Here is the screenshot of the HADR_SYNC_COMMIT threads:

What are the relationship between HADR_LOGPROGRESS_SYNC wait and HADR_SYNC_COMMIT wait? It takes me sometime to understand that for synchronous replica, log block could contain several log records from different transactions and these transactions are grouped to commit to replica, this is the behavior of what is called “Group Commit” in Availability Groups. When the log block is hardened on remote replica, it will send the harden LSN to primary (we call this sync progress messages). The primary receives the harden LSN and then will acquire HADR_LOGPROGRESS_SYNC lock to update the latest harden LSN to the primary database. All those transactions waiting on HADR_SYNC_COMMIT will be signaled that the remote commit is done if their expected harden LSN is less that the latest harden LSN from remote replica. When local commit and remote commit are both done then the user transaction is called “committed”. Note that we are talking synchronous-commit mode replicas here. If the thread cannot acquire HADR_LOGPROGRESS_SYNC lock to update the latest LSN then there could be lots of threads being in HADR_SYNC_COMMIT wait because they are not able to get signal from the log progress update thread.

Now comes to the million dollars question. Why is there long HADR_LOGPROGRESS_SYNC wait happening? In other words, who owns the HADR_LOGPROGRESS_SYNC lock for that long time? From the HADR_LOGPROGRESS_SYNC wait figure shown above, we see that SPID 438 has last_wait_type of HADR_LOGPROGRESS_SYNC, is it possible it is the owner of the HADR_LOGPROGRESS_SYNC lock? Later investigation actually confirms that SPID 438 is holding HADR_LOGPROGRESS_SYNC at that time. However why does it hold the lock so long?

Scheduler Yielding issue

We checked the output of wait.sql to see if we can get the answer why SPID 438 held the lock for more than 2 seconds. From the output I see SPID 438 status is “background” so I am not able to know whether it is “running” or in runnable queue. To figure out whether this thread is really running or runnable we can check the active worker thread of its scheduler. If the active worker thread of the scheduler is the same as this thread then we know this thread is on the scheduler running. I wrote below article to demonstrate how to troubleshoot thread scheduling and yielding:

Troubleshooting SQL Server Scheduling and Yielding

https://blogs.msdn.microsoft.com/psssql/2018/04/05/troubleshooting-sql-server-scheduling-and-yielding/

I use the same technology to capture logs. The finding is simple. The HADR_LOGPROGRESS_SYNC thread was in runnable queue for about 100ms-1 second and therefore it cause lots of HADR_SYNC_COMMIT waits with the same waiting duration, and no doubt it then caused the transaction delay spike as you see in the beginning of this article. Here is the scheduler yield_count looks like:

You can see that yield_count (27076130) of scheduler 23 does not change within 1 second which means someone is actively running on the scheduler without yielding to other threads. You also see runnable task is 7 which means there are 7 threads are waiting in runnable queue.

The wait_info xevent trace also confirms the HADR_LOGPROGRESS_SYNC thread is in runnable queue waiting for a while:

You see for SPID 438 the signal_duration (2407ms) is the same as duration column. This means it has been sitting in runnable queue for about 2407 ms.

Who is holding the scheduler without yielding

From above investigation we understand that the thread who owns the HADR_LOGPROGRESS_SYNC lock cannot get chance to run on scheduler in time and hence it causes transaction delay spike (i.e. sharp transaction rate drop). Using the technology described in article “Troubleshooting SQL Server Scheduling and Yielding” we finally find out the “offending” thread is running a monitoring script. That script is very long containing several system DMV querying and joining, and it often takes hundreds of microseconds to run. The script runs continuously every minute to generate SQL healthy logs. In case the worker thread who picks up the HADR_LOGPROGRESS_SYNC message to process is on the same scheduler then they have chance to competing for CPU resource at the same time. In this case, SQL Server run the monitor script without yielding for about one second, and then this one-second non-yielding scheduler time causes the HADR_LOGPROGRESS_SYNC thread to wait in runnable queue for one second. Because of this, all of the HADR_LOGPROGRESS_SYNC waiter need to wait for 1 second for the lock, which in turn block those threads in HADR_SYNC_COMMIT waits for one second accordingly.

The solution is simple. My customer just optimized the monitoring script to shorten its execution time, and then the problem was resolved.

↧

E2 Spotlight- Daniel Kerr

April 5, 2018, 1:00 am

≫ Next: Business Central and Dynamics NAV Object Ranges

≪ Previous: Lesson learned from an Availability Group performance case

Here is the second in a series of blogs which will share and highlight the great stories of our amazing MIEEs who attended the E2 Global Education Exchange in Singapore! We asked them to share their experiences of becoming a MIEE as well as some of their personal highlights from the Road to E2! Second up in the series is Daniel Kerr, Head of Computing at Sandymoor School who attended as one of our MIEExperts.

Tell us about yourself and your MIEE journey so far

I am currently Head of Computing at Sandymoor School in Runcorn. I became an MIEE when Sandymoor was awarded Microsoft Showcase status in 2014. Over the past few years, I have been working alongside Microsoft to improve student outcomes through innovative uses of technology in teaching and learning at Sandymoor. At Sandymoor, Office 365 is readily establishing digitally literate pupils who will thrive in a technological world. Our Computer Science curriculum embraces Microsoft technology and it is built into a whole school scheme of learning. OneNote creates digital exercise books in every Computing lesson and SharePoint and Stream are creating dynamic and collaborative platforms for our students. As a school, we have also dedicated Microsoft Office lessons to encourage students to complete Microsoft Office Specialist exams which is a further example of how we are preparing our students for a digital future.

What were you looking to get out of the E2 experience?

Before heading to E2, I wanted to explore Microsoft apps such as Minecraft Education and Paint 3D and how they could be incorporated into my classroom. Following my visit, I have been able to try out these apps and how they work in the context of teaching and learning which is very exciting. I can now really see the benefits of using them and I cannot wait to try them out with my students back in the UK.

Three highlights of your E2 experience were:
• Testing out Minecraft Education for the first time
• Working with teachers from all over the world
• Meeting my fellow E2 Team UK; they really are an amazing and talented group of educators

How will this experience impact on your role back in your institution?

Now that I am back in the UK, I will be getting to work on rolling out Minecraft Education and Paint 3D across the whole school Computer Science curriculum. I am also really excited about looking at how Skype in the classroom could be used with my Microsoft Student Ambassadors as I know they would really enjoy being able to participate in that!

Follow @MrKerr_ICT on Twitter to keep up to date with the great things he is doing using technology.

Tweets by MrKerr_ICT

↧

Business Central and Dynamics NAV Object Ranges

April 5, 2018, 1:06 am

≫ Next: Migrating from Azure Service Manager (ASM) Virtual Machines to Azure Resource Manager (ARM) VM

≪ Previous: E2 Spotlight- Daniel Kerr

In Business Central running the Microsoft cloud we operate with three different object ranges in terms of licensing. Developing for Business Central is done using Visual Studio Code with the AL Language extension. Developing for Dynamics NAV can be done either by using Visual Studio Code with the AL Language extension or by using C/SIDE with an appropriate license file. All tenants in Business Central as of April 2nd, 2018 are able to freely use objects in the following ranges:

50.000-99.999
1.000.000-60.000.000
70.000.000-74.999.999

In the following we will take a look at what the intention of each individual range is.

50.000-99.999

As in current on premise implementations this range is for per tenant/customer customizations. A partner can develop an extension tailored to the individual tenant to fit the needs. The partner developing this will do this through either using a sandbox tenant (currently in preview) or by obtaining a Docker image of the current release of Business Central that matches the version of the tenant. Once the development is done, the extension can be deployed to the individual tenant

1.000.000-60.000.000

This is the RSP range which partners that have an ISV solution for on premise have access to. By April 2nd, 2018 the partner can choose to use this range for developing extensions that can be used either in Dynamics NAV on premise or in Business Central in the Microsoft Cloud. When used in Business Central these extensions are obtained as apps from appsource.microsoft.com.

70.000.000-74.999.999

This range continues as it is today. Partners can obtain ranges for extension development that runs in Business Central in the Microsoft Cloud. This range is only available for extension development and only in Business Central. These extensions are obtained as apps from appsource.microsoft.com.

↧

Migrating from Azure Service Manager (ASM) Virtual Machines to Azure Resource Manager (ARM) VM

April 5, 2018, 1:52 am

≫ Next: – Investigating issues with Release Management Feature in West Europe

≪ Previous: Business Central and Dynamics NAV Object Ranges

Time is up. With the announcement of http://manage.windowsazure.com is no more available since 2^nd April 2018, it is high time you think around your classic Azure (ASM) resources. Most of the classic services can be managed from new portal i.e., https://portal.azure.com

Here we will consider classic Virtual Machines with Virtual Networks. This is simple because if you move the Classic Virtual Networks to ARM Virtual Networks.

There is an excellent article I followed and it just worked for me https://docs.microsoft.com/en-us/azure/virtual-machines/windows/migration-classic-resource-manager-ps

This article would consider PowerShell usage. I am assuming that you have Azure PowerShell configured in your laptop.

Step 1: Login and prepare the PowerShell

Then register the Classic Infa Migration component.

This would take some (5-10 min) time. Check the status by running below command,

Get-AzureRmResourceProvider
-ProviderNamespace
Microsoft.ClassicInfrastructureMigrate

The status should be "registered". Wait till the status is "registered" before you proceed for the next steps.

Step 2: Identifying Classic Virtual Network name

Finding Classic Virtual Network name is tricky, in new portal is shows something which is not usable. If you open the Classic Virtual Network it shows a section as "Virtual network site name (for .cscfg file)" (as below),

Alternatively, you can also run the below command to find the name of the classic VNet,

Get-AzureVnetSite
|
Select
-Property
Name

Step 3: Virtual Network Movement: Identify, Validate, Prepare and Commit

Identify:

Use this name as your VNet name.

$vnetName
=
"Group ASM-ARM ASM-ARM"

Validate:

To validate if there is any issue please use this command,

Move-AzureVirtualNetwork
-Validate
-VirtualNetworkName
$vnetName

If everything is fine then you can proceed with the next step. Else you need to investigate the message.

Prepare:

Then prepare the VNet

Move-AzureVirtualNetwork
-Prepare
-VirtualNetworkName
$vnetName

Commit:

If everything is fine above, then commit, else you can use the -Abort switch.

Move-AzureVirtualNetwork
-Commit
-VirtualNetworkName
$vnetName

This ideally should move your resources from ASM to ARM. However, I have observed they creates separate Resource groups. You may later move them into a single resource group.

Namoskar!!!

↧

– Investigating issues with Release Management Feature in West Europe

April 5, 2018, 2:56 am

≫ Next: Publish failed

≪ Previous: Migrating from Azure Service Manager (ASM) Virtual Machines to Azure Resource Manager (ARM) VM

Update: Thursday, April 5th 2018 10:54 UTC

Our DevOps team continues to investigate issues with Release Management Feature in West Europe. Root cause is not fully understood at this time.. The problem began at 09:00 UTC Thursday April 5th. We currently have no estimated time for resolution.

Next Update: Before Thursday, April 5th 2018 13:00 UTC

Sincerely,
Niall

Initial Update: Thursday, April 5th 2018 09:43 UTC

We're investigating issues with Release Management Feature in West Europe. We have engaged the relevant engineers to identify the cause and mitigate the issue.
Apologies for any inconvenience this may have caused.

Next Update: Before Thursday, April 5th 2018 10:30 UTC

Sincerely,
Kalpana

↧

Publish failed

April 5, 2018, 4:16 am

≫ Next: An error occurred

≪ Previous: – Investigating issues with Release Management Feature in West Europe

While I was writing these articles about creating, developing local and deploying an Azure Function App, I received this error, Figure 1.

How to create an Azure Function in Visual Studio
How to connect to a database from an Azure Function
Deploy an Azure Function created from Visual Studio
Check out all my Azure Functions articles here

1>------ Build started: Project: chsharpguitar-func-db, Configuration: Release Any CPU ------
1<span style="display: inline !important; float: none; background-color: transparent; color: #333333; cursor: text; font-family: Georgia,'Times New Roman','Bitstream Charter',Times,serif; font-size: 16px; font-style: normal; font-variant: normal; font-weight: 400; letter-spacing: normal; line-height: 24px; orphans: 2; text-align: left; text-decoration: none; text-indent: 0px; text-transform: none; -webkit-text-stroke-width: 0px; white-space: normal; word-spacing: 0px;">></span>Function1.cs(37,81,37,82): error CS1010: Newline in constant
1<span style="display: inline !important; float: none; background-color: transparent; color: #333333; cursor: text; font-family: Georgia,'Times New Roman','Bitstream Charter',Times,serif; font-size: 16px; font-style: normal; font-variant: normal; font-weight: 400; letter-spacing: normal; line-height: 24px; orphans: 2; text-align: left; text-decoration: none; text-indent: 0px; text-transform: none; -webkit-text-stroke-width: 0px; white-space: normal; word-spacing: 0px;">></span>Function1.cs(37,82,37,82): error CS1026: ) expected
1<span style="display: inline !important; float: none; background-color: transparent; color: #333333; cursor: text; font-family: Georgia,'Times New Roman','Bitstream Charter',Times,serif; font-size: 16px; font-style: normal; font-variant: normal; font-weight: 400; letter-spacing: normal; line-height: 24px; orphans: 2; text-align: left; text-decoration: none; text-indent: 0px; text-transform: none; -webkit-text-stroke-width: 0px; white-space: normal; word-spacing: 0px;">></span>Function1.cs(37,82,37,82): error CS1002: ; expected
1<span style="display: inline !important; float: none; background-color: transparent; color: #333333; cursor: text; font-family: Georgia,'Times New Roman','Bitstream Charter',Times,serif; font-size: 16px; font-style: normal; font-variant: normal; font-weight: 400; letter-spacing: normal; line-height: 24px; orphans: 2; text-align: left; text-decoration: none; text-indent: 0px; text-transform: none; -webkit-text-stroke-width: 0px; white-space: normal; word-spacing: 0px;">></span>Function1.cs(43,75,43,76): error CS1010: Newline in constant
1<span style="display: inline !important; float: none; background-color: transparent; color: #333333; cursor: text; font-family: Georgia,'Times New Roman','Bitstream Charter',Times,serif; font-size: 16px; font-style: normal; font-variant: normal; font-weight: 400; letter-spacing: normal; line-height: 24px; orphans: 2; text-align: left; text-decoration: none; text-indent: 0px; text-transform: none; -webkit-text-stroke-width: 0px; white-space: normal; word-spacing: 0px;">></span>Function1.cs(43,76,43,76): error CS1026: ) expected
1<span style="display: inline !important; float: none; background-color: transparent; color: #333333; cursor: text; font-family: Georgia,'Times New Roman','Bitstream Charter',Times,serif; font-size: 16px; font-style: normal; font-variant: normal; font-weight: 400; letter-spacing: normal; line-height: 24px; orphans: 2; text-align: left; text-decoration: none; text-indent: 0px; text-transform: none; -webkit-text-stroke-width: 0px; white-space: normal; word-spacing: 0px;">></span>Function1.cs(43,76,43,76): error CS1002: ; expected
1<span style="display: inline !important; float: none; background-color: transparent; color: #333333; cursor: text; font-family: Georgia,'Times New Roman','Bitstream Charter',Times,serif; font-size: 16px; font-style: normal; font-variant: normal; font-weight: 400; letter-spacing: normal; line-height: 24px; orphans: 2; text-align: left; text-decoration: none; text-indent: 0px; text-transform: none; -webkit-text-stroke-width: 0px; white-space: normal; word-spacing: 0px;">></span>Done building project "chsharpguitar-func-db.csproj" -- FAILED.
========== Build: 0 succeeded, 1 failed, 0 up-to-date, 0 skipped ==========
  Publish Started
Function1.cs(37,81): error CS1010: Newline in constant [C:Usersbenperksourcereposchsharpguitar-func-db
                     chsharpguitar-func-dbchsharpguitar-func-db.csproj]
Function1.cs(37,82): error CS1026: ) expected [C:Usersbenperksourcereposchsharpguitar-func-db
                     chsharpguitar-func-dbchsharpguitar-func-db.csproj]
Function1.cs(37,82): error CS1002: ; expected [C:Usersbenperksourcereposchsharpguitar-func-db
                     chsharpguitar-func-dbchsharpguitar-func-db.csproj]
Function1.cs(43,75): error CS1010: Newline in constant [C:Usersbenperksourcereposchsharpguitar-func-db
                     chsharpguitar-func-dbchsharpguitar-func-db.csproj]
Function1.cs(43,76): error CS1026: ) expected [C:Usersbenperksourcereposchsharpguitar-func-db
                     chsharpguitar-func-dbchsharpguitar-func-db.csproj]

Figure 1, publishing failed while deploying to and Azure Function App

It turned out to be a compile issue, I had added some code without testing or compiling first. I saw the failure reasons in the Output window in Visual Studio, Figure 2.

Figure 2, publishing failed while deploying to and Azure Function App

Once I fixed those compile error, all worked fine.

↧

An error occurred

April 5, 2018, 4:20 am

≫ Next: Deploy an Azure Function created from Visual Studio

≪ Previous: Publish failed

While I was writing these articles about creating, developing local and deploying an Azure Function App, I received this error, Figure 1.

How to create an Azure Function in Visual Studio
How to connect to a database from an Azure Function
Deploy an Azure Function created from Visual Studio
Check out all my Azure Functions articles here

Template deployment failed. Deployment operation statuses:
Failed: /subscriptions/*****/resourceGroups/PRODUCTION-AM2-001/providers/Microsoft.Storage/
storageAccounts/myazurefunctionsstorage ()   error (ResourceNotFound): The Resource 
'Microsoft.Storage/storageAccounts/****' under resource group 'PRODUCTION-AM2-001'
 was not found.  Succeeded: /subscriptions/####/resourceGroups/PRODUCTION-AM2-001
/providers/Microsoft.Web /sites/csharpguitarfuncdb ()

Figure 1, An error occurred while deploying an Azure Function App

This happened when I first attempted to deploy and simultaneously create the Azure Function App and the Azure Function. I did not resolve it rather I worked around it. Regardless of this error the Azure Function App was created, so I published to an existing Function App instead of creating new and it worked.

↧

Deploy an Azure Function created from Visual Studio

April 5, 2018, 4:25 am

≫ Next: How to connect to a database from an Azure Function

≪ Previous: An error occurred

I created an Azure Function and added some some code to connect to a database and now I want to deploy it to the Azure platform.

How to create an Azure Function in Visual Studio
How to connect to a database from an Azure Function
Check out all my Azure Functions articles here

Right-click on the Project and select Publish, as seen in Figure 1.

Figure 1, how to publish an Azure Function from Visual Studio, develop local

I chose to create a new Azure Function App, as you can see in Figure 2.

Figure 2, how to publish an Azure Function from Visual Studio, develop local, how to create Function App from Visual Studio

Then I filled out the details as seen in Figure 3.

Figure 3, how to publish an Azure Function from Visual Studio, develop local, how to create Function App from Visual Studio

And then selected the Create button. This actually failed, see here “An error occurred”. The Function App was created, but not the Function. So I attempted to redeploy again by selecting the newly created Function App, see Figure 4 and all worked out fine.

Figure 4, how to publish an Azure Function from Visual Studio, develop local, select existing, how to create Function App from Visual Studio

I selected the Azure Function App from Figure 5.

Figure 5, how to publish an Azure Function from Visual Studio, develop local, how to create Function App from Visual Studio

And all worked out just fine. When I viewed the Azure Function App within the portal it looked like that seen in Figure 6.

Figure 6, how to publish an Azure Function from Visual Studio, develop local, how to create Function App from Visual Studio

As I mentioned here “How to create an Azure Function in Visual Studio”, “This if fine, since I decided to create from Visual Studio, I will need to develop, test and publish in Visual Studio from this point on. Do not think that you can go back and forth between developing in the portal and in Visual Studio. It is either or and you need to decide how you want to develop. When created from this direction, I see this, Figure 7 in the portal when I navigate to it.”

When I tested originally, I received the exception shown in Figure 7.

Figure 7, how to publish an Azure Function from Visual Studio, develop local, how to create Function App from Visual Studio

I resolved it by adding “DatabaseConnectionString” application setting as I discuss here “How to connect to a database from an Azure Function”. Issue was that I was reading from the local.settings.json file locally and the value didn’t exist when I published.

Once I added that, all was well, see Figure 8.

Figure 8, how to publish an Azure Function from Visual Studio, develop local, how to create Function App from Visual Studio

↧