Akash Verenkar MSIT, Microsoft Corporation
One reoccurring pattern that most application developers encounter is that of applying a set of business rules to large amounts of data. When developing such applications, developers face the arduous task of ensuring that voluminous data areprocessed with satisfactory performance. Since applying business logic to data is CPU intensive in nature, developers’ often leverage concurrency to achieve the required performance. However developing concurrent program is extremely hard and drifts away the focus from the main business problem. The .NET Framework 4 introduces a new programming model for writing concurrent code that greatlysimplifies the work of developers. In this article I will explain how the new programming model in .NET Framework 4 can be used to achieve data parallelism. I will demonstrate how you can build a robust and scalable application that processes the data faster, yet reduce the load on the database. Further, all this is achieved without the hassle of having to create threads or directly dealing with threadpool. First, we will create a simple hypothetical example to explain the business problem and the solution approach. Then I’ll show you how to apply Parallel Programming model in .NET Framework 4 to attain data parallelism. As part of building this example, I will touch upon Parallel Language Integrated Query (PLINQ) and show how it can be used to process data faster than LINQ by adding‘.AsParallel()’ to LINQ constructs. Also, we will look at some new constructs like ‘Parallel.For’ and thread-safe data structures.
For sake of brevity, the article assumes that the readers have good understanding of the following technologies: .NET Framework C# LINQ XML data types Lambda Expressions
Primary Audience: IT professionals and softwareapplication developers who want use the new features in .NET Framework 4 to achieve data/task parallelism and improve scalability across cores.
Secondary Audience: Any software developers who want to get a broad understanding of the new parallel constructs in .NET Framework 4
In a typical enterprise, data is stored in databases like SQL Server 2008 and different applicationsconsume this data for various purposes. These applications may use a data access layer, built using .NET, to access the data. This data access layer can use dynamic SQL or stored procedures to access the data. Consider a scenario where we have ‘n’ number of records(X1, X2, ..... Xn) and for each record X, there are ‘t’ number of rows (T1, T2, .... Tt) in the database where ‘t’ and ‘n’ are approximately1000 each, and the number of records ‘t’ varies for different Xs. In all, there are about 1000*1000 number of rows in database for the entire input record set(X1, X2, ..... Xn). Figure 1 gives a pictorial representation of data. Note that each colored group can be processed independently.
X1 X2 X3 Xn
. . . .
T100000. . . .
. . . .
Rows in Database
We need to fetch the data from the database, apply business rules and create a processed result set. The logic to apply business rules for processing the data is CPU intensive. There are different approaches to solve this problem and I will explain how I arrived at our solution using the new data parallelism features in .NET Framework 4.Solution Approaches
As mentioned before, there are various approaches to solving the business problem of retrieving voluminous data from a data source and processing it. In this section, I will delineate the three approaches that I have used to solve the problem. For simplicity let’s assume our application to be a 3-tier application: Front End: This is the customer facing layer that sends...