Building LINQ expressions
Jaco Jansen van Vuuren
Software DeveloperI ran into a scenario this week where I needed to build my own expression to query a database and thought that it could make for an interesting post - maybe we could all learn something about LINQ and expressions together.
Expressions?
Expressions are simple a way to strongly type lambda expressions - I find this easier to explain visually.
So looking at the following code sample, is there a way we can write the argument to .Where
any differently?
It turns out that you can - the code to do it is below. If you don't understand it all just yet - don't worry - I'll explain it afterwards.
Essentially the lambda argument is broken down into its essential pieces.
num
is captured withExpression.Parameter
8
is captured withExpression.Constant
>
is captured withExpression.GreaterThan
- and the entire lambda is captured with
Expression.Lambda
Below is an color coded image with each part of the lambda color coded to a respective variable.
Another great way to have a visual look at what happens under the hood is by adding a breakpoint, inspecting the lambda variable and looking at the DebugView
property, you'll see that it looks very similar to the actual lambda that we passed in to the .Where
call originally.
Ok. But why would I do all of this extra work?
It turns out that there are some things that aren't possible with LINQ. I explicitly required an .OrWhere
function and just adding more .Where
calls equates to adding AND statements. Furthermore - I needed to be able to do it dynamically - so I couldn't build the entire lambda upfront with ||
. Joseph Albahari mentions a similar problem on his website (and also a different approach to solving it) which I recommend you read.
The problem I needed to solve
Suppose you have a database with the following structure (I've omitted sales, because it is not needed to illustrate my concept) and you have a service that gives you the location where each product was sold the most - and you now needed to say what the price was of the product at each of those locations. How would you do it (without multiple database queries)?
The service returns a list of TopProductSalesLocation
of which you can see the definition below.
Quick environment setup
This is how I set up the environment for the purpose of this post:
- 100 products
- 100 sellers with 100 locations each (10000 locations in total)
- A link between each product and each location (1000000 in total)
The following (very ugly) code was used to generate my fake data (be warned - it takes a while to run).
The first naive approach
Well - I'd just select all the ProductLocations
's that contain all the product ids and all the location ids. Let's try it.
There are two problems with this:
- You get a cartesian product of 100 * 100 = 10 000 items as a worst case scenario. What happens if you had 200 products, or what about 1000 products?
- You will have to filter out the results in memory again to get back to the original 100 items that you actually wanted.
As you can imagine - this is not going to scale very well.
Expressions to the rescue
Let's do this again - but using expressions and see what the difference is.
We only get the 100 records we wanted - instead of the 10000 that is a result of the cartesian product. Which is a lot easier to work with and will have a smaller footprint.