Entity Framwork’s Include Function Produces Inner Joins Instead of Outer Joins

Published on Author nickriggs6 Comments

I ran into an issue with Entity Framework and Outer Joins when using the Include function. Consider the following problem:

I have Categories that can have many Parent Categories and many Child Categories. Therefore, I have a Hierarchy table that has ParentCategoryId and ChildCategoryId.

Now, picture an UI where you drill into the child categories. Each time you ask for another set of child entities, you need to know whether or not to display the “+” icon next to a particular child entity. You would only display the icon if that entity had children of it’s own.

What I would like to do is query the Hierarchy table for ParentCategoryId = X and get all the ChildCategories and as well as those ChildCategory’s ChildHeriarchy entities.

My first attempt at the EF expression was:

data.Hierarchies
    .Include("ChildCategory.ChildHierarchies")
    .Where(h => h.ParentCategoryId == 1);

I instruct EF to include the Parent Category and the Parent Category’s Child Hierarchies. Clean enough. However, I noticed a real problem with the SQL that was produced:

SELECT
[Project1].[HierarchyId] AS [HierarchyId],
[Project1].[ParentCategoryId] AS [ParentCategoryId],
[Project1].[ChildCategoryId] AS [ChildCategoryId],
[Project1].[CategoryId] AS [CategoryId],
[Project1].[Name] AS [Name],
[Project1].[C1] AS [C1],
[Project1].[HierarchyId1] AS [HierarchyId1],
[Project1].[ParentCategoryId1] AS [ParentCategoryId1],
[Project1].[ChildCategoryId1] AS [ChildCategoryId1]
FROM ( SELECT
	[Extent1].[HierarchyId] AS [HierarchyId],
	[Extent1].[ParentCategoryId] AS [ParentCategoryId],
	[Extent1].[ChildCategoryId] AS [ChildCategoryId],
	[Extent2].[CategoryId] AS [CategoryId],
	[Extent2].[Name] AS [Name],
	[Extent3].[HierarchyId] AS [HierarchyId1],
	[Extent3].[ParentCategoryId] AS [ParentCategoryId1],
	[Extent3].[ChildCategoryId] AS [ChildCategoryId1],
	1 AS [C1]
	FROM   [dbo].[Hierarchy] AS [Extent1]
	INNER JOIN [dbo].[Categories] AS [Extent2] ON [Extent1].[ChildCategoryId] = [Extent2].[CategoryId]
	INNER JOIN [dbo].[Hierarchy] AS [Extent3] ON ([Extent1].[ChildCategoryId] = [Extent2].[CategoryId]) AND ([Extent3].[ParentCategoryId] = [Extent2].[CategoryId])
	WHERE 1 = [Extent1].[ParentCategoryId]
)  AS [Project1]
ORDER BY [Project1].[HierarchyId] ASC, [Project1].[CategoryId] ASC, [Project1].[C1] ASC

Notice the join where it gets the Child Category’s children – it’s an INNER JOIN. Which means it will filter out any Child Category that doesn’t have children of its own: Definitely not what I was needing.

The question is why? Using Include in EF usually produces OUTER JOINs. It would seem the issue is in the direction I am asking the question. Since I’m starting with a child table and joining to a parent table – that join must be an INNER JOIN. And even though the next join to the child table could/should be an OUTER JOIN, EF seems to only use INNER JOINs once an INNER JOIN has been used in a chain.

The answer is to ask the question starting from the parent table. Let’s try:

data.Categories
    .Include("ChildHierarchies")
    .Where(c =>
        data.Hierarchies
            .Where(ch => ch.ParentCategoryId == 1)
            .Select(ch => ch.ChildCategoryId)
            .Contains(c.CategoryId));

The new SQL output looks much better; notice the hierarchy join is an OUTER JOIN.

SELECT
[Project2].[CategoryId] AS [CategoryId],
[Project2].[Name] AS [Name],
[Project2].[C1] AS [C1],
[Project2].[HierarchyId] AS [HierarchyId],
[Project2].[ParentCategoryId] AS [ParentCategoryId],
[Project2].[ChildCategoryId] AS [ChildCategoryId]
FROM ( SELECT
	[Extent1].[CategoryId] AS [CategoryId],
	[Extent1].[Name] AS [Name],
	[Extent2].[HierarchyId] AS [HierarchyId],
	[Extent2].[ParentCategoryId] AS [ParentCategoryId],
	[Extent2].[ChildCategoryId] AS [ChildCategoryId],
	CASE WHEN ([Extent2].[HierarchyId] IS NULL) THEN CAST(NULL AS int) ELSE 1 END AS [C1]
	FROM  [dbo].[Categories] AS [Extent1]
	LEFT OUTER JOIN [dbo].[Hierarchy] AS [Extent2] ON [Extent1].[CategoryId] = [Extent2].[ParentCategoryId]
	WHERE  EXISTS (SELECT
		1 AS [C1]
		FROM [dbo].[Hierarchy] AS [Extent3]
		WHERE (1 = [Extent3].[ParentCategoryId]) AND ([Extent3].[ChildCategoryId] = [Extent1].[CategoryId])
	)
)  AS [Project2]
ORDER BY [Project2].[CategoryId] ASC, [Project2].[C1] ASC

6 Responses to Entity Framwork’s Include Function Produces Inner Joins Instead of Outer Joins

  1. I have 3 tables ,First Table have Primary Key , Second Table have primary Key. The relationship between these 2 table is stored in third table .
    In entity frmework while genearting the entities 3 table is lost. How I can join between these two tables without changing the table structure?

    Thanks in advance.

  2. I have 3 tables ,First Table have Primary Key , Second Table have primary Key. The relationship between these 2 table is stored in third table .
    In entity frmework while genearting the entities 3rd table is lost. How I can join between these two tables without changing the table structure?

    Thanks in advance.

  3. Very cute dress! i love it, i think gladiators would go with it. Maybe black or seivlr. Look them up in google and you can see what im talking about.. i have some of my own and they are very comfortable.

  4. This can also happen if you have setup your navigation property with the WithRequired() method in a code first approach. For example, in OnModelCreating(DbModelBuilder modelBuilder) method, if you had the following, an Include() would produce INNER JOIN statements:

    modelBuilder.Entity()
    .HasMany(e => e.Children)
    .WithRequired(e => e.Parent);

    Changing the WithRequired() call to WithOptional(), calls to Include() would produce a LEFT OUTER JOIN statement.

Leave a Reply

Your email address will not be published. Required fields are marked *