Entity Framework with Asynchronous behaviours

They finally did it:  the future release of Entity Framework (version 6) will sport asynchronous behaviour based on .NET 4.0 Task Parallel Library (TPL).

The API is pretty neat.  First the SaveChanges gets an async brother SaveChangesAsync returning a Task.  So we can now write things like:

await context.SavesChangesAsync();

The more complicated topic is the queries.  LINQ was designed before TPL and doesn’t have the notion of asynchrony.  They got around it in a clever fashion:  LINQ describe the queries while EF allows you to enumerate the result in an asynchronous fashion:

var q = from e in context.Employees

  where e.Name.StartsWith("V")

  select e;

 

await q.ForEachAsync(e => Console.WriteLine(e.FirstName));

So the entire enumeration is done asynchronously hence Entity Framework can manage the moment when it needs to fetch the DB for new objects.

This new feature is quite powerful since DB access is typically a place where your thread blocks, waiting for something external.  For instance, a web service doing a query and returning data is typically written synchronously with the thread blocking waiting for the DB server.  Using this new asynchronous mode, we can as easily write an asynchronous version, much more scalable since no threads are blocking, hence more thread can be used to process requests.

Entity Framework 4.1 Series

I did a few Proof of Concepts (POC) around the new Entity Framework 4.1 as part of an SOA project I’m currently on and I decided to share the acquired knowledge.  The following series of blog posts are the result:

Enjoy!

UPDATE:  If you are more comfortable reading with Chinese characters, haoguanjun did a Chinese translation right over here!

Entity Framework 4.1: Bypassing EF query mapping (8)

This is part of a series of blog post about Entity Framework 4.1.  The past blog entries are:

In this final article, I’ll cover some ways to bypass EF query mapping.

As all good framework, EF recognize it’s not perfect and won’t ever cover all the angles and it does support opening to the underlying Framework, ADO.NET, by allowing you to call the database directly.

There are three (3) APIs opening the hood to ad-hoc:

  • DbContext.Database.ExecuteSqlCommand
  • DbContext.Database.SqlQuery
  • DbSet.SqlQuery

The first API isn’t very interesting, as it looks identical to its ADO.NET counterpart:

public int ExecuteSqlCommand(string sql, params object[] parameters);

The second one is more interesting:

public IEnumerable<TElement> SqlQuery<TElement>(string sql, params object[] parameters);

We can use this method to send SQL command (either a stored procedure call or ad-hoc SQL) directly to the database.  It delivers a little more than ADO.NET in that it’s able to convert the data-readers directly in entities.

TElement could be any class.  An important characteristic is that EF doesn’t track returned object, even if they are entity types.  This contrasts with the third API, part of DbSet, which tracks returned objects.

Let’s try DbContext.Database.SqlQuery:

public IEnumerable<SprocReport> GetEntityList()
{
    return Database.SqlQuery<SprocReport>("SELECT LegalEntityBaseID, EntityName FROM dbo.LegalEntity");
}

It’s good practice to encapsulate those calls inside the DbContext derived class.  Here the class we use is SprocReport:

public class SprocReport
{
    public int LegalEntityBaseID { get; set; }
    public string EntityName { get; set; }
}

This class isn’t an entity and its property are mapped directly:  you have no control on the mapping.  Even if you use a complex type and override some mapping, those overrides won’t be honoured.

Now if you use DbSet.SqlQuery, the returned entities will be tracked by the change tracker and therefore, if you make modifications to those entities, they will be caught when DbContext.SaveChanges is called.  On the other hand, no column mapping overrides would be honoured there either.

Another way we could bypass EF query-mapping management would be by querying the store using Entity SQL.  Remember the Entity Framework maps an entity model to a physical model and LINQ queries are transformed in queries to the entity model (using eSQL syntax) before being transformed to native queries to the underlying data store (e.g. TSQL).

For instance, we could create an entity set without exposing it:

protected override void OnModelCreating(DbModelBuilder modelBuilder)
{
    base.OnModelCreating(modelBuilder);

    modelBuilder.Entity<SimpleEntry>().HasEntitySetName("MyEntry");
    modelBuilder.Entity<SimpleEntry>().ToTable("MyEntry", "man");
    modelBuilder.Entity<SimpleEntry>()
        .Property(s => s.ID)
        .HasColumnName("SimpleEntryID");
    modelBuilder.Entity<SimpleEntry>()
        .Property(s => s.Name)
        .HasColumnName("SimpleEntryName");

    }

We could then expose a query:

public IEnumerable<SimpleEntry> GetSimpleEntries()
{
    IObjectContextAdapter adapter = this;
    var entries = adapter.ObjectContext.CreateQuery<SimpleEntry>("SELECT VALUE MyEntry FROM MyEntry");

    return entries;
}

Here we fetch the underlying ObjectContext in order to query it.  There advantage of this approach versus querying SQL directly is that we can query the resulting query further using LINQ and the final query sent to SQL will be the combined query.  Therefore we can expose a query returning everything and simply query it with LINQ afterwards to obtain efficient query without ever materializing the entire table on the consumer side.

To convince ourselves that what I just said is true, let’s try it:

public IEnumerable<SimpleEntry> GetSimpleEntries()
{
    IObjectContextAdapter adapter = this;
    var entries = adapter.ObjectContext.CreateQuery<SimpleEntry>("SELECT VALUE MyEntry FROM MyEntry");
    var final = from e in entries
                where e.Name == "Mark"
                select e;
    var f = (System.Data.Objects.ObjectQuery<SimpleEntry>)final;
    var s = f.ToTraceString();

    return entries;
}

If you debug this and look at the value of s, you’ll obtain:

SELECT
[Extent1].[SimpleEntryID] AS [SimpleEntryID],
[Extent1].[SimpleEntryName] AS [SimpleEntryName]
FROM [man].[MyEntry] AS [Extent1]
WHERE N’Mark’ = [Extent1].[SimpleEntryName]

This is your typical EF rendered TSQL but you can notice that the filter applied by the LINQ query got into the SQL query.

Now if you want to intercept INSERT, UPDATE & DELETE on those tracked entities, you’re a bit on your own.  You could override the DbContext.SaveChanges, pick entities with altered states, perform you db-logic to save the changes and switch the state of those entities back to Unmodified before calling base.SaveChanges.  It’ll work but it’s start to be a bit on the stunt side.

Entity Framework 4.1: Inheritance (7)

This is part of a series of blog post about Entity Framework 4.1.  The past blog entries are:

In this article, I’ll cover inheritance.

I must say that I did struggle a fair bit with this feature.  Like the rest of EF 4.1, it isn’t officially documented, but it seems the API changed substantially between CTPs and RC, so the blog posts out there were a bit misleading for me.

A good source for me was Morteza Manavi’s blog, which I recommend.

In ORM literature, there are three ways to map tables to object hierarchies (i.e. classes related to each others with inheritance relations):

  • Table per Type (TPT):  for each class in the hierarchy, there is a table in the DB, related to each others with FK
  • Table per Hierarchy (TPH):  there is only one table for the entire hierarchy with all possible data in it
  • Table per Concrete Type (TPC):  a bit of mix of the other two, there is one table for each concrete type, which flattens the abstract types as in TPH

Here I’m going to cover TPT & TPH.  The beauty of EF 4.1 is that you can mix those ways of dealing with the mapping within one hierarchy, as I’ll show.

Let’s start with table per type.  Here I define a simple hierarchy with one abstract base class and two derived classes:

public abstract class PersonBase
    {
        public int PersonID { get; set; }
        [Required]
        public string FirstName { get; set; }
        [Required]
        public string LastName { get; set; }
        public int Age { get; set; }
    }

public class Worker : PersonBase
    {
        public decimal AnnualSalary { get; set; }
    }

public class Retired : PersonBase
{
    public decimal MonthlyPension { get; set; }
}

We need to tell the model builder how to map those classes to tables:

protected override void OnModelCreating(DbModelBuilder modelBuilder)
{
    base.OnModelCreating(modelBuilder);

    modelBuilder.Entity<PersonBase>().HasKey(x => x.PersonID);
    modelBuilder.Entity<PersonBase>().Property(x => x.PersonID)
        .HasDatabaseGeneratedOption(DatabaseGeneratedOption.Identity);
    //    TPT mapping
    modelBuilder.Entity<PersonBase>().ToTable("tpt.Person");
    modelBuilder.Entity<Worker>().ToTable("tpt.Worker");
    modelBuilder.Entity<Retired>().ToTable("tpt.Retired");
}

We use simple table name overrides, but with this information the model builder knows it has to build the database with TPT:

image

If we run some code against that model, we can appreciate how we can leverage the mapping.  Basically, we expose one and only one DbSet:  a collection of PersonBase.  EF takes care of managing which real type each member should be.

public static void ManageTPT()
{
    using (var context1 = new TptContext())
    {
        var worker = new Worker
        {
            AnnualSalary = 20000,
            Age = 25,
            FirstName = "Joe",
            LastName = "Plumber"
        };
        var retired = new Retired
        {
            MonthlyPension = 1500,
            Age = 22,
            FirstName = "Mike",
            LastName = "Smith"
        };
        //    Make sure the tables are empty…
        foreach (var entity in context1.Persons)
        {
            context1.Persons.Remove(entity);
        }
        context1.Persons.Add(worker);
        context1.Persons.Add(retired);

        context1.SaveChanges();
    }
    using (var context2 = new TptContext())
    {
        Console.WriteLine("Persons count:  " + context2.Persons.OfType<PersonBase>().Count());
        Console.WriteLine("Worker:  " + context2.Persons.OfType<Worker>().Count());
        Console.WriteLine("Retired:  " + context2.Persons.OfType<Retired>().Count());
    }
}

This is quite powerful since we can access the Workers only and EF takes care of accessing only the Worker table.

TPH is actually the default for EF.  We could simply comment out lines from the previous example to fallback on the default to see the general mechanic:

protected override void OnModelCreating(DbModelBuilder modelBuilder)
{
    base.OnModelCreating(modelBuilder);

    modelBuilder.Entity<PersonBase>().HasKey(x => x.PersonID);
    modelBuilder.Entity<PersonBase>().Property(x => x.PersonID)
        .HasDatabaseGeneratedOption(DatabaseGeneratedOption.Identity);
    //    TPT mapping
    //modelBuilder.Entity<PersonBase>().ToTable("tpt.Person");
    //modelBuilder.Entity<Worker>().ToTable("tpt.Worker");
    //modelBuilder.Entity<Retired>().ToTable("tpt.Retired");
}

This results in only one table (for the entire hierarchy):

image

We notice the entire hierarchy is flatten within one table.  The properties not found in the base class are automatically marked as nullable.  There is also an addition:  a discriminator column.  If we run the previous code sample, we’ll see how this discriminator column is used by default by looking at the content of the table afterwards:

image

The discriminator column is used by EF to know which class to instantiate when it reads a row, since all classes map to the same table.

We can override all that.  Let’s show that at the same time as showing the mix of TPH & TPT.  I’ll define two new subclasses of Worker and I want to map them both, and the Worker class, in one table:

public class Manager : Worker
{
    public int? ManagedEmployeesCount { get; set; }
}

public class FreeLancer : Worker
{
    [Required]
    public string IncCompanyName { get; set; }
}

You’ll notice that each property must be null.  This is a great inconvenience with TPH:  every property must be nullable.  Now we can instruct the model builder:

protected override void OnModelCreating(DbModelBuilder modelBuilder)
{
    base.OnModelCreating(modelBuilder);

    modelBuilder.Entity<PersonBase>().HasKey(x => x.PersonID);
    modelBuilder.Entity<PersonBase>().Property(x => x.PersonID)
        .HasDatabaseGeneratedOption(DatabaseGeneratedOption.Identity);
    //    TPT mapping
    modelBuilder.Entity<PersonBase>().ToTable("tpt.Person");
    modelBuilder.Entity<Retired>().ToTable("tpt.Retired");
    //    TPH mapping
    modelBuilder.Entity<Worker>()
        .Map<FreeLancer>(m => m.Requires(f => f.IncCompanyName).HasValue())
        .Map<Manager>(m => m.Requires(ma => ma.ManagedEmployeesCount).HasValue())
        .ToTable("tph.Worker");
}

Here I use one way of discriminating:  I asked that a column belonging only to one class be not-null for a row to be mapped to that class.  This is different than what the default does.

The consumer code isn’t dissimilar than TPT (you could change your mapping without impacting the consumer code):

public static void ManageTPH()
{
    using (var context1 = new HierarchyContext())
    {
        var worker = new Worker
        {
            AnnualSalary = 20000,
            Age = 25,
            FirstName = "Joe",
            LastName = "Plumber"
        };
        var freeLancer = new FreeLancer
        {
            Age = 22,
            FirstName = "Mike",
            LastName = "Smith",
            IncCompanyName = "Mike & Mike Inc"
        };
        var manager = new Manager
        {
            Age = 43,
            FirstName = "George",
            LastName = "Costanza",
            ManagedEmployeesCount = 12
        };
        //    Make sure the tables are empty…
        foreach (var entity in context1.Persons)
        {
            context1.Persons.Remove(entity);
        }
        context1.Persons.Add(worker);
        context1.Persons.Add(freeLancer);
        context1.Persons.Add(manager);

        context1.SaveChanges();
    }
    using (var context2 = new HierarchyContext())
    {
        Console.WriteLine("Persons count:  " + context2.Persons.OfType<PersonBase>().Count());
        Console.WriteLine("Worker:  " + context2.Persons.OfType<Worker>().Count());
        Console.WriteLine("Retired:  " + context2.Persons.OfType<Retired>().Count());
        Console.WriteLine("FreeLancer:  " + context2.Persons.OfType<FreeLancer>().Count());
        Console.WriteLine("Manager:  " + context2.Persons.OfType<Manager>().Count());
    }
}

The SQL schema is, as planned, an hybrid of TPT & TPH:

image

I haven’t found the way to use a discriminator and override it.  I do find this way of testing for nulls more interesting than having a discriminator column, which introduces yet more denormalization.

Entity Framework 4.1: Optimistic Concurrency (6)

This is part of a series of blog post about Entity Framework 4.1.  The past blog entries are:

In this article, I’ll cover optimistic concurrency.

Very often we need to manage many-users concurrency.  This is due to the speed at which humans interact with a machine compared to the speed machines interact with each other.  Typically, a human will use the better part of a minute (often more) to fill a data form while a machine transactions usually takes less than a second.

We can’t keep SQL transactions open on the server while a user is editing the corresponding form for many reasons:  contention, trust, technical reasons, etc.  .  For those reasons, we need to manage concurrency differently.  There are two big schools on concurrency management:  optimistic & pessimistic.  Pessimistic is typically more difficult since you need to implement some record-base locking mechanism and with it comes a lot problematic, such as how long should the system keep a lock before releasing it automatically.  Optimistic concurrency is easier.

The basic assumption of optimistic concurrency is that user changes rarely clashes.  This is true for applications with either few users or a lot of users but few users editing the same data at the same time.  The basic idea is that you give the service consumer a version of the data and when the consumer comes back to update, it gives you back that original version.  If the data changed in the back-end store and the version changed, the service can detect it and refuse the update.

Now the ‘version’ of the data can be different things.  It could be a version number associated to your data, it could be a ‘last modified date’ date-time field, it could be more than one field.  In Entity Framework, those are called concurrency token.  In this article, I’ll use SQL Server time-stamp feature:  this involves adding a column with the time-stamp type on the tables where we want to implement optimistic concurrency.  SQL Server takes care of updating this column each time a row is updated.

In order to tell Entity Framework that a property in an entity represents a concurrency token, you can either decorate the property with the ConcurrencyCheck attribute or use the model-builder.  I consider that concurrency tokens should be part of the model since they define business rules, so I’m going to use the attribute:

public class Order
{
    public int OrderID { get; set; }
    [Required]
    [StringLength(32, MinimumLength = 2)]
    public string OrderTitle { get; set; }
    [Required]
    [StringLength(64, MinimumLength=5)]
    public string CustomerName { get; set; }
    public DateTime TransactionDate { get; set; }
    [ConcurrencyCheck]
    [Timestamp]
    public byte[] TimeStamp { get; set; }

    public virtual List<OrderDetail> OrderDetails { get; set; }
    public virtual List<Employee> InvolvedEmployees { get; set; }
}

With this code in, optimistic concurrency will take place when we call SaveChanges on the DbContext.

The timestamp property type is byte[].  By marking the property with the attribute Timestamp, the mapping to SQL Server is done to a time-stamp SQL type column.

Now, let’s simulate optimistic problem.  The example here will go in three steps:

  • Create an order
  • Simulate the modification of the order by a user X
  • Simulate the modification of the order by the user Y, while the order has been modified already without Y’s knowledge

The way to simulate modification is to attach the entity to another Db-context, simulating service boundaries.  When we attach an entity to a db-context, it assumes the entity is unmodified, so we modify it afterwards. 

private static void ConcurrencyCheck()
{
    Order originalOrder;

    //    Create an order
    using (var context1 = new MyDomainContext())
    {
        originalOrder = new Order
        {
            OrderTitle = "Paper",
            CustomerName = "*Bob*",
            TransactionDate = DateTime.Now
        };

        context1.Orders.Add(originalOrder);
        context1.SaveChanges();
    }
    //    Simulate the modification of the created order by user X
    using (var context2 = new MyDomainContext())
    {    //    Recreate the order object in order to attach it
        var order = new Order
        {
            OrderID = originalOrder.OrderID,
            OrderTitle = originalOrder.OrderTitle,
            CustomerName = originalOrder.CustomerName,
            TransactionDate = originalOrder.TransactionDate,
            TimeStamp = originalOrder.TimeStamp
        };

        context2.Orders.Attach(order);

        //    Alter the order
        order.CustomerName = "Robert";

        context2.SaveChanges();
    }
    //    Simulate the modification of the created order by user Y (after user X already modified it)
    using (var context3 = new MyDomainContext())
    {    //    Recreate the order in order to attach it
        var order = new Order
        {
            OrderID = originalOrder.OrderID,
            OrderTitle = originalOrder.OrderTitle,
            CustomerName = originalOrder.CustomerName,
            TransactionDate = originalOrder.TransactionDate,
            TimeStamp = originalOrder.TimeStamp
        };

        context3.Orders.Attach(order);

        //    Alter the order
        order.CustomerName = "Luke**";

        try
        {
            context3.SaveChanges();
        }
        catch (DbUpdateConcurrencyException ex)
        {
            Console.WriteLine("Concurrency exception on " + ex.Entries.First().Entity.GetType().Name);
        }
    }
}

You could also force EF to believe the order is modified:

context3.Entry(order).State = EntityState.Modified;

So basically, EF supports optimistic concurrency out-of-the-box in a fairly simple form.  The trick is more around state management with EF:  when you attach entities to contexts, make sure they are in the right state:

  • Detached
  • Unchanged
  • Added
  • Deleted
  • Modified

Entity Framework 4.1: Many to Many Relationships (5)

This is part of a series of blog post about Entity Framework 4.1.  The past blog entries are:

In this article, I’ll cover the many to many relationships.

Let’s start with the easiest case, we’ll let EF infer the table mapping.  I model a many-to-many relationship between order and employee:

public class Order
{
    public int OrderID { get; set; }
    [Required]
    [StringLength(32, MinimumLength = 2)]
    public string OrderTitle { get; set; }
    [Required]
    [StringLength(64, MinimumLength=5)]
    public string CustomerName { get; set; }
    public DateTime TransactionDate { get; set; }
    public byte[] TimeStamp { get; set; }

    public virtual List<OrderDetail> OrderDetails { get; set; }
    public virtual List<Employee> InvolvedEmployees { get; set; }
}

public class Employee
{
    public int EmployeeID { get; set; }
    public string EmployeeName { get; set; }

    public virtual List<Order> Orders { get; set; }
}

I simply put a list of employees in order and a list of orders in employee.  Voila!  Here are the mapped tables:

image

Now, we might want to control two things:

  • The name of the relation table
  • The name of the two columns in the relation table

This can all be done with the following code:

modelBuilder.Entity<Employee>()
    .HasMany(e => e.Orders)
    .WithMany(e => e.InvolvedEmployees)
    .Map(m =>
    {
        m.ToTable("EmployeeOrder");
        m.MapLeftKey("EmployeeID");
        m.MapRightKey("OrderID");
    });

Basically, we say that an employee as many orders, that each employee has many orders (hence we have a many-to-many relationship).  We then go and say that the relation table should be named EmployeeOrder, the left key (from employee perspective, so the employee key) should be named employee-id and the right key order-id:

image

So there you go, you can control a table that doesn’t directly map to a class.

In terms of using such a model, it is quite straightforward and natural:

private static void ManyToMany()
{
    using (var context = new MyDomainContext())
    {
        var order = new Order
        {
            OrderTitle = "Pens",
            CustomerName = "Mcdo’s",
            TransactionDate = DateTime.Now,
            InvolvedEmployees = new List<Employee>()
        };
        var employee1 = new Employee { EmployeeName = "Joe", Orders = new List<Order>() };
        var employee2 = new Employee { EmployeeName = "Black", Orders = new List<Order>() };

        context.Orders.Add(order);

        order.InvolvedEmployees.Add(employee1);
        order.InvolvedEmployees.Add(employee2);

        context.SaveChanges();
    }

In this example I didn’t even bother to add the employees in the employees collection of the data context:  EF took care of that for me since they were referenced in a collection.

Entity Framework 4.1: Complex Types (4)

This is part of a series of blog post about Entity Framework 4.1.  The past blog entries are:

In this article, I’ll cover the complex types.

By default, EF 4.1 is mapping classes to table.  That is convenient but sometimes, we would like our model to be a bit more granular than the tables.

A classical example is an address class.  Let’s say we have the following client class:

public class Client
{
    public int ClientID { get; set; }
    [Required]
    [StringLength(32, MinimumLength=2)]
    public string ClientName { get; set; }
    public Address ResidentialAddress { get; set; }
    public Address DeliveryAddress { get; set; }
}

public class Address
{
    [Required]
    public int StreetNumber { get; set; }
    [Required]
    [StringLength(32, MinimumLength=2)]
    public string StreetName { get; set; }
}

We do not want the two address properties to map to an entry in an address table.  Instead we want EF 4.1 to map the entire class to one table, flattening the address properties.  How do we go about that?  Enters complex types.

As we’ve seen previously, there’s always an attribute and a model-builder path when we want to override conventions.  As I mentioned, I recommend going with the attributes when we enrich the model with business elements (e.g. required field).  I consider complex types as a detail of mapping classes to tables, hence I won’t use attributes.  I’ll do it all the model builder way:

protected override void OnModelCreating(DbModelBuilder modelBuilder)
{
    base.OnModelCreating(modelBuilder);

    modelBuilder.Entity<Client>().Property(x => x.ClientID)
        .HasDatabaseGeneratedOption(DatabaseGeneratedOption.Identity);
    modelBuilder.ComplexType<Address>();
    modelBuilder.Entity<Client>().Property(i => i.ResidentialAddress.StreetNumber).HasColumnName("ResStreetNumber");
    modelBuilder.Entity<Client>().Property(i => i.ResidentialAddress.StreetName).HasColumnName("ResStreetName");
    modelBuilder.Entity<Client>().Property(i => i.DeliveryAddress.StreetNumber).HasColumnName("DelStreetNumber");
    modelBuilder.Entity<Client>().Property(i => i.DeliveryAddress.StreetName).HasColumnName("DelStreetName");

}

First I specify that client-id is an identity column (auto-incremental).  Then I specify that the address class is a complex type.  You can also put [ComplexType] on top of the class if you want.  Then I do map each sub property (handing out the path of the sub property in the lambda expression) to a column name.  This generates the following table:

image

Now I can use the models:

    using (var context1 = new MyDomainContext())
    {
        var client = new Client
        {
            ClientName = "Joe",
            ResidentialAddress = new Address
            {
                StreetNumber = 15,
                StreetName = "Oxford"
            },
            DeliveryAddress = new Address
            {
                StreetNumber = 514,
                StreetName = "Nolif"
            }
        };
        context1.Clients.Add(client);

        context1.SaveChanges();
    }
    using (var context2 = new MyDomainContext())
    {
        var clients = from w in context2.Clients
                      where w.ClientName == "Joe"
                      select w;

        foreach (var client in clients)
        {
            Console.WriteLine("client residential StreetNumber:  " + client.ResidentialAddress.StreetNumber);
            Console.WriteLine("client residential StreetName:  " + client.ResidentialAddress.StreetName);
            Console.WriteLine("client delivery StreetNumber:  " + client.DeliveryAddress.StreetNumber);
            Console.WriteLine("client delivery StreetName:  " + client.DeliveryAddress.StreetName);
        }
    }

The big caveat with complex types is the null management.  Even if all the properties in the complex type are nullable, you can’t just hand out a null complex type.  For instance, in this case, even if street name & street number wouldn’t be required, I couldn’t have a client with a null residential address:  I would need to new an address with everything null in it.  The same way, when you retrieve an entity, EF 4.1 will always new the complex type even if everything is null in it.

Entity Framework 4.1 Release to Web (RTW)

Two weeks ago, Microsoft published a Release Candidate (RC)  of Entity Framework 4.1.  They promised to release the product within 4 weeks.

Well, they released it today!

So you can now use Entity Framework 4.1 which is a fully supported Microsoft Product.

The only change done from the RC is the default length of nvarchar from 128 to MAX.

You can follow my series of blog on Entity Framework 4.1 to learn more about the technology.

Entity Framework 4.1: Deep Fetch vs Lazy Load (3)

This is part of a series of blog post about Entity Framework 4.1.  The past blog entries are:

In this article, I’ll cover the control of what is getting loaded in queries.

EF 4.1 is able to manage relations.  Now which relations get loaded when you do a query?  Everyone “visible” to the object?  In some cases that might make sense (e.g. when you query an entity that has only a child entity), but in many cases you would end up load a good portion of the database or at least much more data that you would like.

By default, EF 4.1 loads only the entities in the query, but it supports two features to help you control what is loaded:

  • Deep Fetch
  • Lazy Load

—————————-

UPDATE:  Typically, deep fetch is referred to as “Eager loading”.  Sorry for inventing a term!

—————————-

I came up with the name deep fetch, I’m not sure there is one for that mechanism.  If there is feel free to tell me.  Anyhow, this mechanism allow you to specify entities you would like to be loaded along the way:

using (var context = new MyDomainContext())
{

  var orders = from o in context.Orders.Include("OrderDetails")
             where o.CustomerName == "Mac"
             select o;

Here I specify that I want to load certain orders (the ones with a Mac as customer name) and that I want the order details of those orders to be loaded along the way.

You can actually look at the generated SQL query:

Console.WriteLine(orders.ToString());

EF 4.1 doesn’t generate easy to read queries and they quickly become impossible to decipher, but this one you should be able to read and you’ll see that the order details are loaded with it.

Now this brings an issue of EF 4.1 in general regarding deep fetch:  query efficiency.  If you give the exercise to a graduate to write a query to retrieve the orders and order details, chances are they’re going to write something functionally equivalent to the generated query.  If you’re lucky they might be smarter and write a query returning two result sets:  one for the orders and one for the details.  This would be much more efficient since you wouldn’t repeat all the order information for each order details.  For some reason EF never supported that.  Probably because it isn’t supported across all database systems and it would be forced to generate two SQL queries for one LINQ query which would probably open a can of worms.  Anyway, keep this in mind as it can easily get ugly on the performance side.

Anyhow, you could request more than one sub collections be brought along:

var orders = from o in context.Orders.Include("OrderDetails").Include("Businesses")
             where o.CustomerName == "Mac"
             select o;

The other feature you can use to control what’s brought along is lazy loading.  By default, lazy loading is supported.  If you want to disable it, you need to do it explicitly.  The best place would be in the constructor of your db context:

public MyDomainContext()
{
    this.Configuration.LazyLoadingEnabled = false;
}

Now lazy loading works as you would expect:  you request an entity-set, it gets loaded and if you try to access a sub-collection of an entity, it gets loaded on the fly, auto-magically!

How does EF knows you’re trying to access a sub collection?  You collection are POCO collection (e.g. List<EntityType>), so no events are raised if you access them.  Anyone?  It’s generating a dynamic object deriving from your entities and override your sub-collection access properties.  Yes it does.  This is why you need to mark your sub-collection access properties with the keyword virtual in order for the magic to operate:

public class Order
{
    public int OrderID { get; set; }
    public string OrderTitle { get; set; }
    public string CustomerName { get; set; }
    public DateTime TransactionDate { get; set; }
    public virtual List<OrderDetail> OrderDetails { get; set; }
    public virtual List<Business> Businesses { get; set; }
}

Let’s give some characteristics of the two mechanisms.  For deep fetch:

  • Reduces Latency (it fetches all data in one trip to the DB server)
  • You need to know in advance what you’re going to need and be explicit about it

Lazy Loading:

  • Very forgiving, since it will just load the data on requests, you do not need to plan in advance
  • Could kill performance because of latency (think of a loop on parent entities and lazy load of the children of one parent in the body of the loop)

Now, when should we use which mechanism?  I’ll give you my guidelines here:  feel free to bring up other ones.  I would use lazy loading except when you have loops with a lazy load in the body of the loop.  It might create 2-3 server queries instead of one, but it is still acceptable, especially given the shortcoming of the deep fetch query mechanism.