Entity Framework with Asynchronous behaviours

They finally did it:  the future release of Entity Framework (version 6) will sport asynchronous behaviour based on .NET 4.0 Task Parallel Library (TPL).

The API is pretty neat.  First the SaveChanges gets an async brother SaveChangesAsync returning a Task.  So we can now write things like:

await context.SavesChangesAsync();

The more complicated topic is the queries.  LINQ was designed before TPL and doesn’t have the notion of asynchrony.  They got around it in a clever fashion:  LINQ describe the queries while EF allows you to enumerate the result in an asynchronous fashion:

var q = from e in context.Employees

  where e.Name.StartsWith("V")

  select e;

 

await q.ForEachAsync(e => Console.WriteLine(e.FirstName));

So the entire enumeration is done asynchronously hence Entity Framework can manage the moment when it needs to fetch the DB for new objects.

This new feature is quite powerful since DB access is typically a place where your thread blocks, waiting for something external.  For instance, a web service doing a query and returning data is typically written synchronously with the thread blocking waiting for the DB server.  Using this new asynchronous mode, we can as easily write an asynchronous version, much more scalable since no threads are blocking, hence more thread can be used to process requests.

Entity Framework 4.1 Series

I did a few Proof of Concepts (POC) around the new Entity Framework 4.1 as part of an SOA project I’m currently on and I decided to share the acquired knowledge.  The following series of blog posts are the result:

Enjoy!

UPDATE:  If you are more comfortable reading with Chinese characters, haoguanjun did a Chinese translation right over here!

Entity Framework 4.1: Bypassing EF query mapping (8)

This is part of a series of blog post about Entity Framework 4.1.  The past blog entries are:

In this final article, I’ll cover some ways to bypass EF query mapping.

As all good framework, EF recognize it’s not perfect and won’t ever cover all the angles and it does support opening to the underlying Framework, ADO.NET, by allowing you to call the database directly.

There are three (3) APIs opening the hood to ad-hoc:

  • DbContext.Database.ExecuteSqlCommand
  • DbContext.Database.SqlQuery
  • DbSet.SqlQuery

The first API isn’t very interesting, as it looks identical to its ADO.NET counterpart:

public int ExecuteSqlCommand(string sql, params object[] parameters);

The second one is more interesting:

public IEnumerable<TElement> SqlQuery<TElement>(string sql, params object[] parameters);

We can use this method to send SQL command (either a stored procedure call or ad-hoc SQL) directly to the database.  It delivers a little more than ADO.NET in that it’s able to convert the data-readers directly in entities.

TElement could be any class.  An important characteristic is that EF doesn’t track returned object, even if they are entity types.  This contrasts with the third API, part of DbSet, which tracks returned objects.

Let’s try DbContext.Database.SqlQuery:

public IEnumerable<SprocReport> GetEntityList()
{
    return Database.SqlQuery<SprocReport>("SELECT LegalEntityBaseID, EntityName FROM dbo.LegalEntity");
}

It’s good practice to encapsulate those calls inside the DbContext derived class.  Here the class we use is SprocReport:

public class SprocReport
{
    public int LegalEntityBaseID { get; set; }
    public string EntityName { get; set; }
}

This class isn’t an entity and its property are mapped directly:  you have no control on the mapping.  Even if you use a complex type and override some mapping, those overrides won’t be honoured.

Now if you use DbSet.SqlQuery, the returned entities will be tracked by the change tracker and therefore, if you make modifications to those entities, they will be caught when DbContext.SaveChanges is called.  On the other hand, no column mapping overrides would be honoured there either.

Another way we could bypass EF query-mapping management would be by querying the store using Entity SQL.  Remember the Entity Framework maps an entity model to a physical model and LINQ queries are transformed in queries to the entity model (using eSQL syntax) before being transformed to native queries to the underlying data store (e.g. TSQL).

For instance, we could create an entity set without exposing it:

protected override void OnModelCreating(DbModelBuilder modelBuilder)
{
    base.OnModelCreating(modelBuilder);

    modelBuilder.Entity<SimpleEntry>().HasEntitySetName("MyEntry");
    modelBuilder.Entity<SimpleEntry>().ToTable("MyEntry", "man");
    modelBuilder.Entity<SimpleEntry>()
        .Property(s => s.ID)
        .HasColumnName("SimpleEntryID");
    modelBuilder.Entity<SimpleEntry>()
        .Property(s => s.Name)
        .HasColumnName("SimpleEntryName");

    }

We could then expose a query:

public IEnumerable<SimpleEntry> GetSimpleEntries()
{
    IObjectContextAdapter adapter = this;
    var entries = adapter.ObjectContext.CreateQuery<SimpleEntry>("SELECT VALUE MyEntry FROM MyEntry");

    return entries;
}

Here we fetch the underlying ObjectContext in order to query it.  There advantage of this approach versus querying SQL directly is that we can query the resulting query further using LINQ and the final query sent to SQL will be the combined query.  Therefore we can expose a query returning everything and simply query it with LINQ afterwards to obtain efficient query without ever materializing the entire table on the consumer side.

To convince ourselves that what I just said is true, let’s try it:

public IEnumerable<SimpleEntry> GetSimpleEntries()
{
    IObjectContextAdapter adapter = this;
    var entries = adapter.ObjectContext.CreateQuery<SimpleEntry>("SELECT VALUE MyEntry FROM MyEntry");
    var final = from e in entries
                where e.Name == "Mark"
                select e;
    var f = (System.Data.Objects.ObjectQuery<SimpleEntry>)final;
    var s = f.ToTraceString();

    return entries;
}

If you debug this and look at the value of s, you’ll obtain:

SELECT
[Extent1].[SimpleEntryID] AS [SimpleEntryID],
[Extent1].[SimpleEntryName] AS [SimpleEntryName]
FROM [man].[MyEntry] AS [Extent1]
WHERE N’Mark’ = [Extent1].[SimpleEntryName]

This is your typical EF rendered TSQL but you can notice that the filter applied by the LINQ query got into the SQL query.

Now if you want to intercept INSERT, UPDATE & DELETE on those tracked entities, you’re a bit on your own.  You could override the DbContext.SaveChanges, pick entities with altered states, perform you db-logic to save the changes and switch the state of those entities back to Unmodified before calling base.SaveChanges.  It’ll work but it’s start to be a bit on the stunt side.

Entity Framework 4.1: Inheritance (7)

This is part of a series of blog post about Entity Framework 4.1.  The past blog entries are:

In this article, I’ll cover inheritance.

I must say that I did struggle a fair bit with this feature.  Like the rest of EF 4.1, it isn’t officially documented, but it seems the API changed substantially between CTPs and RC, so the blog posts out there were a bit misleading for me.

A good source for me was Morteza Manavi’s blog, which I recommend.

In ORM literature, there are three ways to map tables to object hierarchies (i.e. classes related to each others with inheritance relations):

  • Table per Type (TPT):  for each class in the hierarchy, there is a table in the DB, related to each others with FK
  • Table per Hierarchy (TPH):  there is only one table for the entire hierarchy with all possible data in it
  • Table per Concrete Type (TPC):  a bit of mix of the other two, there is one table for each concrete type, which flattens the abstract types as in TPH

Here I’m going to cover TPT & TPH.  The beauty of EF 4.1 is that you can mix those ways of dealing with the mapping within one hierarchy, as I’ll show.

Let’s start with table per type.  Here I define a simple hierarchy with one abstract base class and two derived classes:

public abstract class PersonBase
    {
        public int PersonID { get; set; }
        [Required]
        public string FirstName { get; set; }
        [Required]
        public string LastName { get; set; }
        public int Age { get; set; }
    }

public class Worker : PersonBase
    {
        public decimal AnnualSalary { get; set; }
    }

public class Retired : PersonBase
{
    public decimal MonthlyPension { get; set; }
}

We need to tell the model builder how to map those classes to tables:

protected override void OnModelCreating(DbModelBuilder modelBuilder)
{
    base.OnModelCreating(modelBuilder);

    modelBuilder.Entity<PersonBase>().HasKey(x => x.PersonID);
    modelBuilder.Entity<PersonBase>().Property(x => x.PersonID)
        .HasDatabaseGeneratedOption(DatabaseGeneratedOption.Identity);
    //    TPT mapping
    modelBuilder.Entity<PersonBase>().ToTable("tpt.Person");
    modelBuilder.Entity<Worker>().ToTable("tpt.Worker");
    modelBuilder.Entity<Retired>().ToTable("tpt.Retired");
}

We use simple table name overrides, but with this information the model builder knows it has to build the database with TPT:

image

If we run some code against that model, we can appreciate how we can leverage the mapping.  Basically, we expose one and only one DbSet:  a collection of PersonBase.  EF takes care of managing which real type each member should be.

public static void ManageTPT()
{
    using (var context1 = new TptContext())
    {
        var worker = new Worker
        {
            AnnualSalary = 20000,
            Age = 25,
            FirstName = "Joe",
            LastName = "Plumber"
        };
        var retired = new Retired
        {
            MonthlyPension = 1500,
            Age = 22,
            FirstName = "Mike",
            LastName = "Smith"
        };
        //    Make sure the tables are empty…
        foreach (var entity in context1.Persons)
        {
            context1.Persons.Remove(entity);
        }
        context1.Persons.Add(worker);
        context1.Persons.Add(retired);

        context1.SaveChanges();
    }
    using (var context2 = new TptContext())
    {
        Console.WriteLine("Persons count:  " + context2.Persons.OfType<PersonBase>().Count());
        Console.WriteLine("Worker:  " + context2.Persons.OfType<Worker>().Count());
        Console.WriteLine("Retired:  " + context2.Persons.OfType<Retired>().Count());
    }
}

This is quite powerful since we can access the Workers only and EF takes care of accessing only the Worker table.

TPH is actually the default for EF.  We could simply comment out lines from the previous example to fallback on the default to see the general mechanic:

protected override void OnModelCreating(DbModelBuilder modelBuilder)
{
    base.OnModelCreating(modelBuilder);

    modelBuilder.Entity<PersonBase>().HasKey(x => x.PersonID);
    modelBuilder.Entity<PersonBase>().Property(x => x.PersonID)
        .HasDatabaseGeneratedOption(DatabaseGeneratedOption.Identity);
    //    TPT mapping
    //modelBuilder.Entity<PersonBase>().ToTable("tpt.Person");
    //modelBuilder.Entity<Worker>().ToTable("tpt.Worker");
    //modelBuilder.Entity<Retired>().ToTable("tpt.Retired");
}

This results in only one table (for the entire hierarchy):

image

We notice the entire hierarchy is flatten within one table.  The properties not found in the base class are automatically marked as nullable.  There is also an addition:  a discriminator column.  If we run the previous code sample, we’ll see how this discriminator column is used by default by looking at the content of the table afterwards:

image

The discriminator column is used by EF to know which class to instantiate when it reads a row, since all classes map to the same table.

We can override all that.  Let’s show that at the same time as showing the mix of TPH & TPT.  I’ll define two new subclasses of Worker and I want to map them both, and the Worker class, in one table:

public class Manager : Worker
{
    public int? ManagedEmployeesCount { get; set; }
}

public class FreeLancer : Worker
{
    [Required]
    public string IncCompanyName { get; set; }
}

You’ll notice that each property must be null.  This is a great inconvenience with TPH:  every property must be nullable.  Now we can instruct the model builder:

protected override void OnModelCreating(DbModelBuilder modelBuilder)
{
    base.OnModelCreating(modelBuilder);

    modelBuilder.Entity<PersonBase>().HasKey(x => x.PersonID);
    modelBuilder.Entity<PersonBase>().Property(x => x.PersonID)
        .HasDatabaseGeneratedOption(DatabaseGeneratedOption.Identity);
    //    TPT mapping
    modelBuilder.Entity<PersonBase>().ToTable("tpt.Person");
    modelBuilder.Entity<Retired>().ToTable("tpt.Retired");
    //    TPH mapping
    modelBuilder.Entity<Worker>()
        .Map<FreeLancer>(m => m.Requires(f => f.IncCompanyName).HasValue())
        .Map<Manager>(m => m.Requires(ma => ma.ManagedEmployeesCount).HasValue())
        .ToTable("tph.Worker");
}

Here I use one way of discriminating:  I asked that a column belonging only to one class be not-null for a row to be mapped to that class.  This is different than what the default does.

The consumer code isn’t dissimilar than TPT (you could change your mapping without impacting the consumer code):

public static void ManageTPH()
{
    using (var context1 = new HierarchyContext())
    {
        var worker = new Worker
        {
            AnnualSalary = 20000,
            Age = 25,
            FirstName = "Joe",
            LastName = "Plumber"
        };
        var freeLancer = new FreeLancer
        {
            Age = 22,
            FirstName = "Mike",
            LastName = "Smith",
            IncCompanyName = "Mike & Mike Inc"
        };
        var manager = new Manager
        {
            Age = 43,
            FirstName = "George",
            LastName = "Costanza",
            ManagedEmployeesCount = 12
        };
        //    Make sure the tables are empty…
        foreach (var entity in context1.Persons)
        {
            context1.Persons.Remove(entity);
        }
        context1.Persons.Add(worker);
        context1.Persons.Add(freeLancer);
        context1.Persons.Add(manager);

        context1.SaveChanges();
    }
    using (var context2 = new HierarchyContext())
    {
        Console.WriteLine("Persons count:  " + context2.Persons.OfType<PersonBase>().Count());
        Console.WriteLine("Worker:  " + context2.Persons.OfType<Worker>().Count());
        Console.WriteLine("Retired:  " + context2.Persons.OfType<Retired>().Count());
        Console.WriteLine("FreeLancer:  " + context2.Persons.OfType<FreeLancer>().Count());
        Console.WriteLine("Manager:  " + context2.Persons.OfType<Manager>().Count());
    }
}

The SQL schema is, as planned, an hybrid of TPT & TPH:

image

I haven’t found the way to use a discriminator and override it.  I do find this way of testing for nulls more interesting than having a discriminator column, which introduces yet more denormalization.

Entity Framework 4.1: Optimistic Concurrency (6)

This is part of a series of blog post about Entity Framework 4.1.  The past blog entries are:

In this article, I’ll cover optimistic concurrency.

Very often we need to manage many-users concurrency.  This is due to the speed at which humans interact with a machine compared to the speed machines interact with each other.  Typically, a human will use the better part of a minute (often more) to fill a data form while a machine transactions usually takes less than a second.

We can’t keep SQL transactions open on the server while a user is editing the corresponding form for many reasons:  contention, trust, technical reasons, etc.  .  For those reasons, we need to manage concurrency differently.  There are two big schools on concurrency management:  optimistic & pessimistic.  Pessimistic is typically more difficult since you need to implement some record-base locking mechanism and with it comes a lot problematic, such as how long should the system keep a lock before releasing it automatically.  Optimistic concurrency is easier.

The basic assumption of optimistic concurrency is that user changes rarely clashes.  This is true for applications with either few users or a lot of users but few users editing the same data at the same time.  The basic idea is that you give the service consumer a version of the data and when the consumer comes back to update, it gives you back that original version.  If the data changed in the back-end store and the version changed, the service can detect it and refuse the update.

Now the ‘version’ of the data can be different things.  It could be a version number associated to your data, it could be a ‘last modified date’ date-time field, it could be more than one field.  In Entity Framework, those are called concurrency token.  In this article, I’ll use SQL Server time-stamp feature:  this involves adding a column with the time-stamp type on the tables where we want to implement optimistic concurrency.  SQL Server takes care of updating this column each time a row is updated.

In order to tell Entity Framework that a property in an entity represents a concurrency token, you can either decorate the property with the ConcurrencyCheck attribute or use the model-builder.  I consider that concurrency tokens should be part of the model since they define business rules, so I’m going to use the attribute:

public class Order
{
    public int OrderID { get; set; }
    [Required]
    [StringLength(32, MinimumLength = 2)]
    public string OrderTitle { get; set; }
    [Required]
    [StringLength(64, MinimumLength=5)]
    public string CustomerName { get; set; }
    public DateTime TransactionDate { get; set; }
    [ConcurrencyCheck]
    [Timestamp]
    public byte[] TimeStamp { get; set; }

    public virtual List<OrderDetail> OrderDetails { get; set; }
    public virtual List<Employee> InvolvedEmployees { get; set; }
}

With this code in, optimistic concurrency will take place when we call SaveChanges on the DbContext.

The timestamp property type is byte[].  By marking the property with the attribute Timestamp, the mapping to SQL Server is done to a time-stamp SQL type column.

Now, let’s simulate optimistic problem.  The example here will go in three steps:

  • Create an order
  • Simulate the modification of the order by a user X
  • Simulate the modification of the order by the user Y, while the order has been modified already without Y’s knowledge

The way to simulate modification is to attach the entity to another Db-context, simulating service boundaries.  When we attach an entity to a db-context, it assumes the entity is unmodified, so we modify it afterwards. 

private static void ConcurrencyCheck()
{
    Order originalOrder;

    //    Create an order
    using (var context1 = new MyDomainContext())
    {
        originalOrder = new Order
        {
            OrderTitle = "Paper",
            CustomerName = "*Bob*",
            TransactionDate = DateTime.Now
        };

        context1.Orders.Add(originalOrder);
        context1.SaveChanges();
    }
    //    Simulate the modification of the created order by user X
    using (var context2 = new MyDomainContext())
    {    //    Recreate the order object in order to attach it
        var order = new Order
        {
            OrderID = originalOrder.OrderID,
            OrderTitle = originalOrder.OrderTitle,
            CustomerName = originalOrder.CustomerName,
            TransactionDate = originalOrder.TransactionDate,
            TimeStamp = originalOrder.TimeStamp
        };

        context2.Orders.Attach(order);

        //    Alter the order
        order.CustomerName = "Robert";

        context2.SaveChanges();
    }
    //    Simulate the modification of the created order by user Y (after user X already modified it)
    using (var context3 = new MyDomainContext())
    {    //    Recreate the order in order to attach it
        var order = new Order
        {
            OrderID = originalOrder.OrderID,
            OrderTitle = originalOrder.OrderTitle,
            CustomerName = originalOrder.CustomerName,
            TransactionDate = originalOrder.TransactionDate,
            TimeStamp = originalOrder.TimeStamp
        };

        context3.Orders.Attach(order);

        //    Alter the order
        order.CustomerName = "Luke**";

        try
        {
            context3.SaveChanges();
        }
        catch (DbUpdateConcurrencyException ex)
        {
            Console.WriteLine("Concurrency exception on " + ex.Entries.First().Entity.GetType().Name);
        }
    }
}

You could also force EF to believe the order is modified:

context3.Entry(order).State = EntityState.Modified;

So basically, EF supports optimistic concurrency out-of-the-box in a fairly simple form.  The trick is more around state management with EF:  when you attach entities to contexts, make sure they are in the right state:

  • Detached
  • Unchanged
  • Added
  • Deleted
  • Modified

Entity Framework 4.1: Many to Many Relationships (5)

This is part of a series of blog post about Entity Framework 4.1.  The past blog entries are:

In this article, I’ll cover the many to many relationships.

Let’s start with the easiest case, we’ll let EF infer the table mapping.  I model a many-to-many relationship between order and employee:

public class Order
{
    public int OrderID { get; set; }
    [Required]
    [StringLength(32, MinimumLength = 2)]
    public string OrderTitle { get; set; }
    [Required]
    [StringLength(64, MinimumLength=5)]
    public string CustomerName { get; set; }
    public DateTime TransactionDate { get; set; }
    public byte[] TimeStamp { get; set; }

    public virtual List<OrderDetail> OrderDetails { get; set; }
    public virtual List<Employee> InvolvedEmployees { get; set; }
}

public class Employee
{
    public int EmployeeID { get; set; }
    public string EmployeeName { get; set; }

    public virtual List<Order> Orders { get; set; }
}

I simply put a list of employees in order and a list of orders in employee.  Voila!  Here are the mapped tables:

image

Now, we might want to control two things:

  • The name of the relation table
  • The name of the two columns in the relation table

This can all be done with the following code:

modelBuilder.Entity<Employee>()
    .HasMany(e => e.Orders)
    .WithMany(e => e.InvolvedEmployees)
    .Map(m =>
    {
        m.ToTable("EmployeeOrder");
        m.MapLeftKey("EmployeeID");
        m.MapRightKey("OrderID");
    });

Basically, we say that an employee as many orders, that each employee has many orders (hence we have a many-to-many relationship).  We then go and say that the relation table should be named EmployeeOrder, the left key (from employee perspective, so the employee key) should be named employee-id and the right key order-id:

image

So there you go, you can control a table that doesn’t directly map to a class.

In terms of using such a model, it is quite straightforward and natural:

private static void ManyToMany()
{
    using (var context = new MyDomainContext())
    {
        var order = new Order
        {
            OrderTitle = "Pens",
            CustomerName = "Mcdo’s",
            TransactionDate = DateTime.Now,
            InvolvedEmployees = new List<Employee>()
        };
        var employee1 = new Employee { EmployeeName = "Joe", Orders = new List<Order>() };
        var employee2 = new Employee { EmployeeName = "Black", Orders = new List<Order>() };

        context.Orders.Add(order);

        order.InvolvedEmployees.Add(employee1);
        order.InvolvedEmployees.Add(employee2);

        context.SaveChanges();
    }

In this example I didn’t even bother to add the employees in the employees collection of the data context:  EF took care of that for me since they were referenced in a collection.