Calculation of the number of days between purchases

We start from a list of sales (Sales) in which, for example, the customer identifier and the dates of the purchases are indicated:

Starting data

The objective is to add to this table a column that indicates the number of days elapsed since the previous purchase (earlier in time, since nothing assures us that the order of the data in the original table is chronological, and we already know that Power BI neither ensures the order of the rows).

The resolution of this scenario is not easy. Although it could be solved in a single expression, its resolution is presented below step by step. The phases involved are the following:

  1. It is necessary to obtain, for each purchase date of customer X, the date of the previous purchase.
  2. The previous purchase date is the highest date of all customer X's purchases (except for the purchase we are analyzing, of course).
  3. We can obtain all the previous purchases of customer X by cross-joining the table with itself, taking as common field the one corresponding to the customer's identifier and considering in the right table (see the documentation of the GENERATE function) only the dates before the purchase date being considered in the left table. Thus, for example, for the purchase that appears in the third line of the previous image (corresponding to customer 1, dated January 21, 2019), a cross joining with the same table in the aforementioned conditions would return a row for the combination of row 3 and row 1 (same customer and previous date) and another for the combination of row 3 and row 2 (again, same customer and previous date).

Finally, once the table mentioned in point 1 has been obtained, it would be enough to calculate the difference between each purchase date and the date of the last purchase.

Let's do it step by step.

First we want to obtain, for each row (that is, for each purchase), the set of previous purchases of the same customer. This, as mentioned, can be achieved with a cross join that allows us to specify the conditions of the join, that is, with the GENERATE function. As this function requires that the tables involved in the combination have fields with different names, we are going to modify the names of the fields of the second copy with the SELECTCOLUMNS function, adding a "_" symbol at the end of the names:

SelfJoin = 
    GENERATE(
        Sales;
        FILTER(
            SELECTCOLUMNS(
                Sales;
                "Customer_"; Sales[Customer];
                "Purchase Date_"; Sales[Purchase Date]
            );
            [Purchase Date_] < Sales[Purchase Date] && Sales[Customer] = [Customer_]
        )
    )

Self-join using the row context created by the GENERATE function

As we can see, we are obtaining the cross join by imposing the condition that the customer identifier is the same, and that the dates in the column on the right are lower than the date considered in each of the rows on the left (for simplicity, we are assuming that the same customer will not make two purchases on the same day).

Note that the records in the first table for which there are no records in the second that meet the imposed conditions do not appear (that is, in the table that we have just created, only purchase dates for which there is a previous purchase are included).

Obtaining, from here, the date before each purchase is easy: just group (with the GROUPBY function) by customer and purchase date (Customer and Purchase Date fields) and select the maximum value (of each group) of the field Purchase Date_ :

LastPurchases = 
    GROUPBY(
        SelfJoin;
        SelfJoin[Customer];
        SelfJoin[Purchase Date];
        "Last Purchase"; MAXX(
                            CURRENTGROUP();
                            SelfJoin[Purchase Date_]
                        )
    )

Table with the dates of the last purchase

Now comes a somewhat complicated process: as the table only includes purchase dates for which there is a previous purchase, if we want all the original dates to appear in the final result we would have to perform a left outer join between the original date table and the new one. For this we would have to use the NATURALLEFTOUTERJOIN function, but this function requires that the tables to be joined have the same lineage (they come from the same source), which is not true in our case. As explained in the documentation for this function, it is possible to solve this problem by adding an empty text string to the end of the names:

Purchase and Last Purchase = 
    NATURALLEFTOUTERJOIN(
        SELECTCOLUMNS(
            Sales;
            "Customer"; Sales[Customer] & "";
            "Purchase Date"; Sales[Purchase Date] & ""
        );
        SELECTCOLUMNS(
            LastPurchases;
            "Customer"; LastPurchases[SelfJoin_Customer] & "";
            "Purchase Date"; LastPurchases[SelfJoin_Purchase Date] & "";
            "Last Purchase"; LastPurchases[Last Purchase]
        )
    )

Initial table including the date of the last purchase (if any)

Finally, to calculate the number of days since the last purchase, simply add a new column that returns the difference (only if there is a last purchase):

Days since last purchase = 
    ADDCOLUMNS(
        'Purchase and Last Purchase';
        "# days";
            VAR
                NumberOfDays =
                    'Purchase and Last Purchase'[Purchase Date] - 
                    'Purchase and Last Purchase'[Last Purchase]
            RETURN
                IF(
                    ISBLANK('Purchase and Last Purchase'[Last Purchase]);
                    BLANK();
                    NumberOfDays
                )
    )

Final table showing the number of days since the last purchase

To return the day difference, we use the IF function to confirm if the value of the field corresponding to the last purchase is a BLANK or not, for which we use the ISBLANK function.

Difficulty
High
Submitted by admin on Tue, 07/02/2019 - 16:23