• About Me
  • Development Posts
  • Development Videos
Mario Mamalis
Category:

Development Posts

Artificial IntelligenceDevelopment Posts

Data Science: Food Hub Data Analysis

by Mario Mamalis February 21, 2025
written by Mario Mamalis
Data Science Analysis

Explore a data science case study, showcasing data-driven strategies for exploring, cleaning and preparing data for statistical analysis and visualizations.

Project Foundations for Data Science: FoodHub Data Analysis¶

Context¶

The number of restaurants in New York is increasing day by day. Lots of students and busy professionals rely on those restaurants due to their hectic lifestyles. Online food delivery service is a great option for them. It provides them with good food from their favorite restaurants. A food aggregator company FoodHub offers access to multiple restaurants through a single smartphone app.

The app allows the restaurants to receive a direct online order from a customer. The app assigns a delivery person from the company to pick up the order after it is confirmed by the restaurant. The delivery person then uses the map to reach the restaurant and waits for the food package. Once the food package is handed over to the delivery person, he/she confirms the pick-up in the app and travels to the customer’s location to deliver the food. The delivery person confirms the drop-off in the app after delivering the food package to the customer. The customer can rate the order in the app. The food aggregator earns money by collecting a fixed margin of the delivery order from the restaurants.

Objective¶

The food aggregator company has stored the data of the different orders made by the registered customers in their online portal. They want to analyze the data to get a fair idea about the demand of different restaurants which will help them in enhancing their customer experience. Suppose you are hired as a Data Scientist in this company and the Data Science team has shared some of the key questions that need to be answered. Perform the data analysis to find answers to these questions that will help the company to improve the business.

Data Description¶

The data contains the different data related to a food order. The detailed data dictionary is given below.

Data Dictionary¶

  • order_id: Unique ID of the order
  • customer_id: ID of the customer who ordered the food
  • restaurant_name: Name of the restaurant
  • cuisine_type: Cuisine ordered by the customer
  • cost: Cost of the order
  • day_of_the_week: Indicates whether the order is placed on a weekday or weekend (The weekday is from Monday to Friday and the weekend is Saturday and Sunday)
  • rating: Rating given by the customer out of 5
  • food_preparation_time: Time (in minutes) taken by the restaurant to prepare the food. This is calculated by taking the difference between the timestamps of the restaurant’s order confirmation and the delivery person’s pick-up confirmation.
  • delivery_time: Time (in minutes) taken by the delivery person to deliver the food package. This is calculated by taking the difference between the timestamps of the delivery person’s pick-up confirmation and drop-off information

Let us start by importing the required libraries¶

In [9]:

# import libraries for data manipulation
import numpy as np
import pandas as pd
# import libraries for data visualization
import matplotlib.pyplot as plt
import seaborn as sns

Understanding the structure of the data¶

In [10]:

#Access the drive
from google.colab import drive
drive.mount('/content/drive')
Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).

In [31]:

# read the data
df = pd.read_csv('/content/drive/MyDrive/MIT-ADSP/0131-Foundations-Python-Statistics/Week 2/Project - Food Hub/foodhub_order.csv')
# returns the first 5 rows
df.head()

Out[31]:

order_id customer_id restaurant_name cuisine_type cost_of_the_order day_of_the_week rating food_preparation_time delivery_time
0 1477147 337525 Hangawi Korean 30.75 Weekend Not given 25 20
1 1477685 358141 Blue Ribbon Sushi Izakaya Japanese 12.08 Weekend Not given 25 23
2 1477070 66393 Cafe Habana Mexican 12.23 Weekday 5 23 28
3 1477334 106968 Blue Ribbon Fried Chicken American 29.20 Weekend 3 25 15
4 1478249 76942 Dirty Bird to Go American 11.59 Weekday 4 25 24

Observations:¶

The DataFrame has 9 columns as mentioned in the Data Dictionary. Data in each row corresponds to the order placed by a customer.

Question 1: How many rows and columns are present in the data? [0.5 mark]¶

In [32]:

# Write your code here
rows, columns = df.shape
print(f"There are {rows} rows and {columns} columns present in the data.")
There are 1898 rows and 9 columns present in the data.

Observations:¶

There are 1898 rows and 9 columns present in the data. This means that there are 1898 orders that we can analyze.

Question 2: What are the datatypes of the different columns in the dataset? (The info() function can be used) [0.5 mark]¶

In [33]:

# Use info() to print a concise summary of the DataFrame
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1898 entries, 0 to 1897
Data columns (total 9 columns):
 #   Column                 Non-Null Count  Dtype  
---  ------                 --------------  -----  
 0   order_id               1898 non-null   int64  
 1   customer_id            1898 non-null   int64  
 2   restaurant_name        1898 non-null   object 
 3   cuisine_type           1898 non-null   object 
 4   cost_of_the_order      1898 non-null   float64
 5   day_of_the_week        1898 non-null   object 
 6   rating                 1898 non-null   object 
 7   food_preparation_time  1898 non-null   int64  
 8   delivery_time          1898 non-null   int64  
dtypes: float64(1), int64(4), object(4)
memory usage: 133.6+ KB

Observations:¶

  • order_id, customer_id, food_preparation_time and delivery_time are integers.
  • restaurant_name, cuisine_type, day_of_the_week and rating are strings.
  • cost_of_ther_order is a float (decimal)
  • rating should be a number. I will convert that at some point.

Question 3: Are there any missing values in the data? If yes, treat them using an appropriate method. [1 mark]¶

In [34]:

# Write your code here
df.isnull().sum()

Out[34]:

0
order_id 0
customer_id 0
restaurant_name 0
cuisine_type 0
cost_of_the_order 0
day_of_the_week 0
rating 0
food_preparation_time 0
delivery_time 0

Observations:¶

  1. There were no columns on any rows without values.
  2. There are instances of certain rows with “rating” missing and instead having the string “Not given”. In order to address this, I will convert the rating values to integer values later in this workbook since the rest of the ratings are integers (in string format before).

Question 4: Check the statistical summary of the data. What is the minimum, average, and maximum time it takes for food to be prepared once an order is placed? [2 marks]¶

In [35]:

# Write your code here
df.describe()

Out[35]:

order_id customer_id cost_of_the_order food_preparation_time delivery_time
count 1.898000e+03 1898.000000 1898.000000 1898.000000 1898.000000
mean 1.477496e+06 171168.478398 16.498851 27.371970 24.161749
std 5.480497e+02 113698.139743 7.483812 4.632481 4.972637
min 1.476547e+06 1311.000000 4.470000 20.000000 15.000000
25% 1.477021e+06 77787.750000 12.080000 23.000000 20.000000
50% 1.477496e+06 128600.000000 14.140000 27.000000 25.000000
75% 1.477970e+06 270525.000000 22.297500 31.000000 28.000000
max 1.478444e+06 405334.000000 35.410000 35.000000 33.000000

In [36]:

# Write your code here
# Display only the statistics for "food_preparation_time"
food_prep_stats = df['food_preparation_time'].describe()
food_prep_stats

Out[36]:

food_preparation_time
count 1898.000000
mean 27.371970
std 4.632481
min 20.000000
25% 23.000000
50% 27.000000
75% 31.000000
max 35.000000


In [37]:

# Extract only the minimum, mean, and maximum values for "food_preparation_time"
food_prep_specific_stats = df['food_preparation_time'].agg(['min', 'mean', 'max']).round(2)
food_prep_specific_stats

Out[37]:

food_preparation_time
min 20.00
mean 27.37
max 35.00

Observations:¶

Food Prep Stats

  1. Minimum Prep Time: 20 minutes
  2. Average Prep Time: 27.37 minutes
  3. Maximun Prep Time: 35 minutes

Question 5: How many orders are not rated? [1 mark]¶

In [38]:

# Write the code here
# Count the number of rows where "rating" is "Not given"
not_given_count = (df['rating'] == "Not given").sum()
not_given_count

Out[38]:

736

Observations:¶

There are 736 instances where rating was “Not given”.

In [40]:

# Convert 'Not given' to NaN
df['rating'] = pd.to_numeric(df['rating'], errors='coerce')
# Calculate the mean rating per restaurant
mean_ratings_per_restaurant = df.groupby('restaurant_name')['rating'].mean().round()
# Use the map function to replace NaN in 'rating' with the mean rating of the respective restaurant
df['rating'] = df.apply(
    lambda row: mean_ratings_per_restaurant[row['restaurant_name']] if pd.isna(row['rating']) else row['rating'],
    axis=1
)
# Check if there are still NaN values and replace them directly without inplace=True
overall_mean_rating = round(df['rating'].mean())
df['rating'] = df['rating'].fillna(overall_mean_rating)
# Convert 'rating' to integer
df['rating'] = df['rating'].astype(int)
df.head()

Out[40]:

order_id customer_id restaurant_name cuisine_type cost_of_the_order day_of_the_week rating food_preparation_time delivery_time
0 1477147 337525 Hangawi Korean 30.75 Weekend 4 25 20
1 1477685 358141 Blue Ribbon Sushi Izakaya Japanese 12.08 Weekend 4 25 23
2 1477070 66393 Cafe Habana Mexican 12.23 Weekday 5 23 28
3 1477334 106968 Blue Ribbon Fried Chicken American 29.20 Weekend 3 25 15
4 1478249 76942 Dirty Bird to Go American 11.59 Weekday 4 25 24

In [41]:

# Count the number of rows where "rating" is "Not given" just to prove that
# we addressed that issue.
not_given_count = (df['rating'] == "Not given").sum()
not_given_count

Out[41]:

0

Observations:¶

I converted all ratings to integers and assigned the mean (targeted mean per restaurant based on other orders) rounded to the nearest integer where the value was “Not given”. I will use these ratings

Exploratory Data Analysis (EDA)¶

Univariate Analysis¶

Question 6: Explore all the variables and provide observations on their distributions. (Generally, histograms, boxplots, countplots, etc. are used for univariate exploration.) [9 marks]¶

In [42]:

# Let's eyball the girst 5 rows again
df.head()

Out[42]:

order_id customer_id restaurant_name cuisine_type cost_of_the_order day_of_the_week rating food_preparation_time delivery_time
0 1477147 337525 Hangawi Korean 30.75 Weekend 4 25 20
1 1477685 358141 Blue Ribbon Sushi Izakaya Japanese 12.08 Weekend 4 25 23
2 1477070 66393 Cafe Habana Mexican 12.23 Weekday 5 23 28
3 1477334 106968 Blue Ribbon Fried Chicken American 29.20 Weekend 3 25 15
4 1478249 76942 Dirty Bird to Go American 11.59 Weekday 4 25 24

In [51]:

# Set plot style
sns.set_style("whitegrid")
# Creating a histogram for the cost of the orders
plt.figure(figsize=(10, 6))
sns.histplot(df['cost_of_the_order'], bins=30, kde=True, color="blue")
plt.title('Distribution of Cost of the Orders')
plt.xlabel('Cost of the Order ($)')
plt.ylabel('Frequency')
plt.show()

No description has been provided for this image

Observations:¶

  1. Skewed Distribution: The distribution of the cost of orders is right-skewed, indicating that most of the orders are concentrated in the lower price range, with fewer orders as the price increases.
  2. Common Price Range: The most common price range for orders appears to be between approximately $10 and $20. This suggests that most customers opt for moderately priced items.

In [52]:

# Creating a box plot for the cost of the orders
plt.figure(figsize=(8, 6))
sns.boxplot(x=df['cost_of_the_order'])
plt.title('Box Plot of the Cost of Orders')
plt.xlabel('Cost of the Order ($)')
plt.show()

No description has been provided for this image

Observations:¶

  1. Central Tendency: The median cost of the orders is around $15-$20, suggesting that this is a typical amount customers spend per order.
  2. Spread and Variability: The interquartile range (IQR), represented by the box, is relatively compact, indicating that the majority of orders cluster around the median price, within a moderate price range.

In [53]:

# Create a count plot for "rating"
plt.figure(figsize=(8, 5))
sns.countplot(x=df['rating'], hue=df['rating'], palette="viridis", legend=False,
              order=sorted(df['rating'].unique()))
plt.title("Count of Each Rating")
plt.xlabel("Rating")
plt.ylabel("Count")
plt.show()

No description has been provided for this image

Observations:¶

  1. Overall, ratings are good, with most being 4 and above.

In [45]:

# Create a boxplot for "food_preparation_time"
plt.figure(figsize=(8, 5))
sns.boxplot(y=df['food_preparation_time'], color="green")
plt.title("Boxplot of Food Preparation Time")
plt.ylabel("Food Preparation Time (minutes)")
plt.show()

No description has been provided for this image

Observations:¶

  1. Median and Central Tendency: The median food preparation time is centered around 25 minutes. This indicates that half of the food orders are prepared in less than 25 minutes, and the other half takes longer, showcasing a balance in preparation times across the dataset.
  2. Interquartile Range (IQR): The IQR, depicted by the box, is relatively narrow, which suggests that the majority of food preparation times are consistent and cluster around the median. This consistency might be indicative of standardized processes within restaurants or similar types of dishes being ordered.
  3. Variability: The whiskers, which extend to the lowest and highest typical preparation times, show a wider range than the IQR. This variability outside the central cluster can indicate differences in restaurant efficiency, menu complexity, or operational challenges during peak times.
  4. Operational Insights: Restaurants and the food delivery service can use this data to identify and investigate the reasons behind unusually long or short preparation times. Addressing these could improve overall efficiency, reduce customer wait times, and enhance satisfaction.

In [65]:

# Creating a count plot for delivery time
plt.figure(figsize=(10, 6))
sns.countplot(x='delivery_time', data=df, hue='delivery_time', dodge=False, palette='viridis')
plt.title('Count of Orders by Delivery Time')
plt.xlabel('Delivery Time (minutes)')
plt.ylabel('Count of Orders')
plt.xticks(rotation=90)  # Rotating x-axis labels for better visibility
plt.legend().set_visible(False)  # Properly setting legend visibility
plt.show()

No description has been provided for this image
In [56]:

# Creating a box plot for delivery time
plt.figure(figsize=(8, 6))
sns.boxplot(x=df['delivery_time'])
plt.title('Box Plot of Delivery Time')
plt.xlabel('Delivery Time (minutes)')
plt.show()

No description has been provided for this image

Observations:¶

  1. Median Delivery Time: The median delivery time is visually apparent in the plot, suggesting a typical delivery time that most orders adhere to. This median time provides a central benchmark for evaluating delivery efficiency.
  2. Interquartile Range (IQR): The box, which encapsulates the middle 50% of delivery times, is relatively tight. This indicates that a majority of deliveries are consistent in duration, showing effective standardization and predictability in the delivery process.
  3. Outliers: The plot reveals several outliers on the higher end, indicating deliveries that take significantly longer than usual. These outliers might be due to external factors such as traffic conditions, distance, or order complications.
  4. Operational Insights: Identifying the reasons behind the longer delivery times could help in optimizing routes, improving delivery scheduling, and potentially selecting more reliable delivery methods or personnel. Similarly, analyzing why some deliveries are unusually quick could highlight best practices that could be replicated across the service.

In [49]:

# Create a count plot for "cuisine_type"
plt.figure(figsize=(10, 5))
sns.countplot(y=df['cuisine_type'], hue=df['cuisine_type'], palette="viridis", legend=False,
              order=df['cuisine_type'].value_counts().index)
plt.title("Count Plot of Cuisine Type")
plt.xlabel("Count")
plt.ylabel("Cuisine Type")
plt.show()

No description has been provided for this image

Observations:¶

Clearly the top cuisine types are American, Japanese and Italian.

In [50]:

# Create a count plot for "day_of_the_week"
plt.figure(figsize=(8, 5))
sns.countplot(x=df['day_of_the_week'], hue=df['day_of_the_week'], palette="magma", legend=False,
              order=df['day_of_the_week'].value_counts().index)
plt.title("Count Plot of Day of the Week")
plt.xlabel("Day of the Week")
plt.ylabel("Count")
plt.xticks(rotation=45)
plt.show()

No description has been provided for this image

Observations:¶

Clearly a lot more orders are placed during the weekend!

Question 7: Which are the top 5 restaurants in terms of the number of orders received? [1 mark]¶

In [66]:

# Write the code here
# Create a count plot for the top 5 restaurants
plt.figure(figsize=(10, 5))
sns.countplot(y=df['restaurant_name'], hue=df['restaurant_name'], palette="deep", legend=False,
              order=df['restaurant_name'].value_counts().index[:5])
plt.title("Top 5 Restaurants by Number of Orders")
plt.xlabel("Number of Orders")
plt.ylabel("Restaurant Name")
plt.show()

No description has been provided for this image

Observations:¶

The top 5 restaurants in terms of orders received are:

  1. Shake Shack
  2. The Meatball Shop
  3. Blue Ribbon Sushi
  4. Blue Ribbon Fried Chicken
  5. Parm

Question 8: Which is the most popular cuisine on weekends? [1 mark]¶

In [67]:

# Write the code here
# Filter data for weekends (labeled as "Weekend" in the dataset)
weekend_data = df[df['day_of_the_week'] == 'Weekend']
# Create a count plot for cuisine types on weekends
# Create a count plot for cuisine types on weekends with hue assigned and ordered by popularity
plt.figure(figsize=(10, 5))
sns.countplot(y=weekend_data['cuisine_type'], hue=weekend_data['cuisine_type'],
              palette=sns.color_palette("magma", len(weekend_data['cuisine_type'].unique())),
              order=weekend_data['cuisine_type'].value_counts().index, dodge=False)
plt.title("Most Popular Cuisine on Weekends")
plt.xlabel("Number of Orders")
plt.ylabel("Cuisine Type")
plt.legend([],[], frameon=False)  # Hide legend
plt.show()
# Get the most popular cuisine type on weekends
popular_cuisine_weekends = weekend_data['cuisine_type'].value_counts().idxmax()
# Display the most popular cuisine type
popular_cuisine_weekends

No description has been provided for this image
Out[67]:

'American'

Observations:¶

Clearly the most popular cuisine during weekends is American.

Question 9: What percentage of the orders cost more than 20 dollars? [2 marks]¶

In [68]:

# Write the code here
# Calculate the percentage of orders that cost more than 20 dollars
total_orders = len(df)
orders_above_20 = len(df[df['cost_of_the_order'] > 20])
percentage_above_20 = (orders_above_20 / total_orders) * 100
percentage_above_20

Out[68]:

29.24130663856691

In [69]:

# Create a pie chart to visualize the percentage of orders above and below $20
plt.figure(figsize=(6, 6))
labels = ["Orders > $20", "Orders ≤ $20"]
sizes = [orders_above_20, total_orders - orders_above_20]
colors = ["#FF9999", "#66B2FF"]
plt.pie(sizes, labels=labels, autopct='%1.1f%%', colors=colors, startangle=90)
plt.title("Percentage of Orders Above and Below $20")
plt.show()

No description has been provided for this image

Observations:¶

About 29% of the orders cost more than 20 dollars. The rest (~71%) are below.

Question 10: What is the mean order delivery time? [1 mark]¶

In [70]:

# Write the code here
mean_delivery_time = df['delivery_time'].mean()
mean_delivery_time

Out[70]:

24.161749209694417

In [71]:

# Create a histogram with mean line to visualize delivery time distribution
plt.figure(figsize=(8, 5))
sns.histplot(df['delivery_time'], kde=True, bins=20, color="blue", alpha=0.7)
plt.axvline(mean_delivery_time, color='red', linestyle='dashed', linewidth=2, label=f'Mean: {mean_delivery_time:.2f} min')
plt.title("Distribution of Delivery Time with Mean Indicator")
plt.xlabel("Delivery Time (minutes)")
plt.ylabel("Frequency")
plt.legend()
plt.show()

No description has been provided for this image

Observations:¶

The mean delivery time is 24.16 min.

The peak of the histogram suggests that most delivery times are close to the mean.
This indicates a relatively consistent delivery time for most orders.

The histogram is right-skewed (longer tail on the right), which means that some orders take significantly longer than the average.

Question 11: The company has decided to give 20% discount vouchers to the top 3 most frequent customers. Find the IDs of these customers and the number of orders they placed. [1 mark]¶

In [72]:

# Write the code here
top_3_customers = df['customer_id'].value_counts().head(3)
top_3_customers

Out[72]:

count
customer_id
52832 13
47440 10
83287 9


In [73]:

# Create a bar chart for the top 3 most frequent customers
plt.figure(figsize=(8, 5))
sns.barplot(x=top_3_customers.index, y=top_3_customers.values,
            hue=top_3_customers.index, palette="coolwarm", legend=False)
plt.title("Top 3 Most Frequent Customers")
plt.xlabel("Customer ID")
plt.ylabel("Number of Orders")
plt.show()

No description has been provided for this image

Observations:¶

The above charts show the top 3 customers.

Multivariate Analysis¶

Question 12: Perform a multivariate analysis to explore relationships between the important variables in the dataset. (It is a good idea to explore relations between numerical variables as well as relations between numerical and categorical variables) [10 marks]¶

In [79]:

# Calculating total time from order to delivery by summing food preparation time and delivery time
df['total_time_from_order_to_delivery'] = df['food_preparation_time'] + df['delivery_time']
# Creating a heatmap to visualize the correlation between total time, cost of the order, and rating
heatmap_data = df[['total_time_from_order_to_delivery', 'cost_of_the_order', 'rating']]
correlation = heatmap_data.corr()
plt.figure(figsize=(8, 6))
sns.heatmap(correlation, annot=True, cmap='coolwarm', fmt=".2f", linewidths=.5)
plt.title('Correlation Heatmap between Total Time, Cost, and Rating')
plt.show()

No description has been provided for this image

Observations:¶

  1. Total Time vs. Rating: The correlation between total time from order to delivery and rating is very low, as indicated by a coefficient close to zero. This reinforces the observation from the scatter plot that total time does not significantly influence the ratings directly.
  2. Cost vs. Rating: Similarly, the correlation between cost of the order and rating is also very low. This suggests that higher costs are not directly associated with higher customer satisfaction (or dissatisfaction), at least not linearly.
  3. Cost vs. Total Time: There is a negligible correlation between the cost of the order and the total time from order to delivery. This indicates that longer preparation and delivery times are not necessarily associated with higher costs.

In [84]:

# 1. Cuisine Type Analysis - Comparing average delivery and preparation times across cuisines
# Aggregating average times per cuisine
cuisine_times = df.groupby('cuisine_type')[['food_preparation_time', 'delivery_time']].mean().reset_index()
# Limiting to top 20 cuisines to prevent memory overload
top_cuisines = df['cuisine_type'].value_counts().index[:20]
cuisine_times_filtered = cuisine_times[cuisine_times['cuisine_type'].isin(top_cuisines)]
# Plotting average food preparation time by cuisine type
plt.figure(figsize=(10, 6))
sns.barplot(x='cuisine_type', y='food_preparation_time', hue='cuisine_type', data=cuisine_times_filtered, palette='viridis', dodge=False)
plt.title('Average Food Preparation Time by Cuisine Type (Top 20)')
plt.xlabel('Cuisine Type')
plt.ylabel('Average Food Preparation Time (minutes)')
plt.xticks(rotation=90)
plt.show()
# Plotting average delivery time by cuisine type
plt.figure(figsize=(10, 6))
sns.barplot(x='cuisine_type', y='delivery_time', hue='cuisine_type', data=cuisine_times_filtered, palette='magma', dodge=False)
plt.title('Average Delivery Time by Cuisine Type (Top 20)')
plt.xlabel('Cuisine Type')
plt.ylabel('Average Delivery Time (minutes)')
plt.xticks(rotation=90)
plt.show()

No description has been provided for this image
No description has been provided for this image

Observations:¶

Food Preparation Time by Cuisine Type:

  1. Certain cuisines take significantly longer to prepare than others.
    The variation in preparation time could be due to differences in dish complexity, cooking techniques, or restaurant efficiency.
    Delivery Time by Cuisine Type:
  2. Delivery times also vary across cuisines, which may be influenced by restaurant locations, demand levels, or how well the food travels.
    Some cuisines might be more frequently ordered from farther distances, leading to longer delivery times.

In [88]:

# 2. Restaurant Performance Analysis - Comparing average ratings, delivery times, and preparation times across restaurants
# Aggregating performance metrics per restaurant
restaurant_performance = df.groupby('restaurant_name')[['rating', 'food_preparation_time', 'delivery_time']].mean().reset_index()
# Limiting to top 20 most popular restaurants to prevent memory overload
top_restaurants = df['restaurant_name'].value_counts().index[:20]
restaurant_performance_filtered = restaurant_performance[restaurant_performance['restaurant_name'].isin(top_restaurants)]
# Plotting average ratings by restaurant
plt.figure(figsize=(14, 6))
sns.barplot(x='restaurant_name', y='rating', hue='restaurant_name', data=restaurant_performance_filtered, palette='coolwarm', dodge=False)
plt.title('Average Customer Rating by Restaurant (Top 20)')
plt.xlabel('Restaurant Name')
plt.ylabel('Average Rating')
plt.xticks(rotation=90)
plt.show()
# Plotting average food preparation time by restaurant
plt.figure(figsize=(14, 6))
sns.barplot(x='restaurant_name', y='food_preparation_time', hue='restaurant_name', data=restaurant_performance_filtered, palette='viridis', dodge=False)
plt.title('Average Food Preparation Time by Restaurant (Top 20)')
plt.xlabel('Restaurant Name')
plt.ylabel('Average Food Preparation Time (minutes)')
plt.xticks(rotation=90)
plt.show()
# Plotting average delivery time by restaurant
plt.figure(figsize=(14, 6))
sns.barplot(x='restaurant_name', y='delivery_time', hue='restaurant_name', data=restaurant_performance_filtered, palette='magma', dodge=False)
plt.title('Average Delivery Time by Restaurant (Top 20)')
plt.xlabel('Restaurant Name')
plt.ylabel('Average Delivery Time (minutes)')
plt.xticks(rotation=90)
plt.show()

No description has been provided for this image
No description has been provided for this image
No description has been provided for this image

Observations¶

Average Ratings by Restaurant:

  1. Some restaurants consistently receive higher ratings than others, indicating better customer satisfaction.
    The variation in ratings may be influenced by factors such as food quality, service efficiency, and pricing.
    Average Food Preparation Time by Restaurant:
  2. Certain restaurants take significantly longer to prepare food than others.
    This could be due to the type of cuisine, restaurant efficiency, or the complexity of menu items.
    Average Delivery Time by Restaurant:
  3. Some restaurants have longer delivery times on average, which could be due to location, demand, or delivery efficiency.
    Identifying which restaurants have consistently longer delivery times may help FoodHub optimize delivery logistics.

In [92]:

# 3. Day of the Week Analysis - Comparing ratings, preparation times, and delivery times with hue set to x
# Aggregating performance metrics per day of the week
day_performance = df.groupby('day_of_the_week')[['rating', 'food_preparation_time', 'delivery_time']].mean().reset_index()
# Plotting average ratings by day of the week
plt.figure(figsize=(10, 6))
sns.barplot(x='day_of_the_week', y='rating', hue='day_of_the_week', data=day_performance, palette='coolwarm', dodge=False)
plt.title('Average Customer Rating by Day of the Week')
plt.xlabel('Day of the Week')
plt.ylabel('Average Rating')
plt.show()
# Plotting average food preparation time by day of the week
plt.figure(figsize=(10, 6))
sns.barplot(x='day_of_the_week', y='food_preparation_time', hue='day_of_the_week', data=day_performance, palette='viridis', dodge=False)
plt.title('Average Food Preparation Time by Day of the Week')
plt.xlabel('Day of the Week')
plt.ylabel('Average Food Preparation Time (minutes)')
plt.show()
# Plotting average delivery time by day of the week
plt.figure(figsize=(10, 6))
sns.barplot(x='day_of_the_week', y='delivery_time', hue='day_of_the_week', data=day_performance, palette='magma', dodge=False)
plt.title('Average Delivery Time by Day of the Week')
plt.xlabel('Day of the Week')
plt.ylabel('Average Delivery Time (minutes)')
plt.show()

No description has been provided for this image
No description has been provided for this image
No description has been provided for this image

Observations:¶

Average Ratings by Day of the Week:

  1. Ratings remain relatively consistent across weekdays and weekends, indicating stable customer satisfaction regardless of the day.
    Average Food Preparation Time by Day of the Week:
  2. There is a slight increase in preparation times on weekends, suggesting a higher volume of orders or operational constraints.
    Restaurants might experience increased kitchen workloads on weekends, leading to slightly slower preparation.
    Average Delivery Time by Day of the Week:
  3. Delivery times show a similar pattern, with weekends experiencing slightly longer durations.
    Higher weekend demand could be affecting overall delivery efficiency.

Question 13: The company wants to provide a promotional offer in the advertisement of the restaurants. The condition to get the offer is that the restaurants must have a rating count of more than 50 and the average rating should be greater than 4. Find the restaurants fulfilling the criteria to get the promotional offer. [3 marks]¶

In [93]:

# Write the code here
# Group by restaurant and calculate rating count and average rating
restaurant_ratings = df.groupby('restaurant_name')['rating'].agg(['count', 'mean'])
# Filter restaurants meeting the criteria
qualified_restaurants = restaurant_ratings[(restaurant_ratings['count'] > 50) & (restaurant_ratings['mean'] > 4)]
qualified_restaurants

Out[93]:

count mean
restaurant_name
Blue Ribbon Fried Chicken 96 4.218750
Blue Ribbon Sushi 119 4.134454
Parm 68 4.073529
RedFarm Broadway 59 4.169492
RedFarm Hudson 55 4.109091
Shake Shack 219 4.168950
The Meatball Shop 132 4.689394


In [96]:

# Group by restaurant and calculate rating count and average rating
restaurant_ratings = df.groupby('restaurant_name')['rating'].agg(['count', 'mean'])
# Filter restaurants meeting the criteria (more than 50 ratings and average rating above 4)
qualified_restaurants = restaurant_ratings[(restaurant_ratings['count'] > 50) & (restaurant_ratings['mean'] > 4)]
# Sort the qualified restaurants by average rating in descending order
qualified_restaurants_sorted = qualified_restaurants.sort_values(by='mean', ascending=False).reset_index()
# Create a bar chart with sorted restaurants and assign the y variable to hue
plt.figure(figsize=(10, 5))
sns.barplot(y='restaurant_name', x='mean', hue='restaurant_name', data=qualified_restaurants_sorted, palette="coolwarm", dodge=False)
plt.title("Restaurants Eligible for Promotional Offer (Sorted by Avg Rating)")
plt.xlabel("Average Rating")
plt.ylabel("Restaurant Name")
plt.xlim(4, 5)  # Set x-axis range to focus on valid rating values
plt.show()

No description has been provided for this image

Observations:¶

This bar chart displays restaurants that qualify for a promotional offer based on:
Having more than 50 ratings (ensuring enough data points for reliability).
Maintaining an average rating above 4.0.
Restaurants are sorted in descending order of average rating, highlighting the best-rated ones.
The hue is assigned to the restaurant name, ensuring distinct colors for each.

Question 14: The company charges the restaurant 25% on the orders having cost greater than 20 dollars and 15% on the orders having cost greater than 5 dollars. Find the net revenue generated by the company across all orders. [3 marks]¶

In [97]:

# Write the code here
# Define commission rates based on order cost
def calculate_commission(cost):
    if cost > 20:
        return cost * 0.25  # 25% commission for orders above $20
    elif cost > 5:
        return cost * 0.15  # 15% commission for orders above $5
    else:
        return 0  # No commission for orders $5 or below
# Apply the commission calculation to each order
df['commission'] = df['cost_of_the_order'].apply(calculate_commission)
# Calculate total revenue generated by the company
total_revenue = df['commission'].sum()
total_revenue

Out[97]:

6166.303

Observations:¶

Net Revenue Generated by the Company:
The total revenue generated by the company from commissions across all orders is $6,166.30.

Question 15: The company wants to analyze the total time required to deliver the food. What percentage of orders take more than 60 minutes to get delivered from the time the order is placed? (The food has to be prepared and then delivered.) [2 marks]¶

In [100]:

# Write the code here
# Calculate total time from order placement to delivery
df['total_time_from_order_to_delivery'] = df['food_preparation_time'] + df['delivery_time']
# Count total number of orders
total_orders = len(df)
# Count orders that took more than 60 minutes
orders_above_60_min = df[df['total_time_from_order_to_delivery'] > 60].shape[0]
# Calculate percentage of orders taking more than 60 minutes
percentage_above_60_min = (orders_above_60_min / total_orders) * 100
round(percentage_above_60_min, 2)

Out[100]:

10.54

Observations:¶

10.54% of orders take more than 60 minutes from order placement to delivery.

Question 16: The company wants to analyze the delivery time of the orders on weekdays and weekends. How does the mean delivery time vary during weekdays and weekends? [2 marks]¶

In [102]:

# Write the code here
# Calculate the mean delivery time for weekdays and weekends
delivery_time_analysis = df.groupby('day_of_the_week')['delivery_time'].mean().reset_index()
# Plotting the variation in mean delivery time
plt.figure(figsize=(10, 6))
sns.barplot(x='day_of_the_week', y='delivery_time', data=delivery_time_analysis, palette='magma', hue='day_of_the_week', dodge=False)
plt.title('Mean Delivery Time on Weekdays vs. Weekends')
plt.xlabel('Day of the Week')
plt.ylabel('Mean Delivery Time (minutes)')
plt.show()
# Display mean delivery times for weekdays and weekends
delivery_time_analysis

No description has been provided for this image
Out[102]:

day_of_the_week delivery_time
0 Weekday 28.340037
1 Weekend 22.470022

Observations:¶

The mean delivery time is lower on weekends than on weekdays.

Possible reasons:
Lower traffic congestion on weekends, leading to faster deliveries.
More efficient delivery management due to higher expected demand on weekends.
Fewer business district orders (which might take longer due to distance or traffic).

Conclusion and Recommendations¶

Question 17: What are your conclusions from the analysis? What recommendations would you like to share to help improve the business? (You can use cuisine type and feedback ratings to drive your business recommendations.) [6 marks]¶

Conclusions:¶

My analysis of FoodHub’s order and delivery data provides valuable insights into restaurant performance, customer satisfaction, and operational efficiency. Key findings include:

  1. Delivery & Preparation Time Insights
  • 10.54% of orders take more than 60 minutes from order placement to delivery, which could negatively impact customer satisfaction.
  • Mean delivery time is longer on weekdays (28.34 min) than on weekends (22.47 min), suggesting that traffic congestion or restaurant availability might be factors affecting delivery speed.
  • Some cuisine types and restaurants have significantly longer preparation times, which contributes to overall delays.
  1. Restaurant & Cuisine Performance
  • Certain restaurants consistently receive high ratings (above 4.0) with more than 50 orders, making them strong candidates for promotional offers.
  • Some cuisines have higher preparation times than others, affecting overall delivery efficiency.
  • The most profitable orders come from customers who spend more than $20, as they generate the highest commissions for FoodHub.
  1. Customer Satisfaction & Revenue
  • Customer ratings are not strongly correlated with delivery time, suggesting that food quality and overall experience play a bigger role in satisfaction.
  • The company generated 6,166.30 dollars in commission revenue, with most revenue coming from orders above $20.
  • Certain restaurants consistently take longer to prepare food, which might contribute to lower ratings.

Recommendations:¶

  1. Improve Delivery & Operational Efficiency
  • Optimize restaurant partnerships: Identify restaurants with consistent delays and work with them to improve preparation efficiency.
  • Prioritize high-performing restaurants: Promote and feature restaurants with high ratings and good efficiency to improve customer satisfaction.
  • Reduce weekday delivery delays: Explore better delivery scheduling and traffic-optimized routing for faster weekday deliveries.
  1. Enhance Customer Satisfaction & Engagement
  • Encourage high-rated restaurants: Offer incentives or priority placement in the app for restaurants maintaining an average rating above 4.0.
  • Improve customer communication: Notify customers in real-time if orders are expected to take longer than 60 minutes to manage expectations.
  • Introduce loyalty rewards: Offer discounts or free delivery for customers who frequently order from top-rated restaurants.
  1. Revenue Optimization Strategies
  • Increase commissions on premium orders: Since most revenue comes from orders above $20, consider adjusting pricing models to encourage higher-value purchases.
  • Feature premium & high-performing cuisines: If certain cuisines generate higher-value orders, highlight them in the app for better visibility.
  • Reduce inefficiencies in long-prep-time cuisines: Work with restaurants offering slow-prep cuisines to improve kitchen efficiency and reduce preparation times.

February 21, 2025 0 comment
0 FacebookTwitterPinterestEmail
Development PostsStrategy

Breaking the Code

by Mario Mamalis November 16, 2023
written by Mario Mamalis

A storm has been brewing in the world of Software Development. This storm is characterized by a landscape riddled with outdated systems, over architected frameworks, and a culture of constant firefighting. It’s a world where the voices of frustration are as loud as the keystrokes of developers racing against time, where stress levels soar as high as the stacks of unresolved bugs, and where the norm has become a reactive approach to problem-solving, rather than a proactive one.

The symptoms of this storm are unmistakable: an alarming rate of employee turnover, with talented developers and IT professionals seeking refuge in environments that promise a semblance of sanity; work environments strained to their limits, where the pressure to deliver often overshadows the need for quality and thoughtful development; and a pervasive culture of putting out fires, where teams are perpetually in crisis mode, reacting to problems rather than preventing them.

This introduction to the world of software development today is not just a narrative of despair but a prelude to a crucial discussion. We stand at a crossroads where the path to sustainability and efficiency in software development is not just desired but desperately needed. The ensuing sections of this post delve into the root causes of this chaotic state, the multifaceted costs it incurs, and most importantly, the path to a more structured, efficient, and healthy software development future. It’s a one way path from chaos to clarity, from reactive to proactive strategies, and from short-term fixes to long-term solutions.

The time to act is now. The stakes are high, not just for individual organizations but for the software industry as a whole. The following insights and discussions aim to shed light on the path forward, offering guidance and strategies to replace the chaos with streamlined processes, innovative solutions, and a work culture that values both efficiency and employee well-being.

Analyzing the Current State

The landscape of software development is diverse and complex, marked by various kinds of systems, each with their unique challenges. We can broadly categorize these into three types: legacy systems using outdated technologies, systems that are not too old but poorly architected, and modern systems developed with shortcuts.

Legacy Systems Using Outdated Technologies

Legacy systems are like the old giants of the software world. They were built years, sometimes decades ago, using the best technologies available at the time. However, as the tech world rapidly evolved, these systems became obsolete. The challenges here are many:

  • Compatibility Issues: Integrating these systems with newer technologies is often a headache, leading to a patchwork of solutions that are neither efficient nor sustainable.
  • Skill Shortage: The pool of developers proficient in older technologies is shrinking, making maintenance and updates a costly affair.
  • Security Risks: Older systems often don’t meet current security standards, posing significant risks.

Systems with Wrong Architecture

Not all problematic systems are old. Some relatively newer systems suffer from poor architectural decisions. This could be due to a lack of foresight, rushed development timelines, or inadequate understanding of future scalability needs. The primary issues with such systems include:

  • Overengineering: Sometimes, in an attempt to future-proof, systems are overengineered, making them unnecessarily complex and hard to maintain.
  • Rigid Structure: A monolithic design can make it challenging to adapt to changing business needs or to integrate with other more modular and flexible systems.
  • Resource Intensiveness: Poorly architected systems can be resource hogs, requiring more server power and maintenance effort than necessary.

Modern Systems Built with Shortcuts

Startups and companies under intense market pressure often develop systems rapidly to meet immediate business needs. While these systems use modern technologies, they may cut corners in best practices, such as thorough testing, proper decoupling, design patterns or security protocols. The “move fast and fix later” approach can lead to technical debt, where the cost of reworking the system later becomes much higher than it would have been to build it correctly in the first place.

These systems face challenges like:

  • Technical Debt: Accumulated deficiencies in code, documentation, and processes make future changes and upgrades time-consuming and expensive.
  • Scalability Issues: Systems built quickly to solve a current problem may not be scalable or flexible enough to accommodate growth.
  • Maintenance Difficulty: Lack of proper design patterns and unit testing makes these systems fragile and difficult to maintain.

The Cost of Inefficiency

The inefficiencies stemming from outdated, poorly architected, or shortcut-laden systems create a ripple effect of costs for organizations, both tangible and intangible. These costs go beyond mere financial figures, affecting human resources, innovation capabilities, and competitive standing.

Financial Costs

Maintaining and operating inefficient software systems is a significant financial burden for organizations. This includes:

  • High Maintenance Costs: Legacy systems often require specialized skills for maintenance and often a larger number of people to constantly put out fires, leading to higher labor costs. Additionally, the cost of patching and updating these systems can be substantial.
  • Increased Downtime: Inefficient systems are prone to breakdowns and errors, leading to increased downtime and loss of productivity, which in turn translates to lost revenue.
  • Scalability Expenses: Systems that are not architected for scalability can require expensive rewrites or additions when scaling is necessary, rather than simple, cost-effective upgrades.

Employee Dissatisfaction and Turnover

The human cost of working with outdated or poorly designed systems is significant. This includes:

  • Stress and Frustration: Employees forced to work with clunky, inefficient systems often experience heightened levels of stress and frustration. This can stem from the constant firefighting mode they find themselves in due to frequent system failures or inefficiencies.
  • Decreased Job Satisfaction: Developers and IT professionals take pride in working with cutting-edge technology and efficient systems. Working with outdated or poorly designed systems can lead to a decrease in job satisfaction and morale.
  • Increased Turnover: Prolonged dissatisfaction can lead to increased turnover. Recruiting and training new employees is an expensive and time-consuming process, adding to the organization’s costs.

Stifled Innovation and Competitive Disadvantage

Inefficient systems not only drain resources but also hinder a company’s ability to innovate and stay competitive.

  • Slower Time-to-Market: Inflexible and complex systems can slow down the development of new features or services, delaying time-to-market and reducing the organization’s ability to respond to market demands.
  • Reduced Agility: Companies burdened with inefficient systems are less agile. They struggle to adapt to changes in the market or to pivot in response to customer needs.
  • Lost Opportunities: In the fast-paced tech world, the inability to quickly adopt new technologies or methodologies can lead to missed opportunities, allowing competitors to gain an edge.

The Root Causes

Understanding the root causes of why many organizations find themselves burdened with inefficient software systems is essential for crafting effective solutions. These causes are often a mix of historical decisions, business constraints, skill gaps, and biases.

Historical Perspective

Legacy systems didn’t start as the cumbersome entities they are today. They were often cutting-edge solutions when initially implemented. For instance, systems developed twenty or even ten years ago were based on the best available technologies and methodologies of the time. These systems were designed to meet the specific needs of that era, which often did not anticipate the rapid advancements in technology and changes in business models that would follow. As technology evolved, these systems became increasingly incompatible with modern requirements, but their deep integration into critical business processes make them hard to replace.

Business Decisions and Constraints

Short-term business needs and constraints have frequently guided software development decisions. In the push to meet immediate goals and deliver quick results, long-term sustainability and scalability were often overlooked. For example, a company might have chosen a particular technology because it was expedient or cost-effective at the time, not considering how it would scale or integrate with future technologies. Such decisions, while solving immediate problems, laid the groundwork for future challenges. It is important to point here that often such decisions are forced from people at the top who have very little understanding of technology.

Lack of Awareness of Skills

The rapid evolution of technology means that what was best practice a few years ago may no longer be relevant. There can be a significant gap in the awareness and adoption of modern best practices among developers, especially those who have spent years working within a specific technology stack or methodology. This gap is not just in technical skills but also in approaches to software development, such as Agile methodologies, DevOps practices, or cloud-native development.

Biases in Process, Technology and Personal Choices

Biases play a significant role in the choice of processes, technologies, and even personnel. These biases can be towards familiar technologies, established processes, or known personnel, often at the expense of more efficient, innovative solutions.

For example, a decision-maker might favor a particular programming language or framework because of personal familiarity, despite it not being the best fit for the project’s requirements. Similarly, processes like waterfall methodologies might be chosen over more agile approaches due to a comfort with traditional structures. These biases can lead to the adoption of technologies and processes that are not aligned with current best practices or the future needs of the business.

Unnecessary Complexity

In software development, the KISS principle, which stands for “Keep It Simple, Stupid,” has long been a guiding mantra. It emphasizes the value of simplicity in design and execution. However, in recent years, there’s been a noticeable drift away from this principle, leading to unnecessary complexity in systems.

Over-Architecting Driven by Trends

One common manifestation of this deviation is the tendency to over-architect systems, often influenced by emerging trends and buzzwords in the tech world. For instance, the current surge in the popularity of microservices architecture has led many organizations to adopt these technologies without a clear need. While microservices and container orchestration offer significant benefits in the right context, they are not a one-size-fits-all solution. Implementing them in scenarios where a simpler architecture would suffice not only adds unnecessary complexity but also increases the cost and effort required for development and maintenance.

The Lure of Cutting-Edge Technology

The tech industry’s fast-paced nature often creates a rush to adopt the latest technologies and methodologies, sometimes at the expense of simplicity and practicality. Decision-makers, driven by the desire to be at the technological forefront, may choose complex solutions over simpler, more effective ones. This approach can lead to systems that are over-engineered, difficult to understand, and challenging to maintain.

Ignoring the Basics

In the pursuit of advanced solutions, there’s often a neglect of basic principles of good software design. This includes aspects like clear code readability, maintainability, and the ability to scale or modify the system efficiently. When these fundamentals are overlooked in favor of complexity or novelty, it can lead to systems that are not only difficult to manage but also fail to meet the intended business objectives effectively.

Impact of Over-Complexity

The repercussions of deviating from the KISS principle are significant. Systems become bloated with unnecessary features and layers, complicating troubleshooting and updates. Developers spend more time deciphering and working around the complexity rather than focusing on innovation or improvement. Furthermore, over-complicated systems often lead to increased resource consumption and reduced system performance.

Path to Modernization

Modernizing legacy software is a complex, multi-faceted process that involves much more than just updating technologies; it represents a fundamental shift in how we approach software development.

Comprehensive Assessment and Strategic Planning

The first step in modernization is a thorough assessment of existing systems. This involves identifying which systems require immediate attention and understanding their current limitations. The assessment should cover not just the technical aspects but also how well these systems align with business objectives and user needs.

Once the assessment is complete, a strategic plan for modernization must be developed. This plan should be phased and iterative, allowing for adjustments as needed. Key considerations in this plan should include:

  • Maintainability: Ensuring that the new system is easier to update and maintain.
  • Scalability: Designing the system to handle increased load and growth effortlessly.
  • Flexibility: Building a system that can adapt to changing business needs and technological advancements.

Choosing the Right Architectural Approach

While microservices architecture has gained popularity for its scalability and flexibility, it’s crucial to recognize that it is not a universal solution. The choice of architecture should be driven by the specific needs of the organization and the nature of the application. For some, a simpler architecture might be more appropriate, especially if the application is less complex or doesn’t require frequent scaling.

Embracing Agile and DevOps

Adopting Agile methodologies can transform the software development process, making it more responsive and iterative. Agile focuses on collaboration, customer feedback, and small, frequent releases, which can be particularly effective in modernization projects.

Incorporating DevOps practices is equally important. DevOps bridges the gap between development and operations, streamlining the entire lifecycle of software development, from design through deployment. This can significantly reduce the time-to-market and improve the quality of software releases.

Using the Right Technologies and Skills

Choosing technologies should be a decision driven by functionality, maintainability, and future growth, rather than trends or biases. It requires a deep understanding of the available technologies and how they align with the specific requirements of the system being developed.

Equally important is having a team with the right skills. This includes not only technical skills but also an understanding of modern software development practices. Continuous learning and upskilling should be a part of the organization’s culture, ensuring that the team remains adept at employing the most effective tools and methodologies.

Strategies for Implementation

Successfully modernizing software systems is as much about the right strategies as it is about the right technologies. Implementation requires careful planning, clear communication, and a well-thought-out approach to change management.

Stakeholder Buy-In

Gaining stakeholder buy-in is critical for the success of any modernization project. This includes not just the executive team but also end users, IT staff, and other departments impacted by the change.

  • Communicating the Vision: Clearly articulate the benefits of modernization, including long-term cost savings, increased efficiency, and enhanced competitive edge. Use data and case studies to support your arguments.
  • Addressing Concerns: Understand and address the concerns of various stakeholders. This may involve discussing how the change will impact their work and what measures will be taken to minimize disruption.
  • Involvement in the Process: Engage stakeholders throughout the process, from planning to implementation. Their input can provide valuable insights and help in tailoring the solution to meet the organization’s unique needs.

Training and Development

A modernization initiative can only be successful if the team driving it has the necessary skills and knowledge.

  • Skill Assessment and Training Programs: Assess the current skill levels of the team and identify areas where training is needed. Implement comprehensive training programs to upskill employees in new technologies and methodologies.
  • Continuous Learning Culture: Foster a culture of continuous learning and improvement. Encourage participation in workshops, conferences, and online courses. This not only keeps the team updated but also boosts morale and job satisfaction.
  • Mentorship and Knowledge Sharing: Encourage knowledge sharing within the team. Experienced members can mentor others, facilitating a smoother transition to new technologies and practices.

Balancing Act: Maintaining Operations During Transition

One of the biggest challenges in software modernization is keeping the business running smoothly while the changes are being implemented.

  • Phased Rollout: Instead of a complete overhaul, consider a phased approach. This reduces risk and allows for adjustments based on feedback and performance.
  • Parallel Systems: In some cases, running parallel systems (old and new) for a transitional period can ensure continuity of operations. This also provides a fallback option in case of issues with the new system.
  • Regular Communication and Feedback Loops: Maintain open lines of communication with all stakeholders. Regular updates on progress, challenges, and changes are essential. Also, establish feedback loops to quickly identify and address issues that arise during the transition.

Conclusion: Embracing Modernization as an Imperative

Modernization is not merely an option but a necessity. It involves more than just updating old systems; it’s about a paradigm shift in how software is developed, maintained, and evolved. From rethinking architecture to embracing new methodologies like Agile and DevOps, from prioritizing continuous learning and skill development to engaging in a delicate balancing act during the transition – each step is critical in navigating the path to a more efficient, adaptable, and robust software environment.

The expanded perspectives on the root causes, including the deviation from the KISS principle and the perils of trend-driven decision-making, highlight the importance of thoughtful, needs-based approaches to system design and development. This understanding, coupled with a strategic, phased implementation that secures stakeholder buy-in and aligns with ongoing business operations, is key to successful modernization.

In essence, the call to action is clear: Organizations must commit to the modernization of their software systems, not just as a means to enhance operational efficiency and innovation but as a strategic imperative to ensure their survival and success. The future of software development, and indeed the broader technology landscape, hinges on our collective ability to adapt, evolve, and continually strive for systems that are not only technologically advanced but also strategically aligned, user-centric, and resilient in the face of ever-changing business and technological landscapes.

As we move forward, the challenge for each organization is to not only recognize the necessity of software modernization but to actively pursue it with the right blend of strategy, technology, and people. In doing so, we can collectively ensure that our industry not only survives but thrives, driving innovation and progress across all sectors.

 

November 16, 2023 0 comment
0 FacebookTwitterPinterestEmail
Artificial IntelligenceDevelopment Posts

Navigating the AI Landscape

by Mario Mamalis August 10, 2023
written by Mario Mamalis

Artificial Intelligence (AI) is a transformative technology that has the potential to revolutionize our lives.

We all know by now the enormous impact AI will have on our society. Some of us are excited and optimistic about the unlocked potential, and new capabilities we will realize with the proper implementation and absorption of AI into our society. Others are much less optimistic and focus only on the risks.

I am more of an optimist, so my view is that while AI presents many challenges, with foresight, planning, and collaborative effort, society can navigate these changes in a manner that not only safeguards but enhances human lives. The future with AI doesn’t have to be a zero-sum game between machines and humans; it can be a symbiotic relationship where each amplifies the other’s strengths.

In this post, I will focus on the reality of AI and what is presently available to us. In future posts I will dive deeper into specific AI applications.

Current Breakthroughs

In essence, AI involves developing software that mirrors human actions and skills. Some of the  areas where we have seen tangible benefits are:

1. Machine Learning

Often the backbone of AI, it’s the method we use to instruct computer models to make inferences and predictions from data.

Simply put, machines they learn from data. Our daily activities result in the production of enormous amounts of data. Whether it’s the text messages, emails, or social media updates we share, or the photos and videos we capture with our smartphones, we’re constantly churning out vast quantities of information. Beyond that, countless sensors in our homes, vehicles, urban environments, public transportation systems, and industrial zones create even more data.

Data experts harness this immense amount of information to educate machine learning models. These models can then draw predictions and conclusions based on the patterns and associations identified within the data.

Real World Example: Machine Learning for Predicting Rainfall Patterns

  1. Data Collection: Data scientists gather years of meteorological data, which includes variables like temperature, humidity, wind speed, air pressure, and past rainfall measurements from various weather stations and satellites.
  2. Feature Engineering: Not all collected data might be relevant. Hence, it’s important to identify which features (or combinations of features) are the most indicative of an impending rainfall event.
  3. Training the Model: With the relevant features identified, a machine learning model, like a neural network or a decision tree, is trained on a portion of the collected data. The model learns the relationships between the features and the outcomes (e.g., whether it rained the next day).
  4. Validation and Testing: Once trained, the model is tested on a different subset of the data (which it hasn’t seen before) to verify its accuracy in predicting rainfall.
  5. Real-time Predictions: Once the model is adequately trained and validated, it can be used in real-time. For instance, if sensors detect a specific combination of temperature, humidity, and pressure on a particular day, the model might predict a 90% chance of rainfall the next day in a certain region.
  6. Continuous Learning: Weather is dynamic, and patterns may evolve over time due to various reasons, including climate change. Machine learning models can be set up for continuous learning. This means that as new data comes in, the model refines and updates its understanding, ensuring predictions remain accurate.

By utilizing ML in this way, meteorologists can offer more precise and timely warnings about rainfall, helping farmers plan their crops, cities manage potential flooding, and people plan their activities.

2. Anomaly Detection

Anomaly detection, often termed outlier detection, refers to the identification of items, events, or observations that do not conform to the expected pattern in a dataset. In the context of AI, it’s the use of algorithms and models to identify unusual patterns that do not align with expected behavior.

Real World Example: Anomaly Detection in F1 Gearbox Systems

  1. Data Collection: Modern F1 cars are equipped with thousands of sensors that continuously monitor various aspects of the car’s performance, from engine metrics to tire conditions. For the gearbox, these sensors can track parameters like temperature, RPM, gear engagement speed, and vibrations.
  2. Baseline Creation: Data from hundreds of laps is used to establish a ‘baseline’ or ‘normal’ behavior of the gearbox under various conditions – straights, tight turns, heavy acceleration, or deceleration.
  3. Real-time Monitoring: During a race or a practice session, the gearbox’s performance metrics are continuously compared to this baseline. Any deviation from the baseline, be it a sudden temperature spike or unexpected vibration, can be flagged instantly.
  4. Anomaly Detection: Advanced algorithms process this data in real-time to detect anomalies. For instance, if a gearbox typically operates at a specific temperature range during a certain track segment but suddenly registers a temperature that’s significantly higher or lower, the system flags this as an anomaly.
  5. Immediate Action: Once an anomaly is detected, the team receives instant alerts. Depending on the severity and type of anomaly, different actions can be taken. It could range from sending a warning to the driver, planning a pit stop to address the issue, or, in critical situations, advising the driver to retire the car to avoid catastrophic failure or danger.
  6. Post-Race Analysis: After the race, data engineers and technicians can delve deeper into the anomaly data to understand its root cause, ensuring that such issues can be preemptively addressed in future races.

This approach of anomaly detection in F1 not only ensures the optimal performance of the car but also significantly enhances driver safety. An unforeseen failure at the high speeds at which F1 cars operate can be catastrophic, making the quick detection and mitigation of potential issues a top priority.

3. Computer Vision

Computer vision systems utilize machine learning models designed to process visual data from sources like cameras, videos, or pictures. The following are common computer vision tasks:

3.1 Image classification 

This refers to the task of assigning a label to an image based on its visual content. Essentially, it’s about categorizing what the image represents. For instance, given a picture, an image classification system might categorize it as a “cat”, “dog”, “car”, etc. This is typically achieved using deep learning models. The primary objective is to identify the main subject or theme of the image from a predefined set of categories.

3.2 Object detection

This is the process of identifying and locating specific objects within an image or video. Unlike image classification, which assigns a single label to the entire picture, object detection can recognize multiple items in the image and provide bounding boxes around each identified object. Commonly used in tasks like autonomous driving, surveillance, and image retrieval, it often employs deep learning models, to both classify and spatially locate objects within the visual frame.

3.3 Semantic segmentation

This task involves dividing an image into segments where each segment corresponds to a specific object or class category. Instead of just identifying that an object is present (as in object detection) or classifying the image (as in image classification), semantic segmentation classifies each pixel of the image. As a result, it provides a detailed, pixel-level labeling, highlighting the specific regions in an image where each object or class is located. Common applications include self-driving cars (to understand road scenes) and medical imaging (to identify regions of interest).

3.4 Image analysis

This refers to the process of inspecting and interpreting visual data to derive meaningful insights. It involves various techniques that evaluate the features, patterns, and structures within images. By transforming visual content into actionable data, image analysis can be applied across diverse fields, from medical diagnostics to satellite imagery interpretation. Its goal is often to categorize, quantify, or enhance the visual data for further understanding or application. 

3.5 Face detection

This is the task of identifying and locating faces within an image or video frame. It determines the presence and location of faces. Typically, face detection algorithms focus on unique facial features such as eyes, nose, and mouth to differentiate faces from other objects in the image. This technology is foundational for applications like facial recognition, camera autofocus, and various security and social media applications.

4. Optical Character Recognition (OCR)

This is a technology that converts different types of documents, such as scanned paper documents, PDF files, or images captured by a digital camera, into editable and searchable data. By recognizing the characters present in the visual data, OCR enables the transformation of static, image-based content into dynamic text that can be edited, formatted, indexed, or searched. It’s commonly used in data entry automation, digitizing printed books, and extracting information from images.

5. Natural Language Processing

Natural language processing (NLP) is a subfield of AI focused on developing software capable of comprehending and generating human language, whether written or spoken.

With NLP, it’s possible to develop applications that can:

  • Examine and deduce meaning from text in documents, emails, and other mediums.
  • Recognize spoken words and produce spoken feedback.
  • Instantly convert phrases between languages, whether they’re spoken or written.
  • Understand instructions and decide on the relevant responses.

Real World Example: Customer Service Chatbots in E-commerce Websites

Problem Statement

Online retailers often have a vast number of customers visiting their websites, many of whom have queries about products, services, shipping, returns, etc. Addressing these in real-time with human agents for each customer can be costly and time-consuming.

NLP Solution

E-commerce platforms deploy chatbots equipped with NLP capabilities. When a customer types in a query, such as “What is the return policy for electronics?”, the NLP system in the chatbot interprets the question’s intent.

Functionality

  1. Tokenization: Breaks the input text into individual words or tokens.
  2. Intent Recognition: Understands the main purpose of the user’s message, i.e., getting information about the return policy for electronics.
  3. Entity Recognition: Identifies key components in the text, e.g., “electronics” as the product category.
  4. Response Generation: Based on the identified intent and entities, the chatbot retrieves the relevant information from its database (in this case, the return policy for electronics) and crafts a coherent response.
  5. Feedback Loop: If the chatbot’s answer is not satisfactory, the user’s feedback can be utilized to train and improve the NLP model, making the chatbot more efficient over time.

Benefits

  • 24/7 Customer Support: The chatbot can operate round the clock, ensuring customers from different time zones get real-time assistance.
  • Cost Efficiency: Reduces the need for a large customer service team.
  • Consistency: Provides uniform information to all customers.
  • This application of NLP has revolutionized the way businesses interact with their customers online, offering quick, consistent, and efficient responses.

6. Knowledge Mining

Knowledge mining involves extracting valuable insights, patterns, and knowledge from vast and often unstructured data sources. It combines techniques from data mining, machine learning, and big data analytics to transform raw information into a structured and understandable format. The goal is to discover hidden relationships, trends, and patterns that can inform decision-making, drive innovation, and provide a deeper understanding of complex subjects. Knowledge mining is particularly valuable in areas with huge datasets, like research, healthcare, and business analytics, where it aids in converting vast data into actionable intelligence.

Risks and Challenges

Artificial Intelligence holds immense potential to bring positive change to our world, but its use demands careful oversight and ethical considerations. Here are some potential shortcomings:

  • Bias influencing outcomes: For example, a lending model shows discrimination towards a particular gender due to skewed training data.
  • Unintended harm from errors: For example, a self-driving car has a system malfunction, leading to an accident.
  • Potential data breaches: For example, a bot designed for medical diagnoses uses confidential patient records stored without adequate security.
  • Inclusive design shortcomings: For example, a smart home device fails to offer audio feedback, leaving visually challenged users unsupported.
  • Need for transparency and trust: For example, a finance AI tool suggests investment strategies; but how does it determine them?
  • Accountability for AI decisions: For example a faulty facial recognition system results in a wrongful conviction; who is held accountable?

Social Implications

The impact of AI on jobs and the skills landscape is profound, complex, and multifaceted. This seems to be one of the biggest fears people have about AI. Let’s delve deeper into this.

  • Job Displacement: Repetitive, manual, and rule-based tasks are more prone to automation. This can impact sectors like manufacturing, customer service, and basic data entry roles.
  • Job Creation: Historically, technological advancements have given rise to new jobs. Similarly, AI will create new roles that we might not even be able to envision now. Positions in AI ethics, AI system training, and AI system maintenance are examples of new job avenues.
  • Job Transformation: Some jobs won’t disappear but will transform. For instance, radiologists might spend less time analyzing X-rays (as AI can do that) and more time consulting with patients or other doctors based on AI’s findings.
  • Technical Skills: There will be an increased demand for individuals who understand AI, data science, machine learning, and related technologies.
  • Soft Skills: Emotional intelligence, creativity, critical thinking, and complex problem-solving will become even more invaluable. As AI systems handle more data-oriented tasks, uniquely human traits will become more prominent in the job market.
  • Adaptability: The pace of change means that the ability to learn and adapt is crucial. Lifelong learning and the readiness to acquire new skills will be vital.
  • Interdisciplinary Knowledge: Combining AI with domain-specific knowledge, whether it’s in arts, medicine, or finance, can lead to groundbreaking applications.

Ideas to Address Negative Impact

  • Education & Training: Governments and private institutions need to focus on retraining programs to help the workforce transition. This includes updating educational curricula to reflect the new skills demand and offering adult education initiatives focused on AI and technology.
  • Safety Nets: Support for those who lose jobs due to automation is vital. This could be in the form of unemployment benefits, retraining programs, or even discussions around universal basic income.
  • Ethical Considerations: Businesses should be encouraged to deploy AI responsibly, understanding its societal impact, and not just the bottom line. Ethical guidelines for AI application can help.
  • Inclusive Development: AI tools should be developed with input from a diverse group to ensure they address a broad range of needs and avoid built-in biases.
  • Local Solutions: AI’s impact might differ based on the region, economy, and culture. Tailored local strategies can better address specific challenges and opportunities.

Responsible AI – The Six Principles

Artificial Intelligence is not just a tool; it has become an integral part of our daily lives, reshaping industries and altering the fabric of society. With its increasing influence comes a pressing need for Responsible AI. But what exactly does this mean?

Responsible AI encompasses the practice of designing, developing, deploying, and managing AI in a manner that is transparent, ethical, and aligned with societal values and norms. It’s about ensuring that as AI systems make decisions, they do so in ways that are understandable, fair, and beneficial, while actively mitigating unintended consequences and harms.

Fair

AI systems ought to ensure equal treatment for everyone. Let’s say you design a machine learning model for a home loan approval process. The model’s predictions on loan approvals or rejections should be unbiased. It’s crucial that the model doesn’t favor or discriminate against groups based on gender, ethnicity, or any other criteria that could unjustly benefit or hinder specific applicant groups.

Safe and Reliable

AI systems must function with both precision and security. Imagine an AI-infused drone system for package deliveries or a machine learning algorithm assisting in air traffic control. Inaccuracies in these systems can have profound consequences, potentially jeopardizing safety.

It’s essential that AI-based software undergo meticulous testing and stringent deployment protocols to guarantee their reliability before they’re introduced to real-world scenarios.

Secure

AI systems ought to prioritize security and uphold privacy standards. AI systems, particularly their underlying machine learning models, draw upon vast data sets that might encompass sensitive personal information. The obligation to protect privacy doesn’t end once the models are developed and operational. As these systems continually utilize fresh data for predictions or decisions, both the data itself and the resultant choices can have associated privacy and security implications.

Inclusive

AI systems should be inclusive and resonate with all individuals. It’s vital that the benefits of AI extend across all societal divisions, be it physical abilities, gender, sexual orientation, ethnicity, or any other characteristics.

For example: An AI-driven voice recognition software shouldn’t just understand accents from major world languages but should also effectively recognize and interpret dialects and variations, ensuring people from remote regions or minority linguistic groups aren’t left out.

Transparent

AI systems should be transparent and comprehensible. Users ought to be well-informed about the system’s intent, its operational mechanisms, and any potential constraints.

For example: If a health app uses AI to assess the likelihood of a certain medical condition based on input symptoms, users should be informed about the sources of its medical data and the accuracy rate of its predictions.

Responsible

Responsibility for AI systems rests with their creators. Those designing and implementing AI solutions should adhere to a well-defined set of ethical and legal rules, ensuring the technology conforms to established standards.

For example: If a company designs an AI tool for recruitment, the architects should ensure it adheres to employment laws and anti-discrimination guidelines. If the tool inadvertently favors a particular age group or ethnicity, the creators must rectify the issue and ensure fairness in the recruitment process.

Final Thoughts

Artificial Intelligence presents transformative solutions to many challenges. AI systems possess the capacity to emulate human behaviors, interpret their environment, and take actions that were once thought of as science fiction.

However, such profound capabilities also carry significant responsibilities. As architects of AI innovations, we have a duty to ensure these technologies benefit the masses without unintentionally disadvantaging any individual or community.

August 10, 2023 1 comment
0 FacebookTwitterPinterestEmail
Development PostsSuccess Stories

Case Study: Transformative Digital Solutions

by Mario Mamalis July 24, 2023
written by Mario Mamalis

Our consultants were recently engaged in a project by a leading company in the insurance industry. The mission was to develop a new API that would be used as an example of best practices and design patterns for all other APIs developed at the client. This was a unique opportunity for my team to actualize our core corporate values: building engineering cultures, sharing knowledge, quality without compromise, and fostering long-lasting client relationships.

Our client, a well-established entity, faced challenges due to its development team’s unfamiliarity with the latest technological trends. Sensing the need for strategic upgrades to stay competitive in the digital era, they called upon our expertise.

Understanding the Challenge

The client had a robust team of developers who were eager to learn but found it challenging to stay abreast of the latest technological trends and best practices. While they had the experience and eagerness to learn, they struggled with the dynamic evolution of the tech industry.

We had a twofold challenge: to build a modern, scalable, and efficient API and to enrich the client’s development team’s knowledge base and skill set, bringing them in line with the latest industry standards.

Embarking on a Solution-Driven Path

In an era where change is the only constant, our team of three experienced consultants worked closely with the client to first understand their specific needs, strengths, and areas for growth. To set a solid foundation for scalable and maintainable future growth, the API was built using the principles of Clean Architecture, a set of best practices that ensure code is easy to understand, modify, and test.

To capitalize on the eagerness of the client’s team, every step of the API development process became a teaching moment. The knowledge sharing was done generously, focusing not only on the ‘how,’ but also the ‘why’ behind specific methods, technologies, and design patterns. This hands-on approach imbued the client’s development team with the confidence and knowledge to not just understand, but also to implement and maintain the systems on their own in the future.

The Azure Pivot and Quality Commitment

Our commitment to delivering uncompromising quality and the best fit solutions led the team to recommend a switch from a planned Azure deployment using Azure Kubernetes Services to Azure App Services. This pivot was not just a technological change, but a strategic decision that was influenced by the understanding of the client’s team capabilities and the specific technical requirements.

Instead of choosing a technology based on its popularity, our consultants recommended the best solution after meticulous evaluation of the client’s needs and capabilities as well as the application functional and technical requirements. This decision was backed by in-depth explanations and walkthroughs of potential benefits, ensuring the client stakeholders were not just accepting, but comprehending and championing this shift.

Outcomes and Beyond

The collaborative journey between my team and our client resulted in an effective, scalable, and maintainable API that will serve as a model for all future API builds. Beyond this technical success, the engagement was transformative for the client’s development team, who found themselves armed with contemporary knowledge, upgraded skills, and a newly found enthusiasm for embracing the changing technology landscape.

Thrilled with the outcome of the engagement, the client didn’t wait long to begin another project with us. We are now gearing up to prepare them for an Azure Cloud Migration utilizing Azure Landing Zones, a clear demonstration of a burgeoning, long-lasting relationship founded on trust, respect, and shared success.

Conclusion

This engagement is a compelling testimony of how we live our core values. We focus on more than just providing technological solutions; we believe in building engineering cultures that nurture continuous learning, sharing knowledge generously, ensuring quality without compromise, and building long-lasting relationships.

Our collaboration with the client validated our unique approach, and the success we achieved together reaffirms our role as a trusted partner for businesses seeking to leverage technology for growth and competitiveness. This success story reflects our unwavering commitment to be more than just consultants; we are educators, collaborators, and partners, dedicated to ensuring the success of our clients in the digital age.

July 24, 2023 0 comment
0 FacebookTwitterPinterestEmail
Cloud ArchitectureDevelopment Posts

Azure Landing Zones

by Mario Mamalis July 21, 2023
written by Mario Mamalis

As the digital landscape evolves, businesses are increasingly turning to cloud solutions for their scalability, flexibility, and cost-efficiency. Among the various cloud platforms available, Microsoft Azure has emerged as a top choice for enterprises looking to transform their operations and harness the full potential of the cloud. However, successfully migrating applications to Azure requires meticulous planning and execution. One essential aspect that can significantly enhance the migration process is the proper implementation of Azure Landing Zones. In this blog post, we’ll explore the benefits of adopting Azure Landing Zones and how they can expedite the journey of a company migrating its applications to the Azure Cloud.

What are Azure Landing Zones?

Azure Landing Zones are a set of best practices, guidelines, and pre-configured templates designed to establish a foundation for smooth, secure, and scalable cloud adoption. Think of them as a blueprint for creating a well-structured environment in Azure. With Azure Landing Zones, companies can avoid potential pitfalls and ensure that their cloud resources are organized, compliant, and aligned with industry standards from the outset.

Benefits of Proper Azure Landing Zones Implementation

Let’s explore the key benefits of an Azure Landing Zones implementation:

Accelerated Cloud Adoption

One of the primary advantages of Azure Landing Zones is the rapid acceleration of the cloud adoption process. By providing a structured framework and pre-configured templates, organizations can skip time-consuming manual setups and start their cloud journey quickly. This allows the company to focus on core business objectives, reduce deployment cycles, and derive value from Azure’s services sooner.

Enhanced Security and Compliance

Security is a top concern when migrating applications to the cloud. Azure Landing Zones help address these concerns by providing a solid foundation for security and compliance best practices. With predefined security policies and controls, organizations can ensure consistent security configurations across their cloud environment. This includes identity and access management, network security, data protection, and compliance with industry regulations.

Standardized Governance

Maintaining governance and control in a cloud environment can be complex, especially as the infrastructure scales. Azure Landing Zones establish standardized governance models, enabling a centralized approach to managing resources, access permissions, and cost controls. By adopting these predefined governance policies, companies can avoid shadow IT and maintain full visibility and control over their cloud assets.

Improved Cost Management

Proper implementation of Azure Landing Zones allows organizations to optimize cloud costs effectively. By following best practices for resource organization and using Azure’s cost management tools, businesses can track their cloud spending, identify cost-saving opportunities, and avoid unexpected expenses.

Increased Scalability and Flexibility

Azure Landing Zones are designed to accommodate future growth and changing business requirements seamlessly. By setting up a scalable and flexible foundation, companies can expand their cloud infrastructure to meet the evolving needs of their applications without encountering bottlenecks or architectural constraints.

Streamlined Collaboration

For companies with multiple teams or departments involved in the migration process, Azure Landing Zones provide a standardized framework that fosters collaboration and communication. This shared approach ensures that everyone follows the same guidelines, leading to consistent results and a smoother migration experience.

Azure Landing Zone Architecture

The architecture of an Azure landing zone is designed to be flexible and adaptable, catering to various deployment requirements. Its modular and scalable nature enables consistent application of configurations and controls across all subscriptions. By utilizing modules, specific components of the Azure landing zone can be easily deployed and adjusted as your needs evolve over time.
 
The conceptual architecture of the Azure landing zone, depicted below, serves as a recommended blueprint, providing an opinionated and target design for your cloud environment. However, it should be viewed as a starting point rather than a rigid framework. It is essential to tailor the architecture to align with your organization’s unique needs, ensuring that the Azure landing zone perfectly fits your requirements.
 
Conceptual Architecture Diagram (Click the image below to expand it).
 

Landing Zone Types

An Azure Landing Zone can be either a Platform Landing Zone or an Application Landing Zone. A more detailed explanation of their respective functions is valuable to gain a comprehensive understanding of their roles in cloud architecture.

Platform Landing Zones

Platform Landing Zones, also known as Foundational Landing Zones, provide the core infrastructure and services required for hosting applications in Azure. They are the initial building blocks that establish a well-structured and governed foundation for the entire cloud environment.
 
The primary focus of Platform Landing Zones is on creating a robust and scalable infrastructure to host applications. They address common requirements, such as identity and access management, networking, security, monitoring, and compliance. These landing zones provide shared services that are consumed by multiple application workloads.
 

Key Features of Platform Landing Zones

Below are some key features of Platform Landing Zones:

  • Identity and Access Management: Platform Landing Zones set up centralized identity and access control mechanisms using Microsoft Entra ID (formerly known as Azure Active Directory or AAD), to manage user identities and permissions effectively.
  • Networking: They establish virtual networks, subnets, and network security groups to ensure secure communication and connectivity between various resources.
  • Security and Compliance: Platform Landing Zones implement security best practices and policies to protect the cloud environment and ensure compliance with industry standards and regulations.
  • Governance and Cost Management: Platform Landing Zones include resource organization, tagging, and governance mechanisms to facilitate cost allocation, tracking, and optimization.
  • Shared Services: Platform Landing Zones may include shared services like Azure Policy, Azure Monitor, and Azure Log Analytics to ensure consistent management and monitoring.

Application Landing Zones

Application Landing Zones focus on the specific requirements of individual applications or application types. They are designed to host and optimize the deployment of a particular application workload in Azure.
 
The primary focus of Application Landing Zones is on the unique needs of applications. They address factors such as application architecture, performance, scalability, and availability. Each Application Landing Zone is tailored to meet the demands of a specific application or application family.
 

Key Features of Application Landing Zones

Below are some key features of Application Landing Zones:

  • Application Architecture: Application Landing Zones include resources and configurations specific to the application’s architecture, such as virtual machines, containers, or serverless functions.
  • Performance Optimization: Application Landing Zones may implement caching mechanisms, content delivery networks (CDNs), or other optimizations to enhance application performance.
  • Scalability and Availability: They leverage Azure’s auto-scaling capabilities, load balancers, and availability sets or zones to ensure the application can handle varying workloads and maintain high availability.
  • Data Storage and Management: Application Landing Zones include configurations for databases and data storage solutions, such as Azure SQL Database, Azure Cosmos DB, or Azure Blob Storage, depending on the application’s data requirements.
  • Application-Specific Security: Application Landing Zones may have customized security settings and access controls based on the application’s sensitivity and compliance requirements.

Platform vs. Application Landing Zones Summary

In summary, Platform Landing Zones focus on providing a standardized and governed foundation for the entire cloud environment, addressing infrastructure and shared services needs. They set the stage for consistent management, security, and cost optimization across the organization’s Azure resources. On the other hand, Application Landing Zones concentrate on tailoring the cloud environment to suit the specific requirements of individual applications, optimizing performance, scalability, and data management for each workload.
 

Both Platform Landing Zones and Application Landing Zones play crucial roles in a successful Azure cloud adoption strategy. Platform Landing Zones ensure the overall health and governance of the cloud environment, while Application Landing Zones cater to the unique needs of diverse application workloads, enabling efficient and optimized hosting of applications in Azure.

Conclusion

In conclusion, embracing Azure Landing Zones is a strategic move for any company preparing to migrate their applications to the Microsoft Azure Cloud. With these predefined best practices and guidelines, organizations can streamline their cloud adoption process, ensure robust security and compliance, and optimize resource utilization. The benefits of proper Azure Landing Zones implementation extend beyond the initial migration phase, providing a foundation for scalable growth and seamless management of cloud resources. As a cloud solutions architect, understanding the value of Azure Landing Zones will empower you to guide businesses towards a successful and rewarding cloud journey with Microsoft Azure. For more information regarding Azure Landing Zones you can explore the documentation on Microsoft Learn.

July 21, 2023 0 comment
0 FacebookTwitterPinterestEmail
Development PostsServerless

Durable Functions Workflows

by Mario Mamalis July 6, 2023
written by Mario Mamalis

In this post we will examine the application of the workflow patterns covered in the Durable Functions Fundamentals part of this series. I will demonstrate a sample application, showcase how to setup the demo solution, and explain important parts of the code. I strongly suggest that you read the first post of this series before continuing here.

Solution Overview

The solution I developed for this demonstration is comprised by two separate applications. One is an Azure Durable Functions App, developed to run in Isolated Worker Mode and the other is a hosted Blazor WebAssembly App, used to visualize the workflow patterns. I used a hosted Blazor WebAssembly App because I wanted to create a SignalR Hub for real time updates to the user interface as the Functions run.

Together these applications make up the Coffee Shop Demo. Imagine a fictitious coffee shop where clients place their coffee orders at the register. After a coffee order is placed there are automated coffee machines that process the order and prepare the coffees, using the following steps: Grind, Dose, Tamp and Brew. All these operations depend on the coffee specifications. The coffee properties are 

  • Coffee Type: Espresso, Cappuccino and Latte
  • Intensity: Single, Double, Triple and Quad
  • Sweetness: None, Low, Medium and Sweet
Depending on the specs of each coffee, the automated coffee machine will execute the necessary steps and report the progress to the web application utilizing SignalR web sockets. Through this process we will be able to see how using the different workflow patterns affect the behavior and performance of the coffee machines.
 
Isolated Worker Mode

I would like to provide some insights about the decision to develop and run the Functions using the .NET isolated worker. The isolated worker enables us to run Functions apps using the latest .NET version. The other alternative we have, the Azure Functions In-Process mode, supports only the same .NET version as the Functions runtime. This means that only LTS versions of .NET are supported. At the time this solution was created .NET 7 was the latest version available but the Functions runtime was supporting .NET 6. 

Some of the benefits of using the isolated worker process are:
  • Fewer conflicts between the assemblies of the Functions runtime and the application runtime.
  • Complete control over the startup of the app, the configurations and the middleware.
  • The ability to use dependency injection and middleware.
Workflow Pattern Demonstrations
In the following short videos I will demonstrate the workflow patterns by running the applications and visualizing the differences.
Function Chaining
First we will take a look at the Function Chaining pattern. Using this pattern we will simulate a coffee order with 5 coffees of different type, intensity and sweetness. The Function Chaining Orchestrator will call four Activity Functions corresponding to the coffee making steps (Grind, Dose, Tamp and Brew), in sequential fashion one after the other for each coffee. In order to simulate the order entry I make an HTTP call to a Function Starter function which in turn calls the Orchestrator.

Fan out/fan in

Now we will take a look at the Fan out/fan in pattern. Using this pattern we will simulate a coffee order with the same 5 coffees we used before. This time however, the Fan Out Fan In Orchestrator will call the Coffee Maker Orchestrator for each coffee in the order in parallel. To do this I use the Coffee Maker Orchestrator as a sub-orchestrator. Again we will utilize Postman to make an HTTP call to the appropriate Starter Function. The difference in processing speed is clear! 

Note: I did not use the Chaining Orchestrator as a sub-orchestrator because the Chaining Orchestrator is a standalone Orchestrator that accepts a coffee order. The Coffee Maker Orchestrator accepts a coffee object and executes the steps for one coffee, which is more appropriate as a sub-orchestrator.

Human Interaction

The final demonstration will be the Human Interaction pattern. With this pattern we can pause the Durable Function Orchestrator at any point we want to receive an external event. The coffee order will be placed, but the coffee machine will not execute the order until the payment is received. To demonstrate that, I initiate the order and then after I get the appropriate prompt, I make another HTTP call using a predefined event and a specific URL for this purpose. The Human Interaction Orchestrator will intercept that event and will continue processing the order by calling the Coffee Maker sub-orchestrator.

Steps to Create the Durable Functions App

I used Visual Studio 2022 to create the Durable Functions App. The following snapshots demonstrate the process of creating the solution. Just click on them to enlarge.

Create New Durable Functions App PROJECT

1. Create New Project
2. Project Name
3. Select Runtime
4. Add Durable Orchestrator
5. Select Trigger

Durable Functions App Solution Structure

After creating the solution, you will get the default files the template creates for you. Below is a snapshot of the solution in its final form after all the code was completed along with high level descriptions of the different classes.

Common Folder

Here you can find common Activity Functions, Models, SignalR classes, a Coffee Price Calculator and Enums.

Activity Functions

These are the functions that simulate the steps to prepare a coffee. Activity Functions is where you normally code your application logic. (Please view the first post in the series to see detailed information about different Function types).

Models

These are the POCO objects that represent the payloads passed in the Functions throughout the project. We have two objects. CoffeOrder and Coffee. The relationship is one-to-many where one CoffeeOrder can have many Coffees.
 

SignalR Classes

Here you will find all classes necessary to enable communication through SignalR. These classes are organized using composition and inheritance, using the appropriate SignalR client and exposing methods to send messages to the SignalR service I have provisioned on Azure. (I will not get into details about SignalR in this post as it is not the main topic).
 

Coffee Price Calculator

The CoffeePriceCalculator is a static class with static methods used to calculate the price of coffees and coffee orders.
 

Enums

The Enums class contains all enumerations used in the code such as the coffee type, sweetness and different statuses.

 

Patterns Folder

In this folder you will find all the Starters and Orchestrators, organized in subfolders for each pattern. As you can see we have Chaining, FanOutFanIn, and HumanInteraction subfolders containing the appropriate classes. 

The HumanInteraction subfolder contains 3 additional classes: An Activity Function (CalculateOrderPrice) that utilizes the Static CoffeePriceCalculator class to calculate the prices, another Activity Function (SendPaymentRequest) that simulates a delay of 2 seconds for sending a request for payment and a Model (ProcessPaymentEventModel) that is expected to be be included in the request made by the Human interacting with the orchestration. That model contains a property of PaymentStatus which is an enumeration. If the PaymentStatus passed to the event is “Approved”, then the orchestrator will continue the work and prepare the coffees.

Finally the CoffeeMaker subfolder contains the sub-orchestrator used by the FanOutFanIn and HumanInteraction Orchestrator Functions.

Important Code Sections

Now let’s go over some important code to help you understand how things come together. (The line numbers shown below are not the same as the line numbers on the actual code found on GitHub, for the obvious copy-paste reason).
 

Activity Functions

The code below is the Brew Activity Function. We can see that it has a constructor that accepts a CoffeeProcessStatusPublisher which is an implementation of the ICoffeeShopHubClient. As you can see we use Dependency Injection just like we would in any normal .NET 7 application. This is trivial at this point because we use the Isolated Worker Mode, as I explained in the beginning. This injected object is found in many other Functions and it is used to publish messages to SignalR so that we can visualize them eventually in the Blazor Web App.

You can see that the Activity Function has a name attribute. This  is very important because that is how we invoke Activity Functions from Orchestrators. We can also see that the Run method is an aync method and it has an ActivityTrigger attribute to indicate the type of Function it is. It accepts one object of type Coffee as the first parameter and then a FunctionContext as the second parameter. We can only pass a single input parameter to an Activity Function (other than the ontext). In this case we encapsulate all the properties we need in the POCO class Coffee and we are good to go. 

The FunctionContext is not used in this scenario but it could be used to call other Activity Functions if needed. If we were using the In-Process mode we would be using the Context to extract the input passed into the function since in that version parameters cannot be passed directly into the Function. The Context would be of a different type as well (IDurableActivityContext).

As mentioned above all the Activity Functions used in this project are simple and simulate the steps of the coffee making process. In a real scenario you can have more complex code. It is however recommended that you keep the function code simple and use well known design patterns to push the complexity in other class libraries.

public class Brew
{
    private readonly ICoffeeShopHubClient _coffeeProcessStatusPublisher;

    public Brew(ICoffeeShopHubClient coffeeProcessStatusPublisher)
    {
        _coffeeProcessStatusPublisher = coffeeProcessStatusPublisher;
    }

    [Function(nameof(Brew))]
    public async Task<Coffee> Run([ActivityTrigger] Coffee coffee, FunctionContext executionContext)
    {
        ILogger logger = executionContext.GetLogger(nameof(Brew));

        logger.LogInformation($"Brewing {coffee.Type} coffee  {coffee.Id} for order: {coffee.OrderId} ...");

        await Task.Delay((int)coffee.Intensity * 1000);

        coffee.Status = CoffeeStatus.Brewed;
        coffee.Message = "Brewing Completed";

        logger.LogInformation($"Brewing completed for {coffee.Type} coffee  {coffee.Id} of order: {coffee.OrderId}.");

        await _coffeeProcessStatusPublisher.UpdateCoffeeProcessStatus(coffee);

        return coffee;
    }
}

Starter Functions (Durable Clients)

The following code is a representation of a Starter Function otherwise known as a Durable Client Function. You can see that again we have a function name, a trigger (which in this case is an HttpTrigger) and two parameters: The DurableTaskClient and the FunctionContext.

This is one of the Functions I call using Postman using HTTP. The main purpose of a Durable Client Function is to kick of an Orchestrator Function. We can see that happening on lines 15 and 16. Before that, we can see that we read the request body and extract (deserialize) the CoffeOrder, which if you can remember from the videos, was passed in as a JSON object.

public static class ChainingStarter
{
    [Function(nameof(StartChainingOrchestrator))]
    public static async Task<HttpResponseData> StartChainingOrchestrator(
        [HttpTrigger(AuthorizationLevel.Anonymous, "get", "post")] HttpRequestData req,
        [DurableClient] DurableTaskClient client,
        FunctionContext executionContext)
    {
        ILogger logger = executionContext.GetLogger("StartChainingOrchestrator");

        string requestBody = await new StreamReader(req.Body).ReadToEndAsync();
        var coffeeOrder = JsonConvert.DeserializeObject<CoffeeOrder>(requestBody);

        // Function input comes from the request content.
        string instanceId = await client.ScheduleNewOrchestrationInstanceAsync(
            nameof(ChainingOrchestrator), coffeeOrder);

        logger.LogInformation("Started Coffee Order Orchestration with ID = '{instanceId}'.", instanceId);

        // Returns an HTTP 202 response with an instance management payload.
        // See https://learn.microsoft.com/azure/azure-functions/durable/durable-functions-http-api#start-orchestration
        return client.CreateCheckStatusResponse(req, instanceId);
    }
}

Orchestrator Function: ChainingOrchestrator

In the first pattern demonstrated we utilize the ChainingOrchestrator. This first piece of code shows the constructor dependency injection, the function name and the appropriate trigger used (OrchestrationTrigger).

On line 14 you can see that I use a CreateReplaySafeLogger. If you rember from the first post of this series, there are many code constraints we have to follow in Orchestrator Functions because the Orchestrator stops and re-starts frequently. The code cannot create any ambiguity. To ensure reliable execution of the orchestration state, Orchestrator Functions must contain deterministic code; meaning it must produce the same result every time it runs. 

On line 15 I use the context to get the CoffeeOrder input parameter, however I could have passed it in as a parameter in the signature as well.

The code within the Try block starts the actual orchestration logic. I wanted to point out lines 29 and 36 as samples of how we call Activity Functions. You can see that we use the name of the function and we pass the input as a parameter.

public class ChainingOrchestrator
{
    private readonly ICoffeeShopHubClient _coffeeProcessStatusPublisher;

    public ChainingOrchestrator(ICoffeeShopHubClient coffeeProcessStatusPublisher)
    {
        _coffeeProcessStatusPublisher = coffeeProcessStatusPublisher;
    }

    [Function(nameof(ChainingOrchestrator))]
    public static async Task<CoffeeOrder> RunOrchestrator(
        [OrchestrationTrigger] TaskOrchestrationContext context)
    {
        ILogger logger = context.CreateReplaySafeLogger(nameof(ChainingOrchestrator));
        var coffeeOrder = context.GetInput<CoffeeOrder>();

        try
        {
            if (coffeeOrder == null)
            {
                coffeeOrder = new CoffeeOrder
                {
                    Status = OrderStatus.Failed,
                    Message = "Coffee order not specified."
                };

                logger.LogInformation(coffeeOrder.Message);

                await context.CallActivityAsync("UpdateCoffeeOrderStatus", coffeeOrder);

                return coffeeOrder;
            }

            coffeeOrder.Status = OrderStatus.Started;
            coffeeOrder.Message = $"Started processing coffee order {coffeeOrder.Id}.";
            await context.CallActivityAsync("UpdateCoffeeOrderStatus", coffeeOrder);

In the following code below (still in the same file), you can see how I chain the function calls. I iterate the coffee collection of the coffee order and invoke each Activity Function (coffee making step).

foreach (var coffee in coffeeOrder.Coffees)
{
    coffee.Message = $"Started making {coffee.Type} coffee {coffee.Id} for order {coffee.OrderId}.";
    await context.CallActivityAsync("UpdateCoffeeMakerStatus", coffee);

    var processedCoffee = await context.CallActivityAsync<Coffee>(nameof(Grind), coffee);
    processedCoffee = await context.CallActivityAsync<Coffee>(nameof(Dose), processedCoffee);
    processedCoffee = await context.CallActivityAsync<Coffee>(nameof(Tamp), processedCoffee);
    processedCoffee = await context.CallActivityAsync<Coffee>(nameof(Brew), processedCoffee);

    if (processedCoffee.Status == CoffeeStatus.Brewed)
    {
        processedCoffee.Message = $"{processedCoffee.Type} coffee {processedCoffee.Id} " +
            $"for order {processedCoffee.OrderId} is ready.";

        await context.CallActivityAsync("UpdateCoffeeMakerStatus", processedCoffee);
    }
}

Orchestrator Function: FanOutFanInOrchestrator

The following code is the most important part of the Fan Out Fan In pattern. First I create a List of Tasks of type Coffee (line 1). Then I iterate the coffee order and assign a Task to the return type of the sub-orchestrator function (CoffeeMakerOrchestrator). Each task returns a Coffee object that contains the status. Then on line 13 I await all Tasks to complete and when that is done the orchestrator finishes execution (a few steps below, not shown here).
 
This parallel execution of the sub-orchestrator makes the processing simulation prepare all coffees together. As you can see in the visualization in the video, the execution is a lot faster. The code of the sub-orchestrator is very similar to the Chaining Orchestrator code I showed above. The only difference is that the CoffeeMakerOrchestrator (used as a sub-orchestrator here), accepts a coffee instead of a coffee order. In this way I can do the parallel execution calls in the parent Orchestrator where they actually belong.

 

var coffeeTasks = new List<Task<Coffee>>();

coffeeOrder.Status = OrderStatus.Started;
coffeeOrder.Message = $"Started processing coffee order {coffeeOrder.Id}.";
await context.CallActivityAsync("UpdateCoffeeOrderStatus", coffeeOrder);

foreach (var coffee in coffeeOrder.Coffees)
{
    var task = context.CallSubOrchestratorAsync<Coffee>(nameof(CoffeeMakerOrchestrator), coffee);
    coffeeTasks.Add(task);
}

var coffeeMessages = await Task.WhenAll(coffeeTasks);

coffeeOrder.Status = OrderStatus.Completed;
coffeeOrder.Message = $"Coffee order {coffeeOrder.Id} is ready.";
await context.CallActivityAsync("UpdateCoffeeOrderStatus", coffeeOrder);

Orchestrator Function: HumanInteractionOrchestrator

This code below shows several interesting points of this pattern and the capability of the Durable Functions to wait for external events.

In line 1 you can see the call to the Activity Function that calculates the price. Immediately after the that I send a payment request and update the Coffee Order Status appropriately. On line 9 we see the code that sets the Orchestrator to sleep, waiting for an external event with name “ProcessPayment”, to trigger the rest of the code to execute. Once the event is received I examine the code, and only if the Payment Status is Approved, I proceed with the coffee making process.

After this point the code is identical to the Fan Out Fan in Pattern.

coffeeOrder = await context.CallActivityAsync<CoffeeOrder>("CalculateOrderPrice", coffeeOrder);

await context.CallActivityAsync("SendPaymentRequest", coffeeOrder);

coffeeOrder.Status = OrderStatus.Placed;
coffeeOrder.Message = $"Coffee order {coffeeOrder.Id} was placed. Total Cost: {coffeeOrder.TotalCost:C}. Waiting for payment confirmation...";
await context.CallActivityAsync("UpdateCoffeeOrderStatus", coffeeOrder);

var response = await context.WaitForExternalEvent<ProcessPaymentEventModel>("ProcessPayment");

if (response.PaymentStatus == PaymentStatus.Approved)
{
    coffeeOrder.Status = OrderStatus.Placed;
    coffeeOrder.Message = $"Payment received for coffee order {coffeeOrder.Id}";
    await context.CallActivityAsync("UpdateCoffeeOrderStatus", coffeeOrder);

    var coffeeTasks = new List<Task<Coffee>>();

    coffeeOrder.Status = OrderStatus.Started;
    coffeeOrder.Message = $"Started processing coffee order {coffeeOrder.Id}.";
    await context.CallActivityAsync("UpdateCoffeeOrderStatus", coffeeOrder);

    foreach (var coffee in coffeeOrder.Coffees)
    {
        var task = context.CallSubOrchestratorAsync<Coffee>(nameof(CoffeeMakerOrchestrator), coffee);
        coffeeTasks.Add(task);
    }

If you can remember from the corresponding video, when I called the Human Interaction HTTP endpoint with a POST, I received the response shown below. Line 4 in the response has the URL for posting events to this particular instance of the Orchestrator. If you scroll to the right you can see the {eventName} route parameter. I typed the name of the event there (ProcessPayment) and then made a POST call to the endpoint. At that point, this particular Orchestrator instance received the event and continued the code execution. This is pretty powerful!

{
    "id": "c7ccbd737d194f7cbf66995b8bcc3e03",
    "purgeHistoryDeleteUri": "https://func-durable-functions-demo.azurewebsites.net/runtime/webhooks/durabletask/instances/c7ccbd737d194f7cbf66995b8bcc3e03?code=7WrfquVTHSs9yXTv9Rk0-NacLhqEHrV2mozAHL-jYGTcAzFuA7_Erg==",
    "sendEventPostUri": "https://func-durable-functions-demo.azurewebsites.net/runtime/webhooks/durabletask/instances/c7ccbd737d194f7cbf66995b8bcc3e03/raiseEvent/{eventName}?code=7WrfquVTHSs9yXTv9Rk0-NacLhqEHrV2mozAHL-jYGTcAzFuA7_Erg==",
    "statusQueryGetUri": "https://func-durable-functions-demo.azurewebsites.net/runtime/webhooks/durabletask/instances/c7ccbd737d194f7cbf66995b8bcc3e03?code=7WrfquVTHSs9yXTv9Rk0-NacLhqEHrV2mozAHL-jYGTcAzFuA7_Erg==",
    "terminatePostUri": "https://func-durable-functions-demo.azurewebsites.net/runtime/webhooks/durabletask/instances/c7ccbd737d194f7cbf66995b8bcc3e03/terminate?reason={{text}}}&code=7WrfquVTHSs9yXTv9Rk0-NacLhqEHrV2mozAHL-jYGTcAzFuA7_Erg=="
}

Azure Resources

Both the Web Application and the Durable Functions Application shown during the demo have been running on resources provisioned on Azure. below you can see the resources used for both applications.

Conclusion

In this post I presented three important workflow patterns we can implement using Durable Functions. I hope this triggered your interest to investigate Durable Functions further. With Durable Functions we can develop complex workflows without the need of a messaging bus. Of course such a decision has pros and cons, but having all the Orchestration code organized in one place and not having to rely on other technologies is a pretty powerful argument. All the code I developed for this post can be found here: CoffeeShopDemo and DurableFunctionsDemo.
July 6, 2023 0 comment
0 FacebookTwitterPinterestEmail
Development PostsServerless

Durable Functions Fundamentals

by Mario Mamalis December 5, 2022
written by Mario Mamalis

Durable Functions is an extension of Azure Functions that provides additional functionality for programming scenarios requiring stateful workflows. With Durable Functions we can simplify the implementation of workflow related programming patterns that would otherwise require more complex setups.

Serverless Computing

Let’s take a few steps back and talk about Serverless computing on Azure first. Serverless computing, is one of the compute options we have available on Azure. It enables developers to go from Code to Cloud faster by removing dependencies on any type of infrastructure. Instead of worrying where and how the code runs, developers only have to focus on coding and adding business value to the applications they are developing.

Of course this does not work by magic. The code still has to run on servers, however Azure is taking care of provisioning, scaling and managing the hardware and the operating systems behind the scenes. The serverless notion is from the point of view of the developers. This is a very powerful and exciting capability to have! Azure provides several serverless compute offerings such as: Azure Functions, Logic Apps, Serverless Kubernetes (AKS) and most recently Azure Container Apps. I will talk more about those in different posts.

Azure Functions Overview

As I mentioned in the beginning, Durable Functions is an extension of Azure Functions. I am assuming that most developers already have some idea about what Azure Functions are and I will not be covering the basics in detail. In case you want to learn more about Azure Functions, there is plenty of information on the internet. You can start here.

As a brief overview, with Azure Functions you can deploy blocks of code on a Functions as a Service (FaaS) environment. Each function can be executed using different triggers such as a timer, a queue message, an http request and others. The code can be written in several supported languages such as C#, F#, JavaScript, Python and more. One of the most important characteristics of Azure Functions is that that they offer Bindings for easy integration with other services such as Blob Storage, Cosmos DB, Service Bus, Event Grid and many others.

Introducing Durable Functions

As powerful and convenient the basic capabilities of Azure Functions are, they do leave some room for improvement. That’s where the Durable Functions extension comes into play.

Many times we have the need to execute Azure Functions in relation to the execution of previously executed ones, as part of a larger workflow. Sometimes we want to pass the output of one function as an input to the next one, make decisions on what functions to call next, handle errors and possibly roll back the work of previously executed functions etc. You might think OK, we can already handle such scenarios by adding a message broker such as Service Bus into our architecture. That is true, however Durable Functions provide a simpler and more powerful alternative.

With Durable Functions we can orchestrate the execution of functions in a workflow defined in code. This way we never have to leave the orchestration context and we can simplify our architecture. It is also easy to have a complete picture of the workflow just by looking at the code in the orchestrator.

Durable Function Types

The Durable Functions extension enables the use of special function types. Each function type has a unique purpose within a workflow.

Activity Functions

Activity Functions are the components that execute the main code of the workflow. This is where the main tasks execute. There are no restrictions on the type of code we write in Activity Functions. We can access databases, interact with external services, execute CPU intensive code etc. If you think of a workflow, all the code that executes specific tasks should be in Activity Functions. The Durable Task Framework guarantees the execution of an Activity Function at least once within the context of an orchestration instance.

Activity Functions are configured using an Activity Trigger through the use of an Activity Trigger Attribute. By default Activity Functions accept an IDurableActivityContext parameter as input, but we can also pass primitive types and custom objects as long as they are JSON-serializable. One caveat here is that we can only pass a single value in the signature. If you want to pass multiple values, then certainly use custom objects.

The fact that Activity Functions contain most of the action does not mean that all the code to accomplish a task should be in the function itself. I encourage you to develop your projects with any architectural style you are comfortable with, and separate your code into different layers and classes, as you would do for an API project for example. My favorite style of code organization is Clean Architecture, and I have developed very complex Durable Function based services using the full potential of that. Treat your Activity Functions as you treat API Controllers in an API project. Keep the code to a minimum in the Activity Function itself and invoke code in different layers.

Entity Functions

Entity Functions, also known as Durable Entities are a special type of Durable Function, available in version 2.0 and above. They are meant for maintaining small pieces of state and making that state available for reading and updating. They are very similar to virtual actors inspired by the actor model. Entity Functions are particularly useful in scenarios requiring keeping track of small pieces of information for thousands or hundreds of thousands of objects; for example the score of players in a computer game or the status of an IoT device. They provide a means to scale out applications by distributing state and work across many entities.

Entity Functions are defined with a special trigger, they Entity Trigger. They are accessed through a unique identifier, the Entity ID. The Entity ID consists of the Entity Name and the Entity Key. Both are strings. The name should match the Entity Function name and the key must be unique among all other entity instances that have the same name, therefore it is safest to use a GUID. My recommendation is to use Entity Functions like we use classes and methods. We can define properties that hold the state and methods that perform operations on that state. For example we can have an Entity Function called “BankAccount” with a property “Balance” and methods “Deposit”, “Withdraw”, and “GetBalance”.

Entity Functions can be called using one-way communication, as in call the function and not wait for a response, or two-way communication when we want to get a response. Entities can be called from Orchestrator Functions, Client Functions, Activity Functions or other Entity Functions, but not all forms of communications are supported by all contexts. From within clients we can signal (call one-way) an entity and we can read the entity state. From within orchestrators we can signal (one-way) and call entities (two-way). From other entities we can only signal (one-way) other entities.

Client Functions

Client Functions are the functions that call/trigger Orchestrator Functions or Entity Functions. Orchestrator and Entity Functions cannot be triggered directly. They must instead receive a message by a Client Function. Client Functions in other words are starter functions. Any non-orchestrator function can be a Client Function as long as it uses a Durable Client output binding. The code below shows an example of a Client Function that is the starter of an Orchestrator Function.

public static class CoffeeMakerStarter
{
    [FunctionName("CoffeeMakerStarter")]
    public static async Task<IActionResult> HttpStart(
        [HttpTrigger(AuthorizationLevel.Anonymous, "get", "post")] HttpRequest req,
        [DurableClient] IDurableOrchestrationClient starter,
        ILogger log)
    {
        string requestBody = await new StreamReader(req.Body).ReadToEndAsync();
        var coffeeOrder = JsonConvert.DeserializeObject<CoffeeOrder>(requestBody);  

        // Function input comes from the request content.
        string instanceId = await starter.StartNewAsync<object>("CoffeeMakerOrchestrator", coffeeOrder);

        log.LogInformation($"Started orchestration with ID = '{instanceId}'.");

        return starter.CreateCheckStatusResponse(req, instanceId);
    }
}

Orchestrator Functions

Orchestrator Functions orchestrate the execution sequence of other Durable Function types using procedural code within a workflow. They can contain conditional logic and error handling code, they can call other functions synchronously and asynchronously, they can take the output of one function and pass it in as input in subsequent functions and they can even initiate sub-orchestrations. In general they contain the orchestration code of a workflow. With Orchestrator Functions we can implement complex patterns such as Function Chaining, Fan Out/Fan In, Async HTTP APIs, Monitor, Human Interaction and Aggregator. We will explore the theory behind the most important of these patterns later on in this post, and sample code will be provided in subsequent posts.

Orchestrator Code Constraints

Code in Orchestrator Functions must comply to certain constraints. Failure to honor these constraints will result in unpredictable behavior. Before we list what those constraints are, it is important to understand why this is the case.

Under the covers, Orchestrators utilize the Durable Task Framework, which enables long running persistence in workflows using async/await constructs. In order to maintain persistence and be able to safely replay the code and continue from the appropriate point in the workflow, the Durable Functions extension uses Azure Storage Queues to trigger the next Activity Function in the workflow. It also uses Storage Tables to save the state of the orchestrations in progress. The entire orchestration state is stored using Event Sourcing. With Event Sourcing, rather than storing only the current state, the whole execution history of actions that result in the current state is stored. This pattern enables the Safe Replay of the orchestration’s code. Safe replay means that code that was executed before within the context of a particular orchestration instance, will not be executed again. For more clarity consider the following diagram:

This diagram shows a typical workflow involving an Orchestrator and two Activity Functions. As you can see, the Orchestrator sleeps and wakes up several times during the workflow. Every time it wakes up, it replays (re-executes) the entire code from the start to rebuild the local state. While the Orchestrator is doing that, the Durable Task Framework, examines the execution history stored in the Azure Storage Tables and if the code encounters an Activity Function that has already been executed, it replays that function’s result and the Orchestrator continues to run until the code is finished or until a new a new activity needs to be triggered.

To ensure reliable execution of the orchestration state, Orchestrator Functions must contain deterministic code; meaning it must produce the same result every time it runs. This imposes certain code constraints such as:

  • Do not generate random numbers or GUIDs
  • Do not ask for current dates and times. If you have to, then use IDurableOrchestrationContext.CurrentUtcDateTime
  • Do not access data stores such as databases
  • Do not look for configuration settings
  • Do not write blocking code such as I/O code or Thread related code
  • Do not perform any async operations such as Task.Run, HttpClient.SendAsync etc
  • Do not use any bindings including orchestration client and entity client
  • Avoid using static variables
  • Do not use environment variables

This is not a comprehensive list but you can already get the idea that code in Orchestration Functions should be limited only to workflow orchestrations code. All these things you cannot do in Orchestrators you can and should be doing in Activity or Entity Functions that are invoked by the Orchestrators. For a full list and a lot more details about these constraints you can go to the Microsoft Learn site here.

Workflow Patterns

Now that we have learned the basic building blocks of Durable Functions we can begin exploring the different patterns that become available for us to implement. As a reminder, in this post we will only be covering the theory behind some of these patterns. Actual sample implementations will follow in subsequent posts.

Pattern 1: Function Chaining

The first pattern we can implement using Durable Functions is the Function Chaining pattern. Imagine any scenario of a workflow that requires multiple sequential steps to happen to accomplish a goal. Each step depends on the previous step to be completed before it can be executed, and steps may or may not require as input the output of a previous step.

In the following diagram we can see a Function Chaining pattern of a fictitious automatic espresso coffee maker machine. The pattern utilizes a Client Function (starter) that triggers an Orchestrator Function and the Orchestrator executes 4 Activity Functions in sequence.

The power that comes with Durable Functions and specifically the orchestration, is that  we can now introduce complex logic and error handling to gracefully handle errors in any step of the workflow or execute different tasks based on certain activity outputs. This would be much harder to do if we used regular functions that called one another. In addition to that we also have the benefits of the Durable Task Framework, that allows the workflow to go to sleep while an activity executes and then wake up and execute the appropriate next step.

Pattern 2: Fan out/fan in

Another powerful pattern we can implement is the Fan out/fan in pattern. This pattern fits well in scenarios when we want to execute multiple similar tasks in parallel, and then wait for them all to complete before we execute the next task. For example let’s consider now that the automated coffee maker machine can prepare multiple coffees. As the owner of the coffee shop, when I get an order from a group of 3 people, I want to prepare the 3 coffees at the same time and serve them together. In Function terms, what we can do is wrap the coffee workflow in one orchestration and then kick off 3 instances of that orchestration from a parent orchestrator, making it in a sense a sub-orchestration. This is depicted in the following diagram.

Pattern 3: Human Interaction

So far we have seen patterns that provide fully automated solutions. But how about workflows that require human intervention? Well there is a way to solve those types of scenarios as well. The Durable Functions Orchestrator context exposes a method called WaitForExternalEvent(). This method also accepts a TimeSpan parameter so that we can set a timer specifying how long we would wait for an external event to happen before continuing the workflow. This is pretty powerful, and I should point out here that no extra charges incur if the function is running in a consumption plan while waiting.

So lets assume that we are building a workflow that handles an on-boarding process for a new employee. As we can see in the diagram below, the Orchestrator first gets the offer letter from storage and then sends an email to the candidate. At this point the Orchestrator goes to sleep and waits for human interaction, which in this case happens when the candidate clicks on the acceptance link inside the email. This link invokes a regular HTTP Trigger Function and this new function in turn invokes the Orchestrator by raising the RaiseEventAsync() event which is part of the IDurableOrchestrationClient interface.

Wrapping Up

As we have seen, Azure Durable Functions provide a lot of useful functionality that allows us to create powerful workflows. Serverless Computing is a revolutionary platform that we can use to building software on. It allows us to focus on coding and completely forget about the nuances of infrastructure and operating systems. We can create services that scale in and out automatically to meet demand, handle complex workflows, pay only for the compute power we use (in a consumption plan) and save time and effort because we do not have to maintain any of the infrastructure.

We have focused on the fundamental theory of the most important aspects of Durable Functions in this first post of the series. In subsequent posts we will dive deeper into the setup, development and deployment of Durable Functions. Stay tuned!

December 5, 2022 0 comment
0 FacebookTwitterPinterestEmail

Software Architect

Building software applications from code to cloud the right way is my passion. Sharing knowledge with my clients and peers is my joy.

Recent Posts

  • Data Science: Food Hub Data Analysis

    February 21, 2025
  • Breaking the Code

    November 16, 2023
  • Navigating the AI Landscape

    August 10, 2023
  • Case Study: Transformative Digital Solutions

    July 24, 2023
  • Azure Landing Zones

    July 21, 2023

Categories

  • Development Posts (7)
    • Artificial Intelligence (2)
    • Cloud Architecture (1)
    • Serverless (2)
    • Strategy (1)
    • Success Stories (1)
  • Development Videos (5)
    • DevOps (2)
    • Microservices (3)

@2022 - All Right Reserved.


Back To Top
Mario Mamalis
  • About Me
  • Development Posts
  • Development Videos
 

Loading Comments...