Category:

Artificial Intelligence

Artificial Intelligence Development Posts

Data Science: Food Hub Data Analysis

by Mario Mamalis February 21, 2025

written by Mario Mamalis

Explore a data science case study, showcasing data-driven strategies for exploring, cleaning and preparing data for statistical analysis and visualizations.

Project Foundations for Data Science: FoodHub Data Analysis¶

Context¶

The number of restaurants in New York is increasing day by day. Lots of students and busy professionals rely on those restaurants due to their hectic lifestyles. Online food delivery service is a great option for them. It provides them with good food from their favorite restaurants. A food aggregator company FoodHub offers access to multiple restaurants through a single smartphone app.

The app allows the restaurants to receive a direct online order from a customer. The app assigns a delivery person from the company to pick up the order after it is confirmed by the restaurant. The delivery person then uses the map to reach the restaurant and waits for the food package. Once the food package is handed over to the delivery person, he/she confirms the pick-up in the app and travels to the customer’s location to deliver the food. The delivery person confirms the drop-off in the app after delivering the food package to the customer. The customer can rate the order in the app. The food aggregator earns money by collecting a fixed margin of the delivery order from the restaurants.

Objective¶

The food aggregator company has stored the data of the different orders made by the registered customers in their online portal. They want to analyze the data to get a fair idea about the demand of different restaurants which will help them in enhancing their customer experience. Suppose you are hired as a Data Scientist in this company and the Data Science team has shared some of the key questions that need to be answered. Perform the data analysis to find answers to these questions that will help the company to improve the business.

Data Description¶

The data contains the different data related to a food order. The detailed data dictionary is given below.

Data Dictionary¶

order_id: Unique ID of the order
customer_id: ID of the customer who ordered the food
restaurant_name: Name of the restaurant
cuisine_type: Cuisine ordered by the customer
cost: Cost of the order
day_of_the_week: Indicates whether the order is placed on a weekday or weekend (The weekday is from Monday to Friday and the weekend is Saturday and Sunday)
rating: Rating given by the customer out of 5
food_preparation_time: Time (in minutes) taken by the restaurant to prepare the food. This is calculated by taking the difference between the timestamps of the restaurant’s order confirmation and the delivery person’s pick-up confirmation.
delivery_time: Time (in minutes) taken by the delivery person to deliver the food package. This is calculated by taking the difference between the timestamps of the delivery person’s pick-up confirmation and drop-off information

Let us start by importing the required libraries¶

In [9]:

# import libraries for data manipulation
import numpy as np
import pandas as pd
# import libraries for data visualization
import matplotlib.pyplot as plt
import seaborn as sns

Understanding the structure of the data¶

In [10]:

#Access the drive
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).

In [31]:

# read the data
df = pd.read_csv('/content/drive/MyDrive/MIT-ADSP/0131-Foundations-Python-Statistics/Week 2/Project - Food Hub/foodhub_order.csv')
# returns the first 5 rows
df.head()

Out[31]:

	order_id	customer_id	restaurant_name	cuisine_type	cost_of_the_order	day_of_the_week	rating	food_preparation_time	delivery_time
0	1477147	337525	Hangawi	Korean	30.75	Weekend	Not given	25	20
1	1477685	358141	Blue Ribbon Sushi Izakaya	Japanese	12.08	Weekend	Not given	25	23
2	1477070	66393	Cafe Habana	Mexican	12.23	Weekday	5	23	28
3	1477334	106968	Blue Ribbon Fried Chicken	American	29.20	Weekend	3	25	15
4	1478249	76942	Dirty Bird to Go	American	11.59	Weekday	4	25	24

Observations:¶

The DataFrame has 9 columns as mentioned in the Data Dictionary. Data in each row corresponds to the order placed by a customer.

Question 1: How many rows and columns are present in the data? [0.5 mark]¶

In [32]:

# Write your code here
rows, columns = df.shape
print(f"There are {rows} rows and {columns} columns present in the data.")

There are 1898 rows and 9 columns present in the data.

Observations:¶

There are 1898 rows and 9 columns present in the data. This means that there are 1898 orders that we can analyze.

Question 2: What are the datatypes of the different columns in the dataset? (The info() function can be used) [0.5 mark]¶

In [33]:

# Use info() to print a concise summary of the DataFrame
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1898 entries, 0 to 1897
Data columns (total 9 columns):
 #   Column                 Non-Null Count  Dtype  
---  ------                 --------------  -----  
 0   order_id               1898 non-null   int64  
 1   customer_id            1898 non-null   int64  
 2   restaurant_name        1898 non-null   object 
 3   cuisine_type           1898 non-null   object 
 4   cost_of_the_order      1898 non-null   float64
 5   day_of_the_week        1898 non-null   object 
 6   rating                 1898 non-null   object 
 7   food_preparation_time  1898 non-null   int64  
 8   delivery_time          1898 non-null   int64  
dtypes: float64(1), int64(4), object(4)
memory usage: 133.6+ KB

Observations:¶

order_id, customer_id, food_preparation_time and delivery_time are integers.
restaurant_name, cuisine_type, day_of_the_week and rating are strings.
cost_of_ther_order is a float (decimal)
rating should be a number. I will convert that at some point.

Question 3: Are there any missing values in the data? If yes, treat them using an appropriate method. [1 mark]¶

In [34]:

# Write your code here
df.isnull().sum()

Out[34]:

	0
order_id	0
customer_id	0
restaurant_name	0
cuisine_type	0
cost_of_the_order	0
day_of_the_week	0
rating	0
food_preparation_time	0
delivery_time	0

dtype: int64

Observations:¶

There were no columns on any rows without values.
There are instances of certain rows with “rating” missing and instead having the string “Not given”. In order to address this, I will convert the rating values to integer values later in this workbook since the rest of the ratings are integers (in string format before).

Question 4: Check the statistical summary of the data. What is the minimum, average, and maximum time it takes for food to be prepared once an order is placed? [2 marks]¶

In [35]:

# Write your code here
df.describe()

Out[35]:

	order_id	customer_id	cost_of_the_order	food_preparation_time	delivery_time
count	1.898000e+03	1898.000000	1898.000000	1898.000000	1898.000000
mean	1.477496e+06	171168.478398	16.498851	27.371970	24.161749
std	5.480497e+02	113698.139743	7.483812	4.632481	4.972637
min	1.476547e+06	1311.000000	4.470000	20.000000	15.000000
25%	1.477021e+06	77787.750000	12.080000	23.000000	20.000000
50%	1.477496e+06	128600.000000	14.140000	27.000000	25.000000
75%	1.477970e+06	270525.000000	22.297500	31.000000	28.000000
max	1.478444e+06	405334.000000	35.410000	35.000000	33.000000

In [36]:

# Write your code here
# Display only the statistics for "food_preparation_time"
food_prep_stats = df['food_preparation_time'].describe()
food_prep_stats

Out[36]:

	food_preparation_time
count	1898.000000
mean	27.371970
std	4.632481
min	20.000000
25%	23.000000
50%	27.000000
75%	31.000000
max	35.000000

dtype: float64
In [37]:

# Extract only the minimum, mean, and maximum values for "food_preparation_time"
food_prep_specific_stats = df['food_preparation_time'].agg(['min', 'mean', 'max']).round(2)
food_prep_specific_stats

Out[37]:

	food_preparation_time
min	20.00
mean	27.37
max	35.00

dtype: float64

Observations:¶

Food Prep Stats

Minimum Prep Time: 20 minutes
Average Prep Time: 27.37 minutes
Maximun Prep Time: 35 minutes

Question 5: How many orders are not rated? [1 mark]¶

In [38]:

# Write the code here
# Count the number of rows where "rating" is "Not given"
not_given_count = (df['rating'] == "Not given").sum()
not_given_count

Out[38]:

Observations:¶

There are 736 instances where rating was “Not given”.

In [40]:

# Convert 'Not given' to NaN
df['rating'] = pd.to_numeric(df['rating'], errors='coerce')
# Calculate the mean rating per restaurant
mean_ratings_per_restaurant = df.groupby('restaurant_name')['rating'].mean().round()
# Use the map function to replace NaN in 'rating' with the mean rating of the respective restaurant
df['rating'] = df.apply(
    lambda row: mean_ratings_per_restaurant[row['restaurant_name']] if pd.isna(row['rating']) else row['rating'],
    axis=1
)
# Check if there are still NaN values and replace them directly without inplace=True
overall_mean_rating = round(df['rating'].mean())
df['rating'] = df['rating'].fillna(overall_mean_rating)
# Convert 'rating' to integer
df['rating'] = df['rating'].astype(int)
df.head()

Out[40]:

	order_id	customer_id	restaurant_name	cuisine_type	cost_of_the_order	day_of_the_week	rating	food_preparation_time	delivery_time
0	1477147	337525	Hangawi	Korean	30.75	Weekend	4	25	20
1	1477685	358141	Blue Ribbon Sushi Izakaya	Japanese	12.08	Weekend	4	25	23
2	1477070	66393	Cafe Habana	Mexican	12.23	Weekday	5	23	28
3	1477334	106968	Blue Ribbon Fried Chicken	American	29.20	Weekend	3	25	15
4	1478249	76942	Dirty Bird to Go	American	11.59	Weekday	4	25	24

In [41]:

# Count the number of rows where "rating" is "Not given" just to prove that
# we addressed that issue.
not_given_count = (df['rating'] == "Not given").sum()
not_given_count

Out[41]:

Observations:¶

I converted all ratings to integers and assigned the mean (targeted mean per restaurant based on other orders) rounded to the nearest integer where the value was “Not given”. I will use these ratings

Exploratory Data Analysis (EDA)¶

Univariate Analysis¶

Question 6: Explore all the variables and provide observations on their distributions. (Generally, histograms, boxplots, countplots, etc. are used for univariate exploration.) [9 marks]¶

In [42]:

# Let's eyball the girst 5 rows again
df.head()

Out[42]:

	order_id	customer_id	restaurant_name	cuisine_type	cost_of_the_order	day_of_the_week	rating	food_preparation_time	delivery_time
0	1477147	337525	Hangawi	Korean	30.75	Weekend	4	25	20
1	1477685	358141	Blue Ribbon Sushi Izakaya	Japanese	12.08	Weekend	4	25	23
2	1477070	66393	Cafe Habana	Mexican	12.23	Weekday	5	23	28
3	1477334	106968	Blue Ribbon Fried Chicken	American	29.20	Weekend	3	25	15
4	1478249	76942	Dirty Bird to Go	American	11.59	Weekday	4	25	24

In [51]:

# Set plot style
sns.set_style("whitegrid")
# Creating a histogram for the cost of the orders
plt.figure(figsize=(10, 6))
sns.histplot(df['cost_of_the_order'], bins=30, kde=True, color="blue")
plt.title('Distribution of Cost of the Orders')
plt.xlabel('Cost of the Order ($)')
plt.ylabel('Frequency')
plt.show()

No description has been provided for this image

Observations:¶

Skewed Distribution: The distribution of the cost of orders is right-skewed, indicating that most of the orders are concentrated in the lower price range, with fewer orders as the price increases.
Common Price Range: The most common price range for orders appears to be between approximately $10 and $20. This suggests that most customers opt for moderately priced items.

In [52]:

# Creating a box plot for the cost of the orders
plt.figure(figsize=(8, 6))
sns.boxplot(x=df['cost_of_the_order'])
plt.title('Box Plot of the Cost of Orders')
plt.xlabel('Cost of the Order ($)')
plt.show()

No description has been provided for this image

Observations:¶

Central Tendency: The median cost of the orders is around $15-$20, suggesting that this is a typical amount customers spend per order.
Spread and Variability: The interquartile range (IQR), represented by the box, is relatively compact, indicating that the majority of orders cluster around the median price, within a moderate price range.

In [53]:

# Create a count plot for "rating"
plt.figure(figsize=(8, 5))
sns.countplot(x=df['rating'], hue=df['rating'], palette="viridis", legend=False,
              order=sorted(df['rating'].unique()))
plt.title("Count of Each Rating")
plt.xlabel("Rating")
plt.ylabel("Count")
plt.show()

No description has been provided for this image

Observations:¶

Overall, ratings are good, with most being 4 and above.

In [45]:

# Create a boxplot for "food_preparation_time"
plt.figure(figsize=(8, 5))
sns.boxplot(y=df['food_preparation_time'], color="green")
plt.title("Boxplot of Food Preparation Time")
plt.ylabel("Food Preparation Time (minutes)")
plt.show()

No description has been provided for this image

Observations:¶

Median and Central Tendency: The median food preparation time is centered around 25 minutes. This indicates that half of the food orders are prepared in less than 25 minutes, and the other half takes longer, showcasing a balance in preparation times across the dataset.
Interquartile Range (IQR): The IQR, depicted by the box, is relatively narrow, which suggests that the majority of food preparation times are consistent and cluster around the median. This consistency might be indicative of standardized processes within restaurants or similar types of dishes being ordered.
Variability: The whiskers, which extend to the lowest and highest typical preparation times, show a wider range than the IQR. This variability outside the central cluster can indicate differences in restaurant efficiency, menu complexity, or operational challenges during peak times.
Operational Insights: Restaurants and the food delivery service can use this data to identify and investigate the reasons behind unusually long or short preparation times. Addressing these could improve overall efficiency, reduce customer wait times, and enhance satisfaction.

In [65]:

# Creating a count plot for delivery time
plt.figure(figsize=(10, 6))
sns.countplot(x='delivery_time', data=df, hue='delivery_time', dodge=False, palette='viridis')
plt.title('Count of Orders by Delivery Time')
plt.xlabel('Delivery Time (minutes)')
plt.ylabel('Count of Orders')
plt.xticks(rotation=90)  # Rotating x-axis labels for better visibility
plt.legend().set_visible(False)  # Properly setting legend visibility
plt.show()

No description has been provided for this image
In [56]:

# Creating a box plot for delivery time
plt.figure(figsize=(8, 6))
sns.boxplot(x=df['delivery_time'])
plt.title('Box Plot of Delivery Time')
plt.xlabel('Delivery Time (minutes)')
plt.show()

No description has been provided for this image

Observations:¶

Median Delivery Time: The median delivery time is visually apparent in the plot, suggesting a typical delivery time that most orders adhere to. This median time provides a central benchmark for evaluating delivery efficiency.
Interquartile Range (IQR): The box, which encapsulates the middle 50% of delivery times, is relatively tight. This indicates that a majority of deliveries are consistent in duration, showing effective standardization and predictability in the delivery process.
Outliers: The plot reveals several outliers on the higher end, indicating deliveries that take significantly longer than usual. These outliers might be due to external factors such as traffic conditions, distance, or order complications.
Operational Insights: Identifying the reasons behind the longer delivery times could help in optimizing routes, improving delivery scheduling, and potentially selecting more reliable delivery methods or personnel. Similarly, analyzing why some deliveries are unusually quick could highlight best practices that could be replicated across the service.

In [49]:

# Create a count plot for "cuisine_type"
plt.figure(figsize=(10, 5))
sns.countplot(y=df['cuisine_type'], hue=df['cuisine_type'], palette="viridis", legend=False,
              order=df['cuisine_type'].value_counts().index)
plt.title("Count Plot of Cuisine Type")
plt.xlabel("Count")
plt.ylabel("Cuisine Type")
plt.show()

No description has been provided for this image

Observations:¶

Clearly the top cuisine types are American, Japanese and Italian.

In [50]:

# Create a count plot for "day_of_the_week"
plt.figure(figsize=(8, 5))
sns.countplot(x=df['day_of_the_week'], hue=df['day_of_the_week'], palette="magma", legend=False,
              order=df['day_of_the_week'].value_counts().index)
plt.title("Count Plot of Day of the Week")
plt.xlabel("Day of the Week")
plt.ylabel("Count")
plt.xticks(rotation=45)
plt.show()

No description has been provided for this image

Observations:¶

Clearly a lot more orders are placed during the weekend!

Question 7: Which are the top 5 restaurants in terms of the number of orders received? [1 mark]¶

In [66]:

# Write the code here
# Create a count plot for the top 5 restaurants
plt.figure(figsize=(10, 5))
sns.countplot(y=df['restaurant_name'], hue=df['restaurant_name'], palette="deep", legend=False,
              order=df['restaurant_name'].value_counts().index[:5])
plt.title("Top 5 Restaurants by Number of Orders")
plt.xlabel("Number of Orders")
plt.ylabel("Restaurant Name")
plt.show()

No description has been provided for this image

Observations:¶

The top 5 restaurants in terms of orders received are:

Shake Shack
The Meatball Shop
Blue Ribbon Sushi
Blue Ribbon Fried Chicken
Parm

Question 8: Which is the most popular cuisine on weekends? [1 mark]¶

In [67]:

# Write the code here
# Filter data for weekends (labeled as "Weekend" in the dataset)
weekend_data = df[df['day_of_the_week'] == 'Weekend']
# Create a count plot for cuisine types on weekends
# Create a count plot for cuisine types on weekends with hue assigned and ordered by popularity
plt.figure(figsize=(10, 5))
sns.countplot(y=weekend_data['cuisine_type'], hue=weekend_data['cuisine_type'],
              palette=sns.color_palette("magma", len(weekend_data['cuisine_type'].unique())),
              order=weekend_data['cuisine_type'].value_counts().index, dodge=False)
plt.title("Most Popular Cuisine on Weekends")
plt.xlabel("Number of Orders")
plt.ylabel("Cuisine Type")
plt.legend([],[], frameon=False)  # Hide legend
plt.show()
# Get the most popular cuisine type on weekends
popular_cuisine_weekends = weekend_data['cuisine_type'].value_counts().idxmax()
# Display the most popular cuisine type
popular_cuisine_weekends

No description has been provided for this image
Out[67]:

'American'

Observations:¶

Clearly the most popular cuisine during weekends is American.

Question 9: What percentage of the orders cost more than 20 dollars? [2 marks]¶

In [68]:

# Write the code here
# Calculate the percentage of orders that cost more than 20 dollars
total_orders = len(df)
orders_above_20 = len(df[df['cost_of_the_order'] > 20])
percentage_above_20 = (orders_above_20 / total_orders) * 100
percentage_above_20

Out[68]:

29.24130663856691

In [69]:

# Create a pie chart to visualize the percentage of orders above and below $20
plt.figure(figsize=(6, 6))
labels = ["Orders > $20", "Orders ≤ $20"]
sizes = [orders_above_20, total_orders - orders_above_20]
colors = ["#FF9999", "#66B2FF"]
plt.pie(sizes, labels=labels, autopct='%1.1f%%', colors=colors, startangle=90)
plt.title("Percentage of Orders Above and Below $20")
plt.show()

No description has been provided for this image

Observations:¶

About 29% of the orders cost more than 20 dollars. The rest (~71%) are below.

Question 10: What is the mean order delivery time? [1 mark]¶

In [70]:

# Write the code here
mean_delivery_time = df['delivery_time'].mean()
mean_delivery_time

Out[70]:

24.161749209694417

In [71]:

# Create a histogram with mean line to visualize delivery time distribution
plt.figure(figsize=(8, 5))
sns.histplot(df['delivery_time'], kde=True, bins=20, color="blue", alpha=0.7)
plt.axvline(mean_delivery_time, color='red', linestyle='dashed', linewidth=2, label=f'Mean: {mean_delivery_time:.2f} min')
plt.title("Distribution of Delivery Time with Mean Indicator")
plt.xlabel("Delivery Time (minutes)")
plt.ylabel("Frequency")
plt.legend()
plt.show()

No description has been provided for this image

Observations:¶

The mean delivery time is 24.16 min.

The peak of the histogram suggests that most delivery times are close to the mean.
This indicates a relatively consistent delivery time for most orders.

The histogram is right-skewed (longer tail on the right), which means that some orders take significantly longer than the average.

Question 11: The company has decided to give 20% discount vouchers to the top 3 most frequent customers. Find the IDs of these customers and the number of orders they placed. [1 mark]¶

In [72]:

# Write the code here
top_3_customers = df['customer_id'].value_counts().head(3)
top_3_customers

Out[72]:

	count
customer_id
52832	13
47440	10
83287	9

dtype: int64
In [73]:

# Create a bar chart for the top 3 most frequent customers
plt.figure(figsize=(8, 5))
sns.barplot(x=top_3_customers.index, y=top_3_customers.values,
            hue=top_3_customers.index, palette="coolwarm", legend=False)
plt.title("Top 3 Most Frequent Customers")
plt.xlabel("Customer ID")
plt.ylabel("Number of Orders")
plt.show()

No description has been provided for this image

Observations:¶

The above charts show the top 3 customers.

Multivariate Analysis¶

Question 12: Perform a multivariate analysis to explore relationships between the important variables in the dataset. (It is a good idea to explore relations between numerical variables as well as relations between numerical and categorical variables) [10 marks]¶

In [79]:

# Calculating total time from order to delivery by summing food preparation time and delivery time
df['total_time_from_order_to_delivery'] = df['food_preparation_time'] + df['delivery_time']
# Creating a heatmap to visualize the correlation between total time, cost of the order, and rating
heatmap_data = df[['total_time_from_order_to_delivery', 'cost_of_the_order', 'rating']]
correlation = heatmap_data.corr()
plt.figure(figsize=(8, 6))
sns.heatmap(correlation, annot=True, cmap='coolwarm', fmt=".2f", linewidths=.5)
plt.title('Correlation Heatmap between Total Time, Cost, and Rating')
plt.show()

No description has been provided for this image

Observations:¶

Total Time vs. Rating: The correlation between total time from order to delivery and rating is very low, as indicated by a coefficient close to zero. This reinforces the observation from the scatter plot that total time does not significantly influence the ratings directly.
Cost vs. Rating: Similarly, the correlation between cost of the order and rating is also very low. This suggests that higher costs are not directly associated with higher customer satisfaction (or dissatisfaction), at least not linearly.
Cost vs. Total Time: There is a negligible correlation between the cost of the order and the total time from order to delivery. This indicates that longer preparation and delivery times are not necessarily associated with higher costs.

In [84]:

# 1. Cuisine Type Analysis - Comparing average delivery and preparation times across cuisines
# Aggregating average times per cuisine
cuisine_times = df.groupby('cuisine_type')[['food_preparation_time', 'delivery_time']].mean().reset_index()
# Limiting to top 20 cuisines to prevent memory overload
top_cuisines = df['cuisine_type'].value_counts().index[:20]
cuisine_times_filtered = cuisine_times[cuisine_times['cuisine_type'].isin(top_cuisines)]
# Plotting average food preparation time by cuisine type
plt.figure(figsize=(10, 6))
sns.barplot(x='cuisine_type', y='food_preparation_time', hue='cuisine_type', data=cuisine_times_filtered, palette='viridis', dodge=False)
plt.title('Average Food Preparation Time by Cuisine Type (Top 20)')
plt.xlabel('Cuisine Type')
plt.ylabel('Average Food Preparation Time (minutes)')
plt.xticks(rotation=90)
plt.show()
# Plotting average delivery time by cuisine type
plt.figure(figsize=(10, 6))
sns.barplot(x='cuisine_type', y='delivery_time', hue='cuisine_type', data=cuisine_times_filtered, palette='magma', dodge=False)
plt.title('Average Delivery Time by Cuisine Type (Top 20)')
plt.xlabel('Cuisine Type')
plt.ylabel('Average Delivery Time (minutes)')
plt.xticks(rotation=90)
plt.show()

No description has been provided for this image

Observations:¶

Food Preparation Time by Cuisine Type:

Certain cuisines take significantly longer to prepare than others.
The variation in preparation time could be due to differences in dish complexity, cooking techniques, or restaurant efficiency.
Delivery Time by Cuisine Type:
Delivery times also vary across cuisines, which may be influenced by restaurant locations, demand levels, or how well the food travels.
Some cuisines might be more frequently ordered from farther distances, leading to longer delivery times.

In [88]:

# 2. Restaurant Performance Analysis - Comparing average ratings, delivery times, and preparation times across restaurants
# Aggregating performance metrics per restaurant
restaurant_performance = df.groupby('restaurant_name')[['rating', 'food_preparation_time', 'delivery_time']].mean().reset_index()
# Limiting to top 20 most popular restaurants to prevent memory overload
top_restaurants = df['restaurant_name'].value_counts().index[:20]
restaurant_performance_filtered = restaurant_performance[restaurant_performance['restaurant_name'].isin(top_restaurants)]
# Plotting average ratings by restaurant
plt.figure(figsize=(14, 6))
sns.barplot(x='restaurant_name', y='rating', hue='restaurant_name', data=restaurant_performance_filtered, palette='coolwarm', dodge=False)
plt.title('Average Customer Rating by Restaurant (Top 20)')
plt.xlabel('Restaurant Name')
plt.ylabel('Average Rating')
plt.xticks(rotation=90)
plt.show()
# Plotting average food preparation time by restaurant
plt.figure(figsize=(14, 6))
sns.barplot(x='restaurant_name', y='food_preparation_time', hue='restaurant_name', data=restaurant_performance_filtered, palette='viridis', dodge=False)
plt.title('Average Food Preparation Time by Restaurant (Top 20)')
plt.xlabel('Restaurant Name')
plt.ylabel('Average Food Preparation Time (minutes)')
plt.xticks(rotation=90)
plt.show()
# Plotting average delivery time by restaurant
plt.figure(figsize=(14, 6))
sns.barplot(x='restaurant_name', y='delivery_time', hue='restaurant_name', data=restaurant_performance_filtered, palette='magma', dodge=False)
plt.title('Average Delivery Time by Restaurant (Top 20)')
plt.xlabel('Restaurant Name')
plt.ylabel('Average Delivery Time (minutes)')
plt.xticks(rotation=90)
plt.show()

No description has been provided for this image

Observations¶

Average Ratings by Restaurant:

Some restaurants consistently receive higher ratings than others, indicating better customer satisfaction.
The variation in ratings may be influenced by factors such as food quality, service efficiency, and pricing.
Average Food Preparation Time by Restaurant:
Certain restaurants take significantly longer to prepare food than others.
This could be due to the type of cuisine, restaurant efficiency, or the complexity of menu items.
Average Delivery Time by Restaurant:
Some restaurants have longer delivery times on average, which could be due to location, demand, or delivery efficiency.
Identifying which restaurants have consistently longer delivery times may help FoodHub optimize delivery logistics.

In [92]:

# 3. Day of the Week Analysis - Comparing ratings, preparation times, and delivery times with hue set to x
# Aggregating performance metrics per day of the week
day_performance = df.groupby('day_of_the_week')[['rating', 'food_preparation_time', 'delivery_time']].mean().reset_index()
# Plotting average ratings by day of the week
plt.figure(figsize=(10, 6))
sns.barplot(x='day_of_the_week', y='rating', hue='day_of_the_week', data=day_performance, palette='coolwarm', dodge=False)
plt.title('Average Customer Rating by Day of the Week')
plt.xlabel('Day of the Week')
plt.ylabel('Average Rating')
plt.show()
# Plotting average food preparation time by day of the week
plt.figure(figsize=(10, 6))
sns.barplot(x='day_of_the_week', y='food_preparation_time', hue='day_of_the_week', data=day_performance, palette='viridis', dodge=False)
plt.title('Average Food Preparation Time by Day of the Week')
plt.xlabel('Day of the Week')
plt.ylabel('Average Food Preparation Time (minutes)')
plt.show()
# Plotting average delivery time by day of the week
plt.figure(figsize=(10, 6))
sns.barplot(x='day_of_the_week', y='delivery_time', hue='day_of_the_week', data=day_performance, palette='magma', dodge=False)
plt.title('Average Delivery Time by Day of the Week')
plt.xlabel('Day of the Week')
plt.ylabel('Average Delivery Time (minutes)')
plt.show()

No description has been provided for this image

Observations:¶

Average Ratings by Day of the Week:

Ratings remain relatively consistent across weekdays and weekends, indicating stable customer satisfaction regardless of the day.
Average Food Preparation Time by Day of the Week:
There is a slight increase in preparation times on weekends, suggesting a higher volume of orders or operational constraints.
Restaurants might experience increased kitchen workloads on weekends, leading to slightly slower preparation.
Average Delivery Time by Day of the Week:
Delivery times show a similar pattern, with weekends experiencing slightly longer durations.
Higher weekend demand could be affecting overall delivery efficiency.

Question 13: The company wants to provide a promotional offer in the advertisement of the restaurants. The condition to get the offer is that the restaurants must have a rating count of more than 50 and the average rating should be greater than 4. Find the restaurants fulfilling the criteria to get the promotional offer. [3 marks]¶

In [93]:

# Write the code here
# Group by restaurant and calculate rating count and average rating
restaurant_ratings = df.groupby('restaurant_name')['rating'].agg(['count', 'mean'])
# Filter restaurants meeting the criteria
qualified_restaurants = restaurant_ratings[(restaurant_ratings['count'] > 50) & (restaurant_ratings['mean'] > 4)]
qualified_restaurants

Out[93]:

	count	mean
restaurant_name
Blue Ribbon Fried Chicken	96	4.218750
Blue Ribbon Sushi	119	4.134454
Parm	68	4.073529
RedFarm Broadway	59	4.169492
RedFarm Hudson	55	4.109091
Shake Shack	219	4.168950
The Meatball Shop	132	4.689394

In [96]:

# Group by restaurant and calculate rating count and average rating
restaurant_ratings = df.groupby('restaurant_name')['rating'].agg(['count', 'mean'])
# Filter restaurants meeting the criteria (more than 50 ratings and average rating above 4)
qualified_restaurants = restaurant_ratings[(restaurant_ratings['count'] > 50) & (restaurant_ratings['mean'] > 4)]
# Sort the qualified restaurants by average rating in descending order
qualified_restaurants_sorted = qualified_restaurants.sort_values(by='mean', ascending=False).reset_index()
# Create a bar chart with sorted restaurants and assign the y variable to hue
plt.figure(figsize=(10, 5))
sns.barplot(y='restaurant_name', x='mean', hue='restaurant_name', data=qualified_restaurants_sorted, palette="coolwarm", dodge=False)
plt.title("Restaurants Eligible for Promotional Offer (Sorted by Avg Rating)")
plt.xlabel("Average Rating")
plt.ylabel("Restaurant Name")
plt.xlim(4, 5)  # Set x-axis range to focus on valid rating values
plt.show()

No description has been provided for this image

Observations:¶

This bar chart displays restaurants that qualify for a promotional offer based on:
Having more than 50 ratings (ensuring enough data points for reliability).
Maintaining an average rating above 4.0.
Restaurants are sorted in descending order of average rating, highlighting the best-rated ones.
The hue is assigned to the restaurant name, ensuring distinct colors for each.

Question 14: The company charges the restaurant 25% on the orders having cost greater than 20 dollars and 15% on the orders having cost greater than 5 dollars. Find the net revenue generated by the company across all orders. [3 marks]¶

In [97]:

# Write the code here
# Define commission rates based on order cost
def calculate_commission(cost):
    if cost > 20:
        return cost * 0.25  # 25% commission for orders above $20
    elif cost > 5:
        return cost * 0.15  # 15% commission for orders above $5
    else:
        return 0  # No commission for orders $5 or below
# Apply the commission calculation to each order
df['commission'] = df['cost_of_the_order'].apply(calculate_commission)
# Calculate total revenue generated by the company
total_revenue = df['commission'].sum()
total_revenue

Out[97]:

6166.303

Observations:¶

Net Revenue Generated by the Company:
The total revenue generated by the company from commissions across all orders is $6,166.30.

Question 15: The company wants to analyze the total time required to deliver the food. What percentage of orders take more than 60 minutes to get delivered from the time the order is placed? (The food has to be prepared and then delivered.) [2 marks]¶

In [100]:

# Write the code here
# Calculate total time from order placement to delivery
df['total_time_from_order_to_delivery'] = df['food_preparation_time'] + df['delivery_time']
# Count total number of orders
total_orders = len(df)
# Count orders that took more than 60 minutes
orders_above_60_min = df[df['total_time_from_order_to_delivery'] > 60].shape[0]
# Calculate percentage of orders taking more than 60 minutes
percentage_above_60_min = (orders_above_60_min / total_orders) * 100
round(percentage_above_60_min, 2)

Out[100]:

10.54

Observations:¶

10.54% of orders take more than 60 minutes from order placement to delivery.

Question 16: The company wants to analyze the delivery time of the orders on weekdays and weekends. How does the mean delivery time vary during weekdays and weekends? [2 marks]¶

In [102]:

# Write the code here
# Calculate the mean delivery time for weekdays and weekends
delivery_time_analysis = df.groupby('day_of_the_week')['delivery_time'].mean().reset_index()
# Plotting the variation in mean delivery time
plt.figure(figsize=(10, 6))
sns.barplot(x='day_of_the_week', y='delivery_time', data=delivery_time_analysis, palette='magma', hue='day_of_the_week', dodge=False)
plt.title('Mean Delivery Time on Weekdays vs. Weekends')
plt.xlabel('Day of the Week')
plt.ylabel('Mean Delivery Time (minutes)')
plt.show()
# Display mean delivery times for weekdays and weekends
delivery_time_analysis

No description has been provided for this image
Out[102]:

	day_of_the_week	delivery_time
0	Weekday	28.340037
1	Weekend	22.470022

Observations:¶

The mean delivery time is lower on weekends than on weekdays.

Possible reasons:
Lower traffic congestion on weekends, leading to faster deliveries.
More efficient delivery management due to higher expected demand on weekends.
Fewer business district orders (which might take longer due to distance or traffic).

Conclusion and Recommendations¶

Conclusions:¶

My analysis of FoodHub’s order and delivery data provides valuable insights into restaurant performance, customer satisfaction, and operational efficiency. Key findings include:

Delivery & Preparation Time Insights

10.54% of orders take more than 60 minutes from order placement to delivery, which could negatively impact customer satisfaction.
Mean delivery time is longer on weekdays (28.34 min) than on weekends (22.47 min), suggesting that traffic congestion or restaurant availability might be factors affecting delivery speed.
Some cuisine types and restaurants have significantly longer preparation times, which contributes to overall delays.

Restaurant & Cuisine Performance

Certain restaurants consistently receive high ratings (above 4.0) with more than 50 orders, making them strong candidates for promotional offers.
Some cuisines have higher preparation times than others, affecting overall delivery efficiency.
The most profitable orders come from customers who spend more than $20, as they generate the highest commissions for FoodHub.

Customer Satisfaction & Revenue

Customer ratings are not strongly correlated with delivery time, suggesting that food quality and overall experience play a bigger role in satisfaction.
The company generated 6,166.30 dollars in commission revenue, with most revenue coming from orders above $20.
Certain restaurants consistently take longer to prepare food, which might contribute to lower ratings.

Recommendations:¶

Improve Delivery & Operational Efficiency

Optimize restaurant partnerships: Identify restaurants with consistent delays and work with them to improve preparation efficiency.
Prioritize high-performing restaurants: Promote and feature restaurants with high ratings and good efficiency to improve customer satisfaction.
Reduce weekday delivery delays: Explore better delivery scheduling and traffic-optimized routing for faster weekday deliveries.

Enhance Customer Satisfaction & Engagement

Encourage high-rated restaurants: Offer incentives or priority placement in the app for restaurants maintaining an average rating above 4.0.
Improve customer communication: Notify customers in real-time if orders are expected to take longer than 60 minutes to manage expectations.
Introduce loyalty rewards: Offer discounts or free delivery for customers who frequently order from top-rated restaurants.

Revenue Optimization Strategies

Increase commissions on premium orders: Since most revenue comes from orders above $20, consider adjusting pricing models to encourage higher-value purchases.
Feature premium & high-performing cuisines: If certain cuisines generate higher-value orders, highlight them in the app for better visibility.
Reduce inefficiencies in long-prep-time cuisines: Work with restaurants offering slow-prep cuisines to improve kitchen efficiency and reduce preparation times.

February 21, 2025 0 comment

Artificial Intelligence Development Posts

Navigating the AI Landscape

by Mario Mamalis August 10, 2023

written by Mario Mamalis

Artificial Intelligence (AI) is a transformative technology that has the potential to revolutionize our lives.

We all know by now the enormous impact AI will have on our society. Some of us are excited and optimistic about the unlocked potential, and new capabilities we will realize with the proper implementation and absorption of AI into our society. Others are much less optimistic and focus only on the risks.

I am more of an optimist, so my view is that while AI presents many challenges, with foresight, planning, and collaborative effort, society can navigate these changes in a manner that not only safeguards but enhances human lives. The future with AI doesn’t have to be a zero-sum game between machines and humans; it can be a symbiotic relationship where each amplifies the other’s strengths.

In this post, I will focus on the reality of AI and what is presently available to us. In future posts I will dive deeper into specific AI applications.

Current Breakthroughs

In essence, AI involves developing software that mirrors human actions and skills. Some of the areas where we have seen tangible benefits are:

1. Machine Learning

Often the backbone of AI, it’s the method we use to instruct computer models to make inferences and predictions from data.

Simply put, machines they learn from data. Our daily activities result in the production of enormous amounts of data. Whether it’s the text messages, emails, or social media updates we share, or the photos and videos we capture with our smartphones, we’re constantly churning out vast quantities of information. Beyond that, countless sensors in our homes, vehicles, urban environments, public transportation systems, and industrial zones create even more data.

Data experts harness this immense amount of information to educate machine learning models. These models can then draw predictions and conclusions based on the patterns and associations identified within the data.

Real World Example: Machine Learning for Predicting Rainfall Patterns

Data Collection: Data scientists gather years of meteorological data, which includes variables like temperature, humidity, wind speed, air pressure, and past rainfall measurements from various weather stations and satellites.
Feature Engineering: Not all collected data might be relevant. Hence, it’s important to identify which features (or combinations of features) are the most indicative of an impending rainfall event.
Training the Model: With the relevant features identified, a machine learning model, like a neural network or a decision tree, is trained on a portion of the collected data. The model learns the relationships between the features and the outcomes (e.g., whether it rained the next day).
Validation and Testing: Once trained, the model is tested on a different subset of the data (which it hasn’t seen before) to verify its accuracy in predicting rainfall.
Real-time Predictions: Once the model is adequately trained and validated, it can be used in real-time. For instance, if sensors detect a specific combination of temperature, humidity, and pressure on a particular day, the model might predict a 90% chance of rainfall the next day in a certain region.
Continuous Learning: Weather is dynamic, and patterns may evolve over time due to various reasons, including climate change. Machine learning models can be set up for continuous learning. This means that as new data comes in, the model refines and updates its understanding, ensuring predictions remain accurate.

By utilizing ML in this way, meteorologists can offer more precise and timely warnings about rainfall, helping farmers plan their crops, cities manage potential flooding, and people plan their activities.

2. Anomaly Detection

Anomaly detection, often termed outlier detection, refers to the identification of items, events, or observations that do not conform to the expected pattern in a dataset. In the context of AI, it’s the use of algorithms and models to identify unusual patterns that do not align with expected behavior.

Real World Example: Anomaly Detection in F1 Gearbox Systems

Data Collection: Modern F1 cars are equipped with thousands of sensors that continuously monitor various aspects of the car’s performance, from engine metrics to tire conditions. For the gearbox, these sensors can track parameters like temperature, RPM, gear engagement speed, and vibrations.
Baseline Creation: Data from hundreds of laps is used to establish a ‘baseline’ or ‘normal’ behavior of the gearbox under various conditions – straights, tight turns, heavy acceleration, or deceleration.
Real-time Monitoring: During a race or a practice session, the gearbox’s performance metrics are continuously compared to this baseline. Any deviation from the baseline, be it a sudden temperature spike or unexpected vibration, can be flagged instantly.
Anomaly Detection: Advanced algorithms process this data in real-time to detect anomalies. For instance, if a gearbox typically operates at a specific temperature range during a certain track segment but suddenly registers a temperature that’s significantly higher or lower, the system flags this as an anomaly.
Immediate Action: Once an anomaly is detected, the team receives instant alerts. Depending on the severity and type of anomaly, different actions can be taken. It could range from sending a warning to the driver, planning a pit stop to address the issue, or, in critical situations, advising the driver to retire the car to avoid catastrophic failure or danger.
Post-Race Analysis: After the race, data engineers and technicians can delve deeper into the anomaly data to understand its root cause, ensuring that such issues can be preemptively addressed in future races.

This approach of anomaly detection in F1 not only ensures the optimal performance of the car but also significantly enhances driver safety. An unforeseen failure at the high speeds at which F1 cars operate can be catastrophic, making the quick detection and mitigation of potential issues a top priority.

3. Computer Vision

Computer vision systems utilize machine learning models designed to process visual data from sources like cameras, videos, or pictures. The following are common computer vision tasks:

3.1 Image classification

This refers to the task of assigning a label to an image based on its visual content. Essentially, it’s about categorizing what the image represents. For instance, given a picture, an image classification system might categorize it as a “cat”, “dog”, “car”, etc. This is typically achieved using deep learning models. The primary objective is to identify the main subject or theme of the image from a predefined set of categories.

3.2 Object detection

This is the process of identifying and locating specific objects within an image or video. Unlike image classification, which assigns a single label to the entire picture, object detection can recognize multiple items in the image and provide bounding boxes around each identified object. Commonly used in tasks like autonomous driving, surveillance, and image retrieval, it often employs deep learning models, to both classify and spatially locate objects within the visual frame.

3.3 Semantic segmentation

This task involves dividing an image into segments where each segment corresponds to a specific object or class category. Instead of just identifying that an object is present (as in object detection) or classifying the image (as in image classification), semantic segmentation classifies each pixel of the image. As a result, it provides a detailed, pixel-level labeling, highlighting the specific regions in an image where each object or class is located. Common applications include self-driving cars (to understand road scenes) and medical imaging (to identify regions of interest).

3.4 Image analysis

This refers to the process of inspecting and interpreting visual data to derive meaningful insights. It involves various techniques that evaluate the features, patterns, and structures within images. By transforming visual content into actionable data, image analysis can be applied across diverse fields, from medical diagnostics to satellite imagery interpretation. Its goal is often to categorize, quantify, or enhance the visual data for further understanding or application.

3.5 Face detection

This is the task of identifying and locating faces within an image or video frame. It determines the presence and location of faces. Typically, face detection algorithms focus on unique facial features such as eyes, nose, and mouth to differentiate faces from other objects in the image. This technology is foundational for applications like facial recognition, camera autofocus, and various security and social media applications.

4. Optical Character Recognition (OCR)

This is a technology that converts different types of documents, such as scanned paper documents, PDF files, or images captured by a digital camera, into editable and searchable data. By recognizing the characters present in the visual data, OCR enables the transformation of static, image-based content into dynamic text that can be edited, formatted, indexed, or searched. It’s commonly used in data entry automation, digitizing printed books, and extracting information from images.

5. Natural Language Processing

Natural language processing (NLP) is a subfield of AI focused on developing software capable of comprehending and generating human language, whether written or spoken.

With NLP, it’s possible to develop applications that can:

Examine and deduce meaning from text in documents, emails, and other mediums.
Recognize spoken words and produce spoken feedback.
Instantly convert phrases between languages, whether they’re spoken or written.
Understand instructions and decide on the relevant responses.

Real World Example: Customer Service Chatbots in E-commerce Websites

Problem Statement

Online retailers often have a vast number of customers visiting their websites, many of whom have queries about products, services, shipping, returns, etc. Addressing these in real-time with human agents for each customer can be costly and time-consuming.

NLP Solution

E-commerce platforms deploy chatbots equipped with NLP capabilities. When a customer types in a query, such as “What is the return policy for electronics?”, the NLP system in the chatbot interprets the question’s intent.

Functionality

Tokenization: Breaks the input text into individual words or tokens.
Intent Recognition: Understands the main purpose of the user’s message, i.e., getting information about the return policy for electronics.
Entity Recognition: Identifies key components in the text, e.g., “electronics” as the product category.
Response Generation: Based on the identified intent and entities, the chatbot retrieves the relevant information from its database (in this case, the return policy for electronics) and crafts a coherent response.
Feedback Loop: If the chatbot’s answer is not satisfactory, the user’s feedback can be utilized to train and improve the NLP model, making the chatbot more efficient over time.

Benefits

24/7 Customer Support: The chatbot can operate round the clock, ensuring customers from different time zones get real-time assistance.
Cost Efficiency: Reduces the need for a large customer service team.
Consistency: Provides uniform information to all customers.
This application of NLP has revolutionized the way businesses interact with their customers online, offering quick, consistent, and efficient responses.

6. Knowledge Mining

Knowledge mining involves extracting valuable insights, patterns, and knowledge from vast and often unstructured data sources. It combines techniques from data mining, machine learning, and big data analytics to transform raw information into a structured and understandable format. The goal is to discover hidden relationships, trends, and patterns that can inform decision-making, drive innovation, and provide a deeper understanding of complex subjects. Knowledge mining is particularly valuable in areas with huge datasets, like research, healthcare, and business analytics, where it aids in converting vast data into actionable intelligence.

Risks and Challenges

Artificial Intelligence holds immense potential to bring positive change to our world, but its use demands careful oversight and ethical considerations. Here are some potential shortcomings:

Bias influencing outcomes: For example, a lending model shows discrimination towards a particular gender due to skewed training data.
Unintended harm from errors: For example, a self-driving car has a system malfunction, leading to an accident.
Potential data breaches: For example, a bot designed for medical diagnoses uses confidential patient records stored without adequate security.
Inclusive design shortcomings: For example, a smart home device fails to offer audio feedback, leaving visually challenged users unsupported.
Need for transparency and trust: For example, a finance AI tool suggests investment strategies; but how does it determine them?
Accountability for AI decisions: For example a faulty facial recognition system results in a wrongful conviction; who is held accountable?

Social Implications

The impact of AI on jobs and the skills landscape is profound, complex, and multifaceted. This seems to be one of the biggest fears people have about AI. Let’s delve deeper into this.

Job Displacement: Repetitive, manual, and rule-based tasks are more prone to automation. This can impact sectors like manufacturing, customer service, and basic data entry roles.
Job Creation: Historically, technological advancements have given rise to new jobs. Similarly, AI will create new roles that we might not even be able to envision now. Positions in AI ethics, AI system training, and AI system maintenance are examples of new job avenues.
Job Transformation: Some jobs won’t disappear but will transform. For instance, radiologists might spend less time analyzing X-rays (as AI can do that) and more time consulting with patients or other doctors based on AI’s findings.
Technical Skills: There will be an increased demand for individuals who understand AI, data science, machine learning, and related technologies.
Soft Skills: Emotional intelligence, creativity, critical thinking, and complex problem-solving will become even more invaluable. As AI systems handle more data-oriented tasks, uniquely human traits will become more prominent in the job market.
Adaptability: The pace of change means that the ability to learn and adapt is crucial. Lifelong learning and the readiness to acquire new skills will be vital.
Interdisciplinary Knowledge: Combining AI with domain-specific knowledge, whether it’s in arts, medicine, or finance, can lead to groundbreaking applications.

Ideas to Address Negative Impact

Education & Training: Governments and private institutions need to focus on retraining programs to help the workforce transition. This includes updating educational curricula to reflect the new skills demand and offering adult education initiatives focused on AI and technology.
Safety Nets: Support for those who lose jobs due to automation is vital. This could be in the form of unemployment benefits, retraining programs, or even discussions around universal basic income.
Ethical Considerations: Businesses should be encouraged to deploy AI responsibly, understanding its societal impact, and not just the bottom line. Ethical guidelines for AI application can help.
Inclusive Development: AI tools should be developed with input from a diverse group to ensure they address a broad range of needs and avoid built-in biases.
Local Solutions: AI’s impact might differ based on the region, economy, and culture. Tailored local strategies can better address specific challenges and opportunities.

Responsible AI – The Six Principles

Artificial Intelligence is not just a tool; it has become an integral part of our daily lives, reshaping industries and altering the fabric of society. With its increasing influence comes a pressing need for Responsible AI. But what exactly does this mean?

Responsible AI encompasses the practice of designing, developing, deploying, and managing AI in a manner that is transparent, ethical, and aligned with societal values and norms. It’s about ensuring that as AI systems make decisions, they do so in ways that are understandable, fair, and beneficial, while actively mitigating unintended consequences and harms.

Fair

AI systems ought to ensure equal treatment for everyone. Let’s say you design a machine learning model for a home loan approval process. The model’s predictions on loan approvals or rejections should be unbiased. It’s crucial that the model doesn’t favor or discriminate against groups based on gender, ethnicity, or any other criteria that could unjustly benefit or hinder specific applicant groups.

Safe and Reliable

AI systems must function with both precision and security. Imagine an AI-infused drone system for package deliveries or a machine learning algorithm assisting in air traffic control. Inaccuracies in these systems can have profound consequences, potentially jeopardizing safety.

It’s essential that AI-based software undergo meticulous testing and stringent deployment protocols to guarantee their reliability before they’re introduced to real-world scenarios.

Secure

AI systems ought to prioritize security and uphold privacy standards. AI systems, particularly their underlying machine learning models, draw upon vast data sets that might encompass sensitive personal information. The obligation to protect privacy doesn’t end once the models are developed and operational. As these systems continually utilize fresh data for predictions or decisions, both the data itself and the resultant choices can have associated privacy and security implications.

Inclusive

AI systems should be inclusive and resonate with all individuals. It’s vital that the benefits of AI extend across all societal divisions, be it physical abilities, gender, sexual orientation, ethnicity, or any other characteristics.

For example: An AI-driven voice recognition software shouldn’t just understand accents from major world languages but should also effectively recognize and interpret dialects and variations, ensuring people from remote regions or minority linguistic groups aren’t left out.

Transparent

AI systems should be transparent and comprehensible. Users ought to be well-informed about the system’s intent, its operational mechanisms, and any potential constraints.

For example: If a health app uses AI to assess the likelihood of a certain medical condition based on input symptoms, users should be informed about the sources of its medical data and the accuracy rate of its predictions.

Responsible

Responsibility for AI systems rests with their creators. Those designing and implementing AI solutions should adhere to a well-defined set of ethical and legal rules, ensuring the technology conforms to established standards.

For example: If a company designs an AI tool for recruitment, the architects should ensure it adheres to employment laws and anti-discrimination guidelines. If the tool inadvertently favors a particular age group or ethnicity, the creators must rectify the issue and ensure fairness in the recruitment process.

Final Thoughts

Artificial Intelligence presents transformative solutions to many challenges. AI systems possess the capacity to emulate human behaviors, interpret their environment, and take actions that were once thought of as science fiction.

However, such profound capabilities also carry significant responsibilities. As architects of AI innovations, we have a duty to ensure these technologies benefit the masses without unintentionally disadvantaging any individual or community.

August 10, 2023 1 comment

Artificial Intelligence

Data Science: Food Hub Data Analysis

Project Foundations for Data Science: FoodHub Data Analysis¶

Context¶

Objective¶

Data Description¶

Data Dictionary¶

Let us start by importing the required libraries¶

Understanding the structure of the data¶

.dataframe tbody tr th:only-of-type {<br /> vertical-align: middle;<br /> }<br /> .dataframe tbody tr th {<br /> vertical-align: top;<br /> }<br /> .dataframe thead th {<br /> text-align: right;<br /> }<br />

Observations:¶

Question 1: How many rows and columns are present in the data? [0.5 mark]¶

Observations:¶

Question 2: What are the datatypes of the different columns in the dataset? (The info() function can be used) [0.5 mark]¶

Observations:¶

Question 3: Are there any missing values in the data? If yes, treat them using an appropriate method. [1 mark]¶

.dataframe tbody tr th:only-of-type {<br /> vertical-align: middle;<br /> }<br /> .dataframe tbody tr th {<br /> vertical-align: top;<br /> }<br /> .dataframe thead th {<br /> text-align: right;<br /> }<br />

Observations:¶

Question 4: Check the statistical summary of the data. What is the minimum, average, and maximum time it takes for food to be prepared once an order is placed? [2 marks]¶

.dataframe tbody tr th:only-of-type {<br /> vertical-align: middle;<br /> }<br /> .dataframe tbody tr th {<br /> vertical-align: top;<br /> }<br /> .dataframe thead th {<br /> text-align: right;<br /> }<br />

.dataframe tbody tr th:only-of-type {<br /> vertical-align: middle;<br /> }<br /> .dataframe tbody tr th {<br /> vertical-align: top;<br /> }<br /> .dataframe thead th {<br /> text-align: right;<br /> }<br />

.dataframe tbody tr th:only-of-type {<br /> vertical-align: middle;<br /> }<br /> .dataframe tbody tr th {<br /> vertical-align: top;<br /> }<br /> .dataframe thead th {<br /> text-align: right;<br /> }<br />

Observations:¶

Question 5: How many orders are not rated? [1 mark]¶

Observations:¶

.dataframe tbody tr th:only-of-type {<br /> vertical-align: middle;<br /> }<br /> .dataframe tbody tr th {<br /> vertical-align: top;<br /> }<br /> .dataframe thead th {<br /> text-align: right;<br /> }<br />

Observations:¶

Exploratory Data Analysis (EDA)¶

Univariate Analysis¶

Question 6: Explore all the variables and provide observations on their distributions. (Generally, histograms, boxplots, countplots, etc. are used for univariate exploration.) [9 marks]¶

.dataframe tbody tr th:only-of-type {<br /> vertical-align: middle;<br /> }<br /> .dataframe tbody tr th {<br /> vertical-align: top;<br /> }<br /> .dataframe thead th {<br /> text-align: right;<br /> }<br />

Observations:¶

Observations:¶

Observations:¶

Observations:¶

Observations:¶

Observations:¶

Observations:¶

Question 7: Which are the top 5 restaurants in terms of the number of orders received? [1 mark]¶

Observations:¶

Question 8: Which is the most popular cuisine on weekends? [1 mark]¶

Observations:¶

Question 9: What percentage of the orders cost more than 20 dollars? [2 marks]¶

Observations:¶

Question 10: What is the mean order delivery time? [1 mark]¶

Observations:¶

Question 11: The company has decided to give 20% discount vouchers to the top 3 most frequent customers. Find the IDs of these customers and the number of orders they placed. [1 mark]¶

.dataframe tbody tr th:only-of-type {<br /> vertical-align: middle;<br /> }<br /> .dataframe tbody tr th {<br /> vertical-align: top;<br /> }<br /> .dataframe thead th {<br /> text-align: right;<br /> }<br />

Observations:¶

Multivariate Analysis¶

Question 12: Perform a multivariate analysis to explore relationships between the important variables in the dataset. (It is a good idea to explore relations between numerical variables as well as relations between numerical and categorical variables) [10 marks]¶

Observations:¶

Observations:¶

Observations¶

Observations:¶

.dataframe tbody tr th:only-of-type {<br /> vertical-align: middle;<br /> }<br /> .dataframe tbody tr th {<br /> vertical-align: top;<br /> }<br /> .dataframe thead th {<br /> text-align: right;<br /> }<br />

Observations:¶

Question 14: The company charges the restaurant 25% on the orders having cost greater than 20 dollars and 15% on the orders having cost greater than 5 dollars. Find the net revenue generated by the company across all orders. [3 marks]¶

Observations:¶

Question 15: The company wants to analyze the total time required to deliver the food. What percentage of orders take more than 60 minutes to get delivered from the time the order is placed? (The food has to be prepared and then delivered.) [2 marks]¶

Observations:¶

Question 16: The company wants to analyze the delivery time of the orders on weekdays and weekends. How does the mean delivery time vary during weekdays and weekends? [2 marks]¶

.dataframe tbody tr th:only-of-type {<br /> vertical-align: middle;<br /> }<br /> .dataframe tbody tr th {<br /> vertical-align: top;<br /> }<br /> .dataframe thead th {<br /> text-align: right;<br /> }<br />

Observations:¶

Conclusion and Recommendations¶

Question 17: What are your conclusions from the analysis? What recommendations would you like to share to help improve the business? (You can use cuisine type and feedback ratings to drive your business recommendations.) [6 marks]¶

Conclusions:¶

Recommendations:¶

Navigating the AI Landscape

Current Breakthroughs

1. Machine Learning

Real World Example: Machine Learning for Predicting Rainfall Patterns

2. Anomaly Detection

Real World Example: Anomaly Detection in F1 Gearbox Systems

3. Computer Vision

3.1 Image classification

3.2 Object detection

3.3 Semantic segmentation

3.4 Image analysis

3.5 Face detection