Pressure’s off the Menu! Enjoyable Plating for Healthy Living

Published: March 13, 2023

EXECUTIVE SUMMARY

High blood pressure, also known as hypertension, is a common chronic condition affecting millions of people worldwide. It is a major risk factor for cardiovascular disease, stroke, and kidney disease, among other health problems. Diet plays a crucial role in managing hypertension, and people with high blood pressure often struggle to find healthy and tasty recipes that meet their nutritional needs. [1] While there are many recipe websites and software applications available, they do not always provide tailored recommendations for people with specific health conditions such as hypertension. Moreover, some recipes labeled as healthy may not be appropriate for people with high blood pressure due to their sodium content or other ingredients that can exacerbate the condition.

This project focused on helping hypertensive consumers overcome the challenge of adhering to dietary recommendations. As the prevalence of this chronic disease continues to rise globally, proper diet management is crucial. However, patients often struggle to cook healthy and tasty meals that fit their nutritional limitations. To address this problem, our team developed a personalized meal recommender system that takes into account consumers’ nutritional requirements, taste preferences, cooking skills, and other relevant factors. The goal is to enhance the user experience of hypertensive consumers and empower them to make informed and enjoyable food choices.

Recipes, User Ratings, and nutritional values were scraped and extracted from allrecipes.com[2]. Post pre-processing, the recipes, which includes selected ingredients and its corresponding nutritional values, were then vectorized for Term Frequency-Inverse Document Frequency (TF-IDF). Singular Value Decomposition (SVD) was also applied to help identify latent features that are relevant for recipe recommendations. Finally, using K-means clustering method, the recipes were then grouped and labelled as either Sweet Treats or Savory Treats based on the cluster features. Using Neighborhood-based Collaborative Filtering recommendation system method, the algorithm returned recipes that fits the consumers preference. The recommended items were further filtered based on the 2,000 grams sodium threshold as prescribed by WHO for hypertensive individuals. [3]

The team is confident in the potential of this recommendation system, and we have identified several key areas for improvement and expansion such as, but not limited to the following: (1) enhance the customization capturing consumer specific dietary requirements (2) extend the explainability of the recommended recipes to eaters, (3) expand the recommendation system to include other health conditions such as diabetes, cancer, gout, and others, and (4) maximize the impact and reach of our recommendation system through partnerships with food delivery services.

Overall, this project highlighted the potential of data-driven approaches to promote healthier eating habits, especially for those with specific health concerns such as hypertension. By leveraging machine learning techniques and nutritional expertise, we can create personalized recommendations that are both enjoyable and beneficial for individuals’ health.

Back to Table of Contents

Executive Summary

Problem Statement

Motivation

List of Tables and Figures

Methodology

Data Source and Description

Data Exploration

Dimensionality Reduction

Results

Conclusion

Recommendations

References

PROBLEM STATEMENT

How the world eats is shifting dramatically. The proliferation of different food options alongside technological advancements increased convenience but at the same time lessened consumers’ nutritional awareness. Given this objective in mind, we explored information from allrecipes focusing on this key question that our team seeks to address:

How can we recommend delicious and healthy recipes to promote the well-being of hypertensive eaters?

The team’s primary focus is on developing a recommendation system that can effectively suggest tasty and nutritious recipes to individuals with hypertension. This question addresses the main challenge of balancing the dietary restrictions required to manage hypertension while still promoting enjoyable and satisfying meals.

Several key considerations such as identifying and incorporating relevant nutritional guidelines, capturing user preferences and dietary restrictions, and developing a robust and accurate recommendation algorithm were taken into consideration. However, for the time being, the team will only incorporate the World Health Organization’s 2,000mg daily sodium restriction as the nutritional guideline. By addressing these challenges, the team aims to provide an effective solution to promote the well-being of hypertensive individuals while enjoying their meals.

Back to Table of Contents

MOTIVATION

Hypertension, or high blood pressure, is a common medical condition that affects millions of people worldwide, including many of our loved ones. It is a major risk factor for cardiovascular disease, stroke, and other serious health problems, and is a leading cause of premature death globally.

One of the main ways to manage hypertension is through lifestyle modifications, including changes to diet and exercise habits. For many hypertensive individuals, this means adhering to specific dietary restrictions, such as reducing salt intake and increasing consumption of nutrient-rich foods like fruits, vegetables, and whole grains.

However, following these dietary restrictions can be challenging for many patients, especially if they feel that they have to sacrifice flavor or enjoyment in their meals. This is where the team aims to help hypertensive patients come in. By developing a recommendation system that suggests delicious and healthy recipes tailored to their dietary requirements, our team hopes to make it easier for hypertensive individuals to adhere to their restrictions while still enjoying their meals.

This can have significant health benefits for hypertensive patients, as it can help them to maintain a healthy weight, manage their blood pressure, and reduce their risk of developing other health problems associated with hypertension. Overall, the team aims to improve the quality of life and health outcomes for hypertensive individuals by helping them to make tasty and nutritious food choices.

Back to Table of Contents

LIST OF TABLES AND FIGURES

Table 1. Description of the Data per DataFrame

Table 2. Features of the Recipes Dataset

Table 3. Features of the Ratings Dataset

Figure 1. Project Methodology

Figure 2. Distribution of Ratings per Rating Value

Figure 3. Top 10 Foods with the Most Number of Reviews

Figure 4. Top 10 Foods with the Most Number of Sodium content (in mg)

Figure 5: Top 10 Foods with the Most Number of Sodium content (in mg) per Serving

Figure 6: Top 10 Foods with the Least Number of Sodium content (in mg) per Serving

Figure 7: Cumulative Variance Explained

Figure 8: Results of k-Means Clustering

Figure 9: Results of k-Means Clustering Metrics

Figure 10: Final k-Means Clustering

Figure 11: Hierarchical Clustering Methods

Figure 12: Extracted Cluster Themes

Figure 13: FlavorFit Recommender System

Figure 14: FlavorFit Recommender System for Sweet Treats

Figure 15: FlavorFit Recommender System for Savory Eats

Back to Table of Contents

METHODOLOGY

Figure 1: Project Methodology

To develop the proposed recipe recommendation system for hypertensive eaters, the following methodology was employed:

1. Data Extraction, Cleaning and Processing:

Data Extraction

The recipes, ratings, and nutrients for this study were extracted from allrecipes.com, a popular recipe website that provides nutritional values for each recipe.

Data Cleaning & Pre-processing

The following cleaning and preprocessing steps were performed to clean the datasets:

Combined the food and nutrients dataframes.
Retained only the calories, carbohydrates_g, sugars_g, fat_g, saturated_fat_g, cholesterol_mg, protein_g, dietary_fiber_g, sodium_mg, and calories_from_fat nutrients. Dropped the rest of the nutrients as they either contain null information or will not impact consumers diagnosed with hypertension.
Retained only the ratings where the food_id is also in the food datarame.
Cleaned the recipe text by employing lemmatization and removing stopwords unique to the dataset (e.g., chef, easy, make, etc).

2. Text Vectorization and Clustering

The resulting recipes after the data cleaning and pre-procesing stage are then vectorized for Term Frequency-Inverse Document Frequency (TF-IDF). This is to identify the most important or relevant words in a recipe (e.g., key ingredients and/or preparation methods). Afterwhich, Singular Value Decomposition (SVD), a dimensionality reduction technique most often used in recommender systems, was employed to help identify latent features that are relevant for recipe recommendations.

Finally, using K-means clustering method, the recipes are grouped based on their features like ingredients, preparation, and nutritional values.

3. Recommendation System

Using Neighborhood Based Collaborative Filtering method, a machine learning algorithm that makes personalized recommendations based on the preferences and behaviours of similar users, the alogirthm returned recipes that fits the eaters preference in foods. This is then filtered by a 2,000 gram sodium threshold as prescribed by WHO.

Back to Table of Contents

DATA SOURCE AND DESCRIPTION

For this project, the team focused on exploring the Recipes and Ratings datasets with detailed features as follows:

DataFrame	Feature	Data Type	Description
`food`	`itemid`	int64	ItemID of the recipe
`food`	`title`	object	Title of the recipe
`food`	`prep_time`	int64	Preperation time for the recipe
`food`	`cook_time`	int64	Cooking time for the recipe
`food`	`ready_time`	int64	Readying time for the recipe
`food`	`ingredients`	object	Ingredients of the recipe
`food`	`directions`	object	Directions of the recipe
`food`	`url`	object	URL of the recipe
`food`	`photo_url`	object	Photo URL of the recipe
`nutrients`	`name`	object	Name of the recipe
`nutrients`	`url`	object	URL of the recipe
`nutrients`	`category`	object	Category of the recipe
`nutrients`	`author`	object	Author of the recipe
`nutrients`	`summary`	object	Summary of the recipe
`nutrients`	`rating`	float64	Rating per recipe
`nutrients`	`rating_count`	int64	Rating Count per recipe
`nutrients`	`review_count`	int64	Review Count per recipe
`nutrients`	`ingredients`	object	Ingredients of the recipe
`nutrients`	`directions`	object	Directions of the recipe
`nutrients`	`prep`	object	Preparation of the recipe
`nutrients`	`cook`	object	Cooking time of each step of the recipe
`nutrients`	`total`	object	Total Cooking Time of the recipe
`nutrients`	`servings`	int64	Serving size of the recipe
`nutrients`	`yield`	object	Yield of the recipe
`nutrients`	`calories`	float64	Total Calories of the recipe
`nutrients`	`carbohydrates_g`	float64	Total Carbohydrates in grams of the recipe
`nutrients`	`sugars_g`	float64	Total Sugar in grams of the recipe
`nutrients`	`fat_g`	float64	Total Fat in grams of the recipe
`nutrients`	`saturated_fat_g`	float64	Total Saturated Fat in grams of the recipe
`nutrients`	`cholesterol_mg`	float64	Total Cholesterol in milligrams of the recipe
`nutrients`	`dietary_fiber_g`	float64	Total Dietary fiber in grams of the recipe
`nutrients`	`sodium_mg`	float64	Total Sodium in milligrams of the recipe
`nutrients`	`calories_from_fat`	float64	Total Calories from fat of the recipe
`nutrients`	`calcium_mg`	float64	Total Calcium in milligrams of the recipe
`nutrients`	`iron_mg`	float64	Total Iron in milligrams of the recipe
`nutrients`	`magnesium_mg`	float64	Total Magnesium in milligrams of the recipe
`nutrients`	`potassium_mg`	float64	Total Potassium in milligrams of the recipe
`nutrients`	`zinc_mg`	float64	Total Zinc in milligrams of the recipe
`nutrients`	`phosphorus_mg`	float64	Total Vitamin A of the recipe
`nutrients`	`vitamin_a_iu_IU`	float64	Total Phosphorous in milligrams of the recipe
`nutrients`	`niacin_equivalents_mg`	float64	Total Niacin Equivalents in milligrams of the recipe
`nutrients`	`vitamin_b6_mg`	float64	Total Vitamin B6 in milligrams of the recipe
`nutrients`	`vitamin_c_mg`	float64	Total Vitamin C in milligrams of the recipee
`nutrients`	`folate_mcg`	float64	Total Folate in microgram of the recipe
`nutrients`	`thiamin_mg`	float64	Total Thiamin in milligrams of the recipe
`nutrients`	`riboflavin_mg`	float64	Total Riboflavin in milligrams of the recipe
`nutrients`	`vitamin_e_iu_IU`	float64	Total Vitamin E of the recipe
`nutrients`	`vitamin_k_mcg`	float64	Total Vitamin K in micrograms of the recipe
`nutrients`	`biotin_mcg`	float64	Total Biotin in micrograms of the recipe
`nutrients`	`vitamin_b12_mcg`	float64	Total Vitamin B12 in micrograms of the recipe
`nutrients`	`mono_fat_g`	float64	Total Mono Fat in grams of the recipe
`nutrients`	`poly_fat_g`	float64	Total Poly Fat in grams of the recipe
`nutrients`	`trans_fatty_acid_g`	float64	Total Trans Fatty Acid in grams of the recipe
`nutrients`	`omega_3_fatty_acid_g`	float64	Total Omega 3 Fatty Acid in grams of the recipe
`nutrients`	`omega_6_fatty_acid_g`	float64	Total Omega 3 Fatty Acid in grams of the recipe
`ratings`	`User_ID`	int64	User ID of the rater
`ratings`	`Food_ID`	int64	Food ID of the recipe
`ratings`	`Rating`	int64	Rating of the rater per recipe

Table 1. Description of the Data per DataFrame

From the above table and as discussed in the methodology that the recipes and nutrients were combined, this project focused on exploring the Recipes and Ratings datasets with detailed features as follows.

The Recipes dataset contains 12,351 rows and 14 columns, including detailed information on the nutritional values of each recipe.
For this project, we used the following nutritional values per recipe: calories, carbohydrates_g, sugars_g, fat_g, saturated_fat_g, cholesterol_mg, protein_g, dietary_fiber_g, and sodium_mg. These values were documented based on the ingredients used in each recipe, which were sourced from a reliable nutritional database. By incorporating these nutritional values, eaters with specific dietary needs, or even those simply interested in tracking their nutritional intake, will greatly benefit from this. More specifically, by including the sodium content, eaters with hypertension can manage their condition better.
Further, adding nutritional values, including sodium content, can help consumers with hypertension manage their condition better. This information can lead to better health choices and outcomes, improved quality of life, and reduced risk of complications such as heart disease and stroke. Overall, including nutritional values in food recipes is a small but significant step towards a healthier lifestyle for everyone.
The Ratings dataset contains 1,555,581 rows and 3 columns, including User_id and Rating information on each recipe given by users.
In order to increase the reliability of the reviews and ensure that they were written by knowledgeable users, we filtered the dataset to only include reviews written by users who had provided feedback more than 100 times. By doing so, we were able to reduce the influence of potentially biased or unreliable reviews, and focus on the feedback provided by experienced reviewers. This filtering process helped us to ensure the overall quality and accuracy of the reviews used in this analysis.

By analyzing these datasets, we aim to gain insights into the nutritional value of popular recipes, identify trends in the types of ingredients and preparation methods used by allrecipes.com users, as well as understand the ratings and user behavior. In the following sections, we will describe the data analysis process and present our findings.

Table 2. Features of the Recipe Dataset

Features	Data Type	Description
Food_ID	int	unique identifier of the food item
name	string	name of the food item
ingredients	string	list of ingredients
photo_url	string	URL to a photo of the food item
calories	float	total number of calories
carbohydrates_g	float	total number of carbohydrates in grams
sugars_g	float	total number of sugars in grams
fat_g	float	total number of fat in grams
saturated_fat_g	float	total number of saturated fat in grams
cholesterol_mg	float	total amount of cholesterol in milligrams
protein_g	float	total number of protein in grams
dietary_fiber_g	float	total number of dietary fiber in grams
sodium_mg	float	total amount of sodium in milligrams
calories_from_fat	float	total number of calories from fat

Table 3. Features of the Ratings Dataset

Features	Data Type	Description
Food_ID	int	unique identifier for the food item
User_id	int	unique identifier for the user who rated the food item
Rating	float	rating given by the user for the food item

Back to Table of Contents

DATA EXPLORATION

Data Collection

Allrecipes.com is a popular recipe-sharing website that has been providing home cooks with a vast collection of recipes for over two decades. One of the unique features of allrecipes.com is that it includes nutritional information for many of its recipes. This feature has made it a go-to source for individuals who are looking for healthier meal options or who have specific dietary needs.

In this study, the team scraped selected data from allrecipes.com due to its diverse food options and the availability of nutritional information for majority of its recipes. By leveraging this information, the team could provide more accurate and comprehensive information about the recipes, including nutritional values such as calories, carbohydrates_g, sugars_g, fat_g, saturated_fat_g, cholesterol_mg, protein_g, dietary_fiber_g, and sodium_mg. This information can benefit a wide range of users, from consumers with specific dietary needs to consumers simply interested in tracking their nutritional intake.

Data Preparation

food_raw, nutrients, ratings_raw = dmw2_final.read_data()

The team performed data preparation, cleaning, and preprocessing procedures to ensure the datasets’ quality and appropriateness for subsequent analysis addressing any notable concerns such as missing data or irregular formatting, if any.

Recipe Dataset

Data Preview

dmw2_final.df_exploration(food_raw)

Impute missing values

food_raw = food_raw.bfill(axis=1)
food_raw = food_raw.ffill(axis=1)
dmw2_final.df_exploration(food_raw, display_df=False)

Add Nutritional Values

Nutritional values in food recipes are important for consumers with specific dietary restrictions or personal preferences as this allow consumers to make informed decisions about what to eat based on their varying nutritional needs.

food_nutrients = dmw2_final.add_nutrients(food_raw, nutrients)
dmw2_final.df_exploration(food_nutrients)

Preview Data with Nutritional Values

df_food = dmw2_final.get_food(food_nutrients)
dmw2_final.df_exploration(df_food)

Impute missing values

df_food = df_food.bfill(axis=1)
df_food = df_food.ffill(axis=1)
dmw2_final.df_exploration(df_food, display_df=False)

Ratings Dataset

Data Preview

dmw2_final.df_exploration(ratings_raw)

As discussed above, in order to increase the reliability of the reviews and ensure that they were written by knowledgeable users, we filtered the dataset to only include reviews written by users who had provided feedback more than 100 times. By doing so, we were able to reduce the influence of potentially biased or unreliable reviews, and focus on the feedback provided by experienced reviewers.

df_ratings = dmw2_final.get_ratings(ratings_raw, df_food)
dmw2_final.df_exploration(df_ratings)

Back to Table of Contents

Exploratory Data Analysis

The team performed in-depth exploration on the datasets discussed above to better comprehend its contents and structures. This critical process ensured that the subsequent analysis will be based on accurate and dependable data.

We took a closer look at the distribution of ratings by plotting the total number of ratings per rating value alongside the percentage of ratings per rating value. By presenting these two metrics side by side, we were able to identify any potential patterns or trends in the data and gain valuable insights into the overall distribution of ratings. This process helped us to identify the most common rating values and determine the most effective way to analyze and interpret the data.

# Calculate the percentage of each rating value
total = df_ratings.Rating.count()
percent_plot = pd.DataFrame({"Total": df_ratings.Rating.value_counts()})
percent_plot.reset_index(inplace=True)
percent_plot.rename(columns={"index": "Rating"}, inplace=True)
percent_plot["Percent"] = percent_plot["Total"].apply(lambda x: (x/total)*100)

# Plot the first bar chart
plt.figure(figsize=(10, 5))
ax1 = plt.subplot(1, 2, 1)
sns.barplot(x="Rating", y="Total", data=percent_plot, color="#808080")
plt.xlabel("Rating")
plt.ylabel("Total")
plt.title("Total Ratings per Rating Value")

# Plot the second bar chart
ax2 = plt.subplot(1, 2, 2)
sns.barplot(x="Rating", y="Percent", data=percent_plot, color="#ff7f0e")
plt.xlabel("Rating")
plt.ylabel("Percent")
plt.title("Percentage of Ratings per Rating Value")

# Save the plot as a PNG file
plt.savefig("plots.png")

# Create an HTML img tag to display the image
img_tag = f'<img src="plots.png" alt="plots" style="display:block;margin-left:auto;margin-right:auto;width:80%;">'

# Display the img tag in the Jupyter Notebook
display(HTML(img_tag))
plt.close()

Figure 2: Distribution of Ratings per Rating Value

From both the total ratings and percentage of ratings per rating value, we can infer that majority of the dishes in the Recipes datasets have a rating of 5.0. Despite the fact that ratings are subjective, we can deduce that majority of the dishes in our dataset are delicious and approved by the foodies out there.

dmw2_final.display_top_foods(df_ratings, df_food)

Top 10 Foods with the Most Number of Reviews

Figure 3: Top 10 Foods with the Most Number of Reviews

Above list of recipes were the top 10 foods based on how many times each food_id was reviewed. Total Rating depicts the total number of ratings per food. We started by tabulating the frequency of each food_id in the reviews and subsequently included a “total rating” column that displays the total number of reviews for each food item. We then determined the top 10 food items based on the cumulated total number of ratings.

Intuitively, it does follow that as the number of reviews increase, total of the ratings also increases. One notable observation as well is that these dishes with most number of reviews have a sodium content which is within the threshold of 2,000mg.

dmw2_final.display_top_10_sodium(df_food)

Top 10 Foods with theMost Number of Sodium Content (in mg)

Figure 4: Top 10 Foods with the Most Number of Sodium content (in mg)

Above list of recipes were the top 10 foods based on their sodium content (in mg). The highest on the list is Baker’s clay, which is from the allrecipes.com website, is used to make non-edible cookies that make lovely ornaments to hang on your Christmas Tree [4]. Its main ingredients are 4 cups of all-purpose flour, 1 1/2 cups water, and 1 cup salt. Given that the sodium content is beyond the 2,000mg threshold, no need to remove this outlier as this will be automatically be removed when recommending recipes to hypertensive eaters.

Other dishes in the list are mostly main dishes with meat as its main ingredient - ribs, steak, ham, others.

dmw2_final.display_top_10_sodium_serving(df_food)

Top 10 Foods with the Most Number ofSodium content (in mg) per Serving

Figure 5: Top 10 Foods with the Most Number of Sodium content (in mg) per Serving

Above list of recipes were the top 10 foods based the sodium content (in mg) per serving. Ignoring Baker’s clay which was already discussed in the previous list of top 10 items, the dishes returned this time are those which are stuffed and sandwich (burger).

dmw2_final.display_low_10_sodium_serving(df_food)

Top 10 Foods with the Least Number ofSodium content (in mg) per serving

Figure 6: Top 10 Foods with the Least Number of Sodium content (in mg) per Serving

Above list of recipes were the bottom 10 foods based on its sodium content (in mg) per serving. Scanning the returned list of dishes, it does not look appetizing on their own, as it seems that these foods are complementary to other foods or should be eaten together with other dishes.

Back to Table of Contents

DIMENSIONALITY REDUCTION

Dimensionality reduction refers to the process of reducing the number of features in a dataset while retaining and being able to explain the variation in the original dataset as much as possible.

For the purpose of this study, given the number and nature of features we have, we used Truncated Singular Value Decomposition since it is recommended for densed data and datasets with sparse data, in which many of the row values are zero. [5] Given that we vectorized the ingredients, the returned dataset was highly sparsed, hence the reason for choosing SVD for dimensionality reduction.

df_corpus = dmw2_final.get_ingredient_matrix(df_food)
dmw2_final.df_exploration(df_corpus)

dmw2_final.plot_variance_explained(df_corpus)

Figure 7: Cumulative variance explained

The SVD analysis of a matrix indicated that to explain at least 80% of the variance, 109 components were required based on the cumulative proportion of variance explained by the singular values

df_svs = dmw2_final.perform_svd(dmw2_final.n_sv, df_corpus)
dmw2_final.df_exploration(df_svs)

Above table provides a detailed breakdown of all 109 singular values retained and its corresponding weights.

Back to Table of Contents

RESULTS

CLUSTERING

We evaluated three different clustering methods:

k-Means representative-based clustering;
Complete Linkage hierarchical-based clustering; and
Ward’s Linkage hierarchical-based clustering.

When selecting the best clustering method for dataset, we took into account below factors:

results of internal validation metrics such as Silhouette Score, Davies-Bouldin index, and Calinski-Harabasz index;
evaluation through visual inspection the resulting clusters when projected onto the two selected singular value components of the ingredients dataset; and
the sensibility of the resulting cluster interpretations based on domain expertise.

After systematically evaluating various clustering techniques, we found that k-Means was the most effective approach for clustering the data, producing superior results in terms of cluster separation and coherence. To optimize the k-Means algorithm, we tested a range of k values ranging from 2 to 11.

On the other hand, for the hierarchical methods, we performed sensitivity analyses on the dendrogram cut-off points. Through this thorough evaluation process, we were able to identify the optimal clustering method for the input data and gain valuable insights into its underlying structure.

K-Means

dmw2_final.evaluate_kmeans_clusters(dmw2_final.final_tsvd)

Figure 8: Results of k-Means Clustering

These are the results of the grid search for the optimal K, which involved 10 iterations.

dmw2_final.show_internal_validation(dmw2_final.silhouette_scores,
                                    dmw2_final.ch_scores,
                                    dmw2_final.db_scores)

Figure 9: Results of k-Means Clustering Metrics

A grid search was performed on the k-Means clustering algorithm to identify the optimal number of clusters, with a range of possible values for k evaluated using internal validation metrics such as the Silhouette Score, Davies-Bouldin index, and Calinski-Harabasz index. Through this systematic evaluation, the optimal balance between cluster separation and coherence was identified, resulting in the optimal number of clusters for the k-Means algorithm being determined as K = 2.

dmw2_final.run_kmeans(dmw2_final.final_tsvd, df_corpus, dmw2_final.feature_names)

Figure 10: Final K-Means Clustering

The k-Means clustering resulted in two optimal clusters, providing valuable insights into the structure of the dataset and simplifying the representation of the data points.

Single, Complete, Average, and Ward

dmw2_final.plot_dendrograms(dmw2_final.final_tsvd)

Figure 11: Hierarchal Clustering Methods

This visualization helps us gain a better understanding of the clustering process by displaying the unique outcomes generated by four different linkage methods: single, complete, average, and Ward. Given that Ward linkage may produce similar clustering results as k-Means but can be computationally expensive, it may be more reasonable to choose k-Means clustering.

WORDCLOUD

dmw2_final.create_word_clouds(df_food, dmw2_final.cluster_indexes)

Figure 12: Extracted Cluster Themes

We utilized k-Means clustering with two clusters to generate two word clouds based on the frequency of terms in each cluster, labeling them as “Sweet Treats” and “Savory Eats”. The resulting word clouds provided a concise visual summary of the main themes and topics associated with each category, enabling us to easily identify the most relevant recipes for each group. By using this approach, we were able to gain valuable insights into the underlying structure of the input data and facilitate more efficient recipe recommendations or menu planning.

RECOMMENDER SYSTEM

Figure 13: FlavorFit Recommender System

Collaborative Filtering

In this section of the analysis, two separate recommendation systems: one for Sweet Treats and the other for Savory Eats were built. We used a collaborative filtering approach based on user-item interactions to make personalized recommendations. Specifically, we will be implemented a nearest-neighbor collaborative filtering algorithm to predict user ratings for recipes based on their past ratings and the ratings of similar users. This approach showed effectiveness in generating accurate and personalized recommendations for users.

Cluster 1: Sweet Treats

dmw2_final.food_recommendation1(df_food, df_ratings, 'Creamy Caramel Flan Recipe')

Figure 14: FlavorFit Recommender System for Sweet Treats

The Sweet Treats recommendation system has generated a list of three top recommendations based on the input “Creamy Caramel Flan Recipe”.

It not only provides the names of the recommended desserts but also displays their sodium content. This information can be useful for users who are conscious about their sodium intake and would like to avoid hypertension. By presenting this additional information, the system enables users to make informed decisions about which dessert to choose, based on their dietary requirements and preferences. This feature highlights the system’s commitment to providing personalized and relevant recommendations that take into account the individual user’s needs and preferences.

Cluster 2: Savory Eats

dmw2_final.food_recommendation2(df_food, df_ratings, 'Homemade Mac and Cheese Recipe')

Figure 15: FlavorFit Recommender System for Savory Eats

The Savory Eats recommendation system has generated a list of three top recommendations based on the input “Homemade Mac and Cheese Recipe”.

Back to Table of Contents

CONCLUSION

The team used k-Means clustering to explore the structure of the recipe dataset and discovered two main clusters: Sweet Treats and Savory Eats. This information can be useful for identifying popular flavor profiles and for tailoring recipe recommendations based on individual preferences. Moreover, by leveraging Neighborhood-Based Collaborative Filtering and capturing nutritional thresholds for hypertensive individuals, our recommendation system was able to provide recipe suggestions that are both delicious and healthy. This has the potential to help hypertensive patients adhere to dietary restrictions and improve their overall health outcomes.

__Overall, our study highlights the potential of data-driven approaches to promote healthier eating habits, especially for those with specific health concerns such as hypertension.__ By leveraging machine learning techniques and nutritional expertise, we can create personalized recommendations that are both enjoyable and beneficial for individuals’ health.

Back to Table of Contents

RECOMMENDATIONS

The team is confident in the potential of our recommendation system, and we have identified several key areas for improvement and expansion. These include:

User Value:

To enhance the customization of our recommendation system, we plan to capture more customized nutritional thresholds beyond the current 2,000 mg sodium daily restriction for hypertensive individuals. While this restriction is recommended by the World Health Organization, there may be individuals whose doctors have prescribed different thresholds based on their specific health needs. By capturing more personalized nutritional thresholds, our system can provide more tailored recipe recommendations to meet the diverse dietary needs of individuals with hypertension. Therefore, the team aims to improve the existing system by incorporating additional customized nutritional thresholds to accommodate a wider range of scenarios. This would further enhance the personalization of our recommendation system and provide more accurate and relevant recipe recommendations to our users.
To extend the explainability of the recommended recipes to eaters, we plan to leverage the power of Explainable Artificial Intelligence (XAI) techniques. XAI is a growing field that focuses on developing transparent and interpretable machine learning models that can provide insights into how a model arrived at a particular decision or recommendation. By leveraging XAI techniques, we can provide users with clear and transparent explanations of how our recommendation system arrived at a particular recommendation. This can improve user trust and engagement, leading to greater adherence to dietary restrictions and ultimately improving the health outcomes of individuals with hypertension.

Business Value:

To expand our recommendation system to include other health conditions such as cancer, gout, and others so as to significantly improve the health outcomes of individuals with these conditions. By leveraging the same data extraction, cleaning, processing, text vectorization and clustering, and recommendation system methodologies used for hypertensive patients, we can tailor our system to capture the nutritional thresholds and dietary restrictions specific to these health conditions. This can enable individuals with these conditions to find recipes that fit their dietary requirements and improve their overall health outcomes. Furthermore, expanding to other health conditions can also potentially increase the user base and reach of our recommendation system, leading to greater impact and success.
To maximize the impact and reach of our recommendation system, we plan to explore partnerships with food delivery services. This partnership will allow us to integrate our recommendation system with their platforms, reaching a wider audience of potential users. Additionally, partnering with food delivery services could potentially provide a new revenue stream. Moreover, by collaborating with these services, we can gain access to vast amounts of consumer purchase behavior data, which can help us improve the performance of our algorithm. With this data, we can fine-tune our recommendation system to better align with user preferences and dietary needs, ultimately leading to improved health outcomes for individuals with hypertension and other health conditions.

Back to Table of Contents

REFERENCES

[1] Hypertension. (2021, August 25). World Health Organization. https://www.who.int/news-room/fact-sheets/detail/hypertension.

[2] allrecipes.com website. https://www.allrecipes.com.

[3] World Heart Federation. https://world-heart-federation.org.

[4] Baker’s Clay. (2022, July 22). Johanna. https://www.allrecipes.com/recipe/11125/bakers-clay/

[5] 11 Dimensionality reduction techniques you should know in 2021. (2021, Apr 14). Rukshan Pramoditha. https://towardsdatascience.com/11-dimensionality-reduction-techniques-you-should-know-in-2021-dcb9500d388b

Back to Table of Contents

ACKNOWLEDGEMENT

I completed this project with my Learning Team, which consisted of Frances Divina Egango, Jed Alvin Guinto, Jose Ramon Jimenez, and Bernadette Obiso.

Share on

Twitter Facebook LinkedIn

Francis

Pressure’s off the Menu! Enjoyable Plating for Healthy Living

EXECUTIVE SUMMARY

TABLE OF CONTENTS

PROBLEM STATEMENT

MOTIVATION

LIST OF TABLES AND FIGURES

METHODOLOGY

DATA SOURCE AND DESCRIPTION

DATA EXPLORATION

Data Collection

Data Preparation

Recipe Dataset

Data Preview

Impute missing values

Add Nutritional Values

Preview Data with Nutritional Values

Impute missing values

Ratings Dataset

Data Preview

Exploratory Data Analysis

DIMENSIONALITY REDUCTION

RESULTS

CLUSTERING

K-Means

Single, Complete, Average, and Ward

WORDCLOUD

RECOMMENDER SYSTEM

Collaborative Filtering

Cluster 1: Sweet Treats

Cluster 2: Savory Eats

CONCLUSION

RECOMMENDATIONS

REFERENCES

ACKNOWLEDGEMENT

Share on