Objective: The objective is to perform an ngram analysis on either a bulk file or an aggregated historical customer search term report file. This analysis will help identify single word phrase match candidates that can be excluded from campaigns.
The excel sheet contains the following column headers:
- Column A: "Row Labels" which represents the query from the customer search term report
- Column B: "Sum of Impressions"
- Column C: "Sum of Clicks"
- Column D: "Sum of Spend"
- Column E: "Sum of 7 Day Total Sales"
- Column F: "Sum of 7 Day Total Units (#)"
- Column G: "Sum of 7 Day Advertised SKU Sales"
These columns are saved in an excel file called "ngram analysis".
To implement this, follow the steps below:
- Read the Excel file "ngram analysis" into a DataFrame.
- Loop through each row and split the "Row Labels" column to obtain unique words.
- For each unique word, aggregate the values of the other columns based on the presence of that word in "Row Labels".
- Calculate the additional columns as mentioned.
- Write the resulting DataFrame to a new Excel file named "ngram analysis - done".
The output will look something like this (single words hidden for privacy)
We can see the top word has spent $60 without any sales indicating that it’s a good candidate to negate as a Phrase Negative match to eliminate wasted ad spend. In looking at this data it’s also interesting to analyze the CTR - we can see that some of these bad words have a high CTR
Possible Errors that can come up:
- Python is sensitive if a column has a space at the end of the text. For example if I add a space in the “Sum of 7 Day Total Sales” then it will cause a KeyError. The script is trying to access a column named 'Sum of 7 Day Total Sales' in result_df, but it seems this column doesn't exist, resulting in a KeyError. To resolve this, you need to ensure that the column name is spelled correctly and that it actually exists in result_df.