As I discussed in my previous post, vector embeddings have become an indispensable tool, allowing us to go beyond keyword-based strategies and use semantic relevance in content analysis. Many technical SEOs don’t code (no judgment) but are spreadsheet wizards, so let’s talk about leveraging Google Sheets for calculating cosine similarity and k-nearest neighbors with your vector embeddings. Whether you’re enhancing your keyword research, looking for gaps, or clustering related content, integrating these techniques into your Google Sheets workflow can provide powerful, data-driven insights that inform improved rankings.
Writing User-Defined Functions in Google Sheets
Google Apps Script is a powerful, cloud-based scripting language that enables users to automate tasks and extend the functionality of Google Workspace applications like Sheets, Docs, and Slides. A form of JavaScript, Apps Script allows for seamless integration and customization across these tools, making it an underrated yet invaluable resource for enhancing productivity in the Google Workspace environment.
Cosine Similarity and kNN are not native functions in Sheets, so we’ll need to prepare our own in Apps Script. Here are steps to set it up:
- Open Google Sheets – Open the Google Sheets document where you want to write your script. In our case, we’ll start from a blank sheet.
- Access the Apps Script Editor – Click on Extensions > Apps Scrip to open the code editor.
- Create a New Script – In the Apps Script editor, delete any default code in the script file (Code.gs). The default code will look like this when you open it.
- Write Your Function – Start by writing your function. For example, let’s write a function to get OpenAI embeddings based on a target cell. This is a good way for us to vectorize keyword data for comparison against page content embeddings collected via Screaming Frog SEO Spider.
function GETEMBEDDINGS(input) {
const url = 'https://api.openai.com/v1/embeddings';
const apiKey = 'YOUR_OPENAI_KEY';
const payload = {
model: 'text-embedding-3-small',
input: input
};
console.log("payload: " + JSON.stringify(payload));
const options = {
method: 'post',
contentType: 'application/json',
headers: {
'Authorization': 'Bearer ' + apiKey
},
payload: JSON.stringify(payload),
muteHttpExceptions: true
};
console.log("options:" + JSON.stringify(options));
const response = UrlFetchApp.fetch(url, options);
const responseData = JSON.parse(response.getContentText());
if (responseData.error) {
throw new Error('API Error: ' + responseData.error.message);
}
const embeddings = responseData.data[0].embedding;
return JSON.stringify(embeddings);
}
- Fill in your OpenAI API Key – To generate embeddings, you’ll need an OpenAI API key. You can follow these instructions to get a key.
- Verify it Works – The GETEMBEDDINGS() function is written to return data, not to print it to the screen, so we’ll set up a test function that passes text and logs it to the console to make sure everything is working.
function testEmbeddings()
{
Logger.log(GETEMBEDDINGS("I really need to hire iPullRank for SEO."))
}
- Save Your Script – Click the floppy disk icon or press Ctrl + S to save your script. Name your project appropriately.
- Run Your Function – In the Apps Script editor, click the dropdown menu next to the Run button and select your function name. The first time you run the script, you’ll need to authorize it. Follow the prompts to allow necessary permissions.
Authorization looks like any other authorization in the Google ecosystem. Don’t be surprised when Apps Script asks for access.
- Test – Check the Logs under the Executions menu to debug any issues. If you see a series of decimal numbers print to your screen, then things are working.
You can now access your script from your Google Sheet as a function using
=GETEMBEDDINGS
(cell)
. Keep in mind that OpenAI embeddings have a 3191 token input limit. I have not implemented any of the chunking and averaging from the other post in this function because our primary use case here is vectorizing keywords.
Now, let’s walk through the other functions you’ll need to do vector comparisons.
Convert Strings to Floats
First things first, we need to convert our string embeddings to float arrays. Embeddings are often stored as strings (text) if you captured them via Screaming Frog, but to perform mathematical operations like cosine similarity, we need to convert these strings into arrays of floats (decimal numbers). This conversion ensures that each vector’s numerical values can be processed correctly for distance calculations.
function convertStringToFloatArray(str) {
return str.replace(/[\[\]]/g, '') // Remove brackets
.split(',') // Split by commas
.map(parseFloat); // Convert to floats
}
Follow the steps above to add the function to your project in Apps Script.
Cosine Similarity
Next, let’s define the function to calculate cosine similarity between two vectors. As we previously discussed, cosine similarity measures the cosine of the angle between two vectors, providing a metric for how similar they are in terms of direction. This is particularly useful in SEO for understanding the semantic similarity between different pieces of content, or the relevance of content to a given keyword in this example thereby enabling more nuanced keyword analysis and content clustering.
Here’s the code:
function cosineSimilarity(vecA, vecB) {
if (vecA.length !== vecB.length) {
throw new Error("Vectors must be the same length");
}
vecA = convertStringToFloatArray(vecA);
vecB = convertStringToFloatArray(vecB);
const dotProduct = vecA.reduce((sum, a, idx) => sum + a * vecB[idx], 0);
const magnitudeA = Math.sqrt(vecA.reduce((sum, a) => sum + a * a, 0));
const magnitudeB = Math.sqrt(vecB.reduce((sum, b) => sum + b * b, 0));
return dotProduct / (magnitudeA * magnitudeB);
}
Here’s an example of the computed similarity in the spreadsheet. We have our embeddings for our keyword pulled via the =GETEMBEDDINGS()
function and embeddings for the landing page from our Screaming Frog crawl. Then we run our cosine similarity function by passing both vectors.
Follow the steps above to add the cosine similarity function to your project.
K-Nearest Neighbors
Finally, we compute the nearest neighbors using cosine similarity. The k-nearest neighbors (k-NN) algorithm identifies the k vectors that are most similar to a target vector, based on a chosen distance metric like cosine similarity. In SEO, this can help in finding the most semantically similar keywords or content pieces, aiding in content optimization and internal linking.
function computeNearestNeighbors(targetCell, embeddingsColumn, dataColumn, numNeighbors,showScores=0) {
const sheet = SpreadsheetApp.getActiveSpreadsheet().getActiveSheet();
// Get target embedding from target cell
const targetEmbeddingString = targetCell;
const targetEmbedding = convertStringToFloatArray(targetEmbeddingString);
// Get embeddings and associated data from specified columns
const embeddingsRange = sheet.getRange(`${embeddingsColumn}2:${embeddingsColumn}`).getValues();
const dataRange = sheet.getRange(`${dataColumn}2:${dataColumn}`).getValues();
const embeddingsValues = embeddingsRange.filter(row => row[0] !== "");
const dataValues = dataRange.filter(row => row[0] !== "");
// Compute distances using cosine similarity
const distances = embeddingsValues.map((row, index) => {
const embedding = convertStringToFloatArray(row[0]);
return {
index: index + 2, // +2 to match row number in sheet (since we start from row 2)
distance: cosineSimilarity(targetEmbedding, embedding),
data: dataValues[index][0]
};
});
// Sort by distance
distances.sort((a, b) => b.distance - a.distance); // Descending order for cosine similarity
// Output nearest neighbors
const nearestNeighbors = distances.slice(0, numNeighbors);
var results = nearestNeighbors.map(neighbor => neighbor.data);
if (showScores !=0){
results = nearestNeighbors.map(neighbor => `Data: ${neighbor.data}, Score: ${neighbor.distance.toFixed(4)}`);
}
return results.join(", ");
}
This function is a bit more complex to use because you need to provide the lookup embeddings and the list of embeddings and the column you want to pull results from.
- Place the data and set the range – Copy your data whether pulled via Screaming Frog or otherwise. Ensure your embeddings that you want to compare against are in the specified column, starting from row 2.
- Function Parameters – Call the
computeNearestNeighbors()
function with the following parameters:- targetCell – The cell containing the embedding you want (e.g., B2).
- embeddingsColumn – The column containing the embeddings (e.g., “K”).
- dataColumn – the column containing the data that you want returned based on the match. In our case this will be the URL column. (E.g. “L”)
- numNeighbors – The number of nearest neighbors to return (e.g., 5).
- showScores – this is an optional flag that returns the cosine similarity scores
- Run the Function – using
=computeNearestNeighbors()
in a cell.
If we want a single match we run the function with numNeighbors set to 1 as follows: =computeNearestNeighbors(B2,"L","K",1)
.
If we want three matches, we set numNeighbors to 3 like so: =computeNearestNeighbors(B2,"L","K",3)
.
If we want three matches and the scores cosine similarity scores, we use the optional 5th parameter =computeNearestNeighbors(B2,"L","K",3,1)
.
Download our Workbook
If you’re in a rush and just want to get to the analysis instead of setting everything up, go ahead and download our Google Sheets workbook by filling out this form.
Sheets Is the Not Ideal Environment
I share these functions in support of SEOs that prefer spreadsheets over code. However, this is far less efficient and far more limited than using Python to do the same. In fact, I don’t know how performant this will be when you start having many rows. As you find more value in using embeddings, I’d encourage you to learn more about Python or at least use your favorite chatbot to help you write code.
Failing that, when your data gets too big, BigQuery ML has embedding and distance functions that integrate with VertexAI directly.
Happy semantic optimizing!
Next Steps
Here are 3 ways iPullRank can help you combine SEO and content to earn visibility for your business and drive revenue:
- Schedule a 30-Minute Strategy Session: Share your biggest SEO and content challenges so we can put together a custom discovery deck after looking through your digital presence. No one-size-fits-all solutions, only tailored advice to grow your business. Schedule your consultation session now.
- Mitigate the AI Overviews’ Potential Impact: How prepared is your SEO strategy for Google’s AI Overviews? Get ahead of potential threats and ensure your site remains competitive with our comprehensive AI Overviews Threat Report. Get your AI Overviews report.
- Enhance Your Content Relevancy with Orbitwise: Not sure if your content is mathematically relevant? Use Orbitwise to test and improve your content’s relevancy, ensuring it ranks for your targeted keywords. Test your content relevance today.
Want more? Visit our Resources Page for access to past webinars, exclusive guides, and insightful blogs crafted by our team of experts.
Leave a Comment