Quick Tip: Cosine Similarity and k-Nearest Neighbors in Google Sheets

As I discussed in my previous post, vector embeddings have become an indispensable tool, allowing us to go beyond keyword-based strategies and use semantic relevance in content analysis. Many technical SEOs don’t code (no judgment) but are spreadsheet wizards, so let’s talk about leveraging Google Sheets for calculating cosine similarity and k-nearest neighbors with your vector embeddings. Whether you’re enhancing your keyword research, looking for gaps, or clustering related content, integrating these techniques into your Google Sheets workflow can provide powerful, data-driven insights that inform improved rankings.

Writing User-Defined Functions in Google Sheets

Google Apps Script is a powerful, cloud-based scripting language that enables users to automate tasks and extend the functionality of Google Workspace applications like Sheets, Docs, and Slides. A form of JavaScript, Apps Script allows for seamless integration and customization across these tools, making it an underrated yet invaluable resource for enhancing productivity in the Google Workspace environment.

Cosine Similarity and kNN are not native functions in Sheets, so we’ll need to prepare our own in Apps Script. Here are steps to set it up:

  1. Open Google Sheets – Open the Google Sheets document where you want to write your script. In our case, we’ll start from a blank sheet.
Screenshot of an open Google Sheets spreadsheet
  1. Access the Apps Script Editor – Click on Extensions > Apps Scrip to open the code editor.
Screenshot with an arrow showing how to open up the Apps Script Editor in Google Sheets
  1. Create a New Script – In the Apps Script editor, delete any default code in the script file (Code.gs). The default code will look like this when you open it.
screenshot of a new Apps Script
  1. Write Your Function – Start by writing your function. For example, let’s write a function to get OpenAI embeddings based on a target cell. This is a good way for us to vectorize keyword data for comparison against page content embeddings collected via Screaming Frog SEO Spider.
function GETEMBEDDINGS(input) {
  const url = 'https://api.openai.com/v1/embeddings';
  const apiKey = 'YOUR_OPENAI_KEY';

   const payload = {
    model: 'text-embedding-3-small', 
    input: input  
  };

  console.log("payload: " + JSON.stringify(payload));
  const options = {
    method: 'post',
    contentType: 'application/json',
    headers: {
      'Authorization': 'Bearer ' + apiKey
    },
    payload: JSON.stringify(payload),
    muteHttpExceptions: true
  };
  console.log("options:" + JSON.stringify(options));
  const response = UrlFetchApp.fetch(url, options);
  const responseData = JSON.parse(response.getContentText());

  if (responseData.error) {
    throw new Error('API Error: ' + responseData.error.message);
  }

  const embeddings = responseData.data[0].embedding;
  return JSON.stringify(embeddings);
}
  1. Fill in your OpenAI API Key – To generate embeddings, you’ll need an OpenAI API key. You can follow these instructions to get a key.
Screenshot of Apps Script where to paste OpenAI API key
  1. Verify it Works – The GETEMBEDDINGS() function is written to return data, not to print it to the screen, so we’ll set up a test function that passes text and logs it to the console to make sure everything is working.
function testEmbeddings()
{
  Logger.log(GETEMBEDDINGS("I really need to hire iPullRank for SEO."))
}
  1. Save Your Script – Click the floppy disk icon or press Ctrl + S to save your script. Name your project appropriately.
  1. Run Your Function – In the Apps Script editor, click the dropdown menu next to the Run button and select your function name. The first time you run the script, you’ll need to authorize it. Follow the prompts to allow necessary permissions.
Screenshot of Apps Script where you should run the function

Authorization looks like any other authorization in the Google ecosystem. Don’t be surprised when Apps Script asks for access.

Screenshot of the Google Authorization window
  1. Test – Check the Logs under the Executions menu to debug any issues. If you see a series of decimal numbers print to your screen, then things are working.
screenshot of apps script showing a test of the execution log

You can now access your script from your Google Sheet as a function using =GETEMBEDDINGS (cell). Keep in mind that OpenAI embeddings have a 3191 token input limit. I have not implemented any of the chunking and averaging from the other post in this function because our primary use case here is vectorizing keywords.

screenshot of google sheet getembeddings function cell

Now, let’s walk through the other functions you’ll need to do vector comparisons.

Convert Strings to Floats

First things first, we need to convert our string embeddings to float arrays. Embeddings are often stored as strings (text) if you captured them via Screaming Frog, but to perform mathematical operations like cosine similarity, we need to convert these strings into arrays of floats (decimal numbers). This conversion ensures that each vector’s numerical values can be processed correctly for distance calculations.

function convertStringToFloatArray(str) {
  return str.replace(/[\[\]]/g, '') // Remove brackets
            .split(',')             // Split by commas
            .map(parseFloat);       // Convert to floats
}

Follow the steps above to add the function to your project in Apps Script.

Cosine Similarity

Next, let’s define the function to calculate cosine similarity between two vectors. As we previously discussed, cosine similarity measures the cosine of the angle between two vectors, providing a metric for how similar they are in terms of direction. This is particularly useful in SEO for understanding the semantic similarity between different pieces of content, or the relevance of content to a given keyword in this example thereby enabling more nuanced keyword analysis and content clustering.

Here’s the code:

function cosineSimilarity(vecA, vecB) {
  if (vecA.length !== vecB.length) {
    throw new Error("Vectors must be the same length");
  }
  vecA = convertStringToFloatArray(vecA);
  vecB = convertStringToFloatArray(vecB);
  const dotProduct = vecA.reduce((sum, a, idx) => sum + a * vecB[idx], 0);
  const magnitudeA = Math.sqrt(vecA.reduce((sum, a) => sum + a * a, 0));
  const magnitudeB = Math.sqrt(vecB.reduce((sum, b) => sum + b * b, 0));
  return dotProduct / (magnitudeA * magnitudeB);
}

Here’s an example of the computed similarity in the spreadsheet. We have our embeddings for our keyword pulled via the =GETEMBEDDINGS() function and embeddings for the landing page from our Screaming Frog crawl. Then we run our cosine similarity function by passing both vectors.

Follow the steps above to add the cosine similarity function to your project. 

K-Nearest Neighbors

Finally, we compute the nearest neighbors using cosine similarity. The k-nearest neighbors (k-NN) algorithm identifies the k vectors that are most similar to a target vector, based on a chosen distance metric like cosine similarity. In SEO, this can help in finding the most semantically similar keywords or content pieces, aiding in content optimization and internal linking.

function computeNearestNeighbors(targetCell, embeddingsColumn, dataColumn, numNeighbors,showScores=0) {
  const sheet = SpreadsheetApp.getActiveSpreadsheet().getActiveSheet();

  // Get target embedding from target cell
  const targetEmbeddingString = targetCell;
  const targetEmbedding = convertStringToFloatArray(targetEmbeddingString);

  // Get embeddings and associated data from specified columns
  const embeddingsRange = sheet.getRange(`${embeddingsColumn}2:${embeddingsColumn}`).getValues();
  const dataRange = sheet.getRange(`${dataColumn}2:${dataColumn}`).getValues();

  const embeddingsValues = embeddingsRange.filter(row => row[0] !== "");
  const dataValues = dataRange.filter(row => row[0] !== "");

  // Compute distances using cosine similarity
  const distances = embeddingsValues.map((row, index) => {
    const embedding = convertStringToFloatArray(row[0]);
    return {
      index: index + 2, // +2 to match row number in sheet (since we start from row 2)
      distance: cosineSimilarity(targetEmbedding, embedding),
      data: dataValues[index][0]
    };
  });

  // Sort by distance
  distances.sort((a, b) => b.distance - a.distance); // Descending order for cosine similarity

  // Output nearest neighbors
  const nearestNeighbors = distances.slice(0, numNeighbors);

  var results = nearestNeighbors.map(neighbor => neighbor.data);

  if (showScores !=0){
    results = nearestNeighbors.map(neighbor => `Data: ${neighbor.data}, Score: ${neighbor.distance.toFixed(4)}`);
  }

  return results.join(", ");
}

This function is a bit more complex to use because you need to provide the lookup embeddings and the list of embeddings and the column you want to pull results from.

  1. Place the data and set the range – Copy your data whether pulled via Screaming Frog or otherwise. Ensure your embeddings that you want to compare against are in the specified column, starting from row 2.
  2. Function Parameters – Call the computeNearestNeighbors() function with the following parameters:
    1. targetCell – The cell containing the embedding you want (e.g., B2).
    2. embeddingsColumn – The column containing the embeddings (e.g.,  “K”).
    3. dataColumn – the column containing the data that you want returned based on the match. In our case this will be the URL column. (E.g. “L”)
    4. numNeighbors – The number of nearest neighbors to return (e.g., 5).
    5. showScores – this is an optional flag that returns the cosine similarity scores
  3. Run the Function – using =computeNearestNeighbors() in a cell.

If we want a single match we run the function with numNeighbors set to 1 as follows: =computeNearestNeighbors(B2,"L","K",1).

Screenshot of Google Sheets with k-nearest neighbor top match example

If we want three matches, we set numNeighbors to 3 like so: =computeNearestNeighbors(B2,"L","K",3).

Google Sheets screenshot of k-nearest neighbors top 3 example

If we want three matches and the scores cosine similarity scores, we use the optional 5th parameter =computeNearestNeighbors(B2,"L","K",3,1).

Screenshot of Google Sheet with top 3 matches of k-nearest neighbor with cosine similarity score

Download our Workbook

If you’re in a rush and just want to get to the analysis instead of setting everything up, go ahead and download our Google Sheets workbook by filling out this form.

Sheets Is the Not Ideal Environment

I share these functions in support of SEOs that prefer spreadsheets over code. However, this is far less efficient and far more limited than using Python to do the same. In fact, I don’t know how performant this will be when you start having many rows. As you find more value in using embeddings, I’d encourage you to learn more about Python or at least use your favorite chatbot to help you write code. 

Failing that, when your data gets too big, BigQuery ML has embedding and distance functions that integrate with VertexAI directly. 

Happy semantic optimizing!

Next Steps

Here are 3 ways iPullRank can help you combine SEO and content to earn visibility for your business and drive revenue:

  1. Schedule a 30-Minute Strategy Session: Share your biggest SEO and content challenges so we can put together a custom discovery deck after looking through your digital presence. No one-size-fits-all solutions, only tailored advice to grow your business. Schedule your consultation session now.
  2. Mitigate the AI Overviews’ Potential Impact: How prepared is your SEO strategy for Google’s AI Overviews? Get ahead of potential threats and ensure your site remains competitive with our comprehensive AI Overviews Threat Report. Get your AI Overviews report.
  3. Enhance Your Content Relevancy with Orbitwise: Not sure if your content is mathematically relevant? Use Orbitwise to test and improve your content’s relevancy, ensuring it ranks for your targeted keywords. Test your content relevance today.

Want more? Visit our Resources Page for access to past webinars, exclusive guides, and insightful blogs crafted by our team of experts. 

Mike King

Leave a Comment

Your email address will not be published. Required fields are marked *

Get The Rank Report

iPullRank's weekly SEO, Content Strategy and Generative AI Newsletter

TIPS, ADVICE, AND EXCLUSIVE INSIGHTS DIRECT TO YOUR INBOX

Join over 4000+ Rank Climbers and get the SEO, Content, and AI industry news, updates, and best practices to level up your digital performance.

Considering AI Content?

AI generative content has gone mainstream.

Discover what that means for your business and why AI generation can be your competitive advantage in the world of content and SEO.