How to perform a correlation test in RStudio

In this article, we will learn how to perform a correlation hypothesis test between two variables using RStudio. The objective is to determine if there is a significant relationship between the selected variables.

The data we will use

For this example, we will use a dataset called data that contains the following variables:

  1. Campaign: Binary categorical variable that indicates whether a specific day belongs to the Black Friday campaign. It takes the value of 1 if the day is part of the campaign and 0 otherwise.

  2. Purchases: discrete numeric variable containing the number of purchases made by users on a specific day. 

  3. Revenue: Continuous numeric variable containing the amount of revenue achieved on a specific day. 

The number of rows in our data is 30 days, i.e. one month. 


Next, we will analyze the correlation between the Purchases and Revenue variables in relation to the Campaign variable. We want to determine whether during the Black Friday campaign, sales and revenue increase.

We want to determine whether during the Black Friday campaign, sales and revenue increase.

Step 1: Hypothesis statement

First, we must define our hypotheses:

  • Null hypothesis (H₀): There is no correlation between the two variables.
  • Alternative hypothesis (H₁): There is correlation between the two variables.

Step 2: Initial data visualization

Before performing the statistical test, it is useful to visualize the relationship between the two variables:

Screenshot 2024-10-14 at 9.33.59

This graph allows us to visually observe whether there appears to be a relationship between the campaign and purchases. The red line represents a linear regression, which is useful for identifying trends.

Screenshot 2024-10-14 at 9.34.29

We observed a positive correlation between purchases and the days belonging to the Black Friday campaign. That is, during the days of Black Friday an increase in sales is observed. Now we will analyze if the same happens with revenues, since it is possible that, although sales increase during the Black Friday campaign, revenues remain stable due to the discounts applied.

Screenshot 2024-10-14 at 9.35.20

A positive correlation is observed between revenues and the days belonging to the Black Friday campaign. Even so, this is lower than with purchases.

Step 3: Method selection

There are different methods to calculate the correlation depending on the characteristics of the data:

  • Pearson: Pearson's correlation coefficient measures the linear relationship between two continuous variables and assumes that both variables are normally distributed and quantitative. It is not ideal when you have a categorical (binary) variable and a discrete variable, as Pearson is designed for continuous and linear variables.
  • Spearman: Spearman's correlation coefficient is a non-parametric measure that assesses the not necessarily linear relationship between two variables. It works best when the data do not follow a normal distribution or when the variables are not continuous. Given that one of our variables is binary and one is discrete, Spearman is more appropriate.

Step 4: Obtaining the correlation coefficient

Screenshot 2024-10-14 at 9.30.29

Screenshot 2024-10-14 at 9.30.36

In this graph, we must look at the coefficients that appear in the cells above the main diagonal. These tell us the strength and direction of the correlation between pairs of variables. We will focus on the correlation between Campaign and the two numerical variables, since we already know that there is a relationship between Purchases and Revenues.

  • Purchases and Campaign: The correlation is 0.71, which suggests a fairly high positive relationship. This means that on Black Friday days, purchases increase
  • Revenues and Campaign: The correlation is 0.61, indicating a moderate positive relationship. This means that on Black Friday days revenues increase, although less strongly than purchases.

The three asterisks (***) next to the numbers indicate that the correlations are statistically significant at a high level, i.e. it is highly unlikely that these relationships are the product of chance. This section consists of a correlation hypothesis test.

Step 5: Performing the correlation test

We will proceed to perform the correlation test in more detail:Screenshot 2024-10-14 at 9.31.44

This command provides us with a p-value and a correlation coefficient (rho). The p-value tells us if the correlation is statistically significant. If it is less than 0.05, we have enough evidence to reject the null hypothesis and conclude that there is a significant correlation between the two variables.

We obtain the following results:

Screen capture 2024-10-14 at 9.31.50

With a p-value well below 0.05, we have enough evidence to reject the null hypothesis and conclude that there is a correlation between shopping and Black Friday campaign days, with a positive coefficient of 0.7057.

Screenshot 2024-10-14 at 9.31.56

With a p-value of less than 0.05, we have sufficient evidence to reject the null hypothesis and conclude that there is a correlation between revenue and Black Friday campaign days, with a positive coefficient of 0.6146.

 

PREVIOUS
NEXT

TIPS DE EXPERTOS

Suscríbete para impulsar tu negocio.

LATESTS ARTICLES

GPT-5 is now a reality: more context and advanced automation

OpenAI has officially confirmed that GPT-5 will launch during the summer of 2025. Although no specific date has been announced, multiple leaks and technical signals have started circulating across platforms like GitHub and specialized forums.

HubSpot connects with ChatGPT for generative AI-powered analysis

Generative artificial intelligence is no longer a futuristic promise — it's now a concrete tool for improving commercial performance. More and more companies are integrating models like ChatGPT and Claude into their CRM systems to streamline processes, generate automated content, personalize interactions, and enhance decision-making.

The seventh wave of AI is redefining CRM and data strategy

Artificial intelligence is not just another improvement: it is, in the words of George Colony, CEO of Forrester, the seventh wave of transformation that will redefine the technology sector. This change directly affects CRM, analytics, and marketing automation, forcing companies to adapt or be left behind.File:George Colony in 2011.jpg

How the end of third-party cookies impacts your marketing strategy

The announcement of the definitive elimination of third-party cookies marks a turning point in the digital ecosystem. This is not just a technical adjustment in browsers: we are talking about a structural change in the way companies collect data, activate advertising campaigns, and manage customer relationships.

And although it may seem like a distant issue or one exclusive to large corporations, the reality is that it affects any business that uses digital advertising, email marketing, retargeting strategies, or affiliate programs.
That’s why understanding its impact and knowing how to prepare is key to staying competitive.

What are third-party cookies and why are they disappearing?

Third-party cookies: the foundation of digital marketing until now

A third-party cookie is a file placed on your browser by a provider other than the website you are visiting.
For example, if you visit a blog that uses Google or Facebook ads, those systems install cookies that track your behavior—even when you browse other sites.

Thanks to those cookies, advertisers could:

  • Follow you throughout your browsing.

  • Show you ads based on your interests and behavior.

  • Measure the impact of their campaigns.

  • Build detailed profiles without requiring you to register or provide data.

In short: third-party cookies were the backbone of programmatic advertising and retargeting.

Why are they being eliminated?

The official reason is user privacy protection.
More and more users demand control over their personal data and how it’s used. Regulations like GDPR in Europe and CCPA in California have forced major players (Google, Apple, Mozilla) to move toward a more privacy-friendly model.

But there is another angle:
Google, owner of Chrome and a leader in digital advertising, is redefining the game to maintain market control and limit competition. By eliminating third-party cookies, Google ensures that only those who manage first-party data or operate within its platforms can effectively reach users.

The three major pillars changing after the elimination of cookies

1. Campaign measurement and attribution

Until now, measuring the impact of a multichannel campaign (ads, email, web visits) relied on attribution models based on cookies.
For example:

If a user saw an ad on Instagram, clicked on a Google ad, and then made a purchase on the website, cookies helped trace that path.

What happens without third-party cookies?

  • Conversions attributed to third parties will decrease.

  • The user journey will be harder to track.

  • “Last-click” or “multi-touch” measurement becomes less reliable.

How to adapt?

  • Prioritize first-party data measurement by connecting your CRM with analytics platforms.

  • Implement solutions like Google Enhanced Conversions or server-side tagging, which allow more accurate measurement without relying on cookies.

  • Explore proprietary attribution models, such as integrating sales or CRM systems with analytics tools.

2. Audience segmentation and activation

The end of retargeting as we knew it.
Without third-party cookies, platforms can no longer create audiences based on behavior across different websites. This directly affects:

  • Programmatic advertising.

  • Dynamic retargeting campaigns.

  • Affiliate campaigns based on cross-site tracking.

How to adapt?

  • Enhance your first-party data: encourage registration, subscriptions, and account creation.

  • Use activation tools like Customer Match (Google Ads) or Audiences (Meta), which let you upload your own data to reach those users on their platforms.

  • Work on lookalike strategies based on your own customer data, not third-party data.

  • Leverage contextual advertising by showing ads related to the content being consumed—without needing to know the user’s identity.

3. First-party data management and value

The direct consequence of this change is that first-party data becomes the most valuable asset of a digital company.
Without the ability to buy audiences based on cookies, you need to build your own database with real, interested users with whom you can maintain a direct relationship.

This means:

  • Developing acquisition strategies based on value: lead magnets, quality content, incentives for registration.

  • Creating automated, personalized communication flows from your CRM.

  • Focusing on the quality of the relationship, not just the quantity of impacts.

How to adapt?

  • Strengthen your lead generation strategies and improve your registration forms.

  • Implement a CDP (Customer Data Platform) if you handle large volumes, or ensure your CRM is well integrated with your marketing platforms.

  • Take care of the user experience to avoid intrusive practices like aggressive pop-ups or forced capture.

What alternatives does the market propose after the elimination of cookies?

  • FLoC and Privacy Sandbox (Google): Google proposes alternative systems based on cohorts, where users are grouped by interests without being individually identified. These proposals still generate debate over their effectiveness and privacy.

  • Data Clean Rooms: Secure environments where data from different parties (advertisers, platforms) can be matched without revealing user identities. Costly but necessary for major advertisers.

  • Contextual advertising: Making a comeback. Showing ads related to the content being visited, with no need to know who the user is.

  • Server-side models: Collecting and activating data from the server side is a technical alternative for measuring and segmenting without relying on traditional cookies.

What should companies do to adapt (and not just survive)?

  • Invest in a data strategy:
    Organize, structure, and connect your databases with your marketing tools.
    First-party data is a strategic asset—not just a list of emails.

  • Train your teams:
    Not just the marketing department. Sales, customer service, IT… everyone needs to understand the value of data and how it’s managed.

  • Strengthen customer trust:
    Transparency and good privacy management will be differentiators. Clearly explaining how you use data builds trust and, in the long term, conversion.

  • Commit to personalized omnichannel experiences:
    The CRM should be the center of a strategy where the user receives coherent impacts across all channels (web, email, app, social).

  • Prepare for new measurement methods:
    Invest in server-side solutions, predictive models, and tools that allow you to measure impact beyond cookies.

Conclusion: Threat or opportunity?

The end of third-party cookies is not the end of advertising or digital marketing.
It is the beginning of a new paradigm where companies that invest in:

  • Building their first-party data.

  • Truly integrating their systems.

  • Personalizing based on a deep understanding of the customer.

… will be the ones to take the biggest slice of the pie.

Because if one thing is clear, it’s that data remains important…
You just have to earn it now.

data
Mallorca 184, 08036
Barcelona, Spain