statistical tools for descriptive research

Which descriptive statistics tool should you choose?

This article will help you choose the right descriptive statistics tool for your data. Each tool is available in Excel using the XLSTAT software.

The purpose of descriptive statistics

Describing data is an essential part of statistical analysis aiming to provide a complete picture of the data before moving to exploratory analysis or predictive modeling. The type of statistical methods used for this purpose are called descriptive statistics. They include both numerical (e.g. central tendency measures such as mean, mode, median or measures of variability) and graphical tools (e.g. histogram, box plot, scatter plot…) which give a summary of the dataset and extract important information such as central tendencies and variability. Moreover, we can use descriptive statistics to explore the association between two or several variables (bivariate or multivariate analysis).

For example, let’s say we have a data table which represents the results of a survey on the amount of money people spend on online shopping on a monthly average basis. Rows correspond to respondents and columns to the amount of money spent as well as the age group they belong to. Our goal is to extract important information from the survey and detect potential differences between the age groups. For this, we can simply summarize the results per group using common descriptive statistics, such as:

The mean and the median , that reflect the central tendency.

The standard deviation , the variance , and the variation coefficient, that reflect the dispersion .

In another example, using qualitative data, we consider a survey on commuting. Rows correspond to respondents and columns to the mode of transportation as well as to the city they live in. Our goal is to describe transportation preferences when commuting to work per city using: - The mode , reflecting the most frequent mode of commuting (the most frequent category).

The frequencies , reflecting how many times each mode of commuting appears as an answer.

The relative frequencies (percentages), which is the frequency divided by the total number of answers.

Bar charts and stacked bars, that graphically illustrate the relative frequencies by category.

A guide to choose a descriptive statistics tool according to the situation

In order to choose the right descriptive statistics tool, we need to consider the types and the number of variables we have as well as the objective of the study. Based on these three criteria we have generated a grid that will help you decide which tool to use according to your situation. The first column of the grid refers to data types:

Quantitative dataset: containing variables that describe quantities of the objects of interest. The values are numbers. The weight of an infant is an example of a quantitative variable.

Qualitative dataset: containing variables that describe qualities of the objects of interest (categorical or nominal data). These values are called categories, also referred as levels or modalities. The gender of an infant is an example of a qualitative variable. The possible values are the categories male and female. Qualitative variables are referred as nominal or categorical.

Mixed dataset: containing both types of variables.

The second column indicates the number of variables. The proposed tools can handle either the description of one (univariate analysis) or the description of the relationships between two (bivariate analysis) or several variables. The grid provides intuitive example for each situation as well as a link of a tutorial explaining how to apply each XLSTAT tool using a demo file.

Descriptive Statistics grid

Please note that the list below is not exhaustive. However, it contains the most commonly used descriptive statistics, all available in Excel using the XLSTAT add-on.


Quantitative	One variable (univariate analysis)	Estimate a frequency distribution	How many people per age class attended this event? (here the investigated variable is age in a quantitative form)
Measure the central tendency of one sample	What is the average grade in a classroom?		Scattergram Strip plot
Measure the dispersion of one sample	How widely or narrowly are the grade scores dispersed around the mean score in a classroom?	, quartiles	Scattergram Strip plot
Characterize the shape of a distribution	Is the employee wage distribution in a company symmetric?
Visually control wether a sample follows a given distribution	What is the theorical percentage of students who obtained a better note than a given threshold
Measure the position of a value within a sample	What data point can be used to split the sample into 95% of low values and 5% of high values?
Detect extreme values	Is the height of 184cm an extreme value in this group of students?
Two variables (bivariate analysis)	Describe the association between two variables	Does plant biomass increase or decrease with soil Pb content?
Several variables	Describe the association between multiple variables	What is the evolution of the life expectancy, the fertility rate and the size of population over the last 10 years in this country?		(up to 3 variables to describe over time) or (up to 3 variables to describe)
Describe the association between three variables under specific conditions	How to visualize the proportions of three ice cream ingredients in several ice scream samples?
Two matrices of several variables	Describe the association between two matrices	Does the evaluation of a series of products differ from a panel to another?
Qualitative	One variable (univariate analysis)	Compute the frequencies of different categories	How many clients said they are satisfied by the service and how many said they were not?
Detect the most frequent category	Which is the most frequent hair color in this country?
Two variables (bivariate analysis)	Measure the association between two variables	Does the presence of a trace element change according to the presence of another trace element?		Stacked or clustered bars
Mixed (quantitative & qualitative)	Two variables (bivariate analysis)	Describe the relationship between a binary and a continuous variable	Is the concentration of a molecule in rats linked to the rats' sex (M/F)?
Describe the relationship between a categorical and a continuous variable	Does sepal length differ between three flower species?
Several variables (multivariate analysis)	Describe the relationship between one categorical and two quantitative variables	Does the amount of money spent on this commercial website change according to the age class and the salary of the customers?		(with groups)

How to run descriptive statistics in XLSTAT?

In XLSTAT, you will find a large variety of descriptive statistics tools in the Describing data menu. The most popular feature is Descriptive Statistics . All you have to do is select your data on the Excel sheet, then set up the dialog box and click OK. It's simple and quick. If you do not have XLSTAT, download for free our 14-Day version.

XLSTAT dialog box for Descriptive Statistics-General tab

Outputs for quantitative data

Statistics : Min./max. value, 1st quartile, median, 3rd quartile, range, sum, mean, geometric mean, harmonic mean, kurtosis (Pearson), skewness (Pearson), kurtosis, skewness, CV (standard deviation/mean), sample variance, estimated variance, standard deviation of a sample, estimated standard deviation, mean absolute deviation, standard deviation of the mean.

Graphs : box plots, scattergrams, strip plots, Q-Q plots, p-p plots, stem and leaf plots. It is possible group together the various box plots, scattergrams and strip plots on the same chart, sort them by mean and color by group to compare them.

Outputs for qualitative data

Statistics : No. of categories, mode, mode frequency, mode weight, % mode, relative frequency of the mode, frequency, weight of the category, percentage of the category, relative frequency of the category

Graphs : Bar charts, pie charts, double pie charts, doughnuts, stacked bars, multiple bars

XLSTAT has developed a series of statistics tutorials that will provide you with a theorical background on inferential statistical, data modeling, clustering, multivariate data analysis and more. These guides will also help you in choosing an appropriate statistical method to investigate the question you are asking.

Which statistical test to use?

Which statistical model should you use?

Which multivariate data analysis method to choose?

Which clustering method should you choose?

Choosing an appropriate time series analysis method

Comparison of supervised machine learning algorithms

Source: Introductory Statistics: Exploring the World Through Data: Robert Gould and Colle n Ryan**

Was this article useful?

Quant Analysis 101: Descriptive Statistics

Everything You Need To Get Started (With Examples)

By: Derek Jansen (MBA) | Reviewers: Kerryn Warren (PhD) | October 2023

If you’re new to quantitative data analysis , one of the first terms you’re likely to hear being thrown around is descriptive statistics. In this post, we’ll unpack the basics of descriptive statistics, using straightforward language and loads of examples . So grab a cup of coffee and let’s crunch some numbers!

Overview: Descriptive Statistics

What are descriptive statistics.

Descriptive vs inferential statistics
Why the descriptives matter
The “ Big 7 ” descriptive statistics
Key takeaways

At the simplest level, descriptive statistics summarise and describe relatively basic but essential features of a quantitative dataset – for example, a set of survey responses. They provide a snapshot of the characteristics of your dataset and allow you to better understand, roughly, how the data are “shaped” (more on this later). For example, a descriptive statistic could include the proportion of males and females within a sample or the percentages of different age groups within a population.

Another common descriptive statistic is the humble average (which in statistics-talk is called the mean ). For example, if you undertook a survey and asked people to rate their satisfaction with a particular product on a scale of 1 to 10, you could then calculate the average rating. This is a very basic statistic, but as you can see, it gives you some idea of how this data point is shaped .

Descriptive statistics summarise and describe relatively basic but essential features of a quantitative dataset, including its “shape”

What about inferential statistics?

Now, you may have also heard the term inferential statistics being thrown around, and you’re probably wondering how that’s different from descriptive statistics. Simply put, descriptive statistics describe and summarise the sample itself , while inferential statistics use the data from a sample to make inferences or predictions about a population .

Put another way, descriptive statistics help you understand your dataset , while inferential statistics help you make broader statements about the population , based on what you observe within the sample. If you’re keen to learn more, we cover inferential stats in another post , or you can check out the explainer video below.

Why do descriptive statistics matter?

While descriptive statistics are relatively simple from a mathematical perspective, they play a very important role in any research project . All too often, students skim over the descriptives and run ahead to the seemingly more exciting inferential statistics, but this can be a costly mistake.

The reason for this is that descriptive statistics help you, as the researcher, comprehend the key characteristics of your sample without getting lost in vast amounts of raw data. In doing so, they provide a foundation for your quantitative analysis . Additionally, they enable you to quickly identify potential issues within your dataset – for example, suspicious outliers, missing responses and so on. Just as importantly, descriptive statistics inform the decision-making process when it comes to choosing which inferential statistics you’ll run, as each inferential test has specific requirements regarding the shape of the data.

Long story short, it’s essential that you take the time to dig into your descriptive statistics before looking at more “advanced” inferentials. It’s also worth noting that, depending on your research aims and questions, descriptive stats may be all that you need in any case . So, don’t discount the descriptives!

The “Big 7” descriptive statistics

With the what and why out of the way, let’s take a look at the most common descriptive statistics. Beyond the counts, proportions and percentages we mentioned earlier, we have what we call the “Big 7” descriptives. These can be divided into two categories – measures of central tendency and measures of dispersion.

Measures of central tendency

True to the name, measures of central tendency describe the centre or “middle section” of a dataset. In other words, they provide some indication of what a “typical” data point looks like within a given dataset. The three most common measures are:

The mean , which is the mathematical average of a set of numbers – in other words, the sum of all numbers divided by the count of all numbers.

The median , which is the middlemost number in a set of numbers, when those numbers are ordered from lowest to highest.

The mode , which is the most frequently occurring number in a set of numbers (in any order). Naturally, a dataset can have one mode, no mode (no number occurs more than once) or multiple modes.

To make this a little more tangible, let’s look at a sample dataset, along with the corresponding mean, median and mode. This dataset reflects the service ratings (on a scale of 1 – 10) from 15 customers.

As you can see, the mean of 5.8 is the average rating across all 15 customers. Meanwhile, 6 is the median . In other words, if you were to list all the responses in order from low to high, Customer 8 would be in the middle (with their service rating being 6). Lastly, the number 5 is the most frequent rating (appearing 3 times), making it the mode.

Together, these three descriptive statistics give us a quick overview of how these customers feel about the service levels at this business. In other words, most customers feel rather lukewarm and there’s certainly room for improvement. From a more statistical perspective, this also means that the data tend to cluster around the 5-6 mark , since the mean and the median are fairly close to each other.

To take this a step further, let’s look at the frequency distribution of the responses . In other words, let’s count how many times each rating was received, and then plot these counts onto a bar chart.

Example frequency distribution of descriptive stats

As you can see, the responses tend to cluster toward the centre of the chart , creating something of a bell-shaped curve. In statistical terms, this is called a normal distribution .

As you delve into quantitative data analysis, you’ll find that normal distributions are very common , but they’re certainly not the only type of distribution. In some cases, the data can lean toward the left or the right of the chart (i.e., toward the low end or high end). This lean is reflected by a measure called skewness , and it’s important to pay attention to this when you’re analysing your data, as this will have an impact on what types of inferential statistics you can use on your dataset.

Measures of dispersion

While the measures of central tendency provide insight into how “centred” the dataset is, it’s also important to understand how dispersed that dataset is . In other words, to what extent the data cluster toward the centre – specifically, the mean. In some cases, the majority of the data points will sit very close to the centre, while in other cases, they’ll be scattered all over the place. Enter the measures of dispersion, of which there are three:

Range , which measures the difference between the largest and smallest number in the dataset. In other words, it indicates how spread out the dataset really is.

Variance , which measures how much each number in a dataset varies from the mean (average). More technically, it calculates the average of the squared differences between each number and the mean. A higher variance indicates that the data points are more spread out , while a lower variance suggests that the data points are closer to the mean.

Standard deviation , which is the square root of the variance . It serves the same purposes as the variance, but is a bit easier to interpret as it presents a figure that is in the same unit as the original data . You’ll typically present this statistic alongside the means when describing the data in your research.

Again, let’s look at our sample dataset to make this all a little more tangible.

statistical tools for descriptive research

As you can see, the range of 8 reflects the difference between the highest rating (10) and the lowest rating (2). The standard deviation of 2.18 tells us that on average, results within the dataset are 2.18 away from the mean (of 5.8), reflecting a relatively dispersed set of data .

For the sake of comparison, let’s look at another much more tightly grouped (less dispersed) dataset.

As you can see, all the ratings lay between 5 and 8 in this dataset, resulting in a much smaller range, variance and standard deviation . You might also notice that the data are clustered toward the right side of the graph – in other words, the data are skewed. If we calculate the skewness for this dataset, we get a result of -0.12, confirming this right lean.

In summary, range, variance and standard deviation all provide an indication of how dispersed the data are . These measures are important because they help you interpret the measures of central tendency within context . In other words, if your measures of dispersion are all fairly high numbers, you need to interpret your measures of central tendency with some caution , as the results are not particularly centred. Conversely, if the data are all tightly grouped around the mean (i.e., low dispersion), the mean becomes a much more “meaningful” statistic).

Key Takeaways

We’ve covered quite a bit of ground in this post. Here are the key takeaways:

Descriptive statistics, although relatively simple, are a critically important part of any quantitative data analysis.
Measures of central tendency include the mean (average), median and mode.
Skewness indicates whether a dataset leans to one side or another
Measures of dispersion include the range, variance and standard deviation

If you’d like hands-on help with your descriptive statistics (or any other aspect of your research project), check out our private coaching service , where we hold your hand through each step of the research journey.

Psst… there’s more!

This post is an extract from our bestselling short course, Methodology Bootcamp . If you want to work smart, you don't want to miss this .

Good day. May I ask about where I would be able to find the statistics cheat sheet?

Right above you comment 🙂

Good job. you saved me

Brilliant and well explained. So much information explained clearly!

Submit a Comment Cancel reply

Your email address will not be published. Required fields are marked *

Save my name, email, and website in this browser for the next time I comment.

Print Friendly

Home » Descriptive Statistics – Types, Methods and Examples

Descriptive Statistics – Types, Methods and Examples

Table of Contents

Descriptive Statistics

Descriptive statistics is a branch of statistics that deals with the summarization and description of collected data. This type of statistics is used to simplify and present data in a manner that is easy to understand, often through visual or numerical methods. Descriptive statistics is primarily concerned with measures of central tendency, variability, and distribution, as well as graphical representations of data.

Here are the main components of descriptive statistics:

Measures of Central Tendency : These provide a summary statistic that represents the center point or typical value of a dataset. The most common measures of central tendency are the mean (average), median (middle value), and mode (most frequent value).
Measures of Dispersion or Variability : These provide a summary statistic that represents the spread of values in a dataset. Common measures of dispersion include the range (difference between the highest and lowest values), variance (average of the squared differences from the mean), standard deviation (square root of the variance), and interquartile range (difference between the upper and lower quartiles).
Measures of Position : These are used to understand the distribution of values within a dataset. They include percentiles and quartiles.
Graphical Representations : Data can be visually represented using various methods like bar graphs, histograms, pie charts, box plots, and scatter plots. These visuals provide a clear, intuitive way to understand the data.
Measures of Association : These measures provide insight into the relationships between variables in the dataset, such as correlation and covariance.

Descriptive Statistics Types

Descriptive statistics can be classified into two types:

Measures of Central Tendency

These measures help describe the center point or average of a data set. There are three main types:

Mean : The average value of the dataset, obtained by adding all the data points and dividing by the number of data points.
Median : The middle value of the dataset, obtained by ordering all data points and picking out the one in the middle (or the average of the two middle numbers if the dataset has an even number of observations).
Mode : The most frequently occurring value in the dataset.

Measures of Variability (or Dispersion)

These measures describe the spread or variability of the data points in the dataset. There are four main types:

Range : The difference between the largest and smallest values in the dataset.
Variance : The average of the squared differences from the mean.
Standard Deviation : The square root of the variance, giving a measure of dispersion that is in the same units as the original dataset.
Interquartile Range (IQR) : The range between the first quartile (25th percentile) and the third quartile (75th percentile), which provides a measure of variability that is resistant to outliers.

Descriptive Statistics Formulas

Sure, here are some of the most commonly used formulas in descriptive statistics:

Mean (μ or x̄) :

The average of all the numbers in the dataset. It is computed by summing all the observations and dividing by the number of observations.

Formula : μ = Σx/n or x̄ = Σx/n (where Σx is the sum of all observations and n is the number of observations)

The middle value in the dataset when the observations are arranged in ascending or descending order. If there is an even number of observations, the median is the average of the two middle numbers.

The most frequently occurring number in the dataset. There’s no formula for this as it’s determined by observation.

The difference between the highest (max) and lowest (min) values in the dataset.

Formula : Range = max – min

Variance (σ² or s²) :

The average of the squared differences from the mean. Variance is a measure of how spread out the numbers in the dataset are.

Population Variance formula : σ² = Σ(x – μ)² / N Sample Variance formula: s² = Σ(x – x̄)² / (n – 1)

(where x is each individual observation, μ is the population mean, x̄ is the sample mean, N is the size of the population, and n is the size of the sample)

Standard Deviation (σ or s) :

The square root of the variance. It measures the amount of variability or dispersion for a set of data. Population Standard Deviation formula: σ = √σ² Sample Standard Deviation formula: s = √s²

Interquartile Range (IQR) :

The range between the first quartile (Q1, 25th percentile) and the third quartile (Q3, 75th percentile). It measures statistical dispersion, or how far apart the data points are.

Formula : IQR = Q3 – Q1

Descriptive Statistics Methods

Here are some of the key methods used in descriptive statistics:

This method involves arranging data into a table format, making it easier to understand and interpret. Tables often show the frequency distribution of variables.

Graphical Representation

This method involves presenting data visually to help reveal patterns, trends, outliers, or relationships between variables. There are many types of graphs used, such as bar graphs, histograms, pie charts, line graphs, box plots, and scatter plots.

Calculation of Central Tendency Measures

This involves determining the mean, median, and mode of a dataset. These measures indicate where the center of the dataset lies.

Calculation of Dispersion Measures

This involves calculating the range, variance, standard deviation, and interquartile range. These measures indicate how spread out the data is.

Calculation of Position Measures

This involves determining percentiles and quartiles, which tell us about the position of particular data points within the overall data distribution.

Calculation of Association Measures

This involves calculating statistics like correlation and covariance to understand relationships between variables.

Summary Statistics

Often, a collection of several descriptive statistics is presented together in what’s known as a “summary statistics” table. This provides a comprehensive snapshot of the data at a glanc

Descriptive Statistics Examples

Descriptive Statistics Examples are as follows:

Example 1: Student Grades

Let’s say a teacher has the following set of grades for 7 students: 85, 90, 88, 92, 78, 88, and 94. The teacher could use descriptive statistics to summarize this data:

Mean (average) : (85 + 90 + 88 + 92 + 78 + 88 + 94)/7 = 88
Median (middle value) : First, rearrange the grades in ascending order (78, 85, 88, 88, 90, 92, 94). The median grade is 88.
Mode (most frequent value) : The grade 88 appears twice, more frequently than any other grade, so it’s the mode.
Range (difference between highest and lowest) : 94 (highest) – 78 (lowest) = 16
Variance and Standard Deviation : These would be calculated using the appropriate formulas, providing a measure of the dispersion of the grades.

Example 2: Survey Data

A researcher conducts a survey on the number of hours of TV watched per day by people in a particular city. They collect data from 1,000 respondents and can use descriptive statistics to summarize this data:

Mean : Calculate the average hours of TV watched by adding all the responses and dividing by the total number of respondents.
Median : Sort the data and find the middle value.
Mode : Identify the most frequently reported number of hours watched.
Histogram : Create a histogram to visually display the frequency of responses. This could show, for example, that the majority of people watch 2-3 hours of TV per day.
Standard Deviation : Calculate this to find out how much variation there is from the average.

Importance of Descriptive Statistics

Descriptive statistics are fundamental in the field of data analysis and interpretation, as they provide the first step in understanding a dataset. Here are a few reasons why descriptive statistics are important:

Data Summarization : Descriptive statistics provide simple summaries about the measures and samples you have collected. With a large dataset, it’s often difficult to identify patterns or tendencies just by looking at the raw data. Descriptive statistics provide numerical and graphical summaries that can highlight important aspects of the data.
Data Simplification : They simplify large amounts of data in a sensible way. Each descriptive statistic reduces lots of data into a simpler summary, making it easier to understand and interpret the dataset.
Identification of Patterns and Trends : Descriptive statistics can help identify patterns and trends in the data, providing valuable insights. Measures like the mean and median can tell you about the central tendency of your data, while measures like the range and standard deviation tell you about the dispersion.
Data Comparison : By summarizing data into measures such as the mean and standard deviation, it’s easier to compare different datasets or different groups within a dataset.
Data Quality Assessment : Descriptive statistics can help identify errors or outliers in the data, which might indicate issues with data collection or entry.
Foundation for Further Analysis : Descriptive statistics are typically the first step in data analysis. They help create a foundation for further statistical or inferential analysis. In fact, advanced statistical techniques often assume that one has first examined their data using descriptive methods.

When to use Descriptive Statistics

They can be used in a wide range of situations, including:

Understanding a New Dataset : When you first encounter a new dataset, using descriptive statistics is a useful first step to understand the main characteristics of the data, such as the central tendency, dispersion, and distribution.
Data Exploration in Research : In the initial stages of a research project, descriptive statistics can help to explore the data, identify trends and patterns, and generate hypotheses for further testing.
Presenting Research Findings : Descriptive statistics can be used to present research findings in a clear and understandable way, often using visual aids like graphs or charts.
Monitoring and Quality Control : In fields like business or manufacturing, descriptive statistics are often used to monitor processes, track performance over time, and identify any deviations from expected standards.
Comparing Groups : Descriptive statistics can be used to compare different groups or categories within your data. For example, you might want to compare the average scores of two groups of students, or the variance in sales between different regions.
Reporting Survey Results : If you conduct a survey, you would use descriptive statistics to summarize the responses, such as calculating the percentage of respondents who agree with a certain statement.

Applications of Descriptive Statistics

Descriptive statistics are widely used in a variety of fields to summarize, represent, and analyze data. Here are some applications:

Business : Businesses use descriptive statistics to summarize and interpret data such as sales figures, customer feedback, or employee performance. For instance, they might calculate the mean sales for each month to understand trends, or use graphical representations like bar charts to present sales data.
Healthcare : In healthcare, descriptive statistics are used to summarize patient data, such as age, weight, blood pressure, or cholesterol levels. They are also used to describe the incidence and prevalence of diseases in a population.
Education : Educators use descriptive statistics to summarize student performance, like average test scores or grade distribution. This information can help identify areas where students are struggling and inform instructional decisions.
Social Sciences : Social scientists use descriptive statistics to summarize data collected from surveys, experiments, and observational studies. This can involve describing demographic characteristics of participants, response frequencies to survey items, and more.
Psychology : Psychologists use descriptive statistics to describe the characteristics of their study participants and the main findings of their research, such as the average score on a psychological test.
Sports : Sports analysts use descriptive statistics to summarize athlete and team performance, such as batting averages in baseball or points per game in basketball.
Government : Government agencies use descriptive statistics to summarize data about the population, such as census data on population size and demographics.
Finance and Economics : In finance, descriptive statistics can be used to summarize past investment performance or economic data, such as changes in stock prices or GDP growth rates.
Quality Control : In manufacturing, descriptive statistics can be used to summarize measures of product quality, such as the average dimensions of a product or the frequency of defects.

Limitations of Descriptive Statistics

While descriptive statistics are a crucial part of data analysis and provide valuable insights about a dataset, they do have certain limitations:

Lack of Depth : Descriptive statistics provide a summary of your data, but they can oversimplify the data, resulting in a loss of detail and potentially significant nuances.
Vulnerability to Outliers : Some descriptive measures, like the mean, are sensitive to outliers. A single extreme value can significantly skew your mean, making it less representative of your data.
Inability to Make Predictions : Descriptive statistics describe what has been observed in a dataset. They don’t allow you to make predictions or generalizations about unobserved data or larger populations.
No Insight into Correlations : While some descriptive statistics can hint at potential relationships between variables, they don’t provide detailed insights into the nature or strength of these relationships.
No Causality or Hypothesis Testing : Descriptive statistics cannot be used to determine cause and effect relationships or to test hypotheses. For these purposes, inferential statistics are needed.
Can Mislead : When used improperly, descriptive statistics can be used to present a misleading picture of the data. For instance, choosing to only report the mean without also reporting the standard deviation or range can hide a large amount of variability in the data.

About the author

Muhammad Hassan

Researcher, Academic Writer, Web developer

Documentary Analysis – Methods, Applications and...

Probability Histogram – Definition, Examples and...

Substantive Framework – Types, Methods and...

Data Analysis – Process, Methods and Types

MANOVA (Multivariate Analysis of Variance) –...

Framework Analysis – Method, Types and Examples

Chapter 14 Quantitative Analysis Descriptive Statistics

Numeric data collected in a research project can be analyzed quantitatively using statistical tools in two different ways. Descriptive analysis refers to statistically describing, aggregating, and presenting the constructs of interest or associations between these constructs. Inferential analysis refers to the statistical testing of hypotheses (theory testing). In this chapter, we will examine statistical techniques used for descriptive analysis, and the next chapter will examine statistical techniques for inferential analysis. Much of today’s quantitative data analysis is conducted using software programs such as SPSS or SAS. Readers are advised to familiarize themselves with one of these programs for understanding the concepts described in this chapter.

Data Preparation

In research projects, data may be collected from a variety of sources: mail-in surveys, interviews, pretest or posttest experimental data, observational data, and so forth. This data must be converted into a machine -readable, numeric format, such as in a spreadsheet or a text file, so that they can be analyzed by computer programs like SPSS or SAS. Data preparation usually follows the following steps.

Data coding. Coding is the process of converting data into numeric format. A codebook should be created to guide the coding process. A codebook is a comprehensive document containing detailed description of each variable in a research study, items or measures for that variable, the format of each item (numeric, text, etc.), the response scale for each item (i.e., whether it is measured on a nominal, ordinal, interval, or ratio scale; whether such scale is a five-point, seven-point, or some other type of scale), and how to code each value into a numeric format. For instance, if we have a measurement item on a seven-point Likert scale with anchors ranging from “strongly disagree” to “strongly agree”, we may code that item as 1 for strongly disagree, 4 for neutral, and 7 for strongly agree, with the intermediate anchors in between. Nominal data such as industry type can be coded in numeric form using a coding scheme such as: 1 for manufacturing, 2 for retailing, 3 for financial, 4 for healthcare, and so forth (of course, nominal data cannot be analyzed statistically). Ratio scale data such as age, income, or test scores can be coded as entered by the respondent. Sometimes, data may need to be aggregated into a different form than the format used for data collection. For instance, for measuring a construct such as “benefits of computers,” if a survey provided respondents with a checklist of b enefits that they could select from (i.e., they could choose as many of those benefits as they wanted), then the total number of checked items can be used as an aggregate measure of benefits. Note that many other forms of data, such as interview transcripts, cannot be converted into a numeric format for statistical analysis. Coding is especially important for large complex studies involving many variables and measurement items, where the coding process is conducted by different people, to help the coding team code data in a consistent manner, and also to help others understand and interpret the coded data.

Data entry. Coded data can be entered into a spreadsheet, database, text file, or directly into a statistical program like SPSS. Most statistical programs provide a data editor for entering data. However, these programs store data in their own native format (e.g., SPSS stores data as .sav files), which makes it difficult to share that data with other statistical programs. Hence, it is often better to enter data into a spreadsheet or database, where they can be reorganized as needed, shared across programs, and subsets of data can be extracted for analysis. Smaller data sets with less than 65,000 observations and 256 items can be stored in a spreadsheet such as Microsoft Excel, while larger dataset with millions of observations will require a database. Each observation can be entered as one row in the spreadsheet and each measurement item can be represented as one column. The entered data should be frequently checked for accuracy, via occasional spot checks on a set of items or observations, during and after entry. Furthermore, while entering data, the coder should watch out for obvious evidence of bad data, such as the respondent selecting the “strongly agree” response to all items irrespective of content, including reverse-coded items. If so, such data can be entered but should be excluded from subsequent analysis.

Missing values. Missing data is an inevitable part of any empirical data set. Respondents may not answer certain questions if they are ambiguously worded or too sensitive. Such problems should be detected earlier during pretests and corrected before the main data collection process begins. During data entry, some statistical programs automatically treat blank entries as missing values, while others require a specific numeric value such as -1 or 999 to be entered to denote a missing value. During data analysis, the default mode of handling missing values in most software programs is to simply drop the entire observation containing even a single missing value, in a technique called listwise deletion . Such deletion can significantly shrink the sample size and make it extremely difficult to detect small effects. Hence, some software programs allow the option of replacing missing values with an estimated value via a process called imputation . For instance, if the missing value is one item in a multi-item scale, the imputed value may be the average of the respondent’s responses to remaining items on that scale. If the missing value belongs to a single-item scale, many researchers use the average of other respondent’s responses to that item as the imputed value. Such imputation may be biased if the missing value is of a systematic nature rather than a random nature. Two methods that can produce relatively unbiased estimates for imputation are the maximum likelihood procedures and multiple imputation methods, both of which are supported in popular software programs such as SPSS and SAS.

Data transformation. Sometimes, it is necessary to transform data values before they can be meaningfully interpreted. For instance, reverse coded items, where items convey the opposite meaning of that of their underlying construct, should be reversed (e.g., in a 1-7 interval scale, 8 minus the observed value will reverse the value) before they can be compared or combined with items that are not reverse coded. Other kinds of transformations may include creating scale measures by adding individual scale items, creating a weighted index from a set of observed measures, and collapsing multiple values into fewer categories (e.g., collapsing incomes into income ranges).

Univariate Analysis

Univariate analysis, or analysis of a single variable, refers to a set of statistical techniques that can describe the general properties of one variable. Univariate statistics include: (1) frequency distribution, (2) central tendency, and (3) dispersion. The frequency distribution of a variable is a summary of the frequency (or percentages) of individual values or ranges of values for that variable. For instance, we can measure how many times a sample of respondents attend religious services (as a measure of their “religiosity”) using a categorical scale: never, once per year, several times per year, about once a month, several times per month, several times per week, and an optional category for “did not answer.” If we count the number (or percentage) of observations within each category (except “did not answer” which is really a missing value rather than a category), and display it in the form of a table as shown in Figure 14.1, what we have is a frequency distribution. This distribution can also be depicted in the form of a bar chart, as shown on the right panel of Figure 14.1, with the horizontal axis representing each category of that variable and the vertical axis representing the frequency or percentage of observations within each category.

Figure 14.1. Frequency distribution of religiosity.

With very large samples where observations are independent and random, the frequency distribution tends to follow a plot that looked like a bell-shaped curve (a smoothed bar chart of the frequency distribution) similar to that shown in Figure 14.2, where most observations are clustered toward the center of the range of values, and fewer and fewer observations toward the extreme ends of the range. Such a curve is called a normal distribution.

Central tendency is an estimate of the center of a distribution of values. There are three major estimates of central tendency: mean, median, and mode. The arithmetic mean (often simply called the “mean”) is the simple average of all values in a given distribution. Consider a set of eight test scores: 15, 22, 21, 18, 36, 15, 25, 15. The arithmetic mean of these values is (15 + 20 + 21 + 20 + 36 + 15 + 25 + 15)/8 = 20.875. Other types of means include geometric mean (n th root of the product of n numbers in a distribution) and harmonic mean (the reciprocal of the arithmetic means of the reciprocal of each value in a distribution), but these means are not very popular for statistical analysis of social research data.

The second measure of central tendency, the median , is the middle value within a range of values in a distribution. This is computed by sorting all values in a distribution in increasing order and selecting the middle value. In case there are two middle values (if there is an even number of values in a distribution), the average of the two middle values represent the median. In the above example, the sorted values are: 15, 15, 15, 18, 22, 21, 25, 36. The two middle values are 18 and 22, and hence the median is (18 + 22)/2 = 20.

Lastly, the mode is the most frequently occurring value in a distribution of values. In the previous example, the most frequently occurring value is 15, which is the mode of the above set of test scores. Note that any value that is estimated from a sample, such as mean, median, mode, or any of the later estimates are called a statistic .

Dispersion refers to the way values are spread around the central tendency, for example, how tightly or how widely are the values clustered around the mean. Two common measures of dispersion are the range and standard deviation. The range is the difference between the highest and lowest values in a distribution. The range in our previous example is 36-15 = 21.

The range is particularly sensitive to the presence of outliers. For instance, if the highest value in the above distribution was 85 and the other vales remained the same, the range would be 85-15 = 70. Standard deviation , the second measure of dispersion, corrects for such outliers by using a formula that takes into account how close or how far each value from the distribution mean:

Figure 14.2. Normal distribution.

Table 14.1. Hypothetical data on age and self-esteem.

The two variables in this dataset are age (x) and self-esteem (y). Age is a ratio-scale variable, while self-esteem is an average score computed from a multi-item self-esteem scale measured using a 7-point Likert scale, ranging from “strongly disagree” to “strongly agree.” The histogram of each variable is shown on the left side of Figure 14.3. The formula for calculating bivariate correlation is:

Figure 14.3. Histogram and correlation plot of age and self-esteem.

After computing bivariate correlation, researchers are often interested in knowing whether the correlation is significant (i.e., a real one) or caused by mere chance. Answering such a question would require testing the following hypothesis:

H 0 : r = 0

H 1 : r ≠ 0

H 0 is called the null hypotheses , and H 1 is called the alternative hypothesis (sometimes, also represented as H a ). Although they may seem like two hypotheses, H 0 and H 1 actually represent a single hypothesis since they are direct opposites of each other. We are interested in testing H 1 rather than H 0 . Also note that H 1 is a non-directional hypotheses since it does not specify whether r is greater than or less than zero. Directional hypotheses will be specified as H 0 : r ≤ 0; H 1 : r > 0 (if we are testing for a positive correlation). Significance testing of directional hypothesis is done using a one-tailed t-test, while that for non-directional hypothesis is done using a two-tailed t-test.

In statistical testing, the alternative hypothesis cannot be tested directly. Rather, it is tested indirectly by rejecting the null hypotheses with a certain level of probability. Statistical testing is always probabilistic, because we are never sure if our inferences, based on sample data, apply to the population, since our sample never equals the population. The probability that a statistical inference is caused pure chance is called the p-value . The p-value is compared with the significance level (α), which represents the maximum level of risk that we are willing to take that our inference is incorrect. For most statistical analysis, α is set to 0.05. A p-value less than α=0.05 indicates that we have enough statistical evidence to reject the null hypothesis, and thereby, indirectly accept the alternative hypothesis. If p>0.05, then we do not have adequate statistical evidence to reject the null hypothesis or accept the alternative hypothesis.

The easiest way to test for the above hypothesis is to look up critical values of r from statistical tables available in any standard text book on statistics or on the Internet (most software programs also perform significance testing). The critical value of r depends on our desired significance level (α = 0.05), the degrees of freedom (df), and whether the desired test is a one-tailed or two-tailed test. The degree of freedom is the number of values that can vary freely in any calculation of a statistic. In case of correlation, the df simply equals n – 2, or for the data in Table 14.1, df is 20 – 2 = 18. There are two different statistical tables for one-tailed and two -tailed test. In the two -tailed table, the critical value of r for α = 0.05 and df = 18 is 0.44. For our computed correlation of 0.79 to be significant, it must be larger than the critical value of 0.44 or less than -0.44. Since our computed value of 0.79 is greater than 0.44, we conclude that there is a significant correlation between age and self-esteem in our data set, or in other words, the odds are less than 5% that this correlation is a chance occurrence. Therefore, we can reject the null hypotheses that r ≤ 0, which is an indirect way of saying that the alternative hypothesis r > 0 is probably correct.

Most research studies involve more than two variables. If there are n variables, then we will have a total of n*(n-1)/2 possible correlations between these n variables. Such correlations are easily computed using a software program like SPSS, rather than manually using the formula for correlation (as we did in Table 14.1), and represented using a correlation matrix, as shown in Table 14.2. A correlation matrix is a matrix that lists the variable names along the first row and the first column, and depicts bivariate correlations between pairs of variables in the appropriate cell in the matrix. The values along the principal diagonal (from the top left to the bottom right corner) of this matrix are always 1, because any variable is always perfectly correlated with itself. Further, since correlations are non-directional, the correlation between variables V1 and V2 is the same as that between V2 and V1. Hence, the lower triangular matrix (values below the principal diagonal) is a mirror reflection of the upper triangular matrix (values above the principal diagonal), and therefore, we often list only the lower triangular matrix for simplicity. If the correlations involve variables measured using interval scales, then this specific type of correlations are called Pearson product moment correlations .

Another useful way of presenting bivariate data is cross-tabulation (often abbreviated to cross-tab, and sometimes called more formally as a contingency table). A cross-tab is a table that describes the frequency (or percentage) of all combinations of two or more nominal or categorical variables. As an example, let us assume that we have the following observations of gender and grade for a sample of 20 students, as shown in Figure 14.3. Gender is a nominal variable (male/female or M/F), and grade is a categorical variable with three levels (A, B, and C). A simple cross-tabulation of the data may display the joint distribution of gender and grades (i.e., how many students of each gender are in each grade category, as a raw frequency count or as a percentage) in a 2 x 3 matrix. This matrix will help us see if A, B, and C grades are equally distributed across male and female students. The cross-tab data in Table 14.3 shows that the distribution of A grades is biased heavily toward female students: in a sample of 10 male and 10 female students, five female students received the A grade compared to only one male students. In contrast, the distribution of C grades is biased toward male students: three male students received a C grade, compared to only one female student. However, the distribution of B grades was somewhat uniform, with six male students and five female students. The last row and the last column of this table are called marginal totals because they indicate the totals across each category and displayed along the margins of the table.

Table 14.2. A hypothetical correlation matrix for eight variables.

Table 14.3. Example of cross-tab analysis.

Although we can see a distinct pattern of grade distribution between male and female students in Table 14.3, is this pattern real or “statistically significant”? In other words, do the above frequency counts differ from that that may be expected from pure chance? To answer this question, we should compute the expected count of observation in each cell of the 2 x 3 cross-tab matrix. This is done by multiplying the marginal column total and the marginal row total for each cell and dividing it by the total number of observations. For example, for the male/A grade cell, expected count = 5 * 10 / 20 = 2.5. In other words, we were expecting 2.5 male students to receive an A grade, but in reality, only one student received the A grade. Whether this difference between expected and actual count is significant can be tested using a chi-square test . The chi-square statistic can be computed as the average difference between observed and expected counts across all cells. We can then compare this number to the critical value associated with a desired probability level (p < 0.05) and the degrees of freedom, which is simply (m-1)*(n-1), where m and n are the number of rows and columns respectively. In this example, df = (2 – 1) * (3 – 1) = 2. From standard chi-square tables in any statistics book, the critical chi-square value for p=0.05 and df=2 is 5.99. The computed chi -square value, based on our observed data, is 1.00, which is less than the critical value. Hence, we must conclude that the observed grade pattern is not statistically different from the pattern that can be expected by pure chance.

Social Science Research: Principles, Methods, and Practices. Authored by : Anol Bhattacherjee. Provided by : University of South Florida. Located at : http://scholarcommons.usf.edu/oa_textbooks/3/ . License : CC BY-NC-SA: Attribution-NonCommercial-ShareAlike

Child Care and Early Education Research Connections

Descriptive Statistics

This page describes graphical and pictorial methods of descriptive statistics and the three most common measures of descriptive statistics (central tendency, dispersion, and association).

Descriptive statistics can be useful for two purposes: 1) to provide basic information about variables in a dataset and 2) to highlight potential relationships between variables. The three most common descriptive statistics can be displayed graphically or pictorially and are measures of:

Graphical/Pictorial Methods

Measures of central tendency, measures of dispersion, measures of association.

There are several graphical and pictorial methods that enhance researchers' understanding of individual variables and the relationships between variables. Graphical and pictorial methods provide a visual representation of the data. Some of these methods include:

Scatter plots

Geographical Information Systems (GIS)

Visually represent the frequencies with which values of variables occur

Each value of a variable is displayed along the bottom of a histogram, and a bar is drawn for each value

The height of the bar corresponds to the frequency with which that value occurs

Display the relationship between two quantitative or numeric variables by plotting one variable against the value of another variable

For example, one axis of a scatter plot could represent height and the other could represent weight. Each person in the data would receive one data point on the scatter plot that corresponds to his or her height and weight

Geographic Information Systems (GIS)

A GIS is a computer system capable of capturing, storing, analyzing, and displaying geographically referenced information; that is, data identified according to location

Using a GIS program, a researcher can create a map to represent data relationships visually

Display networks of relationships among variables, enabling researchers to identify the nature of relationships that would otherwise be too complex to conceptualize

Visit the following websites for more information:

Graphical Analytic Techniques

Geographic Information Systems

Glossary terms related to graphical and pictorial methods:

GIS Histogram Scatter Plot Sociogram

Measures of central tendency are the most basic and, often, the most informative description of a population's characteristics. They describe the "average" member of the population of interest. There are three measures of central tendency:

Mean -- the sum of a variable's values divided by the total number of values Median -- the middle value of a variable Mode -- the value that occurs most often

Example: The incomes of five randomly selected people in the United States are $10,000, $10,000, $45,000, $60,000, and $1,000,000.

Mean Income = (10,000 + 10,000 + 45,000 + 60,000 + 1,000,000) / 5 = $225,000 Median Income = $45,000 Modal Income = $10,000

The mean is the most commonly used measure of central tendency. Medians are generally used when a few values are extremely different from the rest of the values (this is called a skewed distribution). For example, the median income is often the best measure of the average income because, while most individuals earn between $0 and $200,000, a handful of individuals earn millions.

Basic Statistics

Measures of Position

Glossary terms related to measures of central tendency:

Average Central Tendency Confidence Interval Mean Median Mode Moving Average Point Estimate Univariate Analysis

Measures of dispersion provide information about the spread of a variable's values. There are four key measures of dispersion:

Standard Deviation

Range is simply the difference between the smallest and largest values in the data. The interquartile range is the difference between the values at the 75th percentile and the 25th percentile of the data.

Variance is the most commonly used measure of dispersion. It is calculated by taking the average of the squared differences between each value and the mean.

Standard deviation , another commonly used statistic, is the square root of the variance.

Skew is a measure of whether some values of a variable are extremely different from the majority of the values. For example, income is skewed because most people make between $0 and $200,000, but a handful of people earn millions. A variable is positively skewed if the extreme values are higher than the majority of values. A variable is negatively skewed if the extreme values are lower than the majority of values.

Example: The incomes of five randomly selected people in the United States are $10,000, $10,000, $45,000, $60,000, and $1,000,000:

Range = 1,000,000 - 10,000 = 990,000 Variance = [(10,000 - 225,000)2 + (10,000 - 225,000)2 + (45,000 - 225,000)2 + (60,000 - 225,000)2 + (1,000,000 - 225,000)2] / 5 = 150,540,000,000 Standard Deviation = Square Root (150,540,000,000) = 387,995 Skew = Income is positively skewed

Survey Research Tools

Variance and Standard Deviation

Summarizing and Presenting Data

Skewness Simulation

Glossary terms related to measures of dispersion:

Confidence Interval Distribution Kurtosis Point Estimate Quartiles Range Skewness Standard Deviation Univariate Analysis Variance

Measures of association indicate whether two variables are related. Two measures are commonly used:

Correlation

As a measure of association between variables, chi-square tests are used on nominal data (i.e., data that are put into classes: e.g., gender [male, female] and type of job [unskilled, semi-skilled, skilled]) to determine whether they are associated*

A chi-square is called significant if there is an association between two variables, and nonsignificant if there is not an association

To test for associations, a chi-square is calculated in the following way: Suppose a researcher wants to know whether there is a relationship between gender and two types of jobs, construction worker and administrative assistant. To perform a chi-square test, the researcher counts up the number of female administrative assistants, the number of female construction workers, the number of male administrative assistants, and the number of male construction workers in the data. These counts are compared with the number that would be expected in each category if there were no association between job type and gender (this expected count is based on statistical calculations). If there is a large difference between the observed values and the expected values, the chi-square test is significant, which indicates there is an association between the two variables.

*The chi-square test can also be used as a measure of goodness of fit, to test if data from a sample come from a population with a specific distribution, as an alternative to Anderson-Darling and Kolmogorov-Smirnov goodness-of-fit tests. As such, the chi square test is not restricted to nominal data; with non-binned data, however, the results depend on how the bins or classes are created and the size of the sample

A correlation coefficient is used to measure the strength of the relationship between numeric variables (e.g., weight and height)

The most common correlation coefficient is Pearson's r , which can range from -1 to +1.

If the coefficient is between 0 and 1, as one variable increases, the other also increases. This is called a positive correlation. For example, height and weight are positively correlated because taller people usually weigh more

If the correlation coefficient is between -1 and 0, as one variable increases the other decreases. This is called a negative correlation. For example, age and hours slept per night are negatively correlated because older people usually sleep fewer hours per night

Chi-Square Procedures for the Analysis of Categorical Frequency Data

Chi-square Analysis

Glossary terms related to measures of association:

Association Chi Square Correlation Correlation Coefficient Measures of Association Pearson's Correlational Coefficient Product Moment Correlation Coefficient

Open Access is an initiative that aims to make scientific research freely available to all. To date our community has made over 100 million downloads. It’s based on principles of collaboration, unobstructed discovery, and, most importantly, scientific progression. As PhD students, we found it difficult to access the research we needed, so we decided to create a new Open Access publisher that levels the playing field for scientists across the world. How? By making research easy to access, and puts the academic needs of the researchers before the business interests of publishers.

We are a community of more than 103,000 authors and editors from 3,291 institutions spanning 160 countries, including Nobel Prize winners and some of the world’s most-cited researchers. Publishing on IntechOpen allows authors to earn citations and find new collaborators, meaning more people see your work not only from your own field of study, but from other related fields too.

Brief introduction to this section that descibes Open Access especially from an IntechOpen perspective

Want to get in touch? Contact our London head office or media team here

Our team is growing all the time, so we’re always on the lookout for smart people who want to help us reshape the world of scientific publishing.

Home > Books > Recent Advances in Biostatistics

Introduction to Descriptive Statistics

Submitted: 04 July 2023 Reviewed: 20 July 2023 Published: 07 September 2023

DOI: 10.5772/intechopen.1002475

Cite this chapter

There are two ways to cite this chapter:

From the Edited Volume

Recent Advances in Biostatistics

B. Santhosh Kumar

To purchase hard copies of this book, please contact the representative in India: CBS Publishers & Distributors Pvt. Ltd. www.cbspd.com | [email protected]

Chapter metrics overview

287 Chapter Downloads

Impact of this chapter

Total Chapter Downloads on intechopen.com

Total Chapter Views on intechopen.com

This chapter offers a comprehensive exploration of descriptive statistics, tracing its historical development from Condorcet’s “average” concept to Galton and Pearson’s contributions. Emphasizing its pivotal role in academia, descriptive statistics serve as a fundamental tool for summarizing and analyzing data across disciplines. The chapter underscores how descriptive statistics drive research inspiration and guide analysis, and provide a foundation for advanced statistical techniques. It delves into their historical context, highlighting their organizational and presentational significance. Furthermore, the chapter accentuates the advantages of descriptive statistics in academia, including their ability to succinctly represent complex data, aid decision-making, and enhance research communication. It highlights the potency of visualization in discerning data patterns and explores emerging trends like large dataset analysis, Bayesian statistics, and nonparametric methods. Sources of variance intrinsic to descriptive statistics, such as sampling fluctuations, measurement errors, and outliers, are discussed, stressing the importance of considering these factors in data interpretation.

academic research
data analysis
data visualization
decision-making
research methodology
data summarization

Author Information

Olubunmi alabi *.

African University of Science and Technology, Abuja, Nigeria

Tosin Bukola

University of Greenwich, London, United Kingdom

*Address all correspondence to: [email protected]

1. Introduction

The French mathematician and philosopher Condorcet established the idea of the “average” as a means to summarize data, which is when descriptive statistics got their start. Yet, the widespread use of descriptive statistics in academic study did not start until the 19th century. Francis Galton, who was concerned in the examination of human features and attributes, was one of the major forerunners of descriptive statistics. Galton created various statistical methods that are still frequently applied in academic research today, such as the correlation and regression analysis concepts. The American statistician and mathematician in the early 20th century Karl Pearson created the “normal distribution,” which is a bell-shaped curve that characterizes the distribution of many natural occurrences. Moreover, Pearson created a number of correlational measures and popularized the chi-square test, which evaluates the significance of variations between observed and predicted frequencies. With the advent of new methods like multivariate analysis and factor analysis in the middle of the 20th century, the development of electronic computers sparked a revolution in statistical analysis. Descriptive statistics is the analysis and summarization of data to gain insights into its characteristics and distribution [ 1 ].

Descriptive statistics help researchers generate study ideas and guide further analysis by allowing them to explore data patterns and trends [ 2 ]. Descriptive statistics were used more often in academic research because they helped researchers better comprehend their datasets and served as a basis for more sophisticated statistical techniques. Similarly, Descriptive statistics are used to summarize and analyze data in a variety of academic areas, including psychology, sociology, economics, education, and epidemiology [ 3 ]. Descriptive statistics continue to be a crucial research tool in academia today, giving researchers a method to compile and analyze data from many fields. It is now simpler than ever to analyze and understand data, enabling researchers to make better informed judgments based on their results. This is due to the development of new statistical techniques and computer tools. Descriptive statistics can benefit researchers in hypothesis creation and exploratory analysis by identifying trends, patterns, and correlations between variables in huge datasets [ 4 ]. Descriptive statistics are important in data-driven decision-making processes because they allow stakeholders to make educated decisions based on reliable data [ 5 ].

2. Background

The history of descriptive statistics may be traced back to the 17th century, when early pioneers like John Graunt and William Petty laid the groundwork for statistical analysis [ 6 ]. Descriptive statistics is a fundamental concept in academia that is widely used across many disciplines, including social sciences, economics, medicine, engineering, and business. Descriptive statistics provides a comprehensive background for understanding data by organizing, summarizing, and presenting information effectively [ 7 ]. In academia, descriptive statistics is used to summarize and analyze data, providing insights into the patterns, trends, and characteristics of a dataset. Similarly, in academic research, descriptive statistics are often used as a preliminary analysis technique to gain a better understanding of the dataset before applying more complex statistical methods. Descriptive statistics lay the groundwork for inferential statistics by assisting researchers in drawing inferences about a population based on observed sample data [ 8 ]. Descriptive statistics aid in the identification and analysis of outliers, which can give useful insights into unusual observations or data collecting problems [ 9 ].

Descriptive statistics enable researchers to synthesize both quantitative and qualitative data, allowing for a thorough examination of factors [ 10 ]. Descriptive statistics can provide valuable information about the central tendency, variability, and distribution of the data, allowing researchers to make informed decisions about the appropriate statistical techniques to use. Descriptive statistics are an essential component of survey research technique, allowing researchers to efficiently summarize and display survey results [ 11 ]. Descriptive statistics may be used to summarize data as well as spot outliers, or observations that dramatically depart from the trend of the data as a whole. Finding outliers can help researchers spot any issues or abnormalities in the data so they can make the necessary modifications or repairs. In academic research, descriptive statistics are frequently employed to address research issues and evaluate hypotheses. Descriptive statistics, for instance, can be used to compare the average scores of two groups to see if there is a significant difference between them. In order to create new hypotheses or validate preexisting ideas, descriptive statistics may also be used to find patterns and correlations in the data.

There are several sources of variation that can affect the descriptive statistics of a data set, some of which include: Sampling Variation, descriptive statistics are often calculated using a sample of data rather than the entire population. Therefore, the descriptive statistics can vary depending on the particular sample that is selected. This is known as sampling variation. Measurement Variation, different measurement methods can produce different results, leading to variation in descriptive statistics. For example, if a scale is used to measure the weight of objects, slight differences in how the scale is used can produce slightly different measurements.

Data entry errors are mistakes made during the data entry process which can lead to variation in descriptive statistics. Even small errors, such as transposing two digits, can significantly impact the results. Outliers, Outliers are extreme values that fall outside of the expected range of values. These values can skew the descriptive statistics, making them appear more or less extreme than they actually are. Natural Variation, Natural variation refers to the inherent variability in the data itself. For example, if a data set contains measurements of the heights of trees, there will naturally be variation in the heights of the trees. It is important to understand these sources of variation when interpreting and using descriptive statistics in academia. Properly accounting for these sources of variation can help ensure that the descriptive statistics accurately reflect the underlying data.

Some emerging patterns in descriptive statistics in academia include: Big data analysis, with the increasing availability of large data sets, researchers are using descriptive statistics to identify patterns and trends in the data. The use of big data analysis techniques, such as machine learning and data mining, is becoming more common in academic research. Visualization techniques, advances in data visualization techniques are enabling researchers to more easily identify patterns in data sets. For example, heat maps and scatter plots can be used to visualize the relationship between different variables. Bayesian statistics is an emerging area of research in academia, which involves using probability theory to make inferences about data. Bayesian statistics can provide more accurate estimates of descriptive statistics, particularly when dealing with complex data sets.

Non-parametric statistics are becoming increasingly popular in academia, particularly when dealing with data sets that do not meet the assumptions of traditional parametric statistical tests. Non-parametric tests do not require the data to be normally distributed, and can be more robust to outliers. Open science practices, such as pre-registration and data sharing, are becoming more common in academia. This is enabling researchers to more easily replicate and verify the results of descriptive statistical analyses, which can improve the quality and reliability of research findings. Overall, the emerging patterns in descriptive statistics in academia reflect the increasing availability of data, the need for more accurate and robust statistical techniques, and a growing emphasis on transparency and openness in research practices.

3. Benefits of descriptive statistics

The advantages of descriptive statistics extend beyond research and academia, with applications in commercial decision-making, public policy, and strategic planning [ 12 ]. The benefits of descriptive statistics include providing a clear and concise summary of data, aiding in decision-making processes, and facilitating effective communication of findings [ 13 ]. Descriptive statistics provide numerous benefits to academia, some of which include: Summarization of Data: descriptive statistics allow researchers to quickly and efficiently summarize large data sets, providing a snapshot of the key characteristics of the data. This can help researchers identify patterns and trends in the data, and can also help to simplify complex data sets. Better decision making: descriptive statistics can help researchers make data-driven decisions. For example, if a researcher is comparing the effectiveness of two different treatments, descriptive statistics can be used to identify which treatment is more effective based on the data. Visualization of data: descriptive statistics can be used to create visualizations of data, which can make it easier to communicate research findings to others.

Histograms, bar charts, and scatterplots are examples of data visualization techniques that may be used to graphically depict data in order to detect trends, outliers, and correlations [ 14 ]. Visualizations can also help to identify patterns and trends in the data that might not be immediately apparent from raw data. Hypothesis Testing: descriptive statistics are often used in hypothesis testing, which allows researchers to determine whether a particular hypothesis about a data set is supported by the data. This can help to validate research findings and increase confidence in the conclusions drawn from the data. Improved data quality: Descriptive statistics can help to identify errors or inconsistencies in the data, which can help researchers improve the quality of the data. This can lead to more accurate research findings and a better understanding of the underlying phenomena. Overall, the benefits of descriptive statistics in academia are many and varied. They help researchers summarize large data sets, make data-driven decisions, visualize data, validate research findings, and improve the quality of the data. By using descriptive statistics, researchers can gain valuable insights into complex data sets and make more informed decisions based on the data.

4. Practical applications of descriptive statistics

Descriptive statistics has practical applications in disciplines such as business, social sciences, healthcare, finance, and market research [ 15 ]. Descriptive statistics have a wide range of practical applications in academia, some of which include: Data Summarization: Descriptive statistics can be used to summarize large data sets, making it easier for researchers to understand the key characteristics of the data. This is particularly useful when dealing with complex data sets that contain many variables. Hypothesis Testing: Descriptive statistics can be used to test hypotheses about a data set. For example, researchers can use descriptive statistics to test whether the mean value of a particular variable is significantly different from a hypothesized value. Data visualization: descriptive statistics can be used to create visualizations of data, which can make it easier to identify patterns and trends in the data. For example, a histogram or boxplot can be used to visualize the distribution of a variable. Comparing Groups: Descriptive statistics can be used to compare different groups within a data set. For example, researchers may compare the mean values of a particular variable between different demographic groups, such as age or gender. Predictive modeling: Descriptive statistics can be used to build predictive models, which can be used to forecast future trends or outcomes. For example, a researcher might use descriptive statistics to identify the key variables that predict student performance in a particular course. The practical applications of descriptive statistics in academia are wide-ranging and varied. They can be used in many different fields, including psychology, economics, sociology, and biology, among others, to provide insights into complex data sets and help researchers make data-driven decisions ( Figure 1 ).

Types of descriptive statistics. Ref: https://www.analyticssteps.com/blogs/types-descriptive-analysis-examples-steps .

Descriptive statistics is a useful tool for researchers in a variety of sectors since it allows them express the major characteristics of a dataset, such as its frequency, central tendency, variability, and distribution.

4.1 Central tendency measurements

Central tendency metrics, such as mean, median, and mode, are essential descriptive statistics that offer information about the average or typical value in a collection [ 16 ]. One of the primary purposes of descriptive statistics is to summarize data in a succinct and useful manner. Measures of central tendency, such as the median, are resistant to outliers and offer a more representative assessment of the average value in a skewed distribution [ 17 ]. The mean, median, and mode are measures of central tendency that are used to characterize the usual or center value of a dataset. The mean of a dataset is the arithmetic average, but the median is the midway number when the data is ordered in order of magnitude. The mode is the most often occurring value in the collection. Central tendency measurements are one of the most important aspects of descriptive statistics, as they provide a summary of the “typical” value of a data set.

The three most commonly used measures of central tendency are: Mean: the mean is calculated by adding up all the values in a data set and dividing by the total number of values. The mean is sensitive to outliers, as even one extreme value can greatly affect the mean. Median: the median is the middle value in a data set when the values are ordered from smallest to largest. If the data set has an odd number of values, the median is the middle value. If the data set has an even number of values, the median is the average of the two middle values. The median is more robust to outliers than the mean. Mode: the mode is the most common value in a data set. In some cases, there may be multiple modes (i.e. bimodal or multimodal distributions). The mode is useful for identifying the most frequently occurring value in a data set. Each of these measures of central tendency provides a different perspective on the “typical” value of a data set, and which measure is most appropriate to use depends on the nature of the data and the research question being addressed. For example, if the data set contains extreme outliers, the median may be a better measure of central tendency than the mean. Conversely, if the data set is symmetrical and normally distributed, the mean may provide the best measure of central tendency.

4.2 Variability indices

It is another key part of descriptive statistics is determining data variability. The spread or dispersion of data points about the central tendency readings is quantified by variability indices such as range, variance, and standard deviation [ 18 ]. Variability measures, such as range, variance, and standard deviation, reveal information about the spread or dispersion of the data. Variability indices, such as the coefficient of variation, allow you to compare variability across various datasets with different scales or units of measurement [ 19 ]. The range is the distance between the dataset’s greatest and lowest values, and the variance and standard deviation are measures of how much the data values depart from the mean. Variability indices are measures used in descriptive statistics to provide information about how much the data varies or how spread out it is. Variability indices, such as the interquartile range, give insights into data distribution while being less impacted by extreme values than the standard deviation [ 20 ]. Some commonly used variability indices include:

Range: The range is the difference between the largest and smallest values in a data set. It provides a simple measure of the spread of the data, but is sensitive to outliers. Interquartile Range (IQR): The IQR is the range of the middle 50% of the data. It is calculated by subtracting the 25th percentile (lower quartile) from the 75th percentile (upper quartile). The IQR is more robust to outliers than the range. Variance: The variance is a measure of how spread out the data is around the mean. It is calculated by taking the average of the squared differences between each data point and the mean. The variance is sensitive to outliers. Standard Deviation: The standard deviation is the square root of the variance. It provides a measure of how much the data varies from the mean, and is more commonly used than the variance because it has the same units as the original data.

Coefficient of Variation (CV): The CV is a measure of relative variability, expressed as a percentage. It is calculated by dividing the standard deviation by the mean and multiplying by 100. The CV is useful for comparing variability across different data sets that have different units or scales. These variability indices provide important information about the spread and variability of the data, which can help researchers better understand the characteristics of the data and draw meaningful conclusions from it.

4.3 Data visualization

Data may be visually represented using graphical approaches in addition to numerical metrics. Graphs and charts, such as histograms, box plots, and scatterplots, allow researchers investigate data patterns and correlations. Box plots and violin plots are efficient data visualization approaches for showing data distribution and spotting potential outliers [ 21 ]. They may also be used to detect outliers, or data points that deviate dramatically from the rest of the data. Data visualization is an important aspect of descriptive statistics, as it allows researchers to communicate complex data in a visual and easily understandable format. Some common types of data visualization used in descriptive statistics include: Histograms: Histograms are used to display the distribution of a continuous variable. The data is divided into intervals (or “bins”), and the number of observations falling into each bin is displayed on the vertical axis. Histograms provide a visual representation of the shape of the distribution, and can help to identify outliers or skewness. Box plots: Box plots provide a graphical representation of the distribution of a continuous variable. The application of graphical approaches, such as scatterplots and heat maps, improves comprehension of correlations and patterns in large datasets [ 22 ].

The box represents the middle 50% of the data, with the median displayed as a horizontal line inside the box. The whiskers extend to the minimum and maximum values in the data set, and any outliers are displayed as points outside the whiskers. Box plots are useful for comparing distributions across different groups or for identifying outliers. Scatter plots: Scatter plots are used to display the relationship between two continuous variables. Each data point is represented as a point on the graph, with one variable displayed on the horizontal axis and the other variable displayed on the vertical axis. Scatter plots can help to identify patterns or relationships in the data, such as a positive or negative correlation. Bar charts: Bar charts are used to display the distribution of a categorical variable.

The categories are displayed on the horizontal axis, and the frequency or percentage of observations falling into each category is displayed on the vertical axis. Bar charts can help to compare the frequency of different categories or to display the results of a survey or questionnaire. Heat maps: Heat maps are used to display the relationship between two categorical variables. The categories are displayed on both the horizontal and vertical axes, and the frequency or percentage of observations falling into each combination of categories is displayed using a color scale. Heat maps can help to identify patterns or relationships in the data, such as a higher frequency of observations in certain combinations of categories. These types of data visualizations can help researchers to communicate complex data in a clear and understandable format, and can also provide insights into the characteristics of the data that may not be immediately apparent from the raw data.

4.4 Data cleaning and preprocessing

Data cleaning and preprocessing procedures, such as imputation methods for missing data, aid in the preservation of data integrity and the reduction of bias in descriptive analysis [ 23 ]. Before beginning any statistical analysis, be certain that the data is clean and well arranged. The process of discovering and fixing flaws or inconsistencies in data, such as missing numbers or outliers, is known as data cleaning. Data preparation is the process of putting data into an appropriate format for analysis, such as scaling or normalizing the data. Data cleaning and preprocessing are essential steps in descriptive analysis, as they help to ensure that the data is accurate, complete, and ready for analysis. Some common data cleaning and preprocessing steps include: Handling missing data: Missing data can be a common problem in datasets and can impact the accuracy of the analysis. Depending on the amount of missing data, researchers may choose to remove incomplete cases or impute missing values using techniques such as mean imputation, regression imputation, or multiple imputation. Handling outliers: Outliers are extreme values that are different from the majority of the data points and can distort the analysis. Outlier identification and removal procedures, for example, assist increase the accuracy and reliability of descriptive statistics [ 24 ].

To assure the correctness and dependability of descriptive statistics, data cleaning and preprocessing require finding and dealing with missing values, outliers, and data inconsistencies [ 25 ]. Researchers may choose to remove or transform outliers to better reflect the characteristics of the data. Data transformation: Data transformation is used to normalize the data or to make it easier to analyze. Common transformations include logarithmic, square root, or Box-Cox transformations. Handling categorical data: Categorical data, such as nominal or ordinal data, may need to be recoded into numerical data before analysis. Researchers may also need to handle missing or inconsistent categories within the data. Standardizing data: Standardizing data involves scaling the data to have a mean of zero and a standard deviation of one. This can be useful for comparing variables with different units or scales. Data integration: Data integration involves merging or linking multiple datasets to create a single, comprehensive dataset for analysis. This may involve matching or merging datasets based on common variables or identifiers. By performing these data cleaning and preprocessing steps, researchers can ensure that the data is accurate and ready for analysis, which can lead to more reliable and meaningful insights from the data.

5. Descriptive statistics in academic methodology

Descriptive statistics are important in academic technique because they enable researchers to synthesize and describe data collected for research objectives [ 26 ]. Descriptive statistics is often used in combination with other statistical techniques, such as inferential statistics, to draw conclusions and make predictions from the data. In academic research, descriptive statistics is used in a variety of ways, such as describing sample characteristics. Descriptive statistics is used to describe the characteristics of a sample, such as the mean, median, and standard deviation of a variable. This information can be used to identify patterns, trends, or differences within the sample. Identifying data outliers: Descriptive statistics can help researchers identify potential outliers or anomalies in the data, which can affect the validity of the results. For example, identifying extreme values in a dataset can help researchers to investigate whether these values are due to measurement error or a true characteristic of the population.

Communicating research findings: Descriptive statistics is used to summarize and communicate research findings in a clear and concise manner. Graphs, charts, and tables can be used to display descriptive statistics in a way that is easy to understand and interpret. Testing assumptions: Descriptive statistics can be used to test assumptions about the data, such as normality or homogeneity of variance, which are important for selecting appropriate statistical tests and interpreting the results. Overall, descriptive statistics is a critical methodology in academic research that helps researchers to describe and understand the characteristics of their data. By using descriptive statistics, researchers can draw meaningful insights and conclusions from their data, and communicate these findings to others in a clear and concise manner.

6. Pitfalls of descriptive statistics

The possibility for misunderstanding, reliance on summary measures alone, and susceptibility to high values or outliers are all disadvantages of descriptive statistics [ 27 ]. While descriptive statistics is an essential tool in academic statistics, there are several potential pitfalls that researchers should be aware of: Limited scope: Descriptive statistics can provide a useful summary of the characteristics of a dataset, but it is limited in its ability to provide insights into the underlying causes or mechanisms that drive the data. Descriptive statistics alone cannot establish causal relationships or test hypotheses. Misleading interpretations: Descriptive statistics can be misleading if not interpreted correctly. For example, a small sample size may not accurately represent the population, and summary statistics such as the mean may not be meaningful if the data is not normally distributed.

Incomplete analysis: Descriptive statistics can only provide a limited view of the data, and researchers may need to use additional statistical techniques to fully analyze the data. For example, hypothesis testing and regression analysis may be needed to establish relationships between variables and make predictions. Biased data: Descriptive statistics can be biased if the data is not representative of the population of interest. Sampling bias, measurement bias, or non-response bias can all impact the validity of descriptive statistics. Over-reliance on summary statistics: Descriptive statistics can be over-reliant on summary statistics such as the mean or median, which may not provide a complete picture of the data. Visualizations and other descriptive statistics, such as measures of variability, can provide additional insight into the data. To avoid these pitfalls, researchers should carefully consider the scope and limitations of descriptive statistics and use additional statistical techniques as needed. They should also ensure that their data is representative of the population of interest and interpret their descriptive statistics in a thoughtful and nuanced manner.

7. Conclusion

Researchers can test the normalcy assumptions of their data by using relevant descriptive statistics techniques such as measures of skewness and kurtosis [ 28 ]. Descriptive statistics has become a fundamental methodology in academic research that is used to summarize and describe the characteristics of a dataset, such as the central tendency, variability, and distribution of the data. It is used in a wide range of disciplines, including social sciences, natural sciences, engineering, and business. Descriptive statistics can be used to describe sample characteristics, identify data outliers, communicate research findings, and test assumptions. The kind of data, research topic, and particular aims of the study all influence the right choice and implementation of descriptive statistical approaches [ 29 ].

However, there are several potential pitfalls of descriptive statistics, including limited scope, misleading interpretations, incomplete analysis, biased data, and over-reliance on summary statistics. The use of descriptive statistics in data presentation can improve the interpretability of study findings, making complicated material more accessible to a larger audience [ 30 ]. To use descriptive statistics effectively in academic research, researchers should carefully consider the limitations and scope of the methodology, use additional statistical techniques as needed, ensure that their data is representative of the population of interest, and interpret their descriptive statistics in a thoughtful and nuanced manner.

Conflict of interest

The authors declare no conflict of interest.

1. Agresti A, Franklin C. Statistics: The Art and Science of Learning from Data. Upper Saddle River, NJ: Pearson; 2009
2. Norman GR, Streiner DL. Biostatistics: The Bare Essentials. 4th ed. Shelton (CT): PMPH-USA; 2014
3. Cohen J, Cohen P, West SG, Aiken LS. Applied Multiple Regression/Correlation Analysis for the Behavioral Sciences. New York: Routledge; 2013
4. Osborne J. Best practices in exploratory factor analysis: Four recommendations for getting the most from your analysis. Practical Assessment. 2019; 10 (7):1-9
5. Field A, Hole G. How To Design and Report Experiments Sage. The Tyranny of Evaluation Human Factors in Computing Systems CHI Fringe; 2003
6. Anders H. A History of Mathematical Statistics from 1750 to 1930. New York: Wiley; 1998. p. xvii+795. ISBN 0-471-17912-4
7. Rebecca M. Warner’s Applied Statistics: From Bivariate Through Multivariate Techniques. Second Edition. Thousand Oaks, California: SAGE Publications; 2012
8. Sullivan LM, Artino AR Jr. Analyzing and interpreting continuous data using ordinal regression. Journal of Graduate Medical Education. 2013; 5 (4):542-543
9. Hoaglin DC, Mosteller F, Tukey JW. Understanding Robust and Exploratory Data Analysis. John Wiley & Sons; 2011
10. Maxwell SE, Delaney HD, Kelley K. Designing Experiments and Analyzing Data: A Model Comparison Perspective. Routledge; 2017
11. De Leeuw ED, Hox JJ. International Handbook of Survey Methodology. Routledge; 2008
12. Chatfield C. The Analysis of Time Series: An Introduction. CRC Press; 2016
13. Tabachnick BG, Fidell LS. Using Multivariate Statistics. Pearson; 2013
14. Wickham H. ggplot2: Elegant Graphics for Data Analysis. Springer; 2016
15. Field A, Miles J, Field Z. Discovering Statistics Using R. Sage; 2012
16. Howell DC. Statistical Methods for Psychology. Cengage Learning; 2013
17. Wilcox RR. Modern Statistics for the Social and Behavioral Sciences: A Practical Introduction. CRC Press; 2017
18. Hair JF, Black WC, Babin BJ, Anderson RE. Multivariate Data Analysis. Pearson; 2019
19. Beasley TM, Schumacker RE. Multiple regression approach to analyzing contingency tables: Post hoc and planned comparison procedures. Journal of Experimental Education. 2013; 81 (3):310-312
20. Dodge Y. The Concise Encyclopedia of Statistics. Springer Science & Business Media; 2008
21. Krzywinski M, Altman N. Points of significance: Visualizing samples with box plots. Nature Methods. 2014; 11 (2):119-120
22. Cleveland WS. Visualizing data. Hobart Press; 1993
23. Little RJ, Rubin DB. Statistical Analysis with Missing Data. John Wiley & Sons; 2019
24. Filzmoser P, Maronna R, Werner M. Outlier identification in high dimensions. Computational Statistics & Data Analysis. 2008; 52 (3):1694-1711
25. Shmueli G, Bruce PC, Yahav I, Patel NR, Lichtendahl KC Jr, Desarbo WS. Data Mining for Business Analytics: Concepts, Techniques, and Applications in R. John Wiley & Sons; 2017
26. Aguinis H, Gottfredson RK. Statistical power analysis in HRM research. Organizational Research Methods. 2013; 16 (2):289-324
27. Stevens JP. Applied Multivariate Statistics for the Social Sciences. Routledge; 2012
28. Byrne BM. Structural Equation Modeling with AMOS: Basic Concepts, Applications, and Programming. Routledge; 2016
29. Everitt BS, Hothorn T. An Introduction to Applied Multivariate Analysis with R. Springer; 2011
30. Kosslyn SM. Graph Design for the Eye and Mind. Oxford University Press; 2006

© The Author(s). Licensee IntechOpen. This chapter is distributed under the terms of the Creative Commons Attribution 3.0 License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Continue reading from the same book

Edited by B. Santhosh Kumar

Published: 22 May 2024

By Hazhar Talaat Abubaker Blbas

54 downloads

By Joanne N. Halls, Barbara J. Lutz, Sara B. Jones an...

121 downloads

By Ajith Wickramasinghe and Anusha Jayasiri

38 downloads

IntechOpen Author/Editor? To get your discount, log in .

Discounts available on purchase of multiple copies. View rates

Local taxes (VAT) are calculated in later steps, if applicable.

Support: [email protected]

Root out friction in every digital experience, super-charge conversion rates, and optimize digital self-service

Uncover insights from any interaction, deliver AI-powered agent coaching, and reduce cost to serve

Increase revenue and loyalty with real-time insights and recommendations delivered to teams on the ground

Know how your people feel and empower managers to improve employee engagement, productivity, and retention

Take action in the moments that matter most along the employee journey and drive bottom line growth

Whatever they’re are saying, wherever they’re saying it, know exactly what’s going on with your people

Get faster, richer insights with qual and quant tools that make powerful market research available to everyone

Run concept tests, pricing studies, prototyping + more with fast, powerful studies designed by UX research experts

Track your brand performance 24/7 and act quickly to respond to opportunities and challenges in your market

Explore the platform powering Experience Management

Free Account
Product Demos
For Digital
For Customer Care
For Human Resources
For Researchers
Financial Services
All Industries

Popular Use Cases

Customer Experience
Employee Experience
Net Promoter Score
Voice of Customer
Customer Success Hub
Product Documentation
Training & Certification
XM Institute
Popular Resources
Customer Stories
Artificial Intelligence

Market Research

Partnerships
Marketplace

The annual gathering of the experience leaders at the world’s iconic brands building breakthrough business results, live in Salt Lake City.

English/AU & NZ
Español/Europa
Español/América Latina
Português Brasileiro
REQUEST DEMO
Experience Management
Descriptive Statistics

Try Qualtrics for free

Descriptive statistics in research: a critical component of data analysis.

15 min read With any data, the object is to describe the population at large, but what does that mean and what processes, methods and measures are used to uncover insights from that data? In this short guide, we explore descriptive statistics and how it’s applied to research.

What do we mean by descriptive statistics?

With any kind of data, the main objective is to describe a population at large — and using descriptive statistics, researchers can quantify and describe the basic characteristics of a given data set.

For example, researchers can condense large data sets, which may contain thousands of individual data points or observations, into a series of statistics that provide useful information on the population of interest. We call this process “describing data”.

In the process of producing summaries of the sample, we use measures like mean, median, variance, graphs, charts, frequencies, histograms, box and whisker plots, and percentages. For datasets with just one variable, we use univariate descriptive statistics. For datasets with multiple variables, we use bivariate correlation and multivariate descriptive statistics.

Want to find out the definitions?

Univariate descriptive statistics: this is when you want to describe data with only one characteristic or attribute

Bivariate correlation: this is when you simultaneously analyze (compare) two variables to see if there is a relationship between them

Multivariate descriptive statistics: this is a subdivision of statistics encompassing the simultaneous observation and analysis of more than one outcome variable

Then, after describing and summarizing the data, as well as using simple graphical analyses, we can start to draw meaningful insights from it to help guide specific strategies. It’s also important to note that descriptive statistics can employ and use both quantitative and qualitative research .

Describing data is undoubtedly the most critical first step in research as it enables the subsequent organization, simplification and summarization of information — and every survey question and population has summary statistics. Let’s take a look at a few examples.

Examples of descriptive statistics

Consider for a moment a number used to summarize how well a striker is performing in football — goals scored per game. This number is simply the number of shots taken against how many of those shots hit the back of the net (reported to three significant digits). If a striker is scoring 0.333, that’s one goal for every three shots. If they’re scoring one in four, that’s 0.250.

A classic example is a student’s grade point average (GPA). This single number describes the general performance of a student across a range of course experiences and classes. It doesn’t tell us anything about the difficulty of the courses the student is taking, or what those courses are, but it does provide a summary that enables a degree of comparison with people or other units of data.

Ultimately, descriptive statistics make it incredibly easy for people to understand complex (or data intensive) quantitative or qualitative insights across large data sets.

Take your research to the next level with XM for Strategy & Research

Types of descriptive statistics

To quantitatively summarize the characteristics of raw, ungrouped data, we use the following types of descriptive statistics:

Measures of Central Tendency ,
Measures of Dispersion and
Measures of Frequency Distribution.

Following the application of any of these approaches, the raw data then becomes ‘grouped’ data that’s logically organized and easy to understand. To visually represent the data, we then use graphs, charts, tables etc.

Let’s look at the different types of measurement and the statistical methods that belong to each:

Measures of Central Tendency are used to describe data by determining a single representative of central value. For example, the mean, median or mode.

Measures of Dispersion are used to determine how spread out a data distribution is with respect to the central value, e.g. the mean, median or mode. For example, while central tendency gives the person the average or central value, it doesn’t describe how the data is distributed within the set.

Measures of Frequency Distribution are used to describe the occurrence of data within the data set (count).

The methods of each measure are summarized in the table below:

Measures of Central Tendency	Measures of Dispersion	Measures of Frequency Distribution
Mean	Range	Count
Median	Standard deviation
Mode	Quartile deviation
	Variance
	Absolute deviation

Mean: The most popular and well-known measure of central tendency. The mean is equal to the sum of all the values in the data set divided by the number of values in the data set.

Median: The median is the middle score for a set of data that has been arranged in order of magnitude. If you have an even number of data, e.g. 10 data points, take the two middle scores and average the result.

Mode: The mode is the most frequently occurring observation in the data set.

Range: The difference between the highest and lowest value.

Standard deviation: Standard deviation measures the dispersion of a data set relative to its mean and is calculated as the square root of the variance.

Quartile deviation : Quartile deviation measures the deviation in the middle of the data.

Variance: Variance measures the variability from the average of mean.

Absolute deviation: The absolute deviation of a dataset is the average distance between each data point and the mean.

Count: How often each value occurs.

Scope of descriptive statistics in research

Descriptive statistics (or analysis) is considered more vast than other quantitative and qualitative methods as it provides a much broader picture of an event, phenomenon or population.

But that’s not all: it can use any number of variables, and as it collects data and describes it as it is, it’s also far more representative of the world as it exists.

However, it’s also important to consider that descriptive analyses lay the foundation for further methods of study. By summarizing and condensing the data into easily understandable segments, researchers can further analyze the data to uncover new variables or hypotheses.

Mostly, this practice is all about the ease of data visualization. With data presented in a meaningful way, researchers have a simplified interpretation of the data set in question. That said, while descriptive statistics helps to summarize information, it only provides a general view of the variables in question.

It is, therefore, up to the researchers to probe further and use other methods of analysis to discover deeper insights.

Things you can do with descriptive statistics

Define subject characteristics

If a marketing team wanted to build out accurate buyer personas for specific products and industry verticals, they could use descriptive analyses on customer datasets (procured via a survey) to identify consistent traits and behaviors.

They could then ‘describe’ the data to build a clear picture and understanding of who their buyers are, including things like preferences, business challenges, income and so on.

Measure data trends

Let’s say you wanted to assess propensity to buy over several months or years for a specific target market and product. With descriptive statistics, you could quickly summarize the data and extract the precise data points you need to understand the trends in product purchase behavior.

Compare events, populations or phenomena

How do different demographics respond to certain variables? For example, you might want to run a customer study to see how buyers in different job functions respond to new product features or price changes. Are all groups as enthusiastic about the new features and likely to buy? Or do they have reservations? This kind of data will help inform your overall product strategy and potentially how you tier solutions.

Validate existing conditions

When you have a belief or hypothesis but need to prove it, you can use descriptive techniques to ascertain underlying patterns or assumptions.

Form new hypotheses

With the data presented and surmised in a way that everyone can understand (and infer connections from), you can delve deeper into specific data points to uncover deeper and more meaningful insights — or run more comprehensive research.

Guiding your survey design to improve the data collected

To use your surveys as an effective tool for customer engagement and understanding, every survey goal and item should answer one simple, yet highly important question:

What am I really asking?

It might seem trivial, but by having this question frame survey research, it becomes significantly easier for researchers to develop the right questions that uncover useful, meaningful and actionable insights.

Planning becomes easier, questions clearer and perspective far wider and yet nuanced.

Hypothesize – what’s the problem that you’re trying to solve? Far too often, organizations collect data without understanding what they’re asking, and why they’re asking it.

Finally, focus on the end result. What kind of data do you need to answer your question? Also, are you asking a quantitative or qualitative question? Here are a few things to consider:

Clear questions are clear for everyone. It takes time to make a concept clear
Ask about measurable, evident and noticeable activities or behaviors.
Make rating scales easy. Avoid long lists, confusing scales or “don’t know” or “not applicable” options.
Ensure your survey makes sense and flows well. Reduce the cognitive load on respondents by making it easy for them to complete the survey.
Read your questions aloud to see how they sound.
Pretest by asking a few uninvolved individuals to answer.

Furthermore…

As well as understanding what you’re really asking, there are several other considerations for your data:

Keep it random

How you select your sample is what makes your research replicable and meaningful. Having a truly random sample helps prevent bias, increasingly the quality of evidence you find.

Plan for and avoid sample error

Before starting your research project, have a clear plan for avoiding sample error. Use larger sample sizes, and apply random sampling to minimize the potential for bias.

Don’t over sample

Remember, you can sample 500 respondents selected randomly from a population and they will closely reflect the actual population 95% of the time.

Think about the mode

Match your survey methods to the sample you select. For example, how do your current customers prefer communicating? Do they have any shared characteristics or preferences? A mixed-method approach is critical if you want to drive action across different customer segments.

Use a survey tool that supports you with the whole process

Surveys created using a survey research software can support researchers in a number of ways:

Employee satisfaction survey template
Employee exit survey template
Customer satisfaction (CSAT) survey template
Ad testing survey template
Brand awareness survey template
Product pricing survey template
Product research survey template
Employee engagement survey template
Customer service survey template
NPS survey template
Product package testing survey template
Product features prioritization survey template

These considerations have been included in Qualtrics’ survey software , which summarizes and creates visualizations of data, making it easy to access insights, measure trends, and examine results without complexity or jumping between systems.

Uncover your next breakthrough idea with Stats iQ™

What makes Qualtrics so different from other survey providers is that it is built in consultation with trained research professionals and includes high-tech statistical software like Qualtrics Stats iQ .

With just a click, the software can run specific analyses or automate statistical testing and data visualization. Testing parameters are automatically chosen based on how your data is structured (e.g. categorical data will run a statistical test like Chi-squared), and the results are translated into plain language that anyone can understand and put into action.

Get more meaningful insights from your data

Stats iQ includes a variety of statistical analyses, including: describe, relate, regression, cluster, factor, TURF, and pivot tables — all in one place!

Confidently analyze complex data

Built-in artificial intelligence and advanced algorithms automatically choose and apply the right statistical analyses and return the insights in plain english so everyone can take action.

Integrate existing statistical workflows

For more experienced stats users, built-in R code templates allow you to run even more sophisticated analyses by adding R code snippets directly in your survey analysis.

Advanced statistical analysis methods available in Stats iQ

Regression analysis – Measures the degree of influence of independent variables on a dependent variable (the relationship between two or multiple variables).

Analysis of Variance (ANOVA) test – Commonly used with a regression study to find out what effect independent variables have on the dependent variable. It can compare multiple groups simultaneously to see if there is a relationship between them.

Conjoint analysis – Asks people to make trade-offs when making decisions, then analyses the results to give the most popular outcome. Helps you understand why people make the complex choices they do.

T-Test – Helps you compare whether two data groups have different mean values and allows the user to interpret whether differences are meaningful or merely coincidental.

Crosstab analysis – Used in quantitative market research to analyze categorical data – that is, variables that are different and mutually exclusive, and allows you to compare the relationship between two variables in contingency tables.

Go from insights to action

Now that you have a better understanding of descriptive statistics in research and how you can leverage statistical analysis methods correctly, now’s the time to utilize a tool that can take your research and subsequent analysis to the next level.

Try out a Qualtrics survey software demo so you can see how it can take you through descriptive research and further research projects from start to finish.

Related resources

Market intelligence 10 min read, marketing insights 11 min read, ethnographic research 11 min read, qualitative vs quantitative research 13 min read, qualitative research questions 11 min read, qualitative research design 12 min read, primary vs secondary research 14 min read, request demo.

Ready to learn more about Qualtrics?

Introduction: Statistics as a Research Tool

First Online: 24 February 2021

Cite this chapter

David Weisburd 5 , 6 ,
Chester Britt 7 ,
David B. Wilson 5 &
Alese Wooditch 8

2539 Accesses

Statistics seem intimidating because they are associated with complex mathematical formulas and computations. Although some knowledge of math is required, an understanding of the concepts is much more important than an in-depth understanding of the computations. The researcher’s aim in using statistics is to communicate findings in a clear and simple form. As a result, the researcher should always choose the simplest statistic appropriate for answering the research question. Statistics offer commonsense solutions to research problems. The following principles apply to all types of statistics: (1) in developing statistics, we seek to reduce the level of error as much as possible; (2) statistics based on more information are generally preferred over those based on less information; (3) outliers present a significant problem in choosing and interpreting statistics; and (4) the researcher must strive to systematize the procedures used in data collection and analysis. There are two principal uses of statistics discussed in this book. In descriptive statistics, the researcher summarizes large amounts of information in an efficient manner. Two types of descriptive statistics that go hand in hand are measures of central tendency, which describe the characteristics of the average case, and measures of dispersion, which tell us just how typical this average case is. We use inferential statistics to make statements about a population on the basis of a sample drawn from that population.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save.

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime
Available as PDF
Read on any device
Instant download
Own it forever
Available as EPUB and PDF
Compact, lightweight edition
Dispatched in 3 to 5 business days
Free shipping worldwide - see info
Durable hardcover edition

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Reiss, A. J., Jr. (1971). Systematic Observation of Natural Social Phenomena. Sociological Methodology, 3, 3–33. doi:10.2307/270816

Google Scholar

National Institute of Justice (2016). National Institute of Justice Annual Report: 2016. Washington, DC: U.S. Department of Justice, Office of Justice Programs, National Institute of Justice.

Download references

Author information

Authors and affiliations.

Department of Criminology, Law and Society, George Mason University, Fairfax, VA, USA

David Weisburd & David B. Wilson

Institute of Criminology, Faculty of Law, Hebrew University of Jerusalem, Jerusalem, Israel

David Weisburd

Iowa State University, Ames, IA, USA

Chester Britt

Department of Criminal Justice, Temple University, Philadelphia, PA, USA

Alese Wooditch

You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Weisburd, D., Britt, C., Wilson, D.B., Wooditch, A. (2020). Introduction: Statistics as a Research Tool. In: Basic Statistics in Criminology and Criminal Justice. Springer, Cham. https://doi.org/10.1007/978-3-030-47967-1_1

Download citation

DOI : https://doi.org/10.1007/978-3-030-47967-1_1

Published : 24 February 2021

Publisher Name : Springer, Cham

Print ISBN : 978-3-030-47966-4

Online ISBN : 978-3-030-47967-1

eBook Packages : Law and Criminology Law and Criminology (R0)

Share this chapter

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Publish with us

Policies and ethics

Find a journal
Track your research

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Publications
Account settings
My Bibliography
Collections
Citation manager

Save citation to file

Email citation, add to collections.

Create a new collection
Add to an existing collection

Add to My Bibliography

Your saved search, create a file for external citation management software, your rss feed.

Search in PubMed
Search in NLM Catalog
Add to Search

Descriptive Statistics: Reporting the Answers to the 5 Basic Questions of Who, What, Why, When, Where, and a Sixth, So What?

Affiliation.

1 From the Department of Surgery and Perioperative Care, Dell Medical School at the University of Texas at Austin, Austin, Texas.
PMID: 28891910
DOI: 10.1213/ANE.0000000000002471

Descriptive statistics are specific methods basically used to calculate, describe, and summarize collected research data in a logical, meaningful, and efficient way. Descriptive statistics are reported numerically in the manuscript text and/or in its tables, or graphically in its figures. This basic statistical tutorial discusses a series of fundamental concepts about descriptive statistics and their reporting. The mean, median, and mode are 3 measures of the center or central tendency of a set of data. In addition to a measure of its central tendency (mean, median, or mode), another important characteristic of a research data set is its variability or dispersion (ie, spread). In simplest terms, variability is how much the individual recorded scores or observed values differ from one another. The range, standard deviation, and interquartile range are 3 measures of variability or dispersion. The standard deviation is typically reported for a mean, and the interquartile range for a median. Testing for statistical significance, along with calculating the observed treatment effect (or the strength of the association between an exposure and an outcome), and generating a corresponding confidence interval are 3 tools commonly used by researchers (and their collaborating biostatistician or epidemiologist) to validly make inferences and more generalized conclusions from their collected data and descriptive statistics. A number of journals, including Anesthesia & Analgesia, strongly encourage or require the reporting of pertinent confidence intervals. A confidence interval can be calculated for virtually any variable or outcome measure in an experimental, quasi-experimental, or observational research study design. Generally speaking, in a clinical trial, the confidence interval is the range of values within which the true treatment effect in the population likely resides. In an observational study, the confidence interval is the range of values within which the true strength of the association between the exposure and the outcome (eg, the risk ratio or odds ratio) in the population likely resides. There are many possible ways to graphically display or illustrate different types of data. While there is often latitude as to the choice of format, ultimately, the simplest and most comprehensible format is preferred. Common examples include a histogram, bar chart, line chart or line graph, pie chart, scatterplot, and box-and-whisker plot. Valid and reliable descriptive statistics can answer basic yet important questions about a research data set, namely: "Who, What, Why, When, Where, How, How Much?"

PubMed Disclaimer

Related information

Cited in Books

LinkOut - more resources

Full text sources.

Ingenta plc
Ovid Technologies, Inc.
Wolters Kluwer

Other Literature Sources

scite Smart Citations

Citation Manager

NCBI Literature Resources

MeSH PMC Bookshelf Disclaimer

The PubMed wordmark and PubMed logo are registered trademarks of the U.S. Department of Health and Human Services (HHS). Unauthorized use of these marks is strictly prohibited.

Calculators
Descriptive Statistics
Merchandise
Which Statistics Test?

Tools for Descriptive Statistics

Scatter Plot Chart Maker, with Line of Best Fit (Offsite)
Mean, Median and Mode Calculator
Variance Calculator
Standard Deviation Calculator
Coefficient of Variation Calculator
Percentile Calculator
Interquartile Range Calculator
Pooled Variance Calculator
Skewness and Kurtosis Calculator
Sum of Squares Calculator
Easy Histogram Maker
Frequency Distribution Calculator
Histogram: What are they? How do you make one?
Easy Frequency Polygon Maker
Easy Bar Chart Creator

Effective Use of Statistics in Research – Methods and Tools for Data Analysis

Remember that impending feeling you get when you are asked to analyze your data! Now that you have all the required raw data, you need to statistically prove your hypothesis. Representing your numerical data as part of statistics in research will also help in breaking the stereotype of being a biology student who can’t do math.

Statistical methods are essential for scientific research. In fact, statistical methods dominate the scientific research as they include planning, designing, collecting data, analyzing, drawing meaningful interpretation and reporting of research findings. Furthermore, the results acquired from research project are meaningless raw data unless analyzed with statistical tools. Therefore, determining statistics in research is of utmost necessity to justify research findings. In this article, we will discuss how using statistical methods for biology could help draw meaningful conclusion to analyze biological studies.

Table of Contents

Role of Statistics in Biological Research

Statistics is a branch of science that deals with collection, organization and analysis of data from the sample to the whole population. Moreover, it aids in designing a study more meticulously and also give a logical reasoning in concluding the hypothesis. Furthermore, biology study focuses on study of living organisms and their complex living pathways, which are very dynamic and cannot be explained with logical reasoning. However, statistics is more complex a field of study that defines and explains study patterns based on the sample sizes used. To be precise, statistics provides a trend in the conducted study.

Biological researchers often disregard the use of statistics in their research planning, and mainly use statistical tools at the end of their experiment. Therefore, giving rise to a complicated set of results which are not easily analyzed from statistical tools in research. Statistics in research can help a researcher approach the study in a stepwise manner, wherein the statistical analysis in research follows –

1. Establishing a Sample Size

Usually, a biological experiment starts with choosing samples and selecting the right number of repetitive experiments. Statistics in research deals with basics in statistics that provides statistical randomness and law of using large samples. Statistics teaches how choosing a sample size from a random large pool of sample helps extrapolate statistical findings and reduce experimental bias and errors.

2. Testing of Hypothesis

When conducting a statistical study with large sample pool, biological researchers must make sure that a conclusion is statistically significant. To achieve this, a researcher must create a hypothesis before examining the distribution of data. Furthermore, statistics in research helps interpret the data clustered near the mean of distributed data or spread across the distribution. These trends help analyze the sample and signify the hypothesis.

3. Data Interpretation Through Analysis

When dealing with large data, statistics in research assist in data analysis. This helps researchers to draw an effective conclusion from their experiment and observations. Concluding the study manually or from visual observation may give erroneous results; therefore, thorough statistical analysis will take into consideration all the other statistical measures and variance in the sample to provide a detailed interpretation of the data. Therefore, researchers produce a detailed and important data to support the conclusion.

Types of Statistical Research Methods That Aid in Data Analysis

Statistical analysis is the process of analyzing samples of data into patterns or trends that help researchers anticipate situations and make appropriate research conclusions. Based on the type of data, statistical analyses are of the following type:

1. Descriptive Analysis

The descriptive statistical analysis allows organizing and summarizing the large data into graphs and tables . Descriptive analysis involves various processes such as tabulation, measure of central tendency, measure of dispersion or variance, skewness measurements etc.

2. Inferential Analysis

The inferential statistical analysis allows to extrapolate the data acquired from a small sample size to the complete population. This analysis helps draw conclusions and make decisions about the whole population on the basis of sample data. It is a highly recommended statistical method for research projects that work with smaller sample size and meaning to extrapolate conclusion for large population.

3. Predictive Analysis

Predictive analysis is used to make a prediction of future events. This analysis is approached by marketing companies, insurance organizations, online service providers, data-driven marketing, and financial corporations.

4. Prescriptive Analysis

Prescriptive analysis examines data to find out what can be done next. It is widely used in business analysis for finding out the best possible outcome for a situation. It is nearly related to descriptive and predictive analysis. However, prescriptive analysis deals with giving appropriate suggestions among the available preferences.

5. Exploratory Data Analysis

EDA is generally the first step of the data analysis process that is conducted before performing any other statistical analysis technique. It completely focuses on analyzing patterns in the data to recognize potential relationships. EDA is used to discover unknown associations within data, inspect missing data from collected data and obtain maximum insights.

6. Causal Analysis

Causal analysis assists in understanding and determining the reasons behind “why” things happen in a certain way, as they appear. This analysis helps identify root cause of failures or simply find the basic reason why something could happen. For example, causal analysis is used to understand what will happen to the provided variable if another variable changes.

7. Mechanistic Analysis

This is a least common type of statistical analysis. The mechanistic analysis is used in the process of big data analytics and biological science. It uses the concept of understanding individual changes in variables that cause changes in other variables correspondingly while excluding external influences.

Important Statistical Tools In Research

Researchers in the biological field find statistical analysis in research as the scariest aspect of completing research. However, statistical tools in research can help researchers understand what to do with data and how to interpret the results, making this process as easy as possible.

1. Statistical Package for Social Science (SPSS)

It is a widely used software package for human behavior research. SPSS can compile descriptive statistics, as well as graphical depictions of result. Moreover, it includes the option to create scripts that automate analysis or carry out more advanced statistical processing.

2. R Foundation for Statistical Computing

This software package is used among human behavior research and other fields. R is a powerful tool and has a steep learning curve. However, it requires a certain level of coding. Furthermore, it comes with an active community that is engaged in building and enhancing the software and the associated plugins.

3. MATLAB (The Mathworks)

It is an analytical platform and a programming language. Researchers and engineers use this software and create their own code and help answer their research question. While MatLab can be a difficult tool to use for novices, it offers flexibility in terms of what the researcher needs.

4. Microsoft Excel

Not the best solution for statistical analysis in research, but MS Excel offers wide variety of tools for data visualization and simple statistics. It is easy to generate summary and customizable graphs and figures. MS Excel is the most accessible option for those wanting to start with statistics.

5. Statistical Analysis Software (SAS)

It is a statistical platform used in business, healthcare, and human behavior research alike. It can carry out advanced analyzes and produce publication-worthy figures, tables and charts .

6. GraphPad Prism

It is a premium software that is primarily used among biology researchers. But, it offers a range of variety to be used in various other fields. Similar to SPSS, GraphPad gives scripting option to automate analyses to carry out complex statistical calculations.

This software offers basic as well as advanced statistical tools for data analysis. However, similar to GraphPad and SPSS, minitab needs command over coding and can offer automated analyses.

Use of Statistical Tools In Research and Data Analysis

Statistical tools manage the large data. Many biological studies use large data to analyze the trends and patterns in studies. Therefore, using statistical tools becomes essential, as they manage the large data sets, making data processing more convenient.

Following these steps will help biological researchers to showcase the statistics in research in detail, and develop accurate hypothesis and use correct tools for it.

There are a range of statistical tools in research which can help researchers manage their research data and improve the outcome of their research by better interpretation of data. You could use statistics in research by understanding the research question, knowledge of statistics and your personal experience in coding.

Have you faced challenges while using statistics in research? How did you manage it? Did you use any of the statistical tools to help you with your research data? Do write to us or comment below!

Frequently Asked Questions

Statistics in research can help a researcher approach the study in a stepwise manner: 1. Establishing a sample size 2. Testing of hypothesis 3. Data interpretation through analysis

Statistical tools in research can help researchers understand what to do with data and how to interpret the results, making this process as easy as possible. They can manage large data sets, making data processing more convenient. A great number of tools are available to carry out statistical analysis of data like SPSS, SAS (Statistical Analysis Software), and Minitab.

nice article to read

Holistic but delineating. A very good read.

Rate this article Cancel Reply

Your email address will not be published.

Enago Academy's Most Popular Articles

Empowering Researchers, Enabling Progress: How Enago Academy contributes to the SDGs

Promoting Research
Thought Leadership
Trending Now

How Enago Academy Contributes to Sustainable Development Goals (SDGs) Through Empowering Researchers

The United Nations Sustainable Development Goals (SDGs) are a universal call to action to end…

Reporting Research

Research Interviews: An effective and insightful way of data collection

Research interviews play a pivotal role in collecting data for various academic, scientific, and professional…

Planning Your Data Collection: Designing methods for effective research

Planning your research is very important to obtain desirable results. In research, the relevance of…

Language & Grammar

Best Plagiarism Checker Tool for Researchers — Top 4 to choose from!

While common writing issues like language enhancement, punctuation errors, grammatical errors, etc. can be dealt…

Industry News
Publishing News

2022 in a Nutshell — Reminiscing the year when opportunities were seized and feats were achieved!

It’s beginning to look a lot like success! Some of the greatest opportunities to research…

2022 in a Nutshell — Reminiscing the year when opportunities were seized and feats…

Sign-up to read more

Subscribe for free to get unrestricted access to all our resources on research writing and academic publishing including:

2000+ blog articles
50+ Webinars
10+ Expert podcasts
50+ Infographics
10+ Checklists
Research Guides

We hate spam too. We promise to protect your privacy and never spam you.

Publishing Research
AI in Academia
Career Corner
Diversity and Inclusion
Infographics
Expert Video Library
Other Resources
Enago Learn
Upcoming & On-Demand Webinars
Peer Review Week 2024
Open Access Week 2023
Conference Videos
Enago Report
Journal Finder
Enago Plagiarism & AI Grammar Check
Editing Services
Publication Support Services
Research Impact
Translation Services
Publication solutions
AI-Based Solutions
Call for Articles
Call for Speakers
Author Training
Edit Profile

I am looking for Editing/ Proofreading services for my manuscript Tentative date of next journal submission:

In your opinion, what is the most effective way to improve integrity in the peer review process?

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Publications
Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

Advanced Search
Journal List
Ann Card Anaesth
v.22(3); Jul-Sep 2019

Selection of Appropriate Statistical Methods for Data Analysis

Prabhaker mishra.

Department of Biostatistics and Health Informatics, Sanjay Gandhi Post Graduate Institute of Medical Sciences, Lucknow, Uttar Pradesh, India

Chandra Mani Pandey

Uttam singh, amit keshri.

1 Department of Neuro-otology, Sanjay Gandhi Post Graduate Institute of Medical Sciences, Lucknow, Uttar Pradesh, India

Mayilvaganan Sabaretnam

2 Department of Endocrine Surgery, Sanjay Gandhi Post Graduate Institute of Medical Sciences, Lucknow, Uttar Pradesh, India

In biostatistics, for each of the specific situation, statistical methods are available for analysis and interpretation of the data. To select the appropriate statistical method, one need to know the assumption and conditions of the statistical methods, so that proper statistical method can be selected for data analysis. Two main statistical methods are used in data analysis: descriptive statistics, which summarizes data using indexes such as mean and median and another is inferential statistics, which draw conclusions from data using statistical tests such as student's t -test. Selection of appropriate statistical method depends on the following three things: Aim and objective of the study, Type and distribution of the data used, and Nature of the observations (paired/unpaired). All type of statistical methods that are used to compare the means are called parametric while statistical methods used to compare other than means (ex-median/mean ranks/proportions) are called nonparametric methods. In the present article, we have discussed the parametric and non-parametric methods, their assumptions, and how to select appropriate statistical methods for analysis and interpretation of the biomedical data.

Introduction

Selection of appropriate statistical method is very important step in analysis of biomedical data. A wrong selection of the statistical method not only creates some serious problem during the interpretation of the findings but also affects the conclusion of the study. In statistics, for each specific situation, statistical methods are available to analysis and interpretation of the data. To select the appropriate statistical method, one need to know the assumption and conditions of the statistical methods, so that proper statistical method can be selected for data analysis.[ 1 ] Other than knowledge of the statistical methods, another very important aspect is nature and type of the data collected and objective of the study because as per objective, corresponding statistical methods are selected which are suitable on given data. Practice of wrong or inappropriate statistical method is a common phenomenon in the published articles in biomedical research. Incorrect statistical methods can be seen in many conditions like use of unpaired t -test on paired data or use of parametric test for the data which does not follow the normal distribution, etc., At present, many statistical software like SPSS, R, Stata, and SAS are available and using these softwares, one can easily perform the statistical analysis but selection of appropriate statistical test is still a difficult task for the biomedical researchers especially those with nonstatistical background.[ 2 ] Two main statistical methods are used in data analysis: descriptive statistics, which summarizes data using indexes such as mean, median, standard deviation and another is inferential statistics, which draws conclusions from data using statistical tests such as student's t-test, ANOVA test, etc.[ 3 ]

Factors Influencing Selection of Statistical Methods

Selection of appropriate statistical method depends on the following three things: Aim and objective of the study, Type and distribution of the data used, and Nature of the observations (paired/unpaired).

Aim and objective of the study

Selection of statistical test depends upon our aim and objective of the study. Suppose our objective is to find out the predictors of the outcome variable, then regression analysis is used while to compare the means between two independent samples, unpaired samples t-test is used.

Type and distribution of the data used

For the same objective, selection of the statistical test is varying as per data types. For the nominal, ordinal, discrete data, we use nonparametric methods while for continuous data, parametric methods as well as nonparametric methods are used.[ 4 ] For example, in the regression analysis, when our outcome variable is categorical, logistic regression while for the continuous variable, linear regression model is used. The choice of the most appropriate representative measure for continuous variable is dependent on how the values are distributed. If continuous variable follows normal distribution, mean is the representative measure while for non-normal data, median is considered as the most appropriate representative measure of the data set. Similarly in the categorical data, proportion (percentage) while for the ranking/ordinal data, mean ranks are our representative measure. In the inferential statistics, hypothesis is constructed using these measures and further in the hypothesis testing, these measures are used to compare between/among the groups to calculate significance level. Suppose we want to compare the diastolic blood pressure (DBP) between three age groups (years) (<30, 30--50, >50). If our DBP variable is normally distributed, mean value is our representative measure and null hypothesis stated that mean DB P values of the three age groups are statistically equal. In case of non-normal DBP variable, median value is our representative measure and null hypothesis stated that distribution of the DB P values among three age groups are statistically equal. In above example, one-way ANOVA test is used to compare the means when DBP follows normal distribution while Kruskal--Wallis H tests/median tests are used to compare the distribution of DBP among three age groups when DBP follows non-normal distribution. Similarly, suppose we want to compare the mean arterial pressure (MAP) between treatment and control groups, if our MAP variable follows normal distribution, independent samples t-test while in case follow non-normal distribution, Mann--Whitney U test are used to compare the MAP between the treatment and control groups.

Observations are paired or unpaired

Another important point in selection of the statistical test is to assess whether data is paired (same subjects are measures at different time points or using different methods) or unpaired (each group have different subject). For example, to compare the means between two groups, when data is paired, paired samples t-test while for unpaired (independent) data, independent samples t-test is used.

Concept of Parametric and Nonparametric Methods

Inferential statistical methods fall into two possible categorizations: parametric and nonparametric. All type of statistical methods those are used to compare the means are called parametric while statistical methods used to compare other than means (ex-median/mean ranks/proportions) are called nonparametric methods. Parametric tests rely on the assumption that the variable is continuous and follow approximate normally distributed. When data is continuous with non-normal distribution or any other types of data other than continuous variable, nonparametric methods are used. Fortunately, the most frequently used parametric methods have nonparametric counterparts. This can be useful when the assumptions of a parametric test are violated and we can choose the nonparametric alternative as a backup analysis.[ 3 ]

Selection between Parametric and Nonparametric Methods

All type of the t -test, F test are considered parametric test. Student's t -test (one sample t -test, independent samples t -test, paired samples t -test) is used to compare the means between two groups while F test (one-way ANOVA, repeated measures ANOVA, etc.) which is the extension of the student's t -test are used to compare the means among three or more groups. Similarly, Pearson correlation coefficient, linear regression is also considered parametric methods, is used to calculate using mean and standard deviation of the data. For above parametric methods, counterpart nonparametric methods are also available. For example, Mann--Whitney U test and Wilcoxon test are used for student's t -test while Kruskal--Wallis H test, median test, and Friedman test are alternative methods of the F test (ANOVA). Similarly, Spearman rank correlation coefficient and log linear regression are used as nonparametric method of the Pearson correlation and linear regression, respectively.[ 3 , 5 , 6 , 7 , 8 ] Parametric and their counterpart nonparametric methods are given in Table 1 .

Parametric and their Alternative Nonparametric Methods

Description	Parametric Methods	Nonparametric Methods
Descriptive statistics	Mean, Standard deviation	Median, Interquartile range
Sample with population (or hypothetical value)	One sample -test ( <30) and One sample -test ( ≥30)	One sample Wilcoxon signed rank test
Two unpaired groups	Independent samples -test (Unpaired samples -test)	Mann Whitney U test/Wilcoxon rank sum test
Two paired groups	Paired samples -test	Related samples Wilcoxon signed-rank test
Three or more unpaired groups	One-way ANOVA	Kruskal-Wallis H test
Three or more paired groups	Repeated measures ANOVA	Friedman Test
Degree of linear relationship between two variables	Pearson’s correlation coefficient	Spearman rank correlation coefficient
Predict one outcome variable by at least one independent variable	Linear regression model	Nonlinear regression model/Log linear regression model on log normal data

Statistical Methods to Compare the Proportions

The statistical methods used to compare the proportions are considered nonparametric methods and these methods have no alternative parametric methods. Pearson Chi-square test and Fisher exact test is used to compare the proportions between two or more independent groups. To test the change in proportions between two paired groups, McNemar test is used while Cochran Q test is used for the same objective among three or more paired groups. Z test for proportions is used to compare the proportions between two groups for independent as well as dependent groups.[ 6 , 7 , 8 ] [ Table 2 ].

Description	Statistical Methods	Data Type
Test the association between two categorical variables (Independent groups)	Pearson Chi-square test/Fisher exact test	Variable has ≥2 categories
Test the change in proportions between 2/3 groups (paired groups)	McNemar test/Cochrane Q test	Variable has 2 categories
Comparisons between proportions	Z test for proportions	Variable has 2 categories

Other Statistical Methods

Intraclass correlation coefficient is calculated when both pre-post data are in continuous scale. Unweighted and weighted Kappa statistics are used to test the absolute agreement between two methods measured on the same subjects (pre-post) for nominal and ordinal data, respectively. There are some methods those are either semiparametric or nonparametric and these methods, counterpart parametric methods, are not available. Methods are logistic regression analysis, survival analysis, and receiver operating characteristics curve.[ 9 ] Logistic regression analysis is used to predict the categorical outcome variable using independent variable(s). Survival analysis is used to calculate the survival time/survival probability, comparison of the survival time between the groups (Kaplan--Meier method) as well as to identify the predictors of the survival time of the subjects/patients (Cox regression analysis). Receiver operating characteristics (ROC) curve is used to calculate area under curve (AUC) and cutoff values for given continuous variable with corresponding diagnostic accuracy using categorical outcome variable. Diagnostic accuracy of the test method is calculated as compared with another method (usually as compared with gold standard method). Sensitivity (proportion of the detected disease cases from the actual disease cases), specificity (proportion of the detected non-disease subjects from the actual non-disease subjects), overall accuracy (proportion of agreement between test and gold standard methods to correctly detect the disease and non-disease subjects) are the key measures used to assess the diagnostic accuracy of the test method. Other measures like false negative rate (1-sensitivity), false-positive rate (1-specificity), likelihood ratio positive (sensitivity/false-positive rate), likelihood ratio negative (false-negative rate/Specificity), positive predictive value (proportion of correctly detected disease cases by the test variable out of total detected disease cases by the itself), and negative predictive value (proportion of correctly detected non-disease subjects by test variable out of total non-disease subjects detected by the itself) are also used to calculate the diagnostic accuracy of the test method.[ 3 , 6 , 10 ] [ Table 3 ].

Semi-parametric and non-parametric methods

Description	Statistical methods	Data type
To predict the outcome variable using independent variables	Binary Logistic regression analysis	Outcome variable (two categories), Independent variable (s): Categorical (≥2 categories) or Continuous variables or both
To predict the outcome variable using independent variables	Multinomial Logistic regression analysis	Outcome variable (≥3 categories), Independent variable (s): Categorical (≥2 categories) or continuous variables or both
Area under Curve and cutoff values in the continuous variable	Receiver operating characteristics (ROC) curve	Outcome variable (two categories), Test variable : Continuous
To predict the survival probability of the subjects for the given equal intervals	Life table analysis	Outcome variable (two categories), Follow-up time : Continuous variable
To compare the survival time in ≥2 groups with	Kaplan--Meier curve	Outcome variable (two categories), Follow-up time : Continuous variable, One categorical group variable
To assess the predictors those influencing the survival probability	Cox regression analysis	Outcome variable (two categories), Follow-up time : Continuous variable, Independent variable(s): Categorical variable(s) (≥2 categories) or continuous variable(s) or both
To predict the diagnostic accuracy of the test variable as compared to gold standard method	Diagnostic accuracy (Sensitivity, Specificity etc.)	Both variables (gold standard method and test method) should be categorical (2 × 2 table)
Absolute Agreement between two diagnostic methods	Unweighted and weighted Kappa statistics/Intra class correlation	Between two Nominal variables (unweighted Kappa), Two Ordinal variables (Weighted kappa), Two Continuous variables (Intraclass correlation)

Advantage and Disadvantages of Nonparametric Methods over Parametric Methods and Sample Size Issues

Parametric methods are stronger test to detect the difference between the groups as compared with its counterpart nonparametric methods, although due to some strict assumptions, including normality of the data and sample size, we cannot use parametric test in every situation and resultant its alternative nonparametric methods are used. As mean is used to compare parametric method, which is severally affected by the outliers while in nonparametric method, median/mean rank is our representative measures which do not affect from the outliers.[ 11 ]

In parametric methods like student's t-test and ANOVA test, significance level is calculated using mean and standard deviation, and to calculate standard deviation in each group, at least two observations are required. If every group did not have at least two observations, its alternative nonparametric method to be selected works through comparisons of the mean ranks of the data.

For small sample size (average ≤15 observations per group), normality testing methods are less sensitive about non-normality and there is chance to detect normality despite having non-normal data. It is recommended that when sample size is small, only on highly normally distributed data, parametric method should be used otherwise corresponding nonparametric methods should be preferred. Similarly on sufficient or large sample size (average >15 observations per group), most of the statistical methods are highly sensitive about non-normality and there is chance to wrongly detect non-normality, despite having normal data. It is recommended that when sample size is sufficient, only on highly non-normal data, nonparametric method should be used otherwise corresponding parametric methods should be preferred.[ 12 ]

Minimum Sample Size Required for Statistical Methods

To detect the significant difference between the means/medians/mean ranks/proportions, at minimum level of confidence (usually 95%) and power of the test (usually 80%), how many individuals/subjects (sample size) are required depends on the detected effect size. The effect size and corresponding required sample size are inversely proportional to each other, that is, on the same level of confidence and power of the test, when effect size is increasing, required sample size is decreasing. Summary is, no minimum or maximum sample size is fix for any particular statistical method and it is subject to estimate based on the given inputs including effect size, level of confidence, power of the study, etc., Only on the sufficient sample size, we can detect the difference significantly. In case lack of the sample size than actual required, our study will be under power to detect the given difference as well as result would be statistically insignificant.

Impact of Wrong Selection of the Statistical Methods

As for each and every situation, there are specific statistical methods. Failing to select appropriate statistical method, our significance level as well as their conclusion is affected.[ 13 ] For example in a study, systolic blood pressure (mean ± SD) of the control (126.45 ± 8.85, n 1 =20) and treatment (121.85 ± 5.96, n 2 =20) group was compared using Independent samples t -test (correct practice). Result showed that mean difference between two groups was statistically insignificant ( P = 0.061) while on the same data, paired samples t -test (incorrect practice) indicated that mean difference was statistically significant ( P = 0.011). Due to incorrect practice, we detected the statistically significant difference between the groups although actually difference did not exist.

Conclusions

Selection of the appropriate statistical methods is very important for the quality research. It is important that a researcher knows the basic concepts of the statistical methods used to conduct research study that produce a valid and reliable results. There are various statistical methods that can be used in different situations. Each test makes particular assumptions about the data. These assumptions should be taken into consideration when deciding which the most appropriate test is. Wrong or inappropriate use of statistical methods may lead to defective conclusions, finally would harm the evidence-based practices. Hence, an adequate knowledge of statistics and the appropriate use of statistical tests are important for improving and producing quality biomedical research. However, it is extremely difficult for a biomedical researchers or academician to learn the entire statistical methods. Therefore, at least basic knowledge is very important so that appropriate selection of the statistical methods can decide as well as correct/incorrect practices can be recognized in the published research. There are many softwares available online as well as offline for analyzing the data, although it is fact that which set of statistical tests are appropriate for the given data and study objective is still very difficult for the researchers to understand. Therefore, since planning of the study to data collection, analysis and finally in the review process, proper consultation from statistical experts may be an alternative option and can reduce the burden from the clinicians to go in depth of statistics which required lots of time and effort and ultimately affect their clinical works. These practices not only ensure the correct and appropriate use of the biostatistical methods in the research but also ensure the highest quality of statistical reporting in the research and journals.[ 14 ]

Financial support and sponsorship

Conflicts of interest.

There are no conflicts of interest.

Acknowledgements

Authors would like to express their deep and sincere gratitude to Dr. Prabhat Tiwari, Professor, Department of Anaesthesiology, Sanjay Gandhi Postgraduate Institute of Medical Sciences, Lucknow, for his encouragement to write this article. His critical reviews and suggestions were very useful for improvement in the article.

Top 9 Statistical Tools Used in Research

Well-designed research requires a well-chosen study sample and a suitable statistical test selection . To plan an epidemiological study or a clinical trial, you’ll need a solid understanding of the data . Improper inferences from it could lead to false conclusions and unethical behavior . And given the ocean of data available nowadays, it’s often a daunting task for researchers to gauge its credibility and do statistical analysis on it.

With that said, thanks to all the statistical tools available in the market that help researchers make such studies much more manageable. Statistical tools are extensively used in academic and research sectors to study human, animal, and material behaviors and reactions.

Statistical tools aid in the interpretation and use of data. They can be used to evaluate and comprehend any form of data. Some statistical tools can help you see trends, forecast future sales, and create links between causes and effects. When you’re unsure where to go with your study, other tools can assist you in navigating through enormous amounts of data.

What is Statistics? And its Importance in Research

Statistics is the study of collecting, arranging, and interpreting data from samples and inferring it to the total population. Also known as the “Science of Data,” it allows us to derive conclusions from a data set. It may also assist people in all industries in answering research or business queries and forecast outcomes, such as what show you should watch next on your favorite video app.

Statistics is a technique that social scientists, such as psychologists, use to examine data and answer research questions. Scientists raise a wide range of questions that statistics can answer. Moreover, it provides credibility and legitimacy to research. If two research publications are presented, one without statistics and the other with statistical analysis supporting each assertion, people will choose the latter.

Statistical Tools Used in Research

Researchers often cannot discern a simple truth from a set of data. They can only draw conclusions from data after statistical analysis. On the other hand, creating a statistical analysis is a difficult task. This is when statistical tools come into play. Researchers can use statistical tools to back up their claims, make sense of a vast set of data, graphically show complex data, or help clarify many things in a short period.

Let’s go through the top 9 best statistical tools used in research below:

SPSS first stores and organizes the data, then compile the data set to generate appropriate output. SPSS is intended to work with a wide range of variable data formats.

R is a statistical computing and graphics programming language that you may use to clean, analyze and graph your data. It is frequently used to estimate and display results by researchers from various fields and lecturers of statistics and research methodologies. It’s free, making it an appealing option, but it relies upon programming code rather than drop-down menus or buttons.

Many big tech companies are using SAS due to its support and integration for vast teams. Setting up the tool might be a bit time-consuming initially, but once it’s up and running, it’ll surely streamline your statistical processes.

Moreover, MATLAB provides a multi-paradigm numerical computing environment, which means that the language may be used for both procedural and object-oriented programming. MATLAB is ideal for matrix manipulation, including data function plotting, algorithm implementation, and user interface design, among other things. Last but not least, MATLAB can also run programs written in other programming languages.

Tableau is a data visualization program that is among the most competent on the market. In data analytics, the approach of data visualization is commonly employed. In only a few minutes, you can use Tableau to produce the best data visualization for a large amount of data. As a result, it aids the data analyst in making quick decisions. It has a large number of online analytical processing cubes, cloud databases, spreadsheets, and other tools. It also provides users with a drag-and-drop interface. As a result, the user must drag and drop the data set sheet into Tableau and set the filters according to their needs.

Some of the highlights of Tableau are:

7. MS EXCEL:

Microsoft Excel is undoubtedly one of the best and most used statistical tools for beginners looking to do basic data analysis. It provides data analytics specialists with cutting-edge solutions and can be used for both data visualization and simple statistics. Furthermore, it is the most suitable statistical tool for individuals who wish to apply fundamental data analysis approaches to their data.

You can apply various formulas and functions to your data in Excel without prior knowledge of statistics. The learning curve is great, and even freshers can achieve great results quickly since everything is just a click away. This makes Excel a great choice not only for amateurs but beginners as well.

8. RAPIDMINER:

RapidMiner is a valuable platform for data preparation, machine learning, and the deployment of predictive models. RapidMiner makes it simple to develop a data model from the beginning to the end. It comes with a complete data science suite. Machine learning, deep learning, text mining, and predictive analytics are all possible with it.

9. APACHE HADOOP:

So, if you have massive data on your hands and want something that doesn’t slow you down and works in a distributed way, Hadoop is the way to go.

Learn more about Statistics and Key Tools

Elasticity of Demand Explained in Plain Terms

An introduction to statistical power and a/b testing.

Statistical power is an integral part of A/B testing. And in this article, you will learn everything you need to know about it and how it is applied in A/B testing. A/B

What Data Analytics Tools Are And How To Use Them

When it comes to improving the quality of your products and services, data analytic tools are the antidotes. Regardless, people often have questions. What are data analytic tools? Why are

Learn More…

As an IT Engineer, who is passionate about learning and sharing. I have worked and learned quite a bit from Data Engineers, Data Analysts, Business Analysts, and Key Decision Makers almost for the past 5 years. Interested in learning more about Data Science and How to leverage it for better decision-making in my business and hopefully help you do the same in yours.

Standard statistical tools in research and data analysis

Introduction.

Statistics is a field of science concerned with gathering, organising, analysing, and extrapolating data from samples to the entire population. This necessitates a well-designed study, a well-chosen study sample, and a proper statistical test selection. A good understanding of statistics is required to design epidemiological research or a clinical trial. Improper statistical approaches might lead to erroneous findings and unethical behaviour.

A variable is a trait that differs from one person to the next within a population. Quantitative variables are measured by a scale and provide quantitative information, such as height and weight. Qualitative factors, such as sex and eye colour, provide qualitative information (Figure 1).

Figure 1. Classification of variables [1]

Quantitative variables

Discrete and continuous measures are used to split quantitative or numerical data. Continuous data can take on any value, whereas discrete numerical data is stored as a whole number such as 0, 1, 2, 3,… (integer). Discrete data is made up of countable observations, while continuous data is made up of measurable observations. Discrete data examples include the number of respiratory arrest episodes or re-intubation in an intensive care unit. Continuous data includes serial serum glucose levels, partial pressure of oxygen in arterial blood, and oesophageal temperature. A hierarchical scale with increasing precision can be used based on category, ordinal, interval and ratio scales (Figure 1).

Descriptive statistics try to explain how variables in a sample or population are related. The mean, median, and mode forms, descriptive statistics give an overview of data. Inferential statistics use a random sample of data from that group to characterise and infer about a community as a whole. It’s useful when it’s not possible to investigate every single person in a group.

Descriptive statistics

The central tendency describes how observations cluster about a centre point, whereas the degree of dispersion describes the spread towards the extremes.

Inferential statistics

In inferential statistics, data from a sample is analysed to conclude the entire population. The goal is to prove or disprove the theories. A hypothesis is a suggested explanation for a phenomenon (plural hypotheses). Hypothesis testing is essential to process for making logical choices regarding observed effects’ veracity.

SOFTWARES FOR STATISTICS, SAMPLE SIZE CALCULATION AND POWER ANALYSIS

There are several statistical software packages accessible today. The most commonly used software systems are Statistical Package for the Social Sciences (SPSS – manufactured by IBM corporation), Statistical Analysis System (SAS – developed by SAS Institute North Carolina, Minitab (developed by Minitab Inc), United States of America), R (designed by Ross Ihaka and Robert Gentleman from the R core team), Stata (developed by StataCorp), and MS Excel. There are several websites linked to statistical power studies. Here are a few examples:

StatPages.net – contains connections to a variety of online power calculators.
G-Power — a downloadable power analysis software that works on DOS.
ANOVA power analysis creates an interactive webpage that estimates the power or sample size required to achieve a specified power for one effect in a factorial ANOVA design.
Sample Power is software created by SPSS. It generates a comprehensive report on the computer screen that may be copied and pasted into another document.

A researcher must be familiar with the most important statistical approaches for doing research. This will aid in the implementation of a well-designed study that yields accurate and valid data. Incorrect statistical approaches can result in erroneous findings, mistakes, and reduced paper’s importance. Poor statistics can lead to poor research, which can lead to immoral behaviour. As a result, proper statistical understanding and the right application of statistical tests are essential. A thorough understanding of fundamental statistical methods will go a long way toward enhancing study designs and creating high-quality medical research that may be used to develop evidence-based guidelines.

[1] Ali, Zulfiqar, and S Bala Bhaskar. “Basic statistical tools in research and data analysis.” Indian journal of anaesthesia vol. 60,9 (2016): 662-669. doi:10.4103/0019-5049.190623

[2] Ali, Zulfiqar, and S Bala Bhaskar. “Basic statistical tools in research and data analysis.” Indian journal of anaesthesia vol. 60,9 (2016): 662-669. doi:10.4103/0019-5049.190623

ANOVA power analysis
Quantitative Data analysis
quantitative variables
R programming
sample size calculation.

A global market analysis (1)
Academic (22)
Algorithms (1)
Big Data Analytics (4)
Bio Statistics (3)
Clinical Prediction Model (1)
Corporate (9)
Corporate statistics service (1)
Data Analyses (23)
Data collection (11)
Genomics & Bioinformatics (1)
Guidelines (2)
Machine Learning – Blogs (1)
Network Analysis (1)
Predictive analyses (2)
Qualitative (1)
Quantitaive (2)
Quantitative Data analysis service (1)
Research (59)
Shipping & Logistics (1)
Statistical analysis service (7)
Statistical models (1)
Statistical Report Writing (1)
Statistical Software (10)
Statistics (64)
Survey & Interview from Statswork (1)
Uncategorized (3)

Functional Area

– Research Planning – Tool Development – Data Mining – Data Collection – Statistics Coursework – Research Methodology – Meta Analysis – Data Analysis

– Corporate
– Statistical Software
– Statistics

Corporate Office

#10, Kutty Street, Nungambakkam, Chennai, Tamil Nadu – 600034, India No : +91 4433182000, UK No : +44-1223926607 , US No : +1-9725029262 Email: [email protected]

Website: www.statswork.com

Have a language expert improve your writing

Run a free plagiarism check in 10 minutes, generate accurate citations for free.

Knowledge Base
Choosing the Right Statistical Test | Types & Examples

Choosing the Right Statistical Test | Types & Examples

Published on January 28, 2020 by Rebecca Bevans . Revised on June 22, 2023.

Statistical tests are used in hypothesis testing . They can be used to:

determine whether a predictor variable has a statistically significant relationship with an outcome variable.
estimate the difference between two or more groups.

Statistical tests assume a null hypothesis of no relationship or no difference between groups. Then they determine whether the observed data fall outside of the range of values predicted by the null hypothesis.

If you already know what types of variables you’re dealing with, you can use the flowchart to choose the right statistical test for your data.

Statistical tests flowchart

What does a statistical test do, when to perform a statistical test, choosing a parametric test: regression, comparison, or correlation, choosing a nonparametric test, flowchart: choosing a statistical test, other interesting articles, frequently asked questions about statistical tests.

Statistical tests work by calculating a test statistic – a number that describes how much the relationship between variables in your test differs from the null hypothesis of no relationship.

It then calculates a p value (probability value). The p -value estimates how likely it is that you would see the difference described by the test statistic if the null hypothesis of no relationship were true.

If the value of the test statistic is more extreme than the statistic calculated from the null hypothesis, then you can infer a statistically significant relationship between the predictor and outcome variables.

If the value of the test statistic is less extreme than the one calculated from the null hypothesis, then you can infer no statistically significant relationship between the predictor and outcome variables.

Prevent plagiarism. Run a free check.

You can perform statistical tests on data that have been collected in a statistically valid manner – either through an experiment , or through observations made using probability sampling methods .

For a statistical test to be valid , your sample size needs to be large enough to approximate the true distribution of the population being studied.

To determine which statistical test to use, you need to know:

whether your data meets certain assumptions.
the types of variables that you’re dealing with.

Statistical assumptions

Statistical tests make some common assumptions about the data they are testing:

Independence of observations (a.k.a. no autocorrelation): The observations/variables you include in your test are not related (for example, multiple measurements of a single test subject are not independent, while measurements of multiple different test subjects are independent).
Homogeneity of variance : the variance within each group being compared is similar among all groups. If one group has much more variation than others, it will limit the test’s effectiveness.
Normality of data : the data follows a normal distribution (a.k.a. a bell curve). This assumption applies only to quantitative data .

If your data do not meet the assumptions of normality or homogeneity of variance, you may be able to perform a nonparametric statistical test , which allows you to make comparisons without any assumptions about the data distribution.

If your data do not meet the assumption of independence of observations, you may be able to use a test that accounts for structure in your data (repeated-measures tests or tests that include blocking variables).

Types of variables

The types of variables you have usually determine what type of statistical test you can use.

Quantitative variables represent amounts of things (e.g. the number of trees in a forest). Types of quantitative variables include:

Continuous (aka ratio variables): represent measures and can usually be divided into units smaller than one (e.g. 0.75 grams).
Discrete (aka integer variables): represent counts and usually can’t be divided into units smaller than one (e.g. 1 tree).

Categorical variables represent groupings of things (e.g. the different tree species in a forest). Types of categorical variables include:

Ordinal : represent data with an order (e.g. rankings).
Nominal : represent group names (e.g. brands or species names).
Binary : represent data with a yes/no or 1/0 outcome (e.g. win or lose).

Choose the test that fits the types of predictor and outcome variables you have collected (if you are doing an experiment , these are the independent and dependent variables ). Consult the tables below to see which test best matches your variables.

Parametric tests usually have stricter requirements than nonparametric tests, and are able to make stronger inferences from the data. They can only be conducted with data that adheres to the common assumptions of statistical tests.

The most common types of parametric test include regression tests, comparison tests, and correlation tests.

Regression tests

Regression tests look for cause-and-effect relationships . They can be used to estimate the effect of one or more continuous variables on another variable.

	Predictor variable	Outcome variable	Research question example
			What is the effect of income on longevity?
			What is the effect of income and minutes of exercise per day on longevity?
Logistic regression			What is the effect of drug dosage on the survival of a test subject?

Comparison tests

Comparison tests look for differences among group means . They can be used to test the effect of a categorical variable on the mean value of some other characteristic.

T-tests are used when comparing the means of precisely two groups (e.g., the average heights of men and women). ANOVA and MANOVA tests are used when comparing the means of more than two groups (e.g., the average heights of children, teenagers, and adults).

	Predictor variable	Outcome variable	Research question example
Paired t-test			What is the effect of two different test prep programs on the average exam scores for students from the same class?
Independent t-test			What is the difference in average exam scores for students from two different schools?
ANOVA			What is the difference in average pain levels among post-surgical patients given three different painkillers?
MANOVA			What is the effect of flower species on petal length, petal width, and stem length?

Correlation tests

Correlation tests check whether variables are related without hypothesizing a cause-and-effect relationship.

These can be used to test whether two variables you want to use in (for example) a multiple regression test are autocorrelated.

	Variables	Research question example
Pearson’s		How are latitude and temperature related?

Non-parametric tests don’t make as many assumptions about the data, and are useful when one or more of the common statistical assumptions are violated. However, the inferences they make aren’t as strong as with parametric tests.

	Predictor variable	Outcome variable	Use in place of…
Spearman’s
			Pearson’s
Sign test			One-sample -test
Kruskal–Wallis			ANOVA
ANOSIM			MANOVA
Wilcoxon Rank-Sum test			Independent t-test
Wilcoxon Signed-rank test			Paired t-test

Here's why students love Scribbr's proofreading services

Discover proofreading & editing

This flowchart helps you choose among parametric tests. For nonparametric alternatives, check the table above.

If you want to know more about statistics , methodology , or research bias , make sure to check out some of our other articles with explanations and examples.

Normal distribution
Descriptive statistics
Measures of central tendency
Correlation coefficient
Null hypothesis

Methodology

Cluster sampling
Stratified sampling
Types of interviews
Cohort study
Thematic analysis

Research bias

Implicit bias
Cognitive bias
Survivorship bias
Availability heuristic
Nonresponse bias
Regression to the mean

Statistical tests commonly assume that:

the data are normally distributed
the groups that are being compared have similar variance
the data are independent

If your data does not meet these assumptions you might still be able to use a nonparametric statistical test , which have fewer requirements but also make weaker inferences.

A test statistic is a number calculated by a statistical test . It describes how far your observed data is from the null hypothesis of no relationship between variables or no difference among sample groups.

The test statistic tells you how different two or more groups are from the overall population mean , or how different a linear slope is from the slope predicted by a null hypothesis . Different test statistics are used in different statistical tests.

Statistical significance is a term used by researchers to state that it is unlikely their observations could have occurred under the null hypothesis of a statistical test . Significance is usually denoted by a p -value , or probability value.

Statistical significance is arbitrary – it depends on the threshold, or alpha value, chosen by the researcher. The most common threshold is p < 0.05, which means that the data is likely to occur less than 5% of the time under the null hypothesis .

When the p -value falls below the chosen alpha value, then we say the result of the test is statistically significant.

Quantitative variables are any variables where the data represent amounts (e.g. height, weight, or age).

Categorical variables are any variables where the data represent groups. This includes rankings (e.g. finishing places in a race), classifications (e.g. brands of cereal), and binary outcomes (e.g. coin flips).

You need to know what type of variables you are working with to choose the right statistical test for your data and interpret your results .

Discrete and continuous variables are two types of quantitative variables :

Discrete variables represent counts (e.g. the number of objects in a collection).
Continuous variables represent measurable amounts (e.g. water volume or weight).

Cite this Scribbr article

If you want to cite this source, you can copy and paste the citation or click the “Cite this Scribbr article” button to automatically add the citation to our free Citation Generator.

Bevans, R. (2023, June 22). Choosing the Right Statistical Test | Types & Examples. Scribbr. Retrieved August 26, 2024, from https://www.scribbr.com/statistics/statistical-tests/

Is this article helpful?

Rebecca Bevans

Other students also liked, hypothesis testing | a step-by-step guide with easy examples, test statistics | definition, interpretation, and examples, normal distribution | examples, formulas, & uses, what is your plagiarism score.

COMMENTS

Descriptive Statistics
Learn how to use descriptive statistics to summarize and organize characteristics of a data set in quantitative research. Find out the types, formulas and examples of frequency distribution, central tendency and variability measures.
Which descriptive statistics tool should you choose?
In order to choose the right descriptive statistics tool, we need to consider the types and the number of variables we have as well as the objective of the study. Based on these three criteria we have generated a grid that will help you decide which tool to use according to your situation. Quantitative dataset: containing variables that ...
Basic statistical tools in research and data analysis
Descriptive statistics provide a summary of data in the form of mean, median and mode. Inferential statistics use a random sample of data taken from a population to describe and make inferences about the whole population. It is valuable when it is not possible to examine each member of an entire population. The examples if descriptive and ...
Descriptive Analytics
Learn how to use descriptive analytics to summarize, interpret, and visualize historical data in research. Explore the techniques, tools, and applications of descriptive analytics with examples and definitions.
Descriptive Statistics for Summarising Data
Using the data from these three rows, we can draw the following descriptive picture. Mentabil scores spanned a range of 50 (from a minimum score of 85 to a maximum score of 135). Speed scores had a range of 16.05 s (from 1.05 s - the fastest quality decision to 17.10 - the slowest quality decision).
What Is Descriptive Statistics: Full Explainer With Examples
Descriptive statistics, although relatively simple, are a critically important part of any quantitative data analysis. Measures of central tendency include the mean (average), median and mode. Skewness indicates whether a dataset leans to one side or another. Measures of dispersion include the range, variance and standard deviation.
The Beginner's Guide to Statistical Analysis
It is an important research tool used by scientists, governments, businesses, and other organizations. ... Example: Descriptive statistics (experiment) After collecting pretest and posttest data from 30 students across the city, you calculate descriptive statistics. Because you have normal distributed data on an interval scale, you tabulate the ...
Descriptive Research
Learn how to conduct descriptive research to accurately and systematically describe a population, situation or phenomenon. Find out when to use descriptive research, what methods to choose and see examples of descriptive research questions.
Descriptive Statistics
Learn how to summarize and describe data using measures of central tendency, variability, position, and association. See formulas, methods, and examples of descriptive statistics for different types of data.
Chapter 14 Quantitative Analysis Descriptive Statistics
Numeric data collected in a research project can be analyzed quantitatively using statistical tools in two different ways. Descriptive analysis refers to statistically describing, aggregating, and presenting the constructs of interest or associations between these constructs. Inferential analysis refers to the statistical testing of hypotheses ...
Descriptive Statistics
Learn how to use graphical and pictorial methods and measures of central tendency, dispersion, and association to describe variables in a dataset. This page also explains the purpose and benefits of descriptive statistics for research.
Introduction to Descriptive Statistics
The chapter underscores how descriptive statistics drive research inspiration and guide analysis, and provide a foundation for advanced statistical techniques. ... While descriptive statistics is an essential tool in academic statistics, there are several potential pitfalls that researchers should be aware of: Limited scope: Descriptive ...
Descriptive Statistics in Research: Your Complete Guide- Qualtrics
We call this process "describing data". In the process of producing summaries of the sample, we use measures like mean, median, variance, graphs, charts, frequencies, histograms, box and whisker plots, and percentages. For datasets with just one variable, we use univariate descriptive statistics. For datasets with multiple variables, we use ...
Introduction: Statistics as a Research Tool
The purpose of statistical analysis is to clarify and not confuse. It is a tool for answering questions. It allows us to take large bodies of information and summarize them with a few simple statements. It lets us come to solid conclusions even when the realities of the research world make it difficult to isolate the problems we seek to study.
Introduction to Research Statistical Analysis: An Overview of the
Introduction. Statistical analysis is necessary for any research project seeking to make quantitative conclusions. The following is a primer for research-based statistical analysis. It is intended to be a high-level overview of appropriate statistical testing, while not diving too deep into any specific methodology.
Descriptive Statistics: Reporting the Answers to the 5 Basic Questions
Descriptive statistics are specific methods basically used to calculate, describe, and summarize collected research data in a logical, meaningful, and efficient way. Descriptive statistics are reported numerically in the manuscript text and/or in its tables, or graphically in its figures. This basic …
Easy Tools for Descriptive Statistics Tools
Tools for Descriptive Statistics. Scatter Plot Chart Maker, with Line of Best Fit (Offsite) Mean, Median and Mode Calculator. Variance Calculator. Standard Deviation Calculator. Coefficient of Variation Calculator. Percentile Calculator. Interquartile Range Calculator. Pooled Variance Calculator.
Role of Statistics in Research
The descriptive statistical analysis allows organizing and summarizing the large data into graphs and tables. Descriptive analysis involves various processes such as tabulation, measure of central tendency, measure of dispersion or variance, skewness measurements etc. 2. Inferential Analysis.
Selection of Appropriate Statistical Methods for Data Analysis
Type and distribution of the data used. For the same objective, selection of the statistical test is varying as per data types. For the nominal, ordinal, discrete data, we use nonparametric methods while for continuous data, parametric methods as well as nonparametric methods are used.[] For example, in the regression analysis, when our outcome variable is categorical, logistic regression ...
Top 9 Statistical Tools Used in Research
Let's go through the top 9 best statistical tools used in research below: 1. SPSS: SPSS (Statistical Package for the Social Sciences) is a collection of software tools compiled as a single package. This program's primary function is to analyze scientific data in social science. This information can be utilized for market research, surveys ...
Inferential Statistics
In descriptive statistics, there is no uncertainty - the statistics precisely describe the data that you collected. If you collect data from an entire population, you can directly compare these descriptive statistics to those from other populations. Example: Descriptive statistics. You collect data on the SAT scores of all 11th graders in a ...
Standard statistical tools in research and data analysis
STATISTICS. Descriptive statistics try to explain how variables in a sample or population are related. The mean, median, and mode forms, descriptive statistics give an overview of data. Inferential statistics use a random sample of data from that group to characterise and infer about a community as a whole. It's useful when it's not ...
Choosing the Right Statistical Test
Categorical variables represent groupings of things (e.g. the different tree species in a forest). Types of categorical variables include: Ordinal: represent data with an order (e.g. rankings). Nominal: represent group names (e.g. brands or species names). Binary: represent data with a yes/no or 1/0 outcome (e.g. win or lose).

Which descriptive statistics tool should you choose?

The purpose of descriptive statistics

A guide to choose a descriptive statistics tool according to the situation

Descriptive Statistics grid

How to run descriptive statistics in XLSTAT?

Outputs for quantitative data

Outputs for qualitative data

Similar articles

Quant Analysis 101: Descriptive Statistics

Overview: Descriptive Statistics

What about inferential statistics?

Why do descriptive statistics matter?

The “Big 7” descriptive statistics

Measures of central tendency

Measures of dispersion

Key Takeaways

Psst… there’s more!

Submit a Comment Cancel reply

Descriptive Statistics – Types, Methods and Examples

Descriptive Statistics

Descriptive Statistics Types

Measures of Central Tendency

Measures of Variability (or Dispersion)

Descriptive Statistics Formulas

Descriptive Statistics Methods

Graphical Representation

Calculation of Central Tendency Measures

Calculation of Dispersion Measures

Calculation of Position Measures

Calculation of Association Measures

Summary Statistics

Descriptive Statistics Examples

Importance of Descriptive Statistics

When to use Descriptive Statistics

Applications of Descriptive Statistics

Limitations of Descriptive Statistics

About the author

Muhammad Hassan

You may also like

Documentary Analysis – Methods, Applications and...

Probability Histogram – Definition, Examples and...

Substantive Framework – Types, Methods and...

Data Analysis – Process, Methods and Types

MANOVA (Multivariate Analysis of Variance) –...

Framework Analysis – Method, Types and Examples

Chapter 14 Quantitative Analysis Descriptive Statistics

Data Preparation

Univariate Analysis

Child Care and Early Education Research Connections

Graphical/Pictorial Methods

Introduction to Descriptive Statistics

Recent Advances in Biostatistics

Author Information

Tosin Bukola

1. Introduction

2. Background

3. Benefits of descriptive statistics

4. Practical applications of descriptive statistics

4.1 Central tendency measurements

4.2 Variability indices

4.3 Data visualization

4.4 Data cleaning and preprocessing

5. Descriptive statistics in academic methodology

6. Pitfalls of descriptive statistics

7. Conclusion

Conflict of interest

Continue reading from the same book

Try Qualtrics for free

What do we mean by descriptive statistics?

Want to find out the definitions?

Examples of descriptive statistics

Types of descriptive statistics

Scope of descriptive statistics in research

Things you can do with descriptive statistics

Guiding your survey design to improve the data collected

Use a survey tool that supports you with the whole process

Uncover your next breakthrough idea with Stats iQ™

Advanced statistical analysis methods available in Stats iQ

Go from insights to action

Related resources