Farthest Neighbor [ ]
Average Linkage Pool [ ]
Minimum Variance [ ]
Median Method [ ]
The association rules base their analysis on the “if-then” algorithmic sentences, which allow supporting the different probabilities existing in the multiple elements of the data, found in large databases of different formats and types. The data mining techniques that are based on association rules throughout their evolution have had multiple applications, among which the sales and analysis of medical data sets can be highlighted.
Based on the algorithmic “if-then” sentences and based on established criteria such as support and trust, the association rules can identify the most important patterns. The support criterion gives the association rules the ability to know the frequency of the appearance of the elements in the data set is. As for the confidence criterion, it can determine the number of times the Boolean value of the “if-then” statement is true. There is also another common metric which is called Fit, which is fundamentally based on making a comparison between the expected confidence and the confidence that can be evidenced in the data. In the literature review, the progress of the association rules can be identified, as detailed below, see Table 2 .
Association Rules Evolutions.
Based in | Algorithms |
---|---|
Frequent Itemsets Mining | Apriori [ ] |
Apriori-TID [ ] | |
ECLAT TID-list [ , ] | |
FP-Growth [ ] | |
Big Data Algorithms | R-Apriori [ ] |
YAFIM [ ] | |
ParEclat [ ] | |
Par-FP (Parallel FP-Growth with Sampling) [ ] | |
HPA (Hash Partitioned Apriori) [ ] | |
Distributed algorithms | PEAR (Parallel Efficient Association Rules) [ ] |
Distributed algorithms for fuzzy association rule mining | Count Distribution algorithm [ , ] |
Dimensionality reduction methods are statistical techniques that map the data set to subspaces derived from the original space, of less dimension, which allow a description of the data at a lower cost. These techniques become important as many algorithms from various fields such as numerical analysis, machine learning or data mining tend to degrade their performance when used with high dimensional data. In external cases, the algorithm is no longer useful for the purpose for which it was designed. The curse of dimension refers to the various phenomena that arise when analyzing and organizing data from multi-dimensional spaces. Among the most important algorithms we can highlight.
Missing Values Ratio [ 73 ]: By examining the data, if we find that it contains many missing values, if there are few missing values, we can fill in the missing values or remove this variable directly; when the proportion of missing values in the dataset is too high, I usually choose directly Remove this variable because it contains too little information. Specific removal is not removed, how to remove depends on the situation, we can set a threshold, if the proportion of missing values is greater than the threshold, remove the column where it is. The higher the threshold. The more aggressive the dimensionality reduction method.
Low Variance Filter [ 74 ]: If the value of a column is the same in a dataset, that is, its variance is very low, we generally think that low-variance variables contain very little information, so you can eliminate it directly and put it into practice, which is to calculate all Variation Size variables and then eliminate the smallest of them.
High Correlation Filter [ 75 ]: If the two variables are highly correlated, this means that they have similar trends and can carry similar information. Similarly, the presence of such variables can reduce the performance of certain models (such as linear and logistic regression models). To solve such problems, we can calculate the correlation between independent variables. If the correlation coefficient exceeds a certain threshold, one of the variables is eliminated.
Random Forests/Ensemble Trees [ 76 ]: Random Forest is a widely used feature selection algorithm, it automatically calculates the importance of each feature, so no separate programming is required. This helps us choose a smaller subset of features. The advantages of the random forest: high precision, the introduction of randomness makes random forests not easy to overfit, the introduction of randomness makes the random forests have good anti-noise ability (can better handle outliers), can handle very high-dimensional data without feature selection, it can handle both discrete data and continuous data, and the data set does not need to be normalized, fast training speed, you can get the importance of variable classification and easy to parallelize. Disadvantages of the random forest: when there are many decision trees in the random forest, the space and time required for training will be large, and the interpretability of the random forest is poor.
Principal Component Analysis (PCA) [ 77 ]: PCA is a very common dimensionality reduction method. You can reduce the number of predictors by reducing the dimensionality of high-dimensional data while eliminating noise through dimensionality reduction. The most direct application is to compress data, mainly used in signal processing Noise reduction, and visualization after data dimensionality reduction.
An ensemble is a set of machine learning models. Each model produces a different prediction. The predictions from the different models are combined to obtain a single prediction. The advantage we get from combining different models is that because each model works differently, its errors tend to be compensated for. This results in a better generalization error.
Training multiple machine learning models with the same data [ 78 ]. When we have new data, we will get a prediction from each model. Each model will have a vote associated with it. In this way, we will propose as a final prediction what most of the models vote for. There is another way to combine voting. When machine learning models give a probability, we can use “soft-voting”. In soft voting, more importance is given to results in which some model is very confident. That is, when the prediction is very close to probability 0 or 1, more weight is given to the prediction of that model.
Unlike majority voting, the way to get errors to compensate for each other is that each model is trained with subsets of the training set [ 79 ]. These subsets are formed by randomly choosing samples (with repetition) from the training set. The results are combined, for classification problems, as we have seen in majority voting, with the soft vote for the models that give probabilities. For regression problems, the arithmetic mean is normally used.
In boosting, each model tries to fix the errors of the previous models [ 80 ]. For example, in the case of classification, the first model will try to learn the relationship between the input attributes and the result. You will surely make some mistakes. So, the second model will try to reduce these errors. This is achieved by giving more weight to poorly classified samples and less weight to well-classified samples. For regression problems, predictions with a higher mean square error will have more weight for the next model.
When we talk about a stacking ensemble, we mean that we are stacking models [ 81 ]. When we stack models, what we are doing is using the output of multiple models as the input of multiple models.
Deep Learning is a type of machine learning that is structured and inspired by the human brain and its neural networks [ 82 ]. Deep learning processes data to detect objects, recognize conversations, translate languages, and make decisions. Being a type of machine learning, this technology helps artificial intelligence learn continuously. Deep learning is based on the use of artificial neural networks. Within neural networks 3 types are the most used.
Convolutional neural networks are artificial neural networks that have been designed to process structured matrices, such as images [ 83 ]. That is, they are responsible for classifying images based on the patterns and objects that appear in them, for example, lines, circles, or even eyes and faces.
Recurrent neural networks are neural networks that use sequential data or time-series data [ 84 ]. These types of networks solve ordinal or temporal problems, such as language translation, speech recognition, Natural Language Processing (NLP, Natural Language Processing), and image capture. Therefore, these networks are in technologies such as Siri or Google translate. In this case, natural language processing recognizes a person’s speech. For example, it is distinguished if the person who is speaking is a man or a woman, an adult or a minor, if they have an Andalusian or Catalan accent, etc. In this way, the person’s way of speaking is analyzed, and their idiolect is reached.
The antagonistic generative networks consist of using 2 artificial neural networks and opposing them to each other (that is why they are known as antagonistic) to generate new content or synthetic data that can be passed as real [ 85 ]. One of the networks generates and the other works as a “discriminator”. The discriminatory network (also known as an antagonistic network) has been trained to recognize real content and acts as a sensor for the network that generates content to make content that appears real.
The field of machine learning is the branch of Artificial Intelligence that encompasses techniques that allow machines to learn through their environment. This environment can be considered as the set of data that the algorithm has or obtains in the training stage. Reinforcement learning is the most common in nature. An individual has a connection with the environment with which he obtains information from the cause-effect relationships, the results of the actions carried out, and the strategy to follow to complete an objective [ 86 ].
The time difference method was introduced by Sutton [ 87 ] as a model-free method based on a bootstrapping update rule and consists of estimating the values of immediate and future rewards in a way like programming. Are dynamic and are denoted as TD (λ). Methods of time difference attempt to estimate the value function of a given state of a policy, and contrary to Monte Carlo methods, you do not need to wait at the end of an episode to make such an estimate. Some prominent algorithms are.
One of the algorithms that derive from the method based on time difference is the SARSA algorithm [ 88 ] which is an on-policy method, that is, it has an initial policy and updates it at the end. of each episode.
Q-learning is a value-based learning algorithm that focuses on optimizing the value function according to the environment or problem [ 89 , 90 ]. The Q in Q-learning represents the quality with which the model finds its next quality-improving action. The process can be automatic and simple. This technique is great to start your reinforcement learning journey. The model stores all the values in a table, which is Table Q. In simple words, the learning method is used for the best solution.
Deep Reinforcement Learning [ 91 ], where reinforcement learning is integrated with neural networks. The DeepMind company began to use this type of learning to create agents that would learn to play Atari games from scratch without having any information about them, not even the rules of the video game.
Metaheuristics are clever strategies to design or improve very general heuristic procedures with high performance. The term metaheuristic first appeared in Fred Glover’s seminal article on tabu search in 1986 [ 92 ]. Since then, many proposals for guidelines have emerged to design good procedures to solve certain problems that, by expanding their field of application, have adopted the denomination of metaheuristics.
Some of the main types are: Relaxation metaheuristics [ 93 ] refer to problem-solving procedures that use relaxations of the original model (that is, modifications of the model that make the problem easier to solve), the solution of which facilitates the solution of the original problem. The constructive metaheuristics [ 94 ] are oriented to the procedures that try to obtain a solution from the analysis and gradual selection of the components that form it. Search metaheuristics [ 95 ] guide procedures that use transformations or moves to traverse the space of alternative solutions and exploit the associated environment structures. Evolutionary metaheuristics [ 96 ] are focused on procedures based on solution sets that evolve over the solution space.
Deep Learning primarily emphasizes features, Reinforcement Learning primarily emphasizes feedback, and Transfer Learning primarily emphasizes adaptation. Traditional machine learning is about reaping the benefits of planting fruits and reaping the benefits of planting beans, while transfer learning can draw inferences from each other.
Artificial intelligence competition, from algorithm model development to data quality and data competition, these successful models and algorithms are mainly driven by supervised learning, and supervised learning consumes a lot of data and requires big data support (big data) to meet the precise requirements of the application. The development of artificial intelligence tends to satisfy the precise requirements of the applications without requiring massive data. Therefore, “small data learning” is becoming a new point of interest. Small data learning techniques represented by migration learning and reinforcement learning can better reflect artificial intelligence.
Since the transfer learning (TL) concept was proposed by Stevo Bozinovski and Ante Fulgosi in 1976 [ 97 ], it has received a great deal of attention from the academic community. The definition of transfer learning is too broad and a variety of specialized terms have appeared in related research, such as learning to learn, lifelong learning, multitasking learning, meta-learning, inductive transfer, knowledge transfer, context-sensitive learning, etc. Among them, transfer learning has the closest relationship with multitasking learning. Multitask learning learns multiple different tasks at the same time and discovers implicit common features to aid single-task learning.
Recognizing human activities consists of interpreting human gestures or movements through sensors to determine human action or activity [ 98 ]. For example, a HAR system can report activities performed by patients outside of hospital facilities, which makes it a useful tool for evaluating health interventions and therapy progress, and for clinical decision-making [ 99 ]. HAR can be supervised or unsupervised. The supervised HAR system requires prior training with a tagged data set, on the contrary, the unsupervised system does not require training but has a set of rules configured during development. In this particular work, we focused on a HAR system of the supervised type to recognize the following six human activities: walking (WK), climbing stairs (WU), descending stairs (WD), standing (ST), lying down (LD), and being sitting (SD). We name, in particular, the WK, WU, and WD activities as dynamic activities since they involve a voluntary movement that causes displacement and is reflected in the inertial sensors, and we call ST, LD, and SD activities. Given that they do not involve voluntary movements of the subject and there is no displacement of the person.
In HAR systems it is common to use signals and images that come from sensors that can be located in a specific physical space, such as in a room, or that can be placed or carried by people, like those found in smart cell phones or smartwatches. Smartphones are mobile phones that can perform tasks like those of a computer, such as the ability to store and process data and be able to navigate the Internet [ 100 ]. In addition, compared to personal computers, smartphones are widely accepted due to their small size, low weight, more personal device, and, especially, great connectivity that allows you to access at any time and place to information sites and social networks [ 101 ]. Other applications that are usually present are integrated cameras, contact management, multimedia software capable of playing music and being able to view photos and videos, and the use of navigation programs, and, in addition, having the ability to view business documents in different formats such as PDF and Microsoft Office [ 101 ].
Currently, different sensors are installed, such as positioning sensors, proximity sensors, temperature sensors, accelerometer, gyroscope, magnetometer, microphone, etc., as shown in Figure 2 . This is currently a challenge carried out by different scientific communities, particularly in the fields of computer vision, signal processing, and machine learning. The sensors are usually operated by a microcontroller or microprocessor, which performs the function of a computer.
Sensors of Human Activity Recognition.
Inertial sensors are sensors based on the principle of inertia, the tendency of a body to conserve its speed (in the absence of an external influence, a body remains in a uniform rectilinear motion). There are different types of sensors to measure signals that can be used by HAR systems. Two of the most used are the accelerometer and the gyroscope. The accelerometer measures the acceleration (in meters per second squared, m/s 2 ) based on the different variations that a capacitance makes inside the sensor. This capacitance is a microelectromechanical system (MEMS for its acronym in English microelectromechanical systems) that consists of the suspension of silicon particles that are located at a fixed point and are moved freely in the axis where they are measured. When acceleration occurs, the particles move and break with equilibrium in capacitance; this is measured to provide the information that occurs in a certain axis.
According to the type of sensors and the occupation of the indoor environments, a series of datasets have been built that have served to carry out different experiments based on machine learning techniques. The most prominent datasets are: UCI HAR [ 102 ], KU-HAR [ 103 ], Precis HAR [ 104 ], Fall-Up Dataset [ 105 ], VanKasteren [ 106 ], CASAS Multiresident [ 107 ], WISDM [ 108 ], DOMUS [ 109 ], Opportunity [ 110 ], CASAS Aruba [ 111 ], USC-HAD [ 112 ], MIT PlaceLab [ 113 ], Smart Environment—Ulster University [ 114 ], CASAS–Daily Life Kyoto [ 115 ], PAMAP2 [ 116 ], mHealth [ 117 ], DSADS [ 118 ], UJAmI SmatLab [ 119 ].
The methodology for the analysis of the publications is supported and defined by Kitchenham [ 120 ]. This methodology consists of identifying the main research problem, and then disaggregating each of its components by analyzing the different inclusions and exclusions, to determine a suitable search string to be used in scientific databases. Specifically for our case study, in addition to the Scientometric type variables, those related to the type of dataset used, techniques or algorithms implemented, as well as the quality of the results measured by the quality metrics, were identified. Kitchenham [ 120 ] defines different stages of the literature review process, among which the following can be highlighted: (a) Identification of the search parameters (search objectives, hypotheses identified) (b) Definition of search engines (selection of specialized databases where the study is to be developed) (c) Response to the hypotheses that were raised for the literature inquiry process.
By these previously defined phases, the first thing to do is to identify the central question of the inquiry process. For this literature review, it would be “What are the different techniques based on Machine Learning that support the analysis of dataset recognition of human activities?”. To carry out the literature review, the IEEE, Scopus, Science Direct, and WOS databases were used. To delimit the documentary findings, the following search string was used: (HAR OR ADL OR AAL) AND dataset AND (“indoor environment’’ OR “smart homes” OR “intelligent buildings” OR “ambient intelligence” OR “assisted living’’). In Figure 3 you can see the basic concept scheme for the review document filter. Then the references were analyzed by the machine learning technique that is implemented, which is described in Section 6 .
Relationship between concepts for the literature review.
It is important to specify that the order of the different terms that are observed in Figure 3 determine all those that are part of the domain of knowledge, which was previously tested in the different search engines of the scientific databases to eliminate the different noises from them. that can be generated at the time of the search and the exclusion of papers not related to the study area. Taking into account the previously explained methodology, different factors of the analytical order and high importance for those interested in this area of knowledge were described in the meta-analytic matrix, such as year of publication of the work (which is not greater than a window of 5 years), journal, conference or book where the publication was made, quartile in the case of publications in journals, country of origin of the first author as well as the university or research center. Other technical variables are taken into account in the same way for the development of this research, such as Name of the dataset, type of data collection, type of activities carried out, several individuals who define the occupation, data mining techniques used, hybridization of techniques, results of quality metrics.
In the results obtained from the 570 articles processed, different relevant variables were taken into account, among which are detailed: (1) the year of publication of the article see Figure 4 , (2) the database where the publication can be found, (3) Type of publication if it is a journal, conference or book, (4) Quartile of the journal in the case of publications in this medium, (5) country of origin of the journal (6) country of origin of the first author of the article (7) University of the first author, (8) Dataset used for the experiments, (9) Techniques used for the discovery of information and (10) results of the metrics of each technique.
Years of publication of the articles.
It can be identified that 2018 was where the most publications were generated in HAR’s line of work. In the same way, when discriminating the databases in which the publications are made, it is highlighted that most of the works have been published in the Science Direct database, then Scopus. Some publications are visible in different databases, as shown in Figure 5 .
Of the total articles, analyzed 64% of them refer to conference publications, 4 & are books and 36% refer to journals, see Figure 6 a,b.
( a ) Publications division according to typology ( b ). Distribution by quartiles of publications.
6.1. supervised learning applied to human activity recognition dataset.
Regarding the application of Machine Learning techniques, to Human Activity Recognition Dataset, various experiments have been developed, but the most relevant ones found in the literature are highlighted below (see Table 3 ). Tasmin [ 121 ], carried out implementations in the UCI-HAR Dataset, through the implementation of supervised algorithms Nearest Neighbor, Decision Tree, Random Forest, and Naive Bayes, of the techniques used, the one with the best results in the detection of activities was the Bayesian with an accuracy of 76.9%. Igwe [ 122 ], concentrated his experimentations on the ARAS Data-set which was implemented in 2 different locations (House A and House B), CA-SAS Tulum created by WSU University, the author applied supervised techniques such as SVM, ANN, and MSA (Margin Setting Algorithm), demonstrating the effectiveness of the latter in identifying activities with an accuracy of 68.85%, 96.24% and 68% in the respective Datasets.
Supervised Techniques results.
Dataset | Technique | Metrics | References | |||
---|---|---|---|---|---|---|
Accuracy | Precision | Recall | F-Measure | |||
UCI Machine Learning | Nearest Neighbor | 75.7 | - | - | - | [ ] |
Decision Tree | 76.3 | - | - | - | ||
Random Forest | 75.9 | - | - | - | ||
Naive Bayes | 76.9 | - | - | - | ||
Aras (House A) | MSA (Margin Setting Algorithm) | 68.85 | - | - | - | [ ] |
SVM | 66.90 | - | - | - | ||
ANN | 67.32 | - | - | - | ||
Aras (House B) | MSA (Margin Setting Algorithm) | 96.24 | - | - | - | |
SVM | 94.81 | - | - | - | ||
ANN | 95.42 | - | - | - | ||
CASAS Tulum | MSA (Margin Setting Algorithm) | 68.00 | - | - | - | |
SVM | 66.6 | - | - | - | ||
ANN | 67.37 | - | - | - | ||
Mhealth | K-NN | 99.64 | - | - | 99.7 | [ ] |
ANN | 99.55 | - | - | 99.6 | ||
SVM | 99.89 | - | - | 100 | ||
C4.5 | 99.32 | - | - | 99.3 | ||
CART | 99.13 | - | - | 99.7 | ||
Random Forest | 99.89 | - | - | 99.89 | ||
Rotation Forest | 99.79 | - | - | 99.79 | ||
WISDM, SCUT_NA-A | Sliding window with variable size, S transform, and regularization based robust subspace (SRRS) for selection and SVM for Classification | 96.1 | - | - | - | [ ] |
SCUT NA-A | Sliding window with fixed samples, SVM like a classifier, cross-validation | 91.21 | - | - | - | |
PAMPA2, Mhealth | Sliding windows with fixed 2s, SVM, and Cross-validation | 84.10 | - | - | - | |
SBHAR | Sliding windows with fixed 4s, SVM, and Cross-validation | 93.4 | - | - | - | |
WISDM | MLP based on voting techniques with nb-Tree are used | 96.35 | - | - | - | |
UTD-MHAD | Feature level fusion approach& collaborative representation classifier | 79.1 | - | - | - | |
Groupware | Mark Hall’s feature selection and Decision Tree | 99.4 | - | - | - | |
Free-living | k-NN and Decision Tree | 95 | - | - | - | |
WISDM, Skoda | Hybrid Localizing learning (k-NN-LSS-VM) | 81 | - | - | - | |
UniMiB SHAR | LSTM and Deep Q-Learning | 95 | - | - | - | |
Groupware | Sliding windows Gaussian Linear Filter and NB classifier | 89.5 | - | - | - | |
Groupware | Sliding windows Gaussian Linear Filter and Decision Tree classifier | 99.99 | - | - | - | |
CSI-data | SVM | 96 | - | - | - | [ ] |
LSTM | 89 | - | - | - | ||
Built by the authors | IBK | 95 | - | - | - | [ ] |
Classifier based ensemble | 98 | - | - | - | ||
Bayesian network | 63 | - | - | - | ||
Built by the authors | Decision Tree | 91.08 | - | - | 89.75 | [ ] |
Random Forest | 91.25 | - | - | 90.02 | ||
Gradient Boosting | 97.59 | - | - | 97.4 | ||
KNN | 93.76 | - | - | 93.21 | ||
Naive Bayes | 88.57 | - | - | 88.07 | ||
SVM | 92.7 | - | - | 91.53 | ||
XGBoost | 96.93 | - | - | 96.63 | ||
UK-DALE | FFNN | 95.28 | - | - | - | [ ] |
SVM | 93.84 | - | - | - | ||
LSTM | 83.07 | - | - | - | ||
UCI Machine Learning | KNN | 90.74 | 91.15 | 90.28 | 90.45 | [ ] |
SVM | 96.27 | 96.43 | 96.14 | 96.23 | ||
HMM+SVM | 96.57 | 96.74 | 96.49 | 96.56 | ||
SVM+KNN | 96.71 | 96.75 | 96.69 | 96.71 | ||
Naive Bayes | 77.03 | 79.25 | 76.91 | 76.72 | ||
Logistic Reg | 95.93 | 96.13 | 95.84 | 95.92 | ||
Decision Tree | 87.34 | 87.39 | 86.95 | 86.99 | ||
Random Forest | 92.3 | 92.4 | 92.03 | 92.14 | ||
MLP | 95.25 | 95.49 | 95.13 | 95.25 | ||
DNN | 96.81 | 96.95 | 96.77 | 96.83 | ||
LSTM | 91.08 | 91.38 | 91.24 | 91.13 | ||
CNN+LSTM | 93.08 | 93.17 | 93.10 | 93.07 | ||
CNN+BiLSTM | 95.42 | 96.58 | 95.26 | 95.36 | ||
Inception+ResNet | 95.76 | 96.06 | 95.63 | 95.75 | ||
UCI Machine Learning | NB-NB | 73.68 | - | - | 46.9 | [ ] |
NB-KNN | 85.58 | - | - | 61.08 | ||
NB-DT | 89.93 | - | - | 69.75 | ||
NB-SVM | 79.97 | - | - | 53.69 | ||
KNN-NB | 74.93 | - | - | 45 | ||
KNN-KNN | 79.3 | - | - | 49.82 | ||
KNN-DT | 87.01 | - | - | 60.98 | ||
KNN-SVM | 82.24 | - | - | 53.1 | ||
DT-NB | 84.72 | - | - | 60.05 | ||
DT-KNN | 91.55 | - | - | 73.11 | ||
DT-DT | 92.73 | - | - | 75.97 | ||
DT-SVM | 93.23 | - | - | 77.35 | ||
SVM-NB | 30.40 | - | - | - | ||
SVM-KNN | 25.23 | - | - | - | ||
SVM-DT | 92.43 | - | - | 75.31 | ||
SVM-SVM | 43.32 | - | - | - | ||
CASAS Tulum | Back-Propagation | 88.75 | - | - | - | [ ] |
SVM | 87.42 | - | - | - | ||
DBM | 90.23 | - | - | - | ||
CASAS Twor | Back-Propagation | 76.9 | - | - | - | |
SVM | 73.52 | - | - | - | ||
DBM | 78.49 | - | - | - | ||
WISDM | KNN | 69 | 78 | - | 78 | [ ] |
LDA | 40 | 34 | - | 34 | ||
QDA | 65 | 58 | - | 58 | ||
RF | 90 | 91 | - | 91 | ||
DT | 77 | 77 | - | 77 | ||
CNN | 66 | 62 | - | 60 | ||
DAPHNET | KNN | 90 | 87 | - | 88 | |
LDA | 91 | 83 | - | 83 | ||
QDA | 91 | 82 | - | 82 | ||
RF | 91 | 91 | - | 91 | ||
DT | 91 | 83 | - | 83 | ||
CNN | 90 | 87 | - | 87 | ||
PAPAM | KNN | 65 | 66 | - | 66 | |
LDA | 45 | 45 | - | 45 | ||
QDA | 15 | 19 | - | 19 | ||
RF | 80 | 83 | - | 83 | ||
DT | 60 | 60 | - | 60 | ||
CNN | 73 | 76 | - | 73 | ||
HHAR(Phone) | KNN | 83 | 85 | - | 85 | |
LDA | 43 | 45 | - | 45 | ||
QDA | 40 | 50 | - | 50 | ||
RF | 88 | 89 | - | 89 | ||
DT | 67 | 66 | - | 66 | ||
CNN | 84 | 84 | - | 84 | ||
HHAR(watch) | KNN | 78 | 82 | - | 82 | |
LDA | 54 | 52 | - | 52 | ||
QDA | 26 | 27 | - | 27 | ||
RF | 85 | 85 | - | 85 | ||
DT | 69 | 69 | - | 69 | ||
CNN | 83 | 83 | - | 83 | ||
Mhealth | KNN | 76 | 81 | - | 81 | |
LDA | 38 | 59 | - | 59 | ||
QDA | 91 | 82 | - | 82 | ||
RF | 85 | 85 | - | 85 | ||
DT | 77 | 77 | - | 77 | ||
CNN | 80 | 80 | - | 80 | ||
RSSI | KNN | 91 | 91 | - | 91 | |
LDA | 91 | 91 | - | 91 | ||
QDA | 91 | 91 | - | 91 | ||
RF | 91 | 91 | - | 91 | ||
DT | 91 | 91 | - | 91 | ||
CNN | 91 | 90 | - | 91 | ||
CSI | KNN | 93 | 93 | - | 93 | |
LDA | 93 | 93 | - | 93 | ||
QDA | 92 | 92 | - | 92 | ||
RF | 93 | 93 | - | 93 | ||
DT | 93 | 93 | - | 93 | ||
CNN | 92 | 92 | - | 92 | ||
Casas Aruba | DT | 96.3 | 93.8 | 92.3 | 93 | [ ] |
SVM | 88.2 | 88.3 | 87.8 | 88.1 | ||
KNN | 89.2 | 87.8 | 85.9 | 86.8 | ||
AdaBoost | 98 | 96 | 95.9 | 95.9 | ||
DCNN | 95.6 | 93.9 | 95.3 | 94.6 | ||
SisFall | SVM | 97.77 | 76.17 | 75.6 | [ ] | |
Random Forest | 96.82 | 79.99 | 79.95 | |||
KNN | 96.71 | 93.99 | 68.36 | |||
CASAS Milan | Naive Bayes | 76.65 | [ ] | |||
HMM+SVM | 77.44 | |||||
CRF | 61.01 | |||||
LSTM | 93.42 | |||||
CASAS Cairo | Naive Bayes | 82.79 | ||||
HMM+SVM | 82.41 | |||||
CRF | 68.07 | |||||
LSTM | 83.75 | |||||
CASAS Kyoto 2 | Naive Bayes | 63.98 | ||||
HMM+SVM | 65.79 | |||||
CRF | 66.20 | |||||
LSTM | 69.76 | |||||
CASAS Kyoto 3 | Naive Bayes | 77.5 | ||||
HMM+SVM | 81.67 | |||||
CRF | 87.33 | |||||
LSTM | 88.71 | |||||
CASAS Kyoto 4 | Naive Bayes | 63.27 | ||||
HMM+SVM | 60.9 | |||||
CRF | 58.41 | |||||
LSTM | 85.57 |
Subasi [ 123 ], performed analysis on the Meath Dataset, applying techniques such as K-NN, ANN, SVM, C4.5, CART. Random Forest and Rotation Forest obtained better results with SVM and Random Forest with 99.89%. Maswadi [ 124 ], firstly I prepare the Dataset using Sliding window segmentation techniques with a variable size in different Datasets such as WISDM with SCUT_NA-A, SCUT NA-.An only, PAMPA2 with Mhealth, SBHAR, WISDM, UTD-MHAD, Groupware, Free-living WISDM with Skoda, UniMiB SHAR, and Groupware, showing the superiority of this technique obtaining results greater than 80% accuracy. Other authors such as Damodaran [ 125 ], applied SVM, LSTM to the CSI-Data Dataset, where better results are shown in the use of SVM with 96%.
Other authors such as Saha [ 126 ] and Das [ 127 ], define the characteristics and process for the construction of their Dataset, to which a set of techniques are applied and it should be noted that both authors show that vector support machines show efficiency in the results of classification of human activities. Franco [ 128 ], uses techniques such as FFNN, SVM, and LSTM in the UK-Dale Dataset, showing the effectiveness of FFNN with 95.28% accuracy in quality metrics.
Bozkurt [ 129 ] and Wang [ 130 ], carry out supervised learning implementations in the UCI HAR Dataset, with various combined supervised techniques, and in the case of Bozkurt, they describe that using SVM + KNN obtains good results in the classification with an accuracy of 96.71% and Wang explains that using a combination of Decision Tree it is possible to count on the accuracy of 92.73%. Outreach [ 131 ], performs analysis on two Datasets of the set CASAS Tulum and Two, highlighting the use of BackPropagation with results 88.75% and 76.9%, respectively in accuracy.
Demrozi [ 132 ], performs multiple experiments of many supervised techniques in many widely known Datasets such as WISDM, DAPHNET, PAPAM, HHAR (Phone), HHAR (watch), Mhealth, RSSI, CSI. For this, different algorithms are implemented such as KNN, LDA, QDA, RF, DT, CNN. In the case of the algorithm WISDM, DAPHNET, PAPAM, HHAR (Phone), HHAR (watch) the RF algorithm obtains the best results with accuracy of 90%, 91%, 80%, 88%, 85% precision, and recall of 91%, 91%, 83%, 89%, 85%. For Mhealth, RSSI, the performance of the QDA algorithm is denoted with 91% and 92% in accuracy and 85% and 92% in precision and recall respectively.
Xu [ 133 ], applies compares techniques such as DT, SVM, KNN, AdaBoost, DCNN in Dataset CASAS Aruba, showing the superiority of ensembled techniques such as Adaboost with the accuracy of 98%, precision 96%, recall 95.9%, and f -measure 95.9%. Other authors such as Hussain [ 134 ] apply algorithms such as SVM, Random Forest, KNN to datasets such as SisFall, the SVM results being better with 97.77% accuracy. Finally, Liciotti [ 135 ], performs experimentation on a set of well-known CASAS project Datasets such as Milan, Cairo, Kyoto 2, Kyoto 3, Kyoto 4, of algorithms such as Naive Bayes, HMM + SVM, CRF, LSTM, showing the superiority of LSTM in the results.
In the unsupervised learning applications in the literature, different applications of the algorithms can be observed that are measured with quality metrics associated with the groupings such as ARI, Jaccard Index, Silhouette Index, Euclidean, F1 Fisher’s discriminant (see Table 4 ). The following works developed by authors such as Wang [ 130 ] stand out, who uses various versions of the UCI-HAR Dataset, implementing algorithms such as K-means, HAC, FCM, both showing better results for the case of FCM. Mohmed [ 136 ] applies unsupervised algorithms like FCM to the Nottingham Trent University Dataset. Brena [ 137 ], applies his form developed by the author called PM Mo-del to perform unsupervised analysis to the Chest Sensor Dataset, Wrist Sensor Dataset, WISDM Dataset, and Smartphone Dataset, which he measures using the silhouette index. He [ 138 ], applies another method developed by the authors called wavelet tensor fuzzy clustering scheme (WTFCS) to the DSAD Dataset, obtaining an ARI index of 89.66%.
Unsupervised Techniques results.
Dataset | Technique | Metrics | References | ||||
---|---|---|---|---|---|---|---|
ARI | Jaccard Index | Silhouette Index | Euclidean | F1 Fisher’s Discriminant Ratio | |||
UCI HAR SmartPhone | K-means | 0.7727 | 0.3246 | 0.4416 | [ ] | ||
HAC | 0.4213 | 0.2224 | 0.5675 | ||||
FCM | 0.8343 | 0.4052 | 0.4281 | ||||
UCI HAR Single Chest-Mounted Accelerometer | K-means | 0.8850 | 0.6544 | 0.6935 | |||
HAC | 0.5996 | 0.2563 | 0.6851 | ||||
FCM | 0.9189 | 0.7230 | 0.7751 | ||||
Nottingham Trent University | FCM | - | - | - | - | [ ] | |
Chest Sensor Dataset | PM Model | 25.8% | - | [ ] | |||
Wrist Sensor Dataset | 64.3% | - | |||||
WISDM Dataset | 54% | - | |||||
Smartphone Dataset | 85% | - | |||||
DSAD | wavelet tensor fuzzy clustering scheme (WTFCS) | 0.8966 | - | - | - | [ ] | |
UCI HAR | Spectral Clustering | 0.543 | 0.583 | [ ] | |||
Single Linkage | 0.807 | 0.851 | |||||
Ward Linkage | 0.770 | 0.810 | |||||
Average Linkage | 0.790 | 0.871 | |||||
K-medioids | 0.653 | 0.654 | |||||
UCI HAR | K-means | 52.1 | [ ] | ||||
K-Means 5 | 50.7 | ||||||
Spectral Clustering | 57.8 | ||||||
Gaussian Mixture | 49.8 | ||||||
DBSCAN | 16.4 | ||||||
CADL | K-means | 50.9 | |||||
K-Means 5 | 50.5 | ||||||
Spectral Clustering | 61.9 | ||||||
Gaussian Mixture | 58.9 | ||||||
DBSCAN | 13.9 |
Wang [ 139 ], implements clustering-based algorithms such as Spectral Clustering, Single Linkage, Ward Linkage, Average Linkage, K-medioids to the UCI-HAR dataset, analyzing their Jaccard and Euclidean indices as shown in Table 4 . In the same way, Bota [ 140 ] also makes experiments in the UCI-HAR and CADL Dataset with the K-means, K-Means 5, Spectral Clustering, Gaussian Mixture, DBSCAN algorithms analyzing its F1 Fisher’s discriminant rat.
In the lessons based on ensemble learning, the application of multiple techniques is usually carried out, which together offer better results (see Table 5 ). Below is a detailed description of the works found in the literature review that shows the application of these techniques in the recognition of human activities. Yacchirema [ 141 ], uses a combination of techniques such as Decision Tree, Ensemble, Logistic Regression, Deepnet to analyze the SisFall Dataset, explaining the results of the DeepNet algorithm with an accuracy of 99.06%. For his part, Manzi [ 142 ], uses a mixture of X-means and SV; to analyze the Cornell Activity Dataset and TST Dataset obtaining 98.4% and 92.7% respectively.
Ensembled Learning Techniques results.
Dataset | Technique | Metrics | References | |||
---|---|---|---|---|---|---|
Accuracy | Precision | Recall | F-Measure | |||
SisFall | Decision Tree | 97.48 | - | - | - | [ ] |
Ensemble | 99.51 | - | - | - | ||
Logistic Regression | 84.87 | - | - | - | ||
Deepnet | 99.06 | - | - | - | ||
Cornell Activity Dataset | X-means-SVM | 98.4 | 95.0 | 95.8 | - | [ ] |
TST Dataset | 92.7 | 95.6 | 91.1 | - | ||
HHAR | Multi-task deep clustering | 67.2 | 65.3 | 65.9 | [ ] | |
MobiAct | 68.3 | 69.1 | 66.8 | |||
MobiSense | 72.5 | 71.2 | 70.7 | |||
NTU-RGB + D | K-Means | 85.72 | - | - | - | [ ] |
GMM | 87.26 | - | - | - | ||
UCI HAR | CELearning | 96.88% | - | - | - | [ ] |
UCI HAR | RF | 96.96 | 97.0 | 97.0 | 98 | [ ] |
XGB | 96.2 | 96 | 96 | 96 | ||
AdaB | 50.5 | 61 | 51 | 51 | ||
GB | 94.53 | 95 | 95 | 95 | ||
ANN | 92.51 | 92 | 93 | 92 | ||
V. RNN | 90.53 | 90 | 91 | 90 | ||
LSTM | 91.23 | 90 | 91 | 90 | ||
DT | 94.23 | 95 | 95 | 95 | ||
KNN | 96.59 | 97 | 97 | 97 | ||
NB | 80.67 | 84 | 81 | 81 | ||
Proposed Dataset | GB | 84.1 | 84.1 | 84.2 | 84.1 | [ ] |
RFs | 83.9 | 83.9 | 84.1 | 83.9 | ||
Bagging | 83 | 83 | 83.1 | 83 | ||
XGB | 80.4 | 80.5 | 80.4 | 80.4 | ||
AdaBoost | 77.2 | 77.3 | 77.3 | 77.3 | ||
DT | 76.9 | 77 | 77 | 77 | ||
MLP | 67.6 | 68.7 | 67.8 | 67.8 | ||
LSVM | 65 | 65.7 | 65.1 | 64.9 | ||
NLSVM | 63 | 63.3 | 63.2 | 62.8 | ||
LR | 59.6 | 60.2 | 59.8 | 59.4 | ||
KNNs | 58.9 | 60.1 | 59.2 | 58.9 | ||
GNB | 56.1 | 59.4 | 55.4 | 45.2 | ||
House A | Bernoulli NB | 78.7 | 64 | - | - | [ ] |
Decision Tree | 88 | 79.4 | - | - | ||
Logistic Regression | 81.4 | 69.2 | - | - | ||
KNN | 75.8 | 64.9 | - | |||
House B | Bernoulli NB | 95.9 | 79.4 | - | ||
Decision Tree | 97.2 | 86.4 | - | |||
Logistic Regression | 96.5 | 82.7 | - | |||
KNN | 93.1 | 79.8 | - | |||
UCI HAR | SVM-AdaBoost | 99.9 | 99.9 | [ ] | ||
k-NN-AdaBoost | 99.43 | 99.4 | ||||
ANN-AdaBoost | 99.33 | 99.33 | ||||
NB-AdaBoost | 97.24 | 97.2 | ||||
RF-AdaBoost | 99.98 | 100 | ||||
CART-AdaBoost | 99.97 | 100 | ||||
C4.5-AdaBoost | 99.95 | 100 | ||||
REPTree-AdaBoost | 99.95 | 100 | ||||
LADTree-AdaBoost | 98.84 | 98.8 | ||||
HAR Dataset | KNN | 90.3 | [ ] | |||
CART | 84.9 | |||||
BAYES | 77 | |||||
RF | 92.7 | |||||
HAPT Dataset | KNN | 89.2 | ||||
CART | 80.2 | |||||
BAYES | 74.7 | |||||
RF | 91 | |||||
ET | 91.7 | |||||
Proposed Method | 92.6 |
Ma [ 143 ], uses the model based on Multi-task deep clustering in the HHAR, MobiAct, MobiSense datasets, where the latter algorithm obtains an accuracy of 72.5%, a precision of 71.2%, and a recall of 70.7%. Budisteanu [ 144 ], describes the NTU-RGB + D Dataset, and implements the K-Means, GMM algorithms, obtaining 85.72% and 87.26% respectively. Xu [ 145 ], uses the well-known UCI-HAR Dataset, implementing the CELearning own technique, obtaining an accuracy of 96.88%.
Choudhury [ 146 ], also analyzes the UCI-HAR Dataset, with the algorithms RF, XGB, AdaB, GB, ANN, V. RNN, LSTM, DT, KNN, and NB, where the RF algorithm performs the best result in the ensemble models with 96.96%. Wang [ 147 ] for his part defines his Dataset to which he implements the algorithms GB, RFs, Bagging, XGB, AdaBoost, DT, MLP, LSVM, NLSVM, LR, KNNs, GNB, in which the RF algorithm obtains the best results with an accuracy of 83.9%. Jethanandani [ 148 ], works with the popular Dataset House A and House B, applying algorithms such as Bernoulli NB, Decision Tree, Logistic Regression, KNN. This experimentation shows the good results of the algorithms based on decision trees with 88% and 97.2% respectively.
Subasi [ 149 ], also uses the UCI-HAR Dataset, applying the algorithms SVM-AdaBoost, k-NN-AdaBoost, ANN-AdaBoost, NB-AdaBoost, RF-AdaBoost, CART-AdaBoost, C4.5-AdaBoost, REPTree-AdaBoost, LADTree-AdaBoost obtaining better results with the REPTree-AdaBoost combination with 99.95% accuracy. Padmaja [ 150 ], uses HAR Dataset, HAPT Dataset implementing the KN, CART, BAYES, RF, ET alalgorithms, a method proposed by the authors, demonstrating the superiority of the results of the method proposed by the authors.
Implementations based on Deep Learning have become very useful for the identification of activities of daily life, especially those that include image processing [ 151 , 152 ] (see Table 6 ). Some relevant results of the literature review are detailed below. Wan [ 153 ], makes use of the UCI-HAR and PAMAP2 Dataset, implementing algorithms such as CNN, LSTM, BLSTM, MLP, SVM, in which the good results of CNN network implementation are shown with 92.71% and 91% respectively. Akula [ 154 ], configures its Dataset to which it applies the algorithms LBP-Naive Bayes, HOG-Naive Bayes, LBP-KNN, HOG-KNN, LBP-SVM, HOF-SVM obtaining better results with the implementation of HOF -SVM with 85.92% accuracy.
Deep Learning Techniques results.
Dataset | Technique | Metrics | References | |||
---|---|---|---|---|---|---|
Accuracy | Precision | Recall | F-Measure | |||
Uci Har | CNN | 92.71 | 93.21 | 92.82 | 92.93 | [ ] |
LSTM | 89.01 | 89.14 | 88.99 | 88.99 | ||
BLSTM | 89.4 | 89.41 | 89.36 | 89.35 | ||
MLP | 86.83 | 86.83 | 86.58 | 86.61 | ||
SVM | 89.85 | 90.5 | 89.86 | 89.85 | ||
PAMAP2 | CNN | 91.00 | 91.66 | 90.86 | 91.16 | |
LSTM | 85.86 | 86.51 | 84.67 | 85.34 | ||
BLSTM | 89.52 | 90.19 | 89.02 | 89.4 | ||
MLP | 82.07 | 83.35 | 82.17 | 82.46 | ||
SVM | 84.07 | 84.71 | 84.23 | 83.76 | ||
Propio Infrared Images | LBP-Naive Bayes | 42.1 | - | - | - | [ ] |
HOG-Naive Bayes | 77.01 | - | - | - | ||
LBP-KNN | 53.261 | - | - | - | ||
HOG-KNN | 83.541 | - | - | - | ||
LBP-SVM | 62.34 | - | - | - | ||
HOF-SVM | 85.92 | - | - | - | ||
Uci Har | DeepConvLSTM | 94.77 | - | - | - | [ ] |
CNN | 92.76 | - | - | - | ||
Weakly Dataset | DeepConvLSTM | 92.31 | - | - | - | |
CNN | 85.17 | - | - | - | ||
Opportunity | HC | 85.69 | - | - | - | [ ] |
CBH | 84.66 | - | - | - | ||
CBS | 85.39 | - | - | - | ||
AE | 83.39 | - | - | - | ||
MLP | 86.65 | - | - | - | ||
CNN | 87.62 | - | - | - | ||
LSTM | 86.21 | - | - | - | ||
Hybrid | 87.67 | - | - | - | ||
ResNet | 87.67 | - | - | - | ||
ARN | 90.29 | - | - | - | ||
UniMiB-SAHR | HC | 21.96 | - | - | - | |
CBH | 64.36 | - | - | - | ||
CBS | 67.36 | - | - | - | ||
AE | 68.39 | - | - | - | ||
MLP | 74.82 | - | - | - | ||
CNN | 73.36 | - | - | - | ||
LSTM | 68.81 | - | - | - | ||
Hybrid | 72.26 | - | - | - | ||
ResNet | 75.26 | - | - | - | ||
ARN | 76.39 | - | - | - | ||
Uci Har | KNN | 90.74 | 91.15 | 90.28 | 90.48 | [ ] |
SVM | 96.27 | 96.43 | 96.14 | 96.23 | ||
HMM+SVM | 96.57 | 96.74 | 06.49 | 96.56 | ||
SVM+KNN | 96.71 | 96.75 | 96.69 | 96.71 | ||
Naive Bayes | 77.03 | 79.25 | 76.91 | 76.72 | ||
Logistic Regression | 95.93 | 96.13 | 95.84 | 95.92 | ||
Decision Tree | 87.34 | 87.39 | 86.95 | 86.99 | ||
Random Forest | 92.30 | 92.4 | 92.03 | 92.14 | ||
MLP | 95.25 | 95.49 | 95.13 | 95.25 | ||
DNN | 96.81 | 96.95 | 96.77 | 96.83 | ||
LSTM | 91.08 | 91.38 | 91.24 | 91.13 | ||
CNN+LSTM | 93.08 | 93.17 | 93.10 | 93.07 | ||
CNN+BiLSTM | 95.42 | 95.58 | 95.26 | 95.36 | ||
Inception+ResNet | 95.76 | 96.06 | 95.63 | 95.75 | ||
Utwente Dataset | Naive Bayes | - | - | - | 94.7 | [ ] |
SVM | - | - | - | 91.6 | ||
Deep Stacked Autoencoder | - | - | - | 97.6 | ||
CNN-BiGRu | - | - | - | 97.8 | ||
PAMAP2 | DeepCOnvTCN | - | - | - | 81.8 | |
InceptionTime | - | - | - | 81.1 | ||
CNN-BiGRu | - | - | - | 85.5 | ||
FrailSafe dataset | CNN | 91.84 | - | - | - | [ ] |
CASAS Milan | LSTM | 76.65 | - | - | - | [ ] |
Bi-LSTM | 77.44 | - | - | - | ||
Casc-LSTM | 61.01 | - | - | - | ||
ENs2-LSTM | 93.42 | - | - | - | ||
CASAS Cairo | LSTM | 82.79 | - | - | - | |
Bi-LSTM | 82.41 | - | - | - | ||
Casc-LSTM | 68.07 | - | - | - | ||
ENs2-LSTM | 83.75 | - | - | - | ||
CASAS Kyoto 2 | LSTM | 63.98 | - | - | - | |
Bi-LSTM | 65.79 | - | - | - | ||
Casc-LSTM | 66.20 | - | - | - | ||
ENs2-LSTM | 69.76 | - | - | - | ||
CASAS Kyoto 3 | LSTM | 77.5 | - | - | - | |
Bi-LSTM | 81.67 | - | - | - | ||
Casc-LSTM | 87.33 | - | - | - | ||
ENs2-LSTM | 88.71 | - | - | - | ||
Proposal | ANN | 89.06 | - | - | - | [ ] |
SVM | 94.12 | - | - | - | ||
DBN | 95.85 | - | - | - |
He [ 155 ], implements DeepConvLSTM, CNN in the UCI-HAR, and Wealky Datasets, showing good results of the implementation of Deep learning with 94.77% and 92.31% respectively. Long [ 156 ] in turn uses the Opportunity and Uni-MiB-SAHR Dataset with the algorithms HC, CBH, CBS, AE, MLP, CNN, LSTM, Hybrid, ResNet, and ARN were the results of RNA of 90.29% and 76.39%. Bozkurt [ 157 ], for his part, only analyzes the UCI-HAR Dataset, KNN, SVM, HMM + SVM, SVM + KNN, Naive Bayes, Logistic Regression, Decision Tree, Random Forest, MLP, DNN, LSTM, CNN + LSTM, CNN + BiLSTM, Inception + ResNet, the result of the DNN algorithm is shown with an accuracy of 96.81%.
Mekruksavanich [ 158 ], uses the Utwente Dataset and PAMAP2, applying the Naive Bayes, SVM, Deep Stacked Autoencoder, CNN-BiGRu techniques, showing better results with this last technique described. Papagiannaki [ 159 ] used the FrailSafe dataset with the implementation of CNN networks with an accuracy of 91.84%. Liciotti [ 139 ] uses techniques such as LSTM, Bi-LSTM, Casc-LSTM, ENs2-LSTM in the CASAS group dataset to show the dynamics of processes based on deep learning. Hassan [ 160 ], applied ANN, SVM and DBN in a proposal dataset for the development of a robust human activity recognition system based on the smartphone sensors’ data, obtaining the following accuracy results ANN 89.06%, SVM 94.12% and DBN 95.85%.
Currently, there is a new trend in reinforcement-based learning processes where it is possible to have systems capable of learning by themselves from punishment and reward schemes, defined by behavioral psychology. It has been introduced in this new line of work for HAR. Which this review shows three highly relevant works (see Table 7 ). Ber-lin [ 161 ], made implementations in the Weizmann and KTH Datasets through the implementation of Spiking Neural Network showing promising results 94.44% and 92.50%. Lu [ 162 ] uses the DoMSEV Dataset using the Deep-shallow algorithm with an accuracy of 72.9% and Hossain [ 163 ], Pop used a new Dataset to which they implemented the Deep Q-Network algorithm with an accuracy of 83.26%.
Reinforcement Learning Techniques results.
Dataset | Technique | Metrics | References |
---|---|---|---|
Accuracy | |||
Weizmann datasets | Spiking Neural Network | 94.44 | [ ] |
KTH datasets | 92.50 | ||
DoMSEV | Deep-Shallow | 72.9 | [ ] |
Proposal | Deep Q-Network (DQN) | 83.26 | [ ] |
S.Yousefi-2017 | Reinforcement Learning Agent Recurrent Neural Network with Long Short-Term Memory | 80 | [ ] |
FallDeFi | 83 | ||
UCI HAR | Reinforcement Learning + DeepConvLSTM | 98.36 | [ ] |
Proposal | 79 | [ ] | |
UCF-Sports | Q-learning | 95 | [ ] |
UCF-101 | 85 | ||
sub-JHMDB | 80 | ||
MHEALTH | Cluster-Q learning | 94.5 | [ ] |
PAMAP2 | 83.42 | ||
UCI HAR | 81.32 | ||
MARS | 85.92 | ||
DataEgo | LRCN | 88 | [ ] |
Proposal | Mask Algorithm | 96.02 | [ ] |
Proposal | LSTM-Reinforcement Learning | 90.50 | [ ] |
Proposal | Convolutional Autoencoder | 87.7 | [ ] |
In the review of the state of the art, it was possible to identify different metaheuristic techniques that contribute to the identification of different algorithms. Among the most evident results are applications of Genetic Algorithms with the following results 96.43% [ 171 ], 87.5 [ 172 ], 95,71 [ 173 ], 99.75 [ 174 ], 98.00 [ 175 ] and 98.96 [ 175 ]. In many solutions, hybrid systems or new algorithms proposed by the authors are used, see Table 8 .
Metaheuristic Learning Techniques results.
Dataset | Technique | Metrics | References |
---|---|---|---|
Accuracy | |||
Cifar-100 | L4-Banched-ActionNet + EntACS + Cub-CVM | 98.00 | [ ] |
Sbharpt | Ant-Colony, NB | 98.96 | [ ] |
Ucihar | Bee swarm optimization with a deep Q-network | 98.41 | [ ] |
Motionsense | Binary Grey Wolf Optimization | 93.95 | [ ] |
Mhealth | 96.83 | ||
Uci Har | Genetic Algorithms-SVM | 96.43 | [ ] |
Ucf50 | Genetic Algorithms-CNN | 87.5 | [ ] |
Sbhar | GA-PCA | 95,71 | [ ] |
Mnist | GA-CNN | 99.75 | [ ] |
Cifar-100 | Genetic Algorithms-SVM | 98.00 | [ ] |
Sbharpt | Genetic Algorithms-CNN | 98.96 | [ ] |
Transfer Learning TL transfers the parameters of the learned and trained model to a new model to help the training of the new model. Considering that most of the data or tasks are related, through transfer learning, the learned model parameters can be shared with the new model in a certain way to speed up and optimize the model learning efficiency. The basic motivation of TL is to try to apply the knowledge gained from one problem to a different but related problem, see Table 9 .
Transfer Learning Techniques results.
Dataset | Technique | Metrics | References | |||
---|---|---|---|---|---|---|
Accuracy | Precision | Recall | F-Measure | |||
CSI | KNN | 98.3 | - | - | - | [ ] |
SVM | 98.3 | - | - | - | ||
CNN | 99.2 | - | - | - | ||
Opportunity | KNN+PCA | 60 | - | - | - | [ ] |
GFK | 59 | - | - | - | ||
STL | 65 | - | - | - | ||
SA-GAN | 73 | - | - | - | ||
USC-HAD | MMD | 80 | - | - | - | [ ] |
DANN | 77 | - | - | - | ||
WD | 72 | - | - | - | ||
Proposal | KNN-OS | 79.84 | 85.84 | 91.88 | 88.61 | [ ] |
KNN-SS | 89.64 | 94.41 | 94.76 | 94.52 | ||
SVM-OS | 77.14 | 97.04 | 79.23 | 87.09 | ||
SVM-SS | 87.5 | 94.39 | 92.61 | 93.27 | ||
DT-OS | 87.5 | 94.61 | 92.16 | 93.14 | ||
DT-SS | 91.79 | 95.19 | 96.26 | 95.71 | ||
JDA | 86.79 | 92.71 | 93.07 | 92.89 | ||
BDA | 91.43 | 95.9 | 95.18 | 95.51 | ||
IPL-JPDA | 93.21 | 97.04 | 95.97 | 96.48 | ||
KNN-OS | 79.84 | 85.84 | 91.88 | 88.61 | ||
Wiezmann Dataset | VGG-16 MODEL | 96.95 | 97.00 | 97.00 | 97.00 | [ ] |
VGG-19 MODEL | 96.54 | 97.00 | 97.00 | 96.00 | ||
Inception-v3 Model | 95.63 | 96.00 | 96.00 | 96.00 | ||
PAMAP2 | DeepConvLSTM | - | - | - | 93.2 | [ ] |
Skoda Mini Checkpoint | - | - | - | 93 | ||
Opportunity | PCA | 66.78 | - | - | - | [ ] |
TCA | 68.43 | - | - | - | ||
GFK | 70.87 | - | - | - | ||
TKL | 70.21 | - | - | - | ||
STL | 73.22 | - | - | - | ||
TNNAR | 78.4 | - | - | - | ||
PAMAP2 | PCA | 42.87 | - | - | - | |
TCA | 47.21 | - | - | - | ||
GFK | 48.09 | - | - | - | ||
TKL | 43.32 | - | - | - | ||
STL | 51.22 | - | - | - | ||
TNNAR | 55.48 | - | - | - | ||
UCI DSADS | PCA | 71.24 | - | - | - | |
TCA | 73.47 | - | - | - | ||
GFK | 81.23 | - | - | - | ||
TKL | 74.26 | - | - | - | ||
STL | 83.76 | - | - | - | ||
TNNAR | 87.41 | - | - | - | ||
UCI HAR | CNN-LSTM | 90.8 | - | - | - | [ ] |
DT | 76.73 | . | - | - | [ ] | |
RF | 71.96 | - | - | - | ||
TB | 75.65 | - | - | - | ||
TransAct | 86.49 | - | - | - | ||
Mhealth | DT | 48.02 | - | - | - | |
RF | 62.25 | - | - | - | ||
TB | 66.48 | - | - | - | ||
TransAct | 77.43 | - | - | - | ||
Daily Sport | DT | 66.67 | . | . | . | |
RF | 70.38 | . | . | . | ||
TB | 72.86 | . | - | - | ||
TransAct | 80.83 | - | - | - | ||
Proposal | Without SVD (Singular Value Decomposition) | 63.13% | - | - | - | [ ] |
With SVD (Singular Value Decomposition) | 43.13% | - | - | - | ||
Transfer Accuracy | 97.5% | - | - | - | ||
PAMAP2 | CNN | 84.89 | - | - | - | [ ] |
UCI HAR | 83.16 | - | - | - | ||
UCI HAR | kNN | 77.28 | - | - | - | [ ] |
DT | 72.16 | - | - | - | ||
DA | 77.46 | - | - | - | ||
NB | 69.93 | - | - | - | ||
Transfer Accuracy | 83.7 | - | - | - | ||
UCF Sports Action dataset | VGGNet-19 | 97.13 | - | - | - | [ ] |
AMASS | DeepConvLSTM | 87.46 | - | - | - | [ ] |
DIP | 89.08 | - | - | - | ||
DAR Dataset | Base CNN | 85.38 | - | - | - | [ ] |
AugToAc | 91.38 | - | - | - | ||
HDCNN | 86.85 | - | - | - | ||
DDC | 86.67 | - | - | - | ||
UCI HAR | CNN_LSTM | 92.13 | - | - | - | [ ] |
CNN_LSTM_SENSE | 91.55 | - | - | - | ||
LSTM | 91.28 | - | - | - | ||
LSTM_DENSE | 91.40 | - | - | - | ||
ISPL | CNN_LSTM | 99.06 | - | - | - | |
CNN_LSTM_SENSE | 98.43 | - | - | - | ||
LSTM | 96.23 | - | - | - | ||
LSTM_DENSE | 98.11 | - | - | - |
The objective of this systematic literature review article is to provide HAR researchers with a set of recommendations, among which the different data sets that can be used depending on the type of research are highlighted. For the development of this analysis, different data sources were considered in an observation window between the years 2017 and 2021. Among the most representative databases, IEEE Xplorer can be highlighted with 256 articles, far surpassing other specialized databases such as Scopus, Science Direct, Web of Science, and ACM.
It is important to specify that 47% of the publications are due to proceedings of congresses or conferences and 36% to the specialized journal. Discriminating the quartiles where the articles are published, it is important to highlight that although the vast majority of publications are indeed focused on conference proceedings that do not have a specific category, 36% of the publications that were made in journals were are mostly in the first two quartiles Q1 and Q2.
In this article, technical analysis of different types of datasets that are used for experimentation processes with HAR was carried out. It should be noted that the creation of new data sets has increased. Some traditional approaches related to the use of indoor datasets based on the WSU Casas project remain. Also, public repositories such as UCI Machine learning have provided sets widely used in the literature such as Opportunity and UCI HAR. It should be noted that the processing of images and videos to the dataset has also been increased, allowing the application of different cutting-edge techniques, such as Weakly Dataset and UniMiB-SAHR.
In this review, different data processing approaches that have been used in this area of knowledge were used. For the specific case of supervised learning, the usability of algorithms based on decision trees such as RandomForest, Naive Bayes, and Support Vector Machine stands out. Regarding unsupervised learning, in most of the analyzed works, the use of techniques such as Spectral Clustering, Single Linkage, Ward Linkage, Average Linkage and K-medioids. Using ensembled learning, it was possible to demonstrate the use of different sets of techniques that allowed improving the results of the experiments, among which those based on classification and grouping can be highlighted. Another modern and widely used approach is the use of DeepLearning focused on datasets with massive image processing requirements, where the use of the following LSTM algorithms stands out, Bi-LSTM, Casc-LSTM, ENs2-LSTM. Other approaches based on Reinforcement learning use resources such as Q-learning and Cluster-Q with learning, in the experimentation processes. The metaheuristic-based approach shows the usability of different algorithms, among which the following stand out: L4-Banched-ActionNet+EntACS+Cub-CVM, Ant-Colony, N.B Bee swarm optimization with a deep Q-network and Genetic Algorithms.
It is important to point out that due to the high demand for data and information processing, it becomes increasingly necessary to implement techniques capable of improving performance and results, such as those based on Reinforcement Learning and Transfer Learning. Another challenge found in the literature is the processing of multi-occupancy datasets that make the use of computational resources and the identification of activities more expensive.
Among the future works that can be implemented after this systematic review of the literature, the real-time analysis of the dataset not only with data from sensors but also images and sound, among which algorithms based on Reinforcement Learning and Transfer Learning can be highlighted. provide a wide range of competitive solutions, adding multi-occupancy in data sets.
This research has received funding under the REMIND project Marie Sklodowska-Curie EU Framework for Research and Innovation Horizon 2020, under Grant Agreement No. 734355. Furthermore, this research has been supported by the Spanish government by means of the projects RTI2018-098979-A-I00, PI-0387-2018 and CAS17/00292.
European Union’s Horizon 2020 research and innovation program under the Marie Skłodowska-Curie grant agreement No. 734355.
Definition of taxonomy, P.P.A.-C., F.P. and E.V.; Conceptualization, P.P.A.-C., A.I.O.-C. and M.A.P.-M.; Human Activity Recognition conceptual Information P.P.A.-C., F.P. and E.V.; Methodology P.P.A.-C. and M.A.P.-M.; Technical and Scientometric Analysis P.P.A.-C., M.A.P.-M. and F.P. and A.Q.-L.; Formal Conclusions P.P.A.-C. and A.I.O.-C.; Supervision F.P. and E.V.; Writing-Review & Editing, P.P.A.-C., S.B.A. and F.P. All authors have read and agreed to the published version of the manuscript.
The authors declare no conflict of interest.
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.
A not-for-profit organization, IEEE is the world's largest technical professional organization dedicated to advancing technology for the benefit of humanity. © Copyright 2024 IEEE - All rights reserved. Use of this web site signifies your agreement to the terms and conditions.
You are accessing a machine-readable page. In order to be human-readable, please install an RSS reader.
All articles published by MDPI are made immediately available worldwide under an open access license. No special permission is required to reuse all or part of the article published by MDPI, including figures and tables. For articles published under an open access Creative Common CC BY license, any part of the article may be reused without permission provided that the original article is clearly cited. For more information, please refer to https://www.mdpi.com/openaccess .
Feature papers represent the most advanced research with significant potential for high impact in the field. A Feature Paper should be a substantial original Article that involves several techniques or approaches, provides an outlook for future research directions and describes possible research applications.
Feature papers are submitted upon individual invitation or recommendation by the scientific editors and must receive positive feedback from the reviewers.
Editor’s Choice articles are based on recommendations by the scientific editors of MDPI journals from around the world. Editors select a small number of articles recently published in the journal that they believe will be particularly interesting to readers, or important in the respective research area. The aim is to provide a snapshot of some of the most exciting work published in the various research areas of the journal.
Original Submission Date Received: .
Find support for a specific problem in the support section of our website.
Please let us know what you think of our products and services.
Visit our dedicated information section to learn more about MDPI.
Human activity recognition for production and logistics—a systematic literature review.
3. method of literature review, 3.1. inclusion criteria, 3.2. selection process, 3.3. literature analysis.
4.2. systematic review of relevant contributions, 4.2.1. application, 4.2.2. har methods, data representation, pre-processing, segmentation, shallow methods, deep learning, 5. discussion and conclusions.
Conflicts of interest.
Click here to enlarge figure
Ref. | Year | Author & Description |
---|---|---|
[ ] | 2013 | Lara and Labrador reviewed the state of the art in HAR based on wearable sensors. They addressed the general structure of HAR systems and design issues. Twenty-eight systems are evaluated in terms of recognition performance, energy consumption and other criteria. |
[ ] | 2014 | Xing Su et al. surveyed recent advances in HAR with smartphone sensors and address experiment settings. They divided activities into five types: living, working, health, simple and complex. |
[ ] | 2015 | Attal et al. reviewed classification techniques for HAR using accelerometers. They provided an overview of sensor placement, detected activities and performance metrics of current state-of-the-art approaches. |
[ ] | 2016 | Edwards et al. presented a review on publicly available datasets for HAR. The examined sensor technology includes MoCap and IMUs. The observed application domains are ADL, surveillance, sports and generic activities, meaning that a wide variety of actions is covered. |
[ ] | 2018 | Twomey et al. surveyed the state-of-the-art in activity recognition using accelerometers. They focused on ADL and examined, among other issues, the sensor placement and its influence on the recognition performance. |
[ ] | 2018 | O’Reilly et al. synthesised and evaluated studies which investigate the capacity for IMUs to assess movement quality in lower limb exercises. The studies are categorised into three groups: exercise detection, movement classification or measurement validation. |
Inclusion Criteria | Description |
---|---|
Database | IEEE Xplore, Science Direct, Google Scholar, Scopus, European Union Digital Library (EUDL), ACM Digital Library, LearnTechLib, Springer Link, Wiley Online Library, dblp computer science bibliography, IOP Science, World Scientific, Multidisciplinary Digital Publishing Institute (MDPI), SciTePress Digital Library (Science and Technology Publications) |
Keywords | Motion Capturing, Motion Capture, MoCap, OMC, OMMC Inertial Measurement Unit, IMU, Accelerometer, body-worn/on-body/wearable/wireless Sensor (Human) Activity/Action, Recognition, HAR Production, Manufacturing, Logistics, Warehousing, Order Picking |
Year of publication | 2009–2018 |
Language | English |
Source Types | Conference Proceedings & Peer-reviewed Journals |
Identifier | Persistent Identifier mandatory (DOI, ISBN, ISSN, arxiv) |
Content Criteria | Description |
---|---|
( ) IMU or OMMC | Method is based on data from IMUs or OMMC-Systems. The sensors and markers are either attached to the subject’s body or body-worn. |
( ) Human | Contribution addresses the recognition of activities performed by humans. |
( ) Physical World | Data are recorded in the physical world without the use of simulated or immersive environments. |
( ) Quantification | The application aims to quantitatively determine the occurrence of activities, not to capture and analyse them for developing new methods in related fields. |
( ) Application-oriented | Perspectives for deploying the proposed method in P+L is conceivable. Definition of HAR-related terms is not the contribution’s focus. |
( ) Physical activity | According to Caspersen et al., [ ] “physical activity is defined as any bodily movement produced by skeletal muscles that results in energy expenditure”. In this literature review, bodily movement is limited to torso and limb movement. |
( ) No focus on hardware | Comparison of sensor technologies or a showcase of new hardware when using it for HAR is not the contribution’s focus. |
( ) Clear Method | Publications are computer science oriented, stating clear pattern recognition methods and performance metrics. |
Stage | Description |
---|---|
(I) Keywords | Keywords of the publication match with the Inclusion Criteria. Contributions have not yet been examined by the reviewers at this point. |
(II) Title | The title does not conflict with any Content Criteria. This is because the title either complies with the criteria or it is ambiguous. |
(III) Abstract | The abstract’s content does not conflict with any Content Criteria. This is because the content either complies with the criteria or necessary specifications are missing. |
(IV) Full Text | Reading the full text confirms compliance with all Content Criteria. Properties of the publication are recorded in the literature overview. |
Root Category | ||
---|---|---|
Subcategory | Description | |
P+L | Deployment in industrial settings, e.g., production facilities or warehouses | |
Other | Related application domain, e.g., health or ADL | |
Work | Working activities such as assembly or order picking | |
Exercises | Sport Activities, e.g., riding a stationary bicycle or gymnastic exercises | |
Locomotion | Walking, running as well as the recognition of the lack of locomotion when standing | |
ADL | Activities of daily living including cooking, doing the laundry, driving a car and so forth | |
Arm | Upper and lower arm | |
Hand | including wrists | |
Leg | including knee and shank | |
Foot | including ankle | |
Torso | including chest, back, belt and waist | |
Head | including sensors attached to a helmet or protective gear | |
Smartphone | Worn in a pocket or a bag. If attached to a limb, the subcaterogy is checked as well | |
Repository | Utilised dataset is available in a repository | |
Individual | Dataset is created specifically for the contribution and not available in a repository | |
Laboratory | Recording takes place in a constraint laboratory environment | |
Real-life | Recording takes place in a real-life environment, e.g., a real warehouse or in public places | |
Name of dataset | Name, origin, repository and description of dataset | |
Passive Markers | Markers reflect light for the camera to capture | |
Active Markers | Markers emit light for the camera to capture | |
IMU | Devices that measure specific forces such as acceleration or gyroscopes | |
Pre.-Pr. | Pre-Processing: Normalisation, noise filtering, low-pass and high-pass filtering, and re-sampling | |
Segm. | Segmentation: Sliding window-approach | |
FE - Stat. Feat. | Statistical feature extraction: Time- and Frequency-Domain Features | |
FE- App.-based | Application-based features, e.g., Kinematics, Body model, Event-Based | |
FR | Feature reduction, e.g., Principal Components Analysis (PCA), Linear Discriminant Analysis (LDA), Kernel Discriminant Analysis (KDA), Random Projection (RP) | |
CL-NB | Classification method: Naïve Bayes | |
CL-HMMs | Classification method: Hidden Markov Models | |
CL-SVM | Classification method: Support Vector Machines | |
CL-MLP | Classification method: Multilayer Perceptron | |
CL-Other | Classification method: Random Forest (RF), Decision Trees (DT), Dynamic Time Warping (DTW), K-Nearest Neighbor (KNN), Fuzzy-Logic (FL), Logistic Regression (LR), Bayesian Network (BN), Least-Squares (LS), Conditional Random Field (CRF), Factorial Conditional Random Field (FCR), Conditional Clauses (CC), Gaussian Mixture Models (GMM), Template Matching (TM), Dynamic Bayesian Mixture Model (DBMM), Emerging Patterns (EP), Gradient-Boosted Trees (GBT), Sparsity Concentration Index (SCI) | |
CNN | Convolutional Neural Networks | |
tCCN | Temporal CNNs and Dilated tCNNs (DTCNN) | |
rCNN | Recurrent Neural Networks, e.g., GRU, LSTM, Bidirectional LSTM |
Stage | No. of Publications |
---|---|
(I) Keywords | 1243 |
(II) Title | 524 |
(III) Abstract | 263 |
(IV) Full Text | 52 |
General Information | Domain | Activity | Attachment | Dataset | DP | Shallow Method | DL | ||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Ref. | Year | Author | FWCI | P+L | Other | Work | Exercises | Locomotion | ADL | Arm | Hand | Leg | Foot | Torso | Head | Smartphone | Repository | Individual | Laboratory | Real-Life | Pre.-Pr. | Segm. | FE-Stat.Feat. | FE-Others | FR | CL-NB | CL-HMMs | CL-SVM | CL-MLP | CL-Others | CNN | tCNN | rCNN |
[ ] | 2009 | Xi Long et al. | 12.48 | x | x | x | x | x | x | x | x | x | x | x | x | ||||||||||||||||||
[ ] | 2010 | Altun and Barshan | 7.95 | x | x | x | x | x | x | x | x | x | x | x | LS, KNN | ||||||||||||||||||
[ ] | 2010 | Altun et al. | 4.60 | x | x | x | x | x | x | x | x | x | x | x | x | BDM,LSM,KNN,DTW | |||||||||||||||||
[ ] | 2010 | Khan et al. | 7.38 | x | x | x | x | x | x | x | x | LDA, KDA | |||||||||||||||||||||
[ ] | 2010 | Kwapisz et al. | - | x | x | x | x | x | x | x | x | DT, LR | |||||||||||||||||||||
[ ] | 2010 | Wang et al. | 4.62 | x | x | x | x | x | x | x | x | FCR | |||||||||||||||||||||
[ ] | 2011 | Casale et al. | 8.02 | x | x | x | x | x | x | x | RF | ||||||||||||||||||||||
[ ] | 2011 | Gu et al. | 5.16 | x | x | x | x | x | x | x | EP | ||||||||||||||||||||||
[ ] | 2011 | Lee and Cho | 13.37 | x | x | x | x | x | x | x | |||||||||||||||||||||||
[ ] | 2012 | Anguita et al. | 35.00 | x | x | x | x | x | x | x | x | x | x | ||||||||||||||||||||
[ ] | 2012 | Deng et al. | 0.58 | x | x | x | x | x | x | x | x | x | x | x | x | x | GM, DTW | ||||||||||||||||
[ ] | 2012 | Lara and Labrador | 7.53 | x | x | x | x | x | x | x | x | x | DT | ||||||||||||||||||||
[ ] | 2012 | Lara et al. | 18.74 | x | x | x | x | x | x | x | x | x | BN, DT, LR | ||||||||||||||||||||
[ ] | 2012 | Siirtola and Röning | - | x | x | x | x | x | x | x | x | x | QDA,KNN,DT | ||||||||||||||||||||
[ ] | 2013 | Koskimäki et al. | 1.20 | x | x | x | x | x | x | x | KNN | ||||||||||||||||||||||
[ ] | 2013 | Shoaib et al. | 9.30 | x | x | x | x | x | x | x | x | x | x | x | x | x | LR,KNN,DT | ||||||||||||||||
[ ] | 2013 | Zhang and Sawchuk | 6.37 | x | x | x | x | x | x | x | x | SCI | |||||||||||||||||||||
[ ] | 2014 | Bayat et al. | 11.96 | x | x | x | x | x | x | x | x | x | x | x | RF, LR | ||||||||||||||||||
[ ] | 2014 | Bulling et al. | 64.63 | x | x | x | x | x | x | x | x | x | x | x | x | x | KNN boosting | ||||||||||||||||
[ ] | 2014 | Garcia-Ceja et al. | 2.52 | x | x | x | x | x | x | x | x | x | CRF | ||||||||||||||||||||
[ ] | 2014 | Gupta and Dallas | 9.06 | x | x | x | x | x | x | x | x | KNN | |||||||||||||||||||||
[ ] | 2014 | Kwon et al. | 5.89 | x | x | x | x | x | x | x | x | GMM | |||||||||||||||||||||
[ ] | 2014 | Zeng et al. | 39.20 | x | x | x | x | x | x | x | x | x | x | x | x | x | x | x | x | ||||||||||||||
[ ] | 2015 | Aly and Ismail | 0.00 | x | x | x | x | x | x | x | x | x | x | x | x | x | CC | ||||||||||||||||
[ ] | 2015 | Bleser et al. | 2.95 | x | x | x | x | x | x | x | x | x | |||||||||||||||||||||
[ ] | 2015 | Chen and Xue | 20.10 | x | x | x | x | x | x | x | x | x | DBM | x | |||||||||||||||||||
[ ] | 2015 | Guo andWang | 3.56 | x | x | x | x | x | x | x | x | x | x | x | x | x | x | KNN, DT | |||||||||||||||
[ ] | 2015 | Zainudin et al. | 6.95 | x | x | x | x | x | x | x | x | DT, LR | |||||||||||||||||||||
[ ] | 2016 | Ayachi et al. | 1.59 | x | x | x | x | x | x | x | x | x | x | x | |||||||||||||||||||
[ ] | 2016 | Fallmann and Kropf | - | x | x | x | x | x | x | x | x | x | x | x | x | x | x | ||||||||||||||||
[ ] | 2016 | Feldhorst et al. | 1.31 | x | x | x | x | x | x | x | x | x | x | x | RF | ||||||||||||||||||
[ ] | 2016 | Hammerla et al. | 23.99 | x | x | x | x | x | x | x | x | x | x | x | x | x | x | x | x | x | x | ||||||||||||
[ ] | 2016 | Liu et al. | 30.36 | x | x | x | x | x | x | x | x | x | x | x | x | x | KNN | ||||||||||||||||
[ ] | 2016 | Margarito et al. | 5.24 | x | x | x | x | x | x | x | x | DTW, TM | |||||||||||||||||||||
[ ] | 2016 | Ordóñez and Roggen | 42.72 | x | x | x | x | x | x | x | x | x | x | x | x | x | x | x | x | x | x | ||||||||||||
[ ] | 2016 | Reyes-Ortiz et al. | 12.46 | x | x | x | x | x | x | x | x | x | x | x | x | x | x | x | x | x | |||||||||||||
[ ] | 2016 | Ronao and Cho | 24.82 | x | x | x | x | x | x | x | |||||||||||||||||||||||
[ ] | 2016 | Ronao and Cho | 4.89 | x | x | x | x | x | x | x | x | x | x | ||||||||||||||||||||
[ ] | 2017 | Song-Mi Lee et al. | 10.91 | x | x | x | x | x | x | x | |||||||||||||||||||||||
[ ] | 2017 | Scheurer et al. | 7.06 | x | x | x | x | x | x | x | x | GBT, KNN | |||||||||||||||||||||
[ ] | 2017 | Vital et al. | 0.93 | x | x | x | x | x | x | x | x | x | x | x | x | x | x | x | x | DBMM | |||||||||||||
[ ] | 2018 | Chen et al. | 1.77 | x | x | x | x | x | x | x | x | x | x | ||||||||||||||||||||
[ ] | 2018 | Moya Rueda et al. | 3.69 | x | x | x | x | x | x | x | x | x | x | x | x | x | x | x | x | x | x | ||||||||||||
[ ] | 2018 | Nair et al. | 0.00 | x | x | x | x | x | x | x | x | x | |||||||||||||||||||||
[ ] | 2018 | Reining et al. | 0.00 | x | x | x | x | x | x | x | x | x | x | x | x | x | x | ||||||||||||||||
[ ] | 2018 | Tao et al. | 0.00 | x | x | x | x | x | x | x | x | ||||||||||||||||||||||
[ ] | 2018 | Wolff et al. | 0.00 | x | x | x | x | x | x | x | x | x | x | ||||||||||||||||||||
[ ] | 2018 | Xi et al. | 1.61 | x | x | x | x | x | x | x | x | x | x | x | x | x | x | ||||||||||||||||
[ ] | 2018 | Xie et al. | 6.43 | x | x | x | x | x | x | x | x | RF | |||||||||||||||||||||
[ ] | 2018 | Yao et al. | 3.53 | x | x | x | x | x | x | x | x | x | x | x | x | x | x | ||||||||||||||||
[ ] | 2018 | Zhao and Obonyo | 0.00 | x | x | x | x | x | x | x | x | x | x | x | x | x | x | KNN | |||||||||||||||
[ ] | 2018 | Zhu et al. | 0.00 | x | x | x | x | x | x | x | x | x | RF,KNN,LR | x | |||||||||||||||||||
Ref. | Name | Utl. in |
---|---|---|
[ ] | Actitracker from Wireless Sensor Data Mining (WISDM) | [ ] |
[ ] | Activity Prediction from Wireless Sensor Data Mining (WISDM) | [ ] |
[ ] | Daphnet Gait dataset (DG) | [ ] |
[ ] | Mocap Database HDM05 | [ ] |
[ ] | Realistic Sensor Displacement Benchmark Dataset (REALDISP) | [ ] |
[ ] | Smartphone-Based Recognition of Human Activities and Postural Transitions Dataset | [ ] |
[ ] | USC-SIPI Human Activity Dataset (USC-HAD) | [ ] |
[ ] | Wearable Action Recognition Database (WARD) | [ ] |
Ref. | Name | Description | Utl. in |
---|---|---|---|
[ ] | Opportunity | Published in 2012, this dataset contains recordings from wearable, object, and ambient sensors in a room simulating a studio flat. Four subjects were asked to perform early morning cleanup and breakfast activities. | [ , , , , , , ] |
[ ] | Human Activity Recognition Using Smartphones Data Set | The dataset from 2012 contains smartphone-recordings. 30 subjects at the age of 19 to 48 performed six different locomotion activities wearing a smartphone on the waist. | [ , , , , ] |
[ ] | 2 | Published in 2012, this dataset provides recordings from three IMUs and a heart rate monitor. Nine subjects performed twelve different household, sports and daily living activities. Some subjects performed further optional activities. | [ , , , , ] |
PAMAP | |||
[ ] | Hand Gesture | The dataset from 2013 contains 70 minutes of arm movements per subject from eight ADLs as well as from playing tennis. Two recorded subjects were equipped with three IMUs on the right hand and arm. | [ , , ] |
[ ] | Skoda | This dataset from the year 2008 contains ten manipulative gestures performed by a single worker in a car maintenance scenario. 20 accelerometers were used for recording. | [ , ] |
1 | 20 | 20 | 20 | 20 | 25 | 25 | 30 | 40 | 30 | 50 | 50 | 50 | 50 | 50 | 98 | 100 | 100 | 100 | 126 | 300 | |
10–20 | 3 | 5 | 5 | 25 | 5 | 10 | 0.72 | 1.9–7.5 | 1.67 | 1.28 | 1.2–1.3 | 2 | 2.56 | 5,12,20 | 0.5 | 2.56 | 1.28 | 4 | 6 | 0.67 | |
- | 33 | 50 | - | 50 | - | - | 50 | - | 50 | 50–75 | 50 | 50 | 50 | 50 | 50 | 50 | 50 | 50 | 50 | 5 |
Domain | Features | Definitions | Publications |
---|---|---|---|
Time | Variance | Arithmetic Variance | [ , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , ] |
Mean | Arithmetic Mean | [ , , , , , , , , , , , , , , , , , , , , , , , , , , ] | |
Pairwise Correlation | Correlation between every pair of axes | [ , , , , , , , , , , , , , ] | |
Minimum | Smallest value in the window | [ , , , , , , , , ] | |
Maximun | Largest value in the window | [ , , , , , , , , ] | |
Energy | Average sum of squares | [ , , , , , , , ] | |
Signal Magnitude Area | [ , , , , , , ] | ||
IQR | Interquartile Range | [ , , , , , ] | |
Root Mean Square | Square root of the arithmetic mean | [ , , , , , ] | |
Kurtosis | [ , , , , ] | ||
Skewness | [ , , , , ] | ||
MinMax | Difference between the Maximum and the Minimum in the window | [ , , , ] | |
Zero Crossing Rate | Rate of the changes of the sign | [ , , ] | |
Average Absolute Deviation | Mean absolute deviations from a central point | [ , ] | |
MAD | Median Absolute Deviation | [ , ] | |
Mean Crossing Rate | [ , ] | ||
Slope | Sen’s slope for a series of data | [ ] | |
Log-Covariance | [ ] | ||
Norm | Euclidean Norm | [ ] | |
APF | Average Number of occurrences of Peaks | [ ] | |
Variance Peak Frequency | Variance of APF | [ ] | |
Correlation Person Coefficient | [ ] | ||
Angle | Angle between mean signal and vector | [ ] | |
Time Between Peaks | Time [ms] between peaks | [ ] | |
Binned Distribution | Quantisation of the difference between the Maximum and the Minimum | [ ] | |
Median | Middle value in the window | [ ] | |
Five different Percentiles | Observations in five different percentiles | [ ] | |
Sum and Square Sum in Percentiles | Sum and Square sum of observations above/below certain percentile | [ ] | |
ADM | Average Derivate of the Magnitude | [ ] | |
Frequency | Entropy | Normalised information entropy of the discrete FFT component magnitudes of the signal | [ , , , , , , , , , , ] |
Signal Energy | Sum squared signal amplitude | [ , , , , , , , , ] | |
Skewness | Symmetric of distribution | [ , , , ] | |
Kurtosis | Heavy tail of the distribution | [ , , , ] | |
DC Component of FFT and DCT | [ , , ] | ||
Peaks of the DFT | First 5 Peaks of the FFT | [ , ] | |
Spectral | [ ] | ||
Spectral centroid | Centroid of a given spectrum | [ ] | |
Frequency Range Power | Sum of absolute amplitude of the signal | [ ] | |
Cepstral coefficients | Mel-Frequency Cepstral Coefficients | [ ] | |
Correlation | [ ] | ||
maxFreqInd | Largest Frequency Component | [ ] | |
MeanFreq | Frequency Signal Weighted Average | [ ] | |
Energy Band | Spectral Energy of a Frequency Band | [ ] | |
PPF | Peak Power Frequency | [ ] |
Domain | Features | Definitions | Publications |
---|---|---|---|
Spatial | Gravity variation | Gravity acceleration computed using the harmonic mean of the acceleration along the three axes (x,y,z) | [ ] |
Eigenvalues of Dominant Directions | [ ] | ||
Structural | Trend | [ , ] | |
Magnitude of change | [ , ] | ||
Time | Autoregressive Coefficients | [ ] | |
Kinematics | User steps frequency | Number of detected steps per unit time | [ ] |
Walking Elevation | Correlation between the acceleration along the y-axis vs. the gravity acceleration or acceleration along the z-axis | [ ] | |
Correlation Hand and foot | Acceleration correlation between wrist and ankle | [ ] | |
Heel Strike Force | Mean and variance of the Heel Strike Force, which is computed using dynamics | [ ] | |
Average Velocity | Integral of the acceleration | [ ] |
Metric | # of Publications |
---|---|
Accuracy | 38 |
Precision | 12 |
Recall | 11 |
weightedF_1 | 5 |
meanF_1 | 6 |
Reining, C.; Niemann, F.; Moya Rueda, F.; Fink, G.A.; ten Hompel, M. Human Activity Recognition for Production and Logistics—A Systematic Literature Review. Information 2019 , 10 , 245. https://doi.org/10.3390/info10080245
Reining C, Niemann F, Moya Rueda F, Fink GA, ten Hompel M. Human Activity Recognition for Production and Logistics—A Systematic Literature Review. Information . 2019; 10(8):245. https://doi.org/10.3390/info10080245
Reining, Christopher, Friedrich Niemann, Fernando Moya Rueda, Gernot A. Fink, and Michael ten Hompel. 2019. "Human Activity Recognition for Production and Logistics—A Systematic Literature Review" Information 10, no. 8: 245. https://doi.org/10.3390/info10080245
Article access statistics, further information, mdpi initiatives, follow mdpi.
Subscribe to receive issue release notifications and newsletters from MDPI journals
Advertisement
35k Accesses
130 Citations
6 Altmetric
Explore all metrics
Human activity recognition (HAR) has multifaceted applications due to its worldly usage of acquisition devices such as smartphones, video cameras, and its ability to capture human activity data. While electronic devices and their applications are steadily growing, the advances in Artificial intelligence (AI) have revolutionized the ability to extract deep hidden information for accurate detection and its interpretation. This yields a better understanding of rapidly growing acquisition devices, AI, and applications, the three pillars of HAR under one roof. There are many review articles published on the general characteristics of HAR, a few have compared all the HAR devices at the same time, and few have explored the impact of evolving AI architecture. In our proposed review, a detailed narration on the three pillars of HAR is presented covering the period from 2011 to 2021. Further, the review presents the recommendations for an improved HAR design, its reliability, and stability. Five major findings were: (1) HAR constitutes three major pillars such as devices, AI and applications; (2) HAR has dominated the healthcare industry; (3) Hybrid AI models are in their infancy stage and needs considerable work for providing the stable and reliable design. Further, these trained models need solid prediction, high accuracy, generalization, and finally, meeting the objectives of the applications without bias; (4) little work was observed in abnormality detection during actions; and (5) almost no work has been done in forecasting actions. We conclude that: (a) HAR industry will evolve in terms of the three pillars of electronic devices, applications and the type of AI. (b) AI will provide a powerful impetus to the HAR industry in future.
Explore related subjects.
Avoid common mistakes on your manuscript.
Human activity recognition (HAR) can be referred to as the art of identifying and naming activities using Artificial Intelligence (AI) from the gathered activity raw data by utilizing various sources (so-called devices). Examples of such devices include wearable sensors (Pham et al. 2020 ), electronic device sensors like smartphone inertial sensor (Qi et al. 2018 ; Zhu et al. 2019 ), camera devices like Kinect (Wang et al. 2019a ; Phyo et al. 2019 ), closed-circuit television (CCTV) (Du et al. 2019 ), and some commercial off-the-shelf (COTS) equipment’s (Ding et al. 2015 ; Li et al. 2016 ). The use of diverse sources makes HAR important for multifaceted applications domains, such as healthcare (Pham et al. 2020 ; Zhu et al. 2019 ; Wang et al. 2018 ), surveillance (Thida et al. 2013 ; Deep and Zheng 2019 ; Vaniya and Bharathi 2016 ; Shuaibu et al. 2017 ; Beddiar et al. 2020 ) remote care to elderly people living alone (Phyo et al. 2019 ; Deep and Zheng 2019 ; Yao et al. 2018 ), smart home/office/city (Zhu et al. 2019 ; Deep and Zheng 2019 ; Fan et al. 2017 ), and various monitoring application like sports, and exercise (Ding et al. 2015 ). The widespread use of HAR is beneficial for the safety and quality of life for humans (Ding et al. 2015 ; Chen et al. 2020 ).
The existence of devices like sensors, video cameras, radio frequency identification (RFID), and Wi-Fi are not new, but the usage of these devices in HAR is in its infancy. The reason for HAR’s evolution is the fast growth of techniques such as AI, which enables the use of these devices in various application domains (Suthar and Gadhia 2021 ). Therefore, we can say that there is a mutual relationship between the AI techniques or AI models and HAR devices. Earlier these models were based on a single image or a small sequence of images, but the advancements in AI have provided more opportunities. According to our observations (Chen et al. 2020 ; Suthar and Gadhia 2021 ; Ding et al. 2019 ), the growth of HAR is directly proportional to the advancement of AI which thrives the scope of HAR in various application domains.
The introduction of deep learning (DL) in the HAR domain has made the task of meaningful feature extraction from the raw sensor data. The evolution of DL models such as (1) convolutional neural networks (CNN) (Tandel et al. 2020 ), (2) extending the role of transfer weighting schemes (it allows the knowledge reusability where the recognition model is trained on a set of data and the same trained knowledge can then be used by a different testing dataset) such as Inception (Szegedy et al. 2015 , 2016 , 2017 ), VGG-16 (Simonyan and Zisserman 2015 ), and Residual Neural Network (Resents)-50 (Nash et al. 2018 ), (3) series of hybrid DL models such as fusion of CNN with long short-term memory (LSTM), Inception with ResNets (Yao et al. 2017 , 2019 , 2018 ; Buffelli and Vandin 2020 ), (4) loss function designs such entropy, Kaulback Liberal divergence, and Tversky (Janocha and Czarnecki 2016 ; Wang et al. 2020a ), (5) optimization paradigms such as cross-entropy, stochastic gradient descent (SGD) (Soydaner 2020 ; Sun et al. 2020 ) has made the task of HAR-based design plug-and-play based. Even though it is getting black-box oriented, it requires better understanding to actually ensure that the 3-legged stool is stable and effective.
Typically, HAR consists of four stages (Fig. 1 ) including (1) capturing of signal activity, (2) data pre-processing, (3) AI-based activity recognition, and (4) the user interface for the management of HAR. Each stage can be implemented using several techniques bringing the HAR system to have multiple choices. Thus, the choice of the application domain, the type of data acquisition device, and the processing of artificial intelligence (AI) algorithms for activity detection makes the choices even more challenging.
Four stages of HAR process (Hx et al. 2017 )
Numerous reviews in HAR have been published, but our observations show that most of the studies are associated with either vision-based (Beddiar et al. 2020 ; Dhiman Chhavi 2019 ; Ke et al. 2013 ) or sensor-based (Carvalho and Sofia 2020 ; Lima et al. 2019 ), while very few have considered RFID-based and device-free HAR. Further, there is no AI review article that covers the detailed analysis of all the four device types that includes all four types of devices such as sensor-based (Yao et al. 2017 , 2019 ; Hx et al. 2017 ; Hsu et al. 2018 ; Xia et al. 2020 ; Murad and Pyun 2017 ), vision-based (Feichtenhofer et al. 2018 ; Simonyan and Zisserman 2014 ; Newell Alejandro 2016 ; Crasto et al. 2019 ), RFID-based (Han et al. 2014 ), and device-free (Zhang et al. 2011 ).
An important observation to note here is that technology has advanced in the field of AI, i.e., deep learning (Agarwal et al. 2021 ; Skandha et al. 2020 ; Saba et al. 2021 ) and machine learning methods (Hsu et al. 2018 ; Jamthikar et al. 2020 ) and is revolutionizing the ability to extract deep hidden information for accurate detection and interpretation. Thus, there is a need to understand the role of these new paradigms that are rapidly changing HAR devices. This puts the requirement to consider a review inclined to address simultaneously changing AI and HAR devices. Therefore, the main objective of this study is to better understand the HAR framework while integrating devices and application domains in the specialized AI framework. What types of devices can fit in which type of application, and what attributes of the AI can be considered during the design of such (Agarwal et al. 2021 ) a framework are some of the issues that need to be explored. Thus, this review is going to illustrate how one can select such a combination by first understanding the types of HAR devices, and then, the knowledge-based infrastructure in the fast-moving world of AI, knowing that some of such combinations can be transformed into different applications (domains).
The proposed review is structured as follows: Sect. 2 covers the search strategy, and literature review with statistical distributions of HAR attributes. Section 3 illustrates the description of the HAR stages, HAR devices, and HAR application domains in the AI framework. Section 4 illustrates the role of emerging AI as the core of HAR. Section 5 presents performance evaluation criteria in the HAR and integration of AI in HAR devices. Section 6 consists of a critical discussion on factors influencing HAR, benchmarking of the study against the previous studies, and finally, the recommendations. Section 7 finally concludes the study.
“Google Scholar” is used for searching articles published between the periods of 2011-present. The search included the keywords “human activity recognition” or “HAR” in combination with terms “machine learning”, “deep learning”, “sensor-based”, “vision-based”, “RFID-based” and, “device-free”. Figure 2 shows the PRISMA diagram showing the criteria for the selection of HAR articles. We identified around 1548 articles in the last 10 years period, which were then short-listed to 175 articles based on three major assessment criteria: AI models used, target application domain, and data acquisition devices which are the three main pillars of the proposed review. In the proposed review we have formed two clusters of attributes based on three major assessment criteria. Cluster 1 includes 7 HAR devices and applications-based attributes, and cluster 2 includes 7 AI attributes. HAR devices and application-based attributes are: data source, #activities, datasets, subjects, scenarios, total #actions and performance evaluation, while the AI attributes includes: #features, feature extraction, ML/DL model, architecture, metrics, validation and hyperparameters/optimizer/loss function. The description of HAR devices and applications-based attributes is given in Sect. 3.2 . Further, the Tables A.1 , A.2 , A.3 and A.4 of "Appendix 1 " illustrate these attributes for various studies considered in the proposed review. The cluster 2’s AI attributes are discussed in Sect. 4.2 and Table 3 , 4 , 5 and 6 illustrate the insight about AI models adapted by researchers in their HAR model. Apart from three major criteria, three exclusion, and four inclusion criteria were also followed in research articles selection. Excluded (1) articles with traditional and older AI techniques, (2) non-relevant articles, and (3) articles with insufficient data. These exclusion criteria consisted of 991, 125, and 54 articles (marked as E1, E2, and E3 in PRISMA flowchart) that lead to the finalization of the 175 articles. Included (1) non-redundant articles, (2) articles with the detailed screening of abstract and conclusion, (3) articles based on eligibility criteria assessment which includes advanced AI techniques, target domain, and device-type, and (4) article’s qualitative synthesisation including impact factor of journal, and author’s contribution in HAR domain; (marked as I1, I2, I3, and I4 in PRISMA flowchart).
PRISMA model for the study selection
In the proposed review, we performed a rigorous analysis of the HAR framework in terms of AI techniques, device types, and application domain. One of the major observations of the proposed study is the existence of a mutual relationship among HAR device types and AI techniques. First, the analysis on HAR devices is presented in Fig. 3 a which is based on the articles considered between the periods of 2011 to 2021. It shows the changing pattern of HAR devices over time. Secondly, the growth of ML and DL techniques is presented in Fig. 3 b which shows that the HAR is trending towards the use of DL-based techniques. The HAR devices distribution is elaborated more in Fig. 4 a, in Fig. 4 b we have shown the further categorization of sensor-based HAR into the wearable sensor (WS) and smartphone sensor (SPS). Figure 4 c shows the division of vision-based HAR into video and skeleton-based models. Further, Fig. 4 d shows the types of HAR application domains.
a Changing pattern of HAR devices over time, b distribution of machine learning (ML) and deep learning (DL) articles in last decade
a Types of HAR devices, b sensor-based devices, c vision-based devices, d HAR applications. WS: wearable sensors, SPS: smartphone sensor, sHome: smart home, mHealthcare: health care monitoring, cSurv: crowd surveillance, fDetect: fall detection, eMonitor: exercise monitoring, gAnalysis: gait analysis
In Fig. 3 a, according to the device-wise analysis vision-based HAR was popular between the period 2011–2016. But from the year 2017 sensor-based models’ growth is more prominent and this is the same time period when DL techniques entered the HAR domain (Fig. 3 b). In the period 2017–2021, Wi-Fi devices evolved as one of the data sources for gathering activity.
Figure 3 b shows the year-wise distribution of articles published using ML and DL techniques. The key observation is the transition of AI techniques from ML to DL. From the year 2011–2016, the HAR models with ML framework were popular, while the HAR models using DL techniques started to evolve from the year 2014. In the last 3 years, this growth has increased significantly. Therefore, after analysing graphs of Fig. 3 a, b thoroughly, we can say that the HAR devices are evolving, as the trend is shifting towards the DL framework. This combined analysis verifies our claim of the existence of the mutual relationship between AI and device types.
Devices used in the HAR paradigm are the premier component of HAR by which HAR can be classified. We observed a total of 9 review articles arranged in chronological order (see Table 1 ). These reviews focused mainly on three sets of devices such as sensor-based (marked in light shade color) (Carvalho and Sofia 2020 ; Lima et al. 2019 ; Wang et al. 2016a , 2019b ; Lara and Labrador 2013 ; Hx et al. 2017 ; Demrozi et al. 2020 ; Crasto et al. 2019 ; De-La-Hoz-Franco et al. 2018 ) or vision-based (marked with dark shade color) (Beddiar et al. 2020 ; Dhiman Chhavi 2019 ; Ke et al. 2013 ; Obaida and Saraee 2017 ; Popoola and Wang 2012 ), device-free HAR (Hussain et al. 2020 ). Table 1 summarizes the nine articles based on the focus area, keywords, number of keywords, research period, and #citations. Note that sensor-based HAR captures activity signals using ambient and embedded sensors, vision-based HAR involves 3-dimensional (3D) activity data gathering using a 3D camera or depth camera. In device-free HAR, activity data is captured using Wi-fi transmitter–receiver units.
The objective of developing HAR models is to provide information about human actions which helps in analyzing the behavior of a person in a real environment. It allows computer-based applications to help users in performing tasks and to improve their lifestyle such as remote care to the elderly living alone, and posture monitoring during exercise. This section presents about HAR framework that includes HAR stages, HAR devices, and target application domains.
There are four main stages in the HAR process: data acquisition, pre-processing, model training, and performance evaluation (Figure S.1(a) in supporting document) . In stage 1 , depending on the target application, a HAR device is selected. For example, in surveillance application involving multiple persons, the HAR device for data collection is the camera. Similarly, for applications where a person's daily activity monitoring is involved, the data acquisition source is sensor preferably. One can use a camera also, but it breaches the user's privacy and needs high computational cost. Table 2 illustrates the variation in HAR devices according to the application domains. It elaborates the description of diverse HAR applications in terms of various data sources and AI techniques. Note that sometimes the acquired data suffer from noise or other unwanted signals, and therefore offers challenges in post-processing AI-based systems. Thus, it is very important to have a robust feature extraction system with a robust network for better prediction. In stage 2 , data cleaning is performed, which involves low-pass or high-pass filters for noise suppression or image enhancement (Suri 2013 ; Sudeep et al. 2016 ). This data undergoes regional and boundary segmentation (Multi Modality State-of-the-Art Medical Image Segmentation and 2011 ; Suri et al. 2002 ; Suri 2001 ). Our group has published several dedicated monograms on segmentation paradigms and are available as ready reference (Suri 2004 , 2005 ; El-Baz and Jiang 2016 ; El-Baz and Suri JS 2019 ). This segmented data can now be used for model training. Stage 3 involves the training of HAR model using ML or DL techniques. When using hand-crafted features, one can use ML-based techniques (Maniruzzaman et al. 2017 ). For automated feature extraction, one can use the DL framework. Apart from automatic feature learning, DL offers knowledge reusability by providing transfer learning models, exploration of huge datasets (Biswas et al. 2018 ), and hybrid DL models usage which allows spatial as well as temporal features identification and learning. After stage 3, the HAR model is ready to be used for an application or prediction. Stage 4 is the most challenging part since the model is applied to the real data, whose behavior varies depending on physical factors like age, physique, and an approach for performing a task. An HAR model is efficient if its performance is independent of physical factors.
The HAR device type depends on the target application. Figure S.1(b) (Supporting document) presents the different sources for activity data: sensors, video cameras, RFID systems, and Wi-Fi devices.
The sensors-based approaches can be categorized into wearable sensors and device sensors. In wearable sensor-based approach, a body-worn sensor module is designed which includes inertial sensors, environmental sensors units (Pham et al. 2017 , 2020 ; Hsu et al. 2018 ; Xia et al. 2020 ; Murad and Pyun 2017 ; Saha et al. 2020 ; Tao et al. 2016a , b ; Cook et al. 2013 ; Zhou et al. 2020 ; Wang et al. 2016b ; Attal et al. 2015 ; Chen et al. 2021 ; Fullerton et al. 2017 ; Khalifa et al. 2018 ; Tian et al. 2019 ). Sometimes the wearable sensor devices can be stressful for the user, therefore the solution is the use of smart-device sensors. In device sensor approach data is captured using smartphone inertial sensors (Zhu et al. 2019 ; Yao et al. 2018 ; Wang et al. 2016a , 2019b ; Zhou et al. 2020 ; Li et al. 2019 ; Civitarese et al. 2019 ; Chen and Shen 2017 ; Garcia-Gonzalez et al. 2020 ; Sundaramoorthy and Gudur 2018 ; Gouineua et al. 2018 ; Lawal and Bano 2019 ; Bashar et al. 2020 ). The most commonly used sensor for HAR is accelerometer and gyroscope. Table A.1 of “Appendix 1 ”, shows the types of data acquisition devices, activity classes, and scenarios in earlier sensor-based HAR models.
It can be further classified into two types: 3D camera and depth camera. 3D camera-based HAR models uses closed-circuit television (CCTV) cameras in the user's environment for monitoring the actions performed by the user. Usually, the monitoring task is performed by humans or some innovative recognition model. Numerous HAR models were proposed by researchers, which can process and evaluate the activity video or image data and recognize the performed activities (Wang et al. 2018 ; Feichtenhofer et al. 2018 , 2017 , 2016 ; Diba et al. 2016 , 2020 ; Yan et al. 2018 ; Chong and Tay 2017 ). The accuracy of activity recognition of 3D camera data depends on physical factors such as lighting and background color. The solution to this issue can be provided by using a depth camera (like Kinect). The Kinect camera consists of different data streams such as depth, RGB, and audio. Depth stream captures body joint coordinates, and based on joint coordinates, a skeleton-based HAR model can be developed. The skeleton-based HAR models have applications in domains that involve posture recognition (Liu et al. 2020 ; Abobakr et al. 2018 ; Akagündüz et al. 2016 ). Table A.2 of “Appendix 1 ” provides an overview of earlier vision-based HAR models. Apart from 3D and depth cameras, one can use thermal cameras but it can be expensive.
By installing RFID passive tags in close proximity of the user, the activity data can be collected using RFID readers. As compared to active RFID tags, passive tags have more operational life as they do not need a separate battery. Rather it uses the reader's energy and converts it into an electrical signal for operating its circuitry. But the range of active tags is more than passive tags. They both can be used for HAR models (Du et al. 2019 ; Ding et al. 2015 ; Li et al. 2016 ; Yao et al. 2018 ; Zhang et al. 2011 ; Xia et al. 2012 ; Fan et al. 2019 ). The further description of existing RFID-based HAR models is provided in Table A.3 of “Appendix 1 ”.
Wi-Fi device: In the last 5 years, the device-free HAR has gained popularity. Researchers have explored the possibility of capturing activity signals using Wi-Fi devices. Channel state information (CSI) from the wireless signal is used to acquire activity data. Many models were developed for fall detection and gait recognition using CSI (Yao et al. 2018 ; Wang et al. 2019c , d , 2020b ; Zou et al. 2019 ; Yan et al. 2020 ; Fei et al. 2020 ). The description of some popular existing Wi-Fi device-based HAR is provided in Table A.4 of “Appendix 1 ”.
There are almost four types of HAR devices, and researchers have proposed various HAR models with advanced AI techniques. Gradually, the usage of electronic devices for gathering activity data in HAR domain is increasing, but with this growth, the challenges are also evolving: (1) Video camera-based application involves data gathering using a video camera, which results in the invasion of user’s privacy. It also requires high power systems to process large data produced by video cameras, (2) In sensors-based HAR models, the use of wearable devices is stressful and inconvenient for the user, therefore smartphone sensors are more preferable. But the use of smartphone and smartwatch is limited to simple activities recognition such as walking, sitting, and going upstairs, (3) In RFID tags and reader-based HAR models, the usage of RFID in activity capturing is limited to indoor only. (4) Wi-Fi-based HAR models are new in the HAR industry, but there are few issues with it. Moreover, it can capture activities performed within the Wi-Fi range but cannot identify the movement in blind spot areas.
In the last decade, researchers have developed various HAR models for different domains. “What type of HAR device is suitable for which application domain and what is the suitable AI methodology” is the biggest question that pops into the mind, once developing the HAR framework. The description of diverse HAR applications with data sources and AI techniques is illustrated in Table 2 . It shows the variation in HAR devices and AI techniques depending on the application domain. The pie chart in Fig. 4 d shows the distribution of applications based on existing articles. HAR is used in fields like:
Crowd surveillance (cSurv): Crowd pattern monitoring and detecting panic situations in the crowd.
Health care monitoring (mHealthcare): Assistive care to ICU patients, Trauma resuscitation.
Smart home (sHome): Care to elderly or dementia patients and child activity monitoring.
Fall detection (fDetect): Detection of abnormality in action which results in a person's fall.
Exercise monitoring (eMonitor): Pose estimation while doing exercise.
Gait analysis (gAnalysis): Analyze gait patterns to monitor health problems.
There is no predefined set of activities, rather the human activity type varies according to the application domain. Figure S.2 (Supporting document) shows the activity type involved in human activity recognition.
Here the action is performed by a person. Figure S.3 (Supporting document) shows examples of single-person activities (jumping jack, baby crawling, punching the boxing bag, and handstand walking). Single person action can be divided into the following categories:
Behavior : The goal of behavior recognition is to recognize a person’s behavior from activity data, and it is useful in monitoring applications: dementia patient & children behavior (Han et al. 2014 ; Nam and Park 2013 ; Arifoglu and Bouchachia 2017 ).
Gestures: It has application in sign language recognition for differently-abled persons. Wearable sensor-based HAR models are more suitable (Sreekanth and Narayanan 2017 ; Ohn-Bar and Trivedi 2014 ; Xie et al. 2018 ; Kasnesis et al. 2017 ; Zhu and Sheng 2012 ).
Activity of daily living (ADL) and Ambient assistive living (AAL): ADL activities are performed in an indoor environment cooking, sleeping, and sitting. In smart home, ADL monitoring for dementia patients can be performed using wireless sensor-based HAR models (Nguyen et al. 2017 ; Sung et al. 2012 ) or RFID tags based HAR models (Ke et al. 2013 ; Oguntala et al. 2019 ; Raad et al. 2018 ; Ronao and Cho 2016 ). AAL-based models help elderly and disabled people by providing remote care, medication reminder, and management (Rashidi and Mihailidis 2013 ). CCTV cameras are an ideal choice but they have privacy issues (Shivendra shivani and Agarwal 2018 ). Therefore, sensor or RFID-based HAR models (Parada et al. 2016 ; Adame et al. 2018 ) or wearable sensor-based models are more suitable (Azkune and Almeida 2018 ; Ehatisham-Ul-Haq et al. 2020 ; Magherini et al. 2013 ).
The action is performed by a group of persons. Multiple person movement is illustrated in Figure S.4 (Supporting document), depicts the normal human movement on a pedestrian pathway and anomalous activity of cyclist and truck in a pedestrian pathway. It can belong to the following categories.
Interaction: There are human–object (cooking, reading a book) (Kim et al. 2019 ; Koppula et al. 2013 ; Ni et al. 2013 ; Xu et al. 2017 ) and human–human (handshake) activities (Weng et al. 2021 ). A human–object interaction-based free weight exercise monitoring (FEMO) model using RFID devices that monitors exercise by installing a tag on dumbbells (Ding et al. 2015 ).
Group: It involves monitoring people's count in an indoor environment like a museum or crowd pattern monitoring (Chong and Tay 2017 ; Xu et al. 2013 ). To check the number of people in an area, we can use Wi-Fi units. Received signal strength can be used for counting people as it is user-sensitive.
Observation 3: Vision-based HAR has broad application domains, but they have limitations like privacy and the need for more resources (such as GPUs). These issues can be overcome with sensor-based HAR but their applications domain is currently limited to single-person activity monitoring.
The foremost goal of HAR is to predict the movement or action of a person based on the action data collected from a data acquisition device. These movements include activities like walking, exercising, and cooking. It is challenging to predict movements, as it involves huge amounts of unlabelled sensor data, and video data which suffer from conditions like lights, background noise, and scale variation. To overcome these challenges AI framework offers numerous ML, and DL techniques.
ML architectures: ML is a subset of AI, which aims at developing an intelligent model which involves the extraction of unique features, that helps in recognizing patterns in the input data (Maniruzzaman et al. 2018 ). There are two types of ML approaches: supervised and unsupervised. In supervised approach, a mathematical model is created based on the relationship between raw input data and output data. The idea behind the unsupervised approach is to detect patterns in raw input data without prior knowledge of output. Figure S.5 (Supporting document) illustrates the popular ML techniques used in recognizing human actions (Qi et al. 2018 ; Yao et al. 2019 ; Multi Modality State-of-the-Art Medical Image Segmentation and 2011 ). Several applications of ML models in handling different diseases have been developed by our group such as diabetes man(Maniruzzaman et al. 2018 ) liver cancer (Biswas et al. 2018 ), thyroid cancer (Rajendra Acharya et al. 2014 ), ovarian cancer (Acharya et al. 2013a , 2015 ), prostate (Pareek et al. 2013 ) breast (Huang et al. 2008 ), skin (Shrivastava et al. 2016 ), arrhythmia classification (Martis et al. 2013 ), and recently in cardiovascular (Acharya et al. 2012 ; Acharya et al. 2013b ). In the last 5 years, the researchers' focus has been shifted to semi-supervised learning where the HAR model is trained on labelled as well as unlabelled data. The semi-supervised approach aims to label unlabelled data using the knowledge gained from the set of labelled data. In a semi-supervised approach, the HAR model is trained on popular labelled datasets and the new users' unlabelled test data and classified into activity classes according to the knowledge gained from training data (Mabrouk et al. 2015 ; Cardoso and Mendes Moreira 2016 ).
DL/TL Architectures: In recent years, DL has become quite popular due to its capability of learning high-level features and its superior performance (Saba et al. 2019 ; Biswas et al. 2019 ). The basic idea behind DL is data representation, which enables it to produce optimal features. It learns unknown patterns from raw data without human intervention. The DL techniques used in HAR can be divided into three parts such as deep neural networks (DNN), hybrid deep learning (HDL) models, and transfer learning (TL) based models (Agarwal et al. 2021 ). (Shown in Figure S.5 of Supporting document) The DNN includes the models like convolutional neural networks (CNN) (Deep and Zheng 2019 ; Liu et al. 2020 ; Zeng et al. 2014 ), recurrent neural networks (RNN) (Murad and Pyun 2017 ) and RNN variants which include long short-term memory (LSTM) and gated recurrent unit (GRU) (Zhu et al. 2019 ; Du et al. 2019 ; Fazli et al. 2021 ). In hybrid HAR models, the combination of CNN and RNN models is trained on spatio-temporal data. Researchers have proposed various hybrid models in the last 5 years, such as DeepSense (Yao et al. 2017 ) and DeepConvLSTM (Wang et al. 2019a ). Apart from hybrid AI models, there are various transfer learning-based HAR models which involves pre-trained DL architectures like ResNet-50, Inceptionv3, VGG-16 (Feichtenhofer et al. 2018 ; Newell Alejandro 2016 ; Crasto et al. 2019 ; Tran et al. 2019 ; Feichtenhofer and Ai 2019 ). However, the role of TL in sensor-based HAR is still evolving (Deep and Zheng 2019 ).
Figure 5 a depicts a representative CNN architecture for HAR, which shows the two convolution layers followed by a pooling layer for feature extraction for the activity image, leading to dimensionality reduction. This is then followed by a fully connected (FC) layer for iterative weight computations and a softmax layer for binary or granular decision making. After that, the input image is classified into an activity class. Figure 5 b presents the representative TL-based HAR model, which includes pretrained models such as VGG-16, inception V3, and ResNet. The pre-trained model is trained on a large dataset of natural images such as man, cat, dog, and food. These pre-trained weights are applied to the training data of the sequence of images using an intermediate layer. It forms the customized fully connected layer. Further, the training weights are fine-tuned using the optimizer function. Next the retrained model is applied to testing data for the classification of the activity into an activity class.
Miniaturized mobile devices are handy to use and offer a set of physiological sensors that can be used for capturing activity signals. But the problem is the complex structure and strong inner correlation in captured data. The deep learning models which are the combination of both CNN and RNN offer benefits to explore this complex data and identify detailed features for activity recognition. One such model offered by Ordonez et al. was DeepConvLSTM (Ordóñez and Roggen 2016 ), where CNN works as feature extractor and represent the sensor input data as feature maps, and LSTM layer explores the temporal dynamics of feature maps. Yao et al. have proposed similar model named as DeepSense in which two convolution layers (individual and merge conv layers) and stacked GRU layers were used as main building blocks (Yao et al. 2017 ). Figure 5 c shows the representative hybrid HAR model with CNN-LSTM frameworks.
a CNN model for HAR \(({\text{where}}\;\omega _{{\left( * \right)}} {\text{:}}\;{\text{ weights}}\;{\text{of}}\;{\text{hidden}}\;{\text{layers}},\;\sigma \left( * \right){\text{:}}\;{\text{activation}}\;{\text{function}},\;\lambda {\text{:}}\;{\text{learning}}\;{\text{rate}}\) . \({*}{:} {\text{ convolutional operation}},{ }{\mathcal{L}}\left( {\upomega } \right){\text{ is the loss function}}){ }\) , b TL-based model for HAR, and c hybrid HAR model (CNN-LSTM)
DL model learns by means of loss function. It evaluates how well an algorithm models the applied data. If it deviates largely from actual output, the value of the loss function will be very large. The loss function with the help of optimization function learns gradually to reduce the prediction error. Mostly used loss functions in HAR models are mean squared loss and cross-entropy (Janocha and Czarnecki 2016 ; Wang et al. 2020a ).
Mean absolute error (δ): is calculated as the average sum of absolute differences between predicted \({(\text{y}}_{{\text{i}}} )\) and actual \({(\hat{\text{y}}}_{{\text{i}}} ){\text{ output}}\) . N is the number of training samples
Mean squared error ( \({\upvarepsilon }\) ): is calculated as the average of the squared difference between the predicted \({(\hat{\text{y}}}_{{\text{i}}} )\) and actual output \({(\text{y}}_{{\text{i}}} )\) . N is the number of training samples
Cross-entropy loss ( \({\upeta }\) ): evaluates the performance of a model whose output probability ranges between 0 and 1. The loss increases if predicted probability \(({\text{y}}_{{\text{i}}} ){ }\) diverges from actual output \(\widehat{{({\text{y}}}}_{{\text{i}}} ){ }\) .
Binary cross-entropy loss: predict the probability between two activity classes.
Multiclass cross-entropy loss : Multi-class CEL is the generalization of binary CEL where each class is assigned a unique integer value range between 0 to n −1 (n is a number of classes).
Kullback Lieblar-divergence (KL-divergence): is a measure of how a probability distribution diverges from another distribution. For the probability distribution of P(x) and Q(x), KL-divergence is defined as the logarithmic difference between P(x) and Q(x) with respect to P(x).
Hyper-parameters and optimization
Drop-out rate : regularization technique where few neurons are dropped to avoid overfitting.
Learning rate : it defines how fast parameters are updated in a network.
Momentum : it helps in the next step direction based on knowledge gained in previous steps.
Number of hidden layers : number of hidden layers between input and output layers.
It is a method used for changing the parameters of neural networks. DL provides a wide range of optimizers: gradient descent (GD), stochastic gradient descent (SGD), RMSprop, and Adam optimizers. GD is a first-order optimization that relies on the first-order derivative of the loss function. SGD is the variant of GD, which involves frequent variation in a model’s parameter. It computes the loss for each training sample and alters the model’s parameters. Further, the RMSprop optimizer lies in the domain of adaptive learning. RMSprop deals with the vanishing/exploding gradient issue by using a moving average of squared gradients to normalize the gradient. The most powerful optimizer is the Adam optimizer which has the strength of momentum of GD to hold the gained knowledge of updates, adaptive learning of RMSprop optimizer, offers two new hyper-parameters beta and beta 2 (Soydaner 2020 ; Sun et al. ( 2020 ).
The most common validation strategies are K-fold cross validation and leave one subject out (LOSO). In k-fold, the k-onefold is used for training and the remaining is used for validation. A similar pattern is followed in k-fold variants such as twofold, threefold, and tenfold cross-validation. In LOSO, out of whole dataset, the data of one subject is kept for validation and the rest is used for training.
There are various HAR devices for capturing human activity signals. The goal of HAR devices is to capture activity signals with minimal distortion. For providing deeper insight into the existing HAR models, we have identified seven AI attributes and used tabular representation for better understanding. It consists of attributes such as #features, feature extraction, AI model architecture, metrics, validation, hyper-parameters/optimizer/loss function. For in-depth description of recent HAR models between 2019 and 2021, we have made four tables for each HAR device: Table 3 (sensor), Table 4 (vision), Table 5 (RFID), and Table 6 (device-free).
In Table 3 , we have provided insight into AI techniques adopted in sensor-based HAR models in the last two years. Apart from recent sensor-based HAR models, knowledge about previous sensor-based HAR models published between 2011–2018 is provided in Table S.1 (Supporting document) (Zhu et al. 2019 ; Ding et al. 2019 ; Yao et al. 2017 , 2019 ; Hsu et al. 2018 ; Murad and Pyun 2017 ; Sundaramoorthy and Gudur 2018 ; Lawal and Bano 2019 ). The sensor-based HAR is more dominated by DL techniques especially CNN or the CNN's combination with RNN or its variants. In sensor-based HAR the most used hyper-parameters are learning rate, batch size, #layers, and drop out. Adam optimizer, cross-entropy loss, and k-fold validation are dominant in sensor-based HAR. For example, Table 3 ’s (R2, C4) presents the 3D CNN-based HAR model which includes 3 convolutional layers of size (32, 64,128) followed by a pooling layer, then an FC layer of size (128) and a softmax layer. Entry (R2, C6) illustrate the validation strategy (10% data was used for validation) and entry (R2, C7) illustrates the hyperparameters (i.e., LR = 0.001, batch size = 50) and selected optimiser (Adam) for performance fine-tuning. Table 4 illustrates the AI framework in vision-based HAR models published in recent 2 years. Further, description of earlier vision-based HAR models published between 2011–2018 are provided in Table S.2 (Supporting document) (Qi et al. 2018 ; Wang et al. 2018 ; Thida et al. 2013 ; Feichtenhofer et al. 2018 , 2017 ; Simonyan and Zisserman 2014 ; Newell Alejandro 2016 ; Diba et al. 2016 ; Xia et al. 2012 ; Vishwakarma and Singh 2017 ; Chaaraoui 2015 ). Initial vision-based HAR models were dominated by ML algorithms such as support vector machine (SVM), k-means clustering with principal component analysis (PCA)-based feature extraction. In the last few years, researchers have shifted to DL paradigm and the most dominant DL techniques such as multi-dimensional CNN, LSTM, and a combination of both. In video camera-based HAR models, the incoming data is video stream which needs more resources and processing time. This issue gives rise to the usage of TL in vision-based HAR approaches. The hyper-parameters used in vision-based HAR are drop-out rate, learning rate, weight decay, and batch normalization. The mean square loss and cross-entropy loss are the most used loss functions, while RMSProp and SGD are the most dominant optimizers in vision-based HAR. For example, Table 4 ’s (R1, C3) illustrates the description of 3DCNN based HAR model which includes input layer with skeletal joints information split into coloured skeleton motion history images (Color-skl-MHI), and relative joint images (RJI) followed by 3DCNN, then a fusion layer to combine the o/p of both 3DCNN layers and last is the output layer. Table 5 shows the recognition models using RFID devices published in the last 2 years, while details of the earlier RFID-based HAR models are provided in Table S.3 (Supporting document) (Ding et al. 2015 ; Li et al. 2016 ; Fan et al. 2017 ). RFID-based HAR is mostly dominated by ML algorithms like SVM, sparse coding, and dictionary learning. Very few researchers have used DL techniques. Some RFID-based HAR models used traditional approach in which received signal strength indicator (RSSI) is used for data gathering and recognition task is performed by calculating the similarity in dynamic time warping (DTW). Table 6 provides the overview of device-free HAR models where Wi-Fi devices are used for collecting activity data. The recognition approach is similar to RFID-based HAR. Further, ML approaches are more dominant than DL.
A visible growth of DL in vision-based HAR devices is observed in terms of existing HAR models mentioned in Table 4 where most of the recent work is done using advanced DL techniques like TL using VGG-16, VGG-19, and ResNet-50. Apart from these TL-based models, there are hybrid models using autoencoders as shown in row R8 of Table 4 which includes CNN, LSTM, and autoencoder-based HAR model for extracting deep features from enormous volumed video datasets. But the impact of advanced DL techniques in sensors-based HAR and device-free HAR is not very powerful. Due to the compact size and versatility of miniaturized and wireless sensing devices, they are progressing to become the next revolution in the HAR framework, and the key to their progress is the emerging DL framework. The data gathered from these devices is unlabelled, complex, and has strong inter-correlation. DL offers (1) advanced algorithms like TL, and unsupervised learning techniques such as generative adversarial networks (GAN) and variational autoencoders (VAE), (2) fast optimization techniques such as SGD, Adam, and (3) dedicated DL libraries like TensorFlow, (Py) Torch, and Theano to handle complex data.
Observation 4: DL techniques are still in an evolving stage. Minimal work has been done using TL in sensor-based HAR models. Most of the approaches are discriminative where supervised learning is used for training HAR models. Generative models like VAE and GAN have evolved in the computer vision domain but they are still new in the HAR domain.
5.1 performance evaluation.
Researchers have adopted different metrics for evaluating the performance of HAR models, and the most popular evaluation metric is accuracy. The most used metrics in sensor-based HAR include accuracy, sensitivity, specificity, and F1-score. The evaluation metrics used in existing vision-based HAR models were accuracy i.e., top-1, top-5, and mean average precision (mAPS). Metrics used in RFID-based HAR include accuracy, F1-score, recall, and precision. The metrics used in Device-free HAR include F1-score, precision, recall, and accuracy. “Appendix 2 ” shows the mathematical representations of the performance evaluation metrics used in the HAR framework.
In the last few years, a significant growth can be seen in the usage of DL in the HAR framework, but there are challenges associated with DL models such as (1) Overfitting/Underfitting: When the amount of activity data is limited, the HAR model learns too well during training that it learns the irregularities and random noise as part of data. As a result, it negatively impacts the model’s generalization ability. Underfitting is another negative condition where the HAR model neither models the new data nor generalizes to new unseen data. Both overfitting and underfitting result in lower performance. By selecting the appropriate optimizer, we can overcome the overfitting condition by tuning the right hyperparameters or by increasing the size of training data, or using k-fold cross validation. The challenge is to select the correct range of hyperparameters that can work well during training and testing protocols and works well when the HAR model is used in real-life applications. (2) Hardware integration in HAR devices: In the last 10 years various HAR models with high performance came into the picture, but the question is “how well they can be used in real-environment without integrating specialized hardware like graphics processing units (GPUs), and extra memory”. Therefore, the objective for designing a HAR model is to design a robust and lightweight model which can run in real-environment without the need for specialized hardware. For applications with huge data such as videos, we need GPUs for training the model. Python offers libraries (such as Keras, TensorFlow) for implementing AI framework on a general-purpose CPU processor. For working on GPUs, one needs to explore special libraries for implementing AI models. Sometimes, it may result in specialized hardware integration need in the target application which makes it expensive. Processing power and costs are interrelated i.e., one needs to pay more for extra power.
In the proposed review, we made four observations based on the tri-stool of HAR which includes HAR devices, AI techniques, and applications. Based on these observations and challenges highlighted in Sects. 3 and 4 , we have made three claims and four recommendations.
(i) Mutual relationship among HAR devices and AI framework: Our first claim is based on the observation 1 and 2 where we illustrate that the advancement in AI directly affects the growth of HAR devices. In Sect. 2 , Fig. 3 a presents the growth of HAR devices in the last 10 years. Further, Fig. 3 b illustrates the advancement in AI, which shows how researchers have shifted to the DL paradigm from ML in the last 5 years. Therefore, from observations 1 and 2, we can rationalize that the advancement in AI is resulting in the growth of HAR devices. Most of the earlier HAR models were dependent on cameras or customized wearable sensors data but in the last 5 years more devices like embedded sensors, Wi-Fi devices came into the picture as prominent HAR sources.
(ii) Growth in HAR devices increases the scope of HAR in various application domains: Claim 2 is based on observation 3, where we have shown that for the best results how a target application is depending on a HAR device. For applications like crowd monitoring, if we use sensor devices for gathering the activity data it will not be able to give prominent results because sensors are best for single person applications. Similarly, if we use a camera in a smart home environment, it will not be a good choice because cameras invade user’s privacy and require a high computational cost.
Therefore, we can conclude that multi-person applications like surveillance video cameras are proven best. However, for single-person monitoring applications smart device sensors are more suitable.
(iii) HAR devices, AI, and target application domains are three pillars in HAR framework: From all four observations and claims (1 and 2), we have proved that HAR devices, AI, and application domains are three pillars in the success of a HAR model.
The objective of the proposed review is to provide a complete and comprehensive review of HAR based on the three pillars i.e., device-type, AI techniques, and application domains. Table 7 provides the benchmarking of the proposed review with existing studies.
The narrative review surely needs a special note on types of HAR datasets. (1) Sensor-based : Researchers have proposed many popular sensor-based datasets. In Table A.5 (“Appendix 1 ”), the description of sensor-datasets is illustrated with attributes such as data source, #factors, sensor location, and activity type. It includes wearable sensor-based datasets (Alsheikh et al. 2016 ; Asteriadis and Daras 2017 ; Zhang et al. 2012 ; Chavarriaga et al. 2013 ; Munoz-Organero 2019 ; Roggen et al. 2010 ; Qin et al. 2019 ), as well as smart-device sensor-based datasets (Ravi et al. 2016 ; Cui and Xu 2013 ; Weiss et al. 2019 ; Miu et al. 2015 ; Reiss and Stricker 2012a , b ; Lv et al. 2020 ; Gani et al. 2019 ; Stisen et al. 2015 ; Röcker et al. 2017 ; Micucci et al. 2017 ) Apart from datasets mentioned in Table A.5 , there are few more datasets worth mentioning such as Kasteren dataset (Kasteren et al. 2011 ; Chen et al. 2017 ), which is also very popular. (2) Vision-based HAR: Devices for collecting 3D data are CCTV cameras (Koppula and Saxena 2016 ; Devanne et al. 2015 ; Zhang and Parker 2016 ; Li et al. 2010 ; Duan et al. 2020 ; Kalfaoglu et al. 2020 ; Gorelick et al. 2007 ; Mahadevan et al. 2010 ), depth cameras (Cippitelli et al. 2016 ; Gaglio et al. 2015 ; Neili Boualia and Essoukri Ben Amara 2021 ; Ding et al. 2016 ; Cornell Activity Datasets: CAD-60 & CAD-120 2021 ), and videos from public domains like YouTube and Hollywood movie scenes (Gu et al. 2018 ; Soomro et al. 2012 ; Kuehne et al. 2011 ; Sigurdsson et al. 2016 ; Kay et al. 2017 ; Carreira et al. 2018 ; Goyal et al. 2017 ). The reason behind using public domain videos is that they have no privacy issue, unlike with cameras. Table A.6 (“Appendix 1 ” illustrates the description of vision-based datasets which includes data source, #factors, sensor location, and activity type. Apart from datasets mentioned in Table A.6 , there are few more publicly available datasets such as MCDS (Magnetic wall chess board video) datasets (Tanberk et al. 2020 ), NTU-RGBD datasets (Yan et al. 2018 ; Liu et al. 2016 ), VIRAT 1.0 (3 hour person vehicle interaction), and VIRAT 2.0 (8 hour surveillance scene of school parking) (Wang and Ji 2014 ). (3) RFID-based: RFID-based HAR is mostly used for smart home applications, where actions performed by the user are monitored by RFID tags. To the best of our knowledge, there is hardly a public dataset available for RFID-based HAR. Researchers have developed their own datasets for their respective applications. One such dataset was developed by Ding et al . (Ding et al. 2015 ) in 2015 which includes data of 10 exercises performed by 15 volunteers for 2 weeks with a total duration of 1543 min. Similarly, Li et al. developed the dataset for trauma resuscitation including the 10 activities and 5 resuscitation phases (Li et al. 2016 ). A similar strategy was followed by Du et al. ( 2019 ), Fan et al. ( 2017 ), Yao et al. ( 2017 , 2019 ), Wang et al. ( 2019d ). (4) Device-free: There are not many popular datasets that are publicly available. However, researchers followed the same strategy which is adopted in RFID-based HAR. Yan et al. included the data of 6 volunteers with 440 actions in their dataset with a total of 4400 samples (Wang et al. 2019c ). Similarly, Yan et al. ( 2020 ), Fei et al. ( 2020 ), Wang et al. ( 2019d ) have proposed their own datasets.
This is the first review of its kind where we demonstrated the HAR system consisting of three components such as HAR devices, AI models, and HAR applications. This is the only review that considered all four kinds of HAR devices such as sensor-based, vision-based, RFID-based, and device-free in the AI framework. The engineering perspective was discussed on AI in terms of architecture, loss function design, and optimization strategies. A comprehensive and comparative study was conducted in the benchmarking section. We also provided sources of datasets for the readers. Limitations: A significant amount of work has been done in the HAR domain, but some limitations need to be addressed. (1) Synchronised activities: According to earlier HAR models, researchers have made this presumption that a person performs a single activity at a time. But it is not true, in the real-world humans perform synchronized activities such as talking on smartphone and walking or reading a book. As per our knowledge, there is hardly a HAR model that considered synchronized activities in their recognition model. (2) Complex and composite activities: Various state-of-the-art results have been achieved by researchers with simple and atomic activities such as: running, walking, stairs-up, and down. But very limited work has been done with complex activities where an activity includes two or more simple actions. For example, exercise monitoring where an exercise like burpees includes jump, bending down, and extending legs. Such kind of complex and risky activity requires attention for proper posture monitoring, but to the best of our knowledge, there is no HAR model which can monitor exercise involving complex activity. (3) Future action forecast: A significant amount of work has been done in HAR but most of the work is based on the identification of action performed by a user like fall detection. There is no HAR model which predicts the future action. For example, in a smart home environment if an elderly person is doing exercise and there are chances of fall then it will be very helpful if there is a smart system that can identify fall in advance and inform the person timely for necessary precaution(s). (4) Lack of real-time validation of HAR models: In earlier HAR models, for validation researchers have used k-fold cross validation and LOSO, where a part on the dataset is used for validation. However, most of the data in datasets are gathered in the experimental setup, which lacks real-time flavour. Therefore, there is a need for a model which can provide good results on experimental as well as real-time data without AI bias (Suri et al. 2016 ).
Trending AI technique : the use of transfer learning had shown significant results with vision-based HAR models. But there is very less work done on the sensor-based HAR model. Sensor-based HAR with TL can be the next revolution in the HAR domain.
Trending device type : a decade before the most popular data capturing source for activity signals were video cameras. But there are some major issues associated with vision-based HAR such as user privacy and GPU requirements. The solution to these problems is sensor-based HAR where a simple smartphone or smartwatch is used for capturing activity signals. In the last 3 years, the sensor-based HAR is one of the most trending HAR approach.
Dominant target domain in HAR : Although, HAR has multifaceted application domains such as surveillance, healthcare, and fall detection. Healthcare is the most crucial domain where HAR plays an important role which includes remote health monitoring to patients, exercise monitoring, assistive care to the elderly living alone. In the current COVID-19 pandemic scenario, the sensor-based HAR model with DL technique can be used to provide assistive care to home-quarantined COVID-19 patients by monitoring their health remotely.
Abnormal action identification and future action prediction: A significant amount of work has been done in HAR, but most of the work revolves around the recognition of simple activities. A very little amount of work has been done in finding the abnormalities in actions. Abnormality conditions are categorized into two categories: physical and non-physical. Under physical conditions, examples include (a) Fall detection in normal conditions under activities of daily living (ADL), (b) Fall detection in elderly health monitoring conditions, and (c) Fall detection in sports conditions. Only physical abnormality can be detected under this paradigm. Under Non-physical abnormality, examples include dizziness, headaches, vomiting feeling. These are not truly physical parameters that can be detected via the camera. Note that these non-physical parameters can however be monitored via special sensor-based devices, such as hypertension monitor, oximeter, etc. Further, to our knowledge, there are not many applications that combine camera and sensor devices in non-physical frame. Apart from abnormality identification, there is hardly any work done on the prediction of future action based on current actions. For example, A person is running or walking and he is not focusing or concentrating on the road on which he is travelling. Suddenly, there is an obstacle on the road in his path. He trips and falls down. Such detections are forecasting actions and happen suddenly. There is no application that can detect the obstacle and raise an alarm in advance. Forecasting is more towards the projections at distant times, unlike nearly current time spatial and temporal information. Similarly, forecasting is challenging in the motion estimation for subsequent frames where data is not available and unseen.
Unlike earlier review articles where researchers focus was on a single HAR device, we have proposed the study that revolves around the three pillars of HAR i.e., HAR devices, AI, and application domains. In the proposed review, we have hypothesized that the growth in HAR devices is synchronized with the evolving AI framework, and the study rationalizes this by providing evidence in terms of graphical representation of existing HAR models. Our second hypothesis says the growth in AI is the core of HAR which makes it suitable for multifaceted domains. We rationalized this by presenting representative CNN and TL architectures of HAR models, and also discussed the importance of hyperparameters, optimizers, and loss functions in the design of HAR models. A unique contribution is in the area of the role of the AI framework in existing HAR models for each of the HAR devices. This study further surfaced out (1) sensor-based HAR with miniaturizing devices will show the ground for opportunities in healthcare application, especially remote care, and monitoring, and (2) device-free HAR with the use of Wi-Fi device can make the usage of HAR as an essential part of human’s healthy life. Finally, the study presented four recommendations that will expand the vision of new researchers and help them in expanding the scope of HAR in diverse domains with evolving AI framework for providing a quality of healthy life to human.
Ambient assistive living
Activity of daily living
Artificial intelligence
Average pooling
Area under curve
Batch normalization
Cross entropy loss
Convolution neural network
Convolution
Cross validation
Dynamic time warping
Equal error rate
Fully connected
Generative adversarial n/w
Giga floating point operations/sec
Graphics processing unit
Gated recurrent unit
Kullback Lieblar
Leave one subject out
Learning rate
Long short-term memory
Mean absolute error
Mean average precision
Max pooling
Mean square error
Principal component analysis
Radio frequency identification
Recurrent neural network
Received signal strength indicator
Sensitivity
Stochastic gradient descent
Specificity
Support vector machine
Transfer learning
Variational autoencoders
True positive
True negative
False positive
False negative
Likelihood ratio
Abobakr A, Hossny M, Nahavandi S (2018) A skeleton-free fall detection system from depth images using random decision forest. IEEE Syst J 12(3):2994–3005. https://doi.org/10.1109/JSYST.2017.2780260
Article Google Scholar
Acharya UR et al (2012) An accurate and generalized approach to plaque characterization in 346 carotid ultrasound scans. IEEE Trans Instrum Meas 61(4):1045–1053. https://doi.org/10.1109/TIM.2011.2174897
Acharya UR et al (2013b) Automated classification of patients with coronary artery disease using grayscale features from left ventricle echocardiographic images. Comput Methods Programs Biomed 112(3):624–632. https://doi.org/10.1016/j.cmpb.2013.07.012
Acharya UR et al (2015) Ovarian tissue characterization in ultrasound: a review. Technol Cancer Res Treat 14(3):251–261. https://doi.org/10.1177/1533034614547445
Acharya UR, Sree SV, Saba L, Molinari F, Guerriero S, Suri JS (2013a) Ovarian tumor characterization and classification using ultrasound—a new online paradigm. J Digit Imaging 26(3):544–553. https://doi.org/10.1007/s10278-012-9553-8
Adame T, Bel A, Carreras A, Melià-Seguí J, Oliver M, Pous R (2018) CUIDATS: An RFID–WSN hybrid monitoring system for smart health care environments. Future Gen Comput Syst 78:602–615. https://doi.org/10.1016/j.future.2016.12.023
Agarwal M et al (2021) A novel block imaging technique using nine artificial intelligence models for COVID-19 disease classification, characterization and severity measurement in lung computed tomography scans on an Italian cohort. J Med Syst. https://doi.org/10.1007/s10916-021-01707-w
Agarwal M et al (2021) Wilson disease tissue classification and characterization using seven artificial intelligence models embedded with 3D optimization paradigm on a weak training brain magnetic resonance imaging datasets: a supercomputer application. Med Biol Eng Comput 59(3):511–533. https://doi.org/10.1007/s11517-021-02322-0
Akagündüz E, Aslan M, Şengür A (2016) Silhouette orientation volumes for efficient fall detection in depth videos. 2194(c):1–8. https://doi.org/10.1109/JBHI.2016.2570300 .
Alsheikh MA, Selim A, Niyato D, Doyle L, Lin S, Tan HP (2016) Deep activity recognition models with triaxial accelerometers. In: AAAI workshop technical reports, vol. WS-16-01, pp 8–13, 2016.
Arifoglu D, Bouchachia A (2017) Activity recognition and abnormal behaviour detection with recurrent neural networks. Procedia Comput Sci 110:86–93. https://doi.org/10.1016/j.procs.2017.06.121
Asteriadis S, Daras P (2017) Landmark-based multimodal human action recognition. Multimed Tools Appl 76(3):4505–4521. https://doi.org/10.1007/s11042-016-3945-6
Attal F, Mohammed S, Dedabrishvili M, Chamroukhi F, Oukhellou L, Amirat Y (2015) Physical human activity recognition using wearable sensors. Sensors (Switzerland) 15(12):31314–31338. https://doi.org/10.3390/s151229858
Azkune G, Almeida A (2018) A scalable hybrid activity recognition approach for intelligent environments. IEEE Access 6(8):41745–41759. https://doi.org/10.1109/ACCESS.2018.2861004
Bashar SK, Al Fahim A, Chon KH (2020) Smartphone based human activity recognition with feature selection and dense neural network. In: Proceedings of annual international conference of the ieee engineering in medicine and biology society EMBS, vol. 2020-July, pp 5888–5891, 2020. https://doi.org/10.1109/EMBC44109.2020.9176239
Beddiar DR, Nini B, Sabokrou M, Hadid A (2020) Vision-based human activity recognition: a survey. Multimed Tools Appl 79(41–42):30509–30555. https://doi.org/10.1007/s11042-020-09004-3
Biswas M et al (2018) Symtosis: a liver ultrasound tissue characterization and risk stratification in optimized deep learning paradigm. Comput Methods Programs Biomed 155:165–177. https://doi.org/10.1016/j.cmpb.2017.12.016
Biswas M et al (2019) State-of-the-art review on deep learning in medical imaging. Front Biosci Landmark 24(3):392–426. https://doi.org/10.2741/4725
Buffelli D, Vandin F (2020) Attention-based deep learning framework for human activity recognition with user adaptation. arXiv, 2020.
Cardoso HL, Mendes Moreira J (2016) Human activity recognition by means of online semi-supervised learning, pp. 75–77. https://doi.org/10.1109/mdm.2016.93
Carreira J, Noland E, Banki-Horvath A, Hillier C, Zisserman A (2018) A short note about kinetics-600, 2018. [Online]. http://arxiv.org/abs/1808.01340 .
Carvalho LI, Sofia RC (2020) A review on scaling mobile sensing platforms for human activity recognition: challenges and recommendations for future research. IoT 1(2):451–473. https://doi.org/10.3390/iot1020025
Chaaraoui AA (215) Abnormal gait detection with RGB-D devices using joint motion history features, 2015
Chavarriaga R et al (2013) The opportunity challenge: a benchmark database for on-body sensor-based activity recognition. Pattern Recognit Lett 34(15):2033–2042. https://doi.org/10.1016/j.patrec.2012.12.014
Chen WH, Cho PC, Jiang YL (2017) Activity recognition using transfer learning. Sensors Mater 29(7):897–904. https://doi.org/10.18494/SAM.2017.1546
Chen Y, Shen C (2017) Performance analysis of smartphone-sensor behavior for human activity recognition. IEEE Access 5(c):3095–3110. https://doi.org/10.1109/ACCESS.2017.2676168
Chen J, Sun Y, Sun S (2021) Improving human activity recognition performance by data fusion and feature engineering. Sensors (Switzerland) 21(3):1–23. https://doi.org/10.3390/s21030692
Chen K, Zhang D, Yao L, Guo B, Yu Z, Liu Y (2020) Deep learning for sensor-based human activity recognition: overview, challenges and opportunities. arXiv, vol. 37, no. 4, 2020
Chong YS, Tay YH (2017) Abnormal event detection in videos using spatiotemporal autoencoder. In: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol 10262 LNCS, pp 189–196, 2017. https://doi.org/10.1007/978-3-319-59081-3_23
Cippitelli E, Gasparrini S, Gambi E, Spinsante S (2016) A human activity recognition system using skeleton data from RGBD sensors. Comput Intell Neurosci. https://doi.org/10.1155/2016/4351435
Civitarese G, Presotto R, Bettini C (2019) Context-driven active and incremental activity recognition, 2019. [Online]. http://arxiv.org/abs/1906.03033 .
Cook DJ, Krishnan NC, Rashidi P (2013) Activity discovery and activity recognition: a new partnership. IEEE Trans Cybern 43(3):820–828. https://doi.org/10.1109/TSMCB.2012.2216873
Cornell Activity Datasets: CAD-60 & CAD-120 (2021) [Online]. Available: re3data.org: Cornell Activity Datasets: CAD-60 & CAD-120; editing status 2019-01-22; re3data.org—Registry of Research Data Repositories. https://doi.org/10.17616/R3DD2D . Accessed 17 Apr 2021
Crasto N et al (2019) MARS: motion-augmented RGB stream for action recognition to cite this version : HAL Id : hal-02140558 MARS: motion-augmented RGB stream for action recognition, 2019. [Online]. http://www.europe.naverlabs.com/Research/
Cui J, Xu B (2013) Cost-effective activity recognition on mobile devices. In: BODYNETS 2013—8th international conference on body area networks, pp 90–96, 2013. https://doi.org/10.4108/icst.bodynets.2013.253656
De-La-Hoz-Franco E, Ariza-Colpas P, Quero JM, Espinilla M (2018) Sensor-based datasets for human activity recognition—a systematic review of literature. IEEE Access 6(c):59192–59210. https://doi.org/10.1109/ACCESS.2018.2873502
Deep S, Zheng X (2019) Leveraging CNN and transfer learning for vision-based human activity recognition. In: 2019 29th international telecommunication networks and application conference ITNAC 2019, pp 35–38, 2019. https://doi.org/10.1109/ITNAC46935.2019.9078016
Demrozi F, Pravadelli G, Bihorac A, Rashidi P (2020) Human activity recognition using inertial, physiological and environmental sensors: a comprehensive survey. IEEE Access 8:210816–210836. https://doi.org/10.1109/ACCESS.2020.3037715
Devanne M, Wannous H, Berretti S, Pala P, Daoudi M, Del Bimbo A (2015) 3-D human action recognition by shape analysis of motion trajectories on Riemannian manifold. IEEE Trans Cybern 45(7):1340–1352. https://doi.org/10.1109/TCYB.2014.2350774
Dhiman Chhavi VDK (2019) state of art tech for HAR.pdf., pp 21–45
Diba A, Pazandeh AM, Van Gool L (2016) Efficient two-stream motion and appearance 3D CNNs for video classification, 2016, [Online]. http://arxiv.org/abs/1608.08851
Diba A et al. (2020) Large scale holistic video understanding. In: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol 12350 LNCS, pp 593–610, 2020. https://doi.org/10.1007/978-3-030-58558-7_35
Ding R et al (2019) Empirical study and improvement on deep transfer learning for human activity recognition. Sensors (Switzerland). https://doi.org/10.3390/s19010057
Ding W, Liu K, Fu X, Cheng F (2016) Profile HMMs for skeleton-based human action recognition. Signal Process Image Commun 42:109–119. https://doi.org/10.1016/j.image.2016.01.010
Ding H et al. (2015) FEMO: a platform for free-weight exercise monitoring with RFIDs. In: SenSys 2015—proceedings of 13th ACM conference on embedded networked sensor systems, pp 141–154. https://doi.org/10.1145/2809695.2809708 .
Du Y, Lim Y, Tan Y (2019) A novel human activity recognition and prediction in smart home based on interaction. Sensors (Switzerland). https://doi.org/10.3390/s19204474
Duan H, Zhao Y, Xiong Y, Liu W, Lin D (2020) Omni-sourced webly-supervised learning for video recognition. In: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol 12360 LNCS, pp 670–688, 2020. https://doi.org/10.1007/978-3-030-58555-6_40
Ehatisham-Ul-Haq M, Azam MA, Amin Y, Naeem U (2020) C2FHAR: coarse-to-fine human activity recognition with behavioral context modeling using smart inertial sensors. IEEE Access 8:7731–7747. https://doi.org/10.1109/ACCESS.2020.2964237
El-Baz JSSA, Jiang X (2016) Biomedical Image Segmentation: Advances and Trends. CRC Press, Taylor & Francis Group
Book Google Scholar
El-Baz A, Suri JS (2019) Level set method in medical imaging segmentation. CRC Press, Taylor & Francis Group, London
Fan X, Gong W, Liu J (2017) I2tag: RFID mobility and activity identification through intelligent profiling. ACM Trans Intell Syst Technol 9(1):1–21. https://doi.org/10.1145/3035968
Fan X, Wang F, Wang F, Gong W, Liu J (2019) When RFID meets deep learning: exploring cognitive intelligence for activity identification. IEEE Wirel Commun 26(3):19–25. https://doi.org/10.1109/MWC.2019.1800405
Fazli M, Kowsari K, Gharavi E, Barnes L, Doryab A (2020) HHAR-net: hierarchical human activity recognition using neural networks, pp 48–58, 2021. https://doi.org/10.1007/978-3-030-68449-5_6
Fei H, Xiao F, Han J, Huang H, Sun L (2020) Multi-variations activity based gaits recognition using commodity WiFi. IEEE Trans Veh Technol 69(2):2263–2273. https://doi.org/10.1109/TVT.2019.2962803
Feichtenhofer C, Ai F (2019) SlowFast networks for video recognition technical report AVA action detection in ActivityNet challenge 2019, pp. 2–5
Feichtenhofer C, Fan H, Malik J, He K (2018) SlowFast networks for video recognition 2018. [Online]. http://arxiv.org/abs/1812.03982
Feichtenhofer C, Pinz A, Wildes RP (2017) Spatiotemporal multiplier networks for video action recognition. In: Proceedings of 30th IEEE conference on computer vision and pattern recognition, CVPR 2017, vol 2017-Janua, no. Nips, pp 7445–7454, 2017. https://doi.org/10.1109/CVPR.2017.787
Feichtenhofer C, Pinz A, Zisserman A (2016) Convolutional two-stream network fusion for video action recognition. In: Proceedings of IEEE computer society conference on computer vision and pattern recognition, vol 2016-Decem, no. i, pp. 1933–1941, 2016. https://doi.org/10.1109/CVPR.2016.213
Ferrari A, Micucci D, Mobilio M, Napoletano P (2020) On the personalization of classification models for human activity recognition. IEEE Access 8:32066–32079. https://doi.org/10.1109/ACCESS.2020.2973425
Fullerton E, Heller B, Munoz-Organero M (2017) Recognizing human activity in free-living using multiple body-worn accelerometers. IEEE Sens J 17(16):5290–5297. https://doi.org/10.1109/JSEN.2017.2722105
Gaglio S, Lo Re G, Morana M (2015) Human activity recognition process using 3-D posture data. IEEE Trans Hum Mach Syst 45(5):586–597. https://doi.org/10.1109/THMS.2014.2377111
Gani MO et al (2019) A light weight smartphone based human activity recognition system with high accuracy. J Netw Comput Appl 141(May):59–72. https://doi.org/10.1016/j.jnca.2019.05.001
Garcia-Gonzalez D, Rivero D, Fernandez-Blanco E, Luaces MR (2020) A public domain dataset for real-life human activity recognition using smartphone sensors. Sensors (Switzerland). https://doi.org/10.3390/s20082200
Gorelick L, Blank M, Shechtman E, Irani M, Basri R (2007) Actions as space-time shapes. IEEE Trans Pattern Anal Mach Intell 29(12):2247–2253. https://doi.org/10.1109/TPAMI.2007.70711
Gouineua F, Sortin M Chikhaoui B (2018) Chikhaoui-DL-springer (2018).pdf. Springer, pp 302–315
Goyal R et al. (2017) The ‘Something Something’ video database for learning and evaluating visual common sense. In: Proceedings of the IEEE international conference on computer vision, pp 5843–5851. https://doi.org/10.1109/ICCV.2017.622 .
Gu C et al. (2018) AVA: a video dataset of spatio-temporally localized atomic visual actions. In: Proceedings of the IEEE computer society conference on computer vision and pattern recognition, pp 6047–6056, 2018. https://doi.org/10.1109/CVPR.2018.00633
Han J et al. (2014) CBID: a customer behavior identification system using passive tags. In: Proceedings of international conference on network protocols, ICNP, pp 47–58, 2014. https://doi.org/10.1109/ICNP.2014.26 .
Hsu YL, Yang SC, Chang HC, Lai HC (2018) Human daily and sport activity recognition using a wearable inertial sensor network. IEEE Access 6(c):31715–31728. https://doi.org/10.1109/ACCESS.2018.2839766
Huang SF, Chang RF, Moon WK, Lee YH, Chen DR, Suri JS (2008) Analysis of tumor vascularity using ultrasound images. IEEE Trans Med Imaging 27(3):320–330
Hussain Z, Sheng QZ, Zhang WE (2020) A review and categorization of techniques on device-free human activity recognition. J Netw Comput Appl 167:102738. https://doi.org/10.1016/j.jnca.2020.102738
Hx P, Wang J, Hu L, Chen Y, Hao S (2017) Deep learning for sensor based activity recognition: a survey. Pattern Recognit Lett 1–9
Jalal A, Uddin M, Kim TS (2012) Depth video-based human activity recognition system using translation and scaling invariant features for life logging at smart home. IEEE Trans Consum Electron 58(3):863–871. https://doi.org/10.1109/TCE.2012.6311329
Jamthikar AD et al (2020) Multiclass machine learning vs. conventional calculators for stroke/CVD risk assessment using carotid plaque predictors with coronary angiography scores as gold standard: a 500 participants study. Int J Cardiovasc Imaging. https://doi.org/10.1007/s10554-020-02099-7
Janocha K, Czarnecki WM (2016) On loss functions for deep neural networks in classification. Schedae Informaticae 25:49–59. https://doi.org/10.4467/20838476SI.16.004.6185
Jiang B, Wang M, Gan W, Wu W, Yan J (2019) STM: spatiotemporal and motion encoding for action recognition. In: Proceedings of the IEEE international conference on computer vision, vol. 2019-Octob, pp 2000–2009, 2019. https://doi.org/10.1109/ICCV.2019.00209 .
Kalfaoglu ME, Kalkan S, Alatan AA (2020) Late temporal modeling in 3D CNN architectures with bert for action recognition. arXiv, pp 1–19. https://doi.org/10.1007/978-3-030-68238-5_48
Kasnesis P, Patrikakis CZ, Venieris IS (2017) Changing mobile data analysis through deep learning, pp 17–23
Kay W et al. (2017) The kinetics human action video dataset, 2017 [Online]. http://arxiv.org/abs/1705.06950
Ke SR, Thuc HLU, Lee YJ, Hwang JN, Yoo JH, Choi KH (2013) A review on video-based human activity recognition, vol 2, no 2
Khalifa S, Lan G, Hassan M, Seneviratne A, Das SK (2018) HARKE: human activity recognition from kinetic energy harvesting data in wearable devices. IEEE Trans Mob Comput 17(6):1353–1368. https://doi.org/10.1109/TMC.2017.2761744
Kim S, Yun K, Park J, Choi JY (2019) Skeleton-based action recognition of people handling objects. In: Proceedings of 2019 IEEE winter conference on applications of computer vision, WACV 2019, pp 61–70, 2019. https://doi.org/10.1109/WACV.2019.00014 .
Koppula HS, Gupta R, Saxena A (2013) Learning human activities and object affordances from RGB-D videos. Int J Rob Res 32(8):951–970. https://doi.org/10.1177/0278364913478446
Koppula HS, Saxena A (2016) Anticipating human activities using object affordances for reactive robotic response. IEEE Trans Pattern Anal Mach Intell 38(1):14–29. https://doi.org/10.1109/TPAMI.2015.2430335
Kuehne H, Jhuang H, Garrote E, Poggio T, Serre T (2011) HMDB: a large video database for human motion recognition. In: Proceedings of the IEEE international conference on computer vision, pp. 2556–2563. https://doi.org/10.1109/ICCV.2011.6126543 .
Lara ÓD, Labrador MA (2013) A survey on human activity recognition using wearable sensors. IEEE Commun Surv Tutorials 15(3):1192–1209. https://doi.org/10.1109/SURV.2012.110112.00192
Lawal IA, Bano S (2020) Deep human activity recognition with localisation of wearable sensors. IEEE Access 8:155060–155070. https://doi.org/10.1109/ACCESS.2020.3017681
Lawal IA, Bano S (2019) Deep human activity recognition using wearable sensors. In: ACM international conference proceedings series, pp 45–48, 2019. https://doi.org/10.1145/3316782.3321538
Li JH, Tian L, Wang H, An Y, Wang K, Yu L (2019) Segmentation and recognition of basic and transitional activities for continuous physical human activity. IEEE Access 7:42565–42576. https://doi.org/10.1109/ACCESS.2019.2905575
Li W, Zhang Z, Liu Z (2010) Action recognition based on a bag of 3D points. In: 2010 IEEE computer society conference on computer vision and pattern recognition—work. CVPRW 2010, vol 2010, pp 9–14, 2010. https://doi.org/10.1109/CVPRW.2010.5543273 .
Li X, Zhang Y, Marsic I, Sarcevic A, Burd RS (2016) Deep learning for RFID-based activity recognition. In: Proceedings of 14th ACM conference on embedded networked sensor systems SenSys 2016, pp 164–175. https://doi.org/10.1145/2994551.2994569 .
Lima WS, Souto E, El-Khatib K, Jalali R, Gama J (2019) Human activity recognition using inertial sensors in a smartphone: an overview. Sensors (switzerland) 19(14):14–16. https://doi.org/10.3390/s19143213
Liu J, Shahroudy A, Xu D, Wang G (2016) Spatio-temporal LSTM with trust gates for 3D human action recognition. In: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol 9907 LNCS, pp 816–833, 2016. https://doi.org/10.1007/978-3-319-46487-9_50 .
Liu Z, Zhang H, Chen Z, Wang Z, Ouyang W (2020) Disentangling and unifying graph convolutions for skeleton-based action recognition. In: Proceedings of the IEEE computer society conference on computer vision and pattern recognition, pp 140–149, 2020. https://doi.org/10.1109/CVPR42600.2020.00022
Lv T, Wang X, Jin L, Xiao Y, Song M (2020) A hybrid network based on dense connection and weighted feature aggregation for human activity recognition. IEEE Access 8:68320–68332. https://doi.org/10.1109/ACCESS.2020.2986246
Mabrouk MF, Ghanem NM, Ismail MA (2016) Semi supervised learning for human activity recognition using depth cameras. In: Proceedings of 2015 IEEE 14th international conference on machine learning and applications ICMLA 2015, pp 681–686, 2016. https://doi.org/10.1109/ICMLA.2015.170
Magherini T, Fantechi A, Nugent CD, Vicario E (2013) Using temporal logic and model checking in automated recognition of human activities for ambient-assisted living. IEEE Trans Hum Mach Syst 43(6):509–521. https://doi.org/10.1109/TSMC.2013.2283661
Mahadevan V, Li W, Bhalodia V, Vasconcelos N (2010) Anomaly detection in crowded scenes. In: Proceedings of the IEEE computer society conference on computer vision and pattern recognition, pp 1975–1981, 2010. https://doi.org/10.1109/CVPR.2010.5539872
Maniruzzaman M et al (2017) Comparative approaches for classification of diabetes mellitus data: machine learning paradigm. Comput Methods Programs Biomed 152:23–34. https://doi.org/10.1016/j.cmpb.2017.09.004
Maniruzzaman M et al (2018) Accurate diabetes risk stratification using machine learning: role of missing value and outliers. J Med Syst 42(5):1–17. https://doi.org/10.1007/s10916-018-0940-7
Martis JSRJ, Acharya UR, Prasad H, Chua CK, Lim CM (2013) Application of higher order statistics for atrial arrhythmia classification. Biomed Signal Process Control 8(6)
Micucci D, Mobilio M, Napoletano P (2017) UniMiB SHAR: a dataset for human activity recognition using acceleration data from smartphones. Appl Sci. https://doi.org/10.3390/app7101101
Miu T, Missier P, Plötz T (2015) Bootstrapping personalised human activity recognition models using online active learning. In: Proceedings of 15th international conference on computer science and information technology CIT 2015, 14th IEEE international conference on ubiquitous computing and communications IUCC 2015, 13th international conference on dependable, autonomic and secure, pp 1138–1147, 2015. https://doi.org/10.1109/CIT/IUCC/DASC/PICOM.2015.170
Multi Modality State-of-the-Art Medical Image Segmentation and Registration Methodologies (2011)
Munoz-Organero M (2019) Outlier detection in wearable sensor data for human activity recognition (HAR) based on DRNNs. IEEE Access 7:74422–74436. https://doi.org/10.1109/ACCESS.2019.2921096
Murad A, Pyun JY (2017) Deep recurrent neural networks for human activity recognition. Sensors (Switzerland). https://doi.org/10.3390/s17112556
Nam Y, Park JW (2013) Child activity recognition based on cooperative fusion model of a triaxial accelerometer and a barometric pressure sensor. IEEE J Biomed Heal Inform 17(2):420–426. https://doi.org/10.1109/JBHI.2012.2235075
Nash W, Drummond T, Birbilis N (2018) A review of deep learning in the study of materials degradation. NPJ Mater Degrad 2(1):1–12. https://doi.org/10.1038/s41529-018-0058-x
Neili Boualia S, Essoukri Ben Amara N (2021) Deep full-body HPE for activity recognition from RGB frames only. Informatics 8(1):2. https://doi.org/10.3390/informatics8010002
Newell Alejandro DJ, Yang K (2016) Stacked hour glass.pdf., pp 1–15
Nguyen DT, Kim KW, Hong HG, Koo JH, Kim MC, Park KR (2017) Gender recognition from human-body images using visible-light and thermal camera videos based on a convolutional neural network for image feature extraction, pp 1–22, 2017. https://doi.org/10.3390/s17030637
Ni B, Pei Y, Moulin P, Yan S (2013) Multilevel depth and image fusion for human activity detection. IEEE Trans Cybern 43(5):1382–1394. https://doi.org/10.1109/TCYB.2013.2276433
Obaida MA, Saraee MAM (2017) A novel framework for intelligent surveillance system based on abnormal human activity detection in academic environments. Neural Comput Appl 28(s1):565–572. https://doi.org/10.1007/s00521-016-2363-z
Oguntala GA et al (2019) SmartWall: novel RFID-enabled ambient human activity recognition using machine learning for unobtrusive health monitoring. IEEE Access 7:68022–68033. https://doi.org/10.1109/ACCESS.2019.2917125
Ohn-Bar E, Trivedi MM (2014) Hand gesture recognition in real time for automotive interfaces: a multimodal vision-based approach and evaluations. IEEE Trans Intell Transp Syst 15(6):2368–2377. https://doi.org/10.1109/TITS.2014.2337331
Ordóñez FJ, Roggen D (2016) Deep convolutional and LSTM recurrent neural networks for multimodal wearable activity recognition. Sensors (Switzerland). https://doi.org/10.3390/s16010115
Parada R, Nur K, Melia-Segui J, Pous R (2016) Smart surface: RFID-based gesture recognition using k-means algorithm. In: Proceedings of 12th international conference on intelligent environments IE 2016, pp 111–118, 2016. https://doi.org/10.1109/IE.2016.25 .
Pareek G et al (2013) Prostate tissue characterization/classification in 144 patient population using wavelet and higher order spectra features from transrectal ultrasound images. Technol Cancer Res Treat 12(6):545–557. https://doi.org/10.7785/tcrt.2012.500346
Pham C et al (2020) SensCapsNet: deep neural network for non-obtrusive sensing based human activity recognition. IEEE Access 8:86934–86946. https://doi.org/10.1109/ACCESS.2020.2991731
Pham C, Diep NN, Phuong TM (2017) E-shoes: smart shoes for unobtrusive human activity recognition. In: Proceedings of 2017 9th international conference on knowledge and systems engineering KSE 2017, vol 2017-Janua, pp 269–274, 2017. https://doi.org/10.1109/KSE.2017.8119470 .
Phyo CN, Zin TT, Tin P (2019) Deep learning for recognizing human activities using motions of skeletal joints. IEEE Trans Consum Electron 65(2):243–252. https://doi.org/10.1109/TCE.2019.2908986
Popoola OP, Wang K (2012) Video-based abnormal human behavior recognitiona review. IEEE Trans Syst Man Cybern Part C Appl Rev 42(6):865–878. https://doi.org/10.1109/TSMCC.2011.2178594
Qi J, Wang Z, Lin X, Li C (2018) Learning complex spatio-temporal configurations of body joints for online activity recognition. IEEE Trans Hum Mach Syst 48(6):637–647. https://doi.org/10.1109/THMS.2018.2850301
Qin Z, Zhang Y, Meng S, Qin Z, Choo KKR (2020) Imaging and fusing time series for wearable sensor-based human activity recognition. Inf Fusion 53:80–87. https://doi.org/10.1016/j.inffus.2019.06.014
Raad MW, Sheltami T, Soliman MA, Alrashed M (2018) An RFID based activity of daily living for elderly with Alzheimer’s. In: Lecture notes of the institute for computer sciences, social-informatics and telecommunications engineering LNICST, vol 225, pp 54–61, 2018. https://doi.org/10.1007/978-3-319-76213-5_8
Rajendra Acharya U et al (2014) A review on ultrasound-based thyroid cancer tissue characterization and automated classification. Technol Cancer Res Treat 13(4):289–301. https://doi.org/10.7785/tcrt.2012.500381
Rashidi P, Mihailidis A (2013) A survey on ambient-assisted living tools for older adults. IEEE J Biomed Heal Inform 17(3):579–590. https://doi.org/10.1109/JBHI.2012.2234129
Ravi D, Wong C, Lo B, Yang GZ (2016) Deep learning for human activity recognition: a resource efficient implementation on low-power devices. In: BSN 2016—13th annual body sensor networks conference, pp 71–76, 2016. https://doi.org/10.1109/BSN.2016.7516235
Reiss A, Stricker D (2012) Introducing a new benchmarked dataset for activity monitoring. In: Proceedings of international symposium on wearable computers ISWC, pp 108–109, 2012. https://doi.org/10.1109/ISWC.2012.13 .
Reiss A. Stricker D (2012) Creating and benchmarking a new dataset for physical activity monitoring. In: ACM international conference proceeding series, no. February, 2012. https://doi.org/10.1145/2413097.2413148 .
Roggen D et al (2010) “Collecting complex activity datasets in highly rich networked sensor environments”, INSS 2010–7th Int. Conf Networked Sens Syst 00:233–240. https://doi.org/10.1109/INSS.2010.5573462
Ronao CA, Cho SB (2016) Human activity recognition with smartphone sensors using deep learning neural networks. Expert Syst Appl 59:235–244. https://doi.org/10.1016/j.eswa.2016.04.032
Röcker C, O’Donoghue J, Ziefle M, Maciaszek L, Molloy W (2017) Preface. Commun Comput. Inf Sci 736:5. https://doi.org/10.1007/978-3-319-62704-5
Saba L et al (2019) The present and future of deep learning in radiology. Eur J Radiol 114:14–24. https://doi.org/10.1016/j.ejrad.2019.02.038
Saba L et al (2021) Ultrasound-based internal carotid artery plaque characterization using deep learning paradigm on a supercomputer: a cardiovascular disease/stroke risk assessment system. Int J Cardiovasc Imaging. https://doi.org/10.1007/s10554-020-02124-9
Saha J, Ghosh D, Chowdhury C, Bandyopadhyay S (2020) Smart handheld based human activity recognition using multiple instance multiple label learning. Wirel Pers Commun. https://doi.org/10.1007/s11277-020-07903-0
Shivendra shivani JSS, Agarwal S (2018) Hand book of image-based security techniques. Chapman and Hall/CRC, London, p 442
Shrivastava VK, Londhe ND, Sonawane RS, Suri JS (2016) Computer-aided diagnosis of psoriasis skin images with HOS, texture and color features: a first comparative study of its kind. Comput Methods Programs Biomed 126(2016):98–109. https://doi.org/10.1016/j.cmpb.2015.11.013
Shuaibu AN, Malik AS, Faye I, Ali YS (2017) Pedestrian group attributes detection in crowded scenes. In: Proceedings of 3rd international conference on advanced technologies for signal and image processing ATSIP 2017, pp 1–5, 2017. https://doi.org/10.1109/ATSIP.2017.8075584
Sigurdsson GA, Varol G, Wang X, Farhadi A, Laptev I, Gupta A (2016) Hollywood in homes: crowdsourcing data collection for activity understanding. In: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol 9905 LNCS, pp 510–526, 2016. https://doi.org/10.1007/978-3-319-46448-0_31 .
Simonyan K, Zisserman A (2014) Two-stream convolutional networks for action recognition in videos. Adv Neural Inf Process Syst 1:568–576
Google Scholar
Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition. In: 3rd international conference on learning representations ICLR 2015—conference track proceedings, pp 1–14
Skandha SS et al (2020) 3-D optimized classification and characterization artificial intelligence paradigm for cardiovascular/stroke risk stratification using carotid ultrasound-based delineated plaque: Atheromatic TM 2.0. Comput Biol Med 125:103958. https://doi.org/10.1016/j.compbiomed.2020.103958
Soomro K, Zamir AR, Shah M (2012) UCF101: a dataset of 101 human actions classes from videos in the wild, no. November, 2012, [Online]. http://arxiv.org/abs/1212.0402 .
Soydaner D (2020) A comparison of optimization algorithms for deep learning. Int J Pattern Recognit Artif Intell. https://doi.org/10.1142/S0218001420520138
Sreekanth NS, Narayanan NK (2017) Proceedings of the international conference on signal, networks, computing, and systems, vol 395, pp 105–115, 2017. https://doi.org/10.1007/978-81-322-3592-7
Stisen A et al. (2015) Smart devices are different: assessing and mitigating mobile sensing heterogeneities for activity recognition. In: SenSys 2015—proceedings of 13th ACM conference on embedded networked sensor systems, no. November, pp 127–140, 2015. https://doi.org/10.1145/2809695.2809718
Sudeep PV et al (2016) Speckle reduction in medical ultrasound images using an unbiased non-local means method. Biomed Signal Process Control 28:1–8. https://doi.org/10.1016/j.bspc.2016.03.001
Sun S, Cao Z, Zhu H, Zhao J (2020) A survey of optimization methods from a machine learning perspective. IEEE Trans Cybern 50(8):3668–3681. https://doi.org/10.1109/TCYB.2019.2950779
Sundaramoorthy P, Gudur GK (2018) HARNet : towards on-device incremental learning using deep, pp 31–36
Sung J, Ponce C, Selman B, Saxena A (2012) Unstructured human activity detection from RGBD images. In: Proceedings of IEEE international conference on robotics and automation, pp 842–849, 2012. https://doi.org/10.1109/ICRA.2012.6224591
Suri JS (2001) Two-dimensional fast magnetic resonance brain segmentation. IEEE Eng Med Biol Mag 20(4):84–95. https://doi.org/10.1109/51.940054
Suri JS (2005) Handbook of biomedical image analysis: segmentation models. Springer, New York
Suri JS et al (2021) Systematic review of artificial intelligence in acute respiratory distress syndrome for COVID-19 lung patients: a biomedical imaging perspective. IEEE J Biomed Heal Inform 2194(1):1–12. https://doi.org/10.1109/JBHI.2021.3103839
Suri JS, Liu K, Singh S, Laxminarayan SN, Zeng X, Reden L (2002) Shape recovery algorithms using level sets in 2-D/3-D medical imagery: a state-of-the-art review. IEEE Trans Inf Technol Biomed 6(1):8–28. https://doi.org/10.1109/4233.992158
Suri JS (2013) DK Med_Image_Press_Eng.Pdf.” [Online]. https://www.freepatentsonline.com/20080051648.pdf .
Suri JS (2004) Segmentation method and apparatus for medical images using diffusion propagation, pixel classification, and mathematical morphology
Suthar B, Gadhia B (2021) Human activity recognition using deep learning: a survey. Lect Notes Data Eng Commun Technol 52:217–223. https://doi.org/10.1007/978-981-15-4474-3_25
Szegedy C et al (2015) Going deeper with convolutions. In: Proceedings of IEEE computer society conference on computer vision and pattern recognition, pp 1–9. https://doi.org/10.1109/CVPR.2015.7298594
Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z (2016) Rethinking the inception architecture for computer vision. In: Proceedings of IEEE computer society conference on computer vision and pattern recognition, pp 2818–2826, 2016. https://doi.org/10.1109/CVPR.2016.308 .
Szegedy C, Ioffe S, Vanhoucke V, Alemi AA (2017) Inception-v4, inception-ResNet and the impact of residual connections on learning. In: 31st AAAI conference on artificial intelligence AAAI 2017, pp. 4278–4284
Tanberk S, Kilimci ZH, Tukel DB, Uysal M, Akyokus S (2020) A hybrid deep model using deep learning and dense optical flow approaches for human activity recognition. IEEE Access 8:19799–19809. https://doi.org/10.1109/ACCESS.2020.2968529
Tandel GS, Balestrieri A, Jujaray T, Khanna NN, Saba L, Suri JS (2020) Multiclass magnetic resonance imaging brain tumor classification using artificial intelligence paradigm. Comput Biol Med 122:103804. https://doi.org/10.1016/j.compbiomed.2020.103804
Tao D, Jin L, Yuan Y, Xue Y (2016a) Ensemble manifold rank preserving for acceleration-based human activity recognition. IEEE Trans Neural Networks Learn Syst 27(6):1392–1404. https://doi.org/10.1109/TNNLS.2014.2357794
Article MathSciNet Google Scholar
Tao D, Wen Y, Hong R (2016b) Multicolumn bidirectional long short-term memory for mobile devices-based human activity recognition. IEEE Internet Things J 3(6):1124–1134. https://doi.org/10.1109/JIOT.2016.2561962
Thida M, Eng HL, Remagnino P (2013) Laplacian eigenmap with temporal constraints for local abnormality detection in crowded scenes. IEEE Trans Cybern 43(6):2147–2156. https://doi.org/10.1109/TCYB.2013.2242059
Tian Y, Zhang J, Chen L, Geng Y, Wang X (2019) Single wearable accelerometer-based human activity recognition via kernel discriminant analysis and QPSO-KELM classifier. IEEE Access 7:109216–109227. https://doi.org/10.1109/access.2019.2933852
Tran D, Wang H, Feiszli M, Torresani L (2019) Video classification with channel-separated convolutional networks. In: Proceedings of IEEE international conference on computer vision, vol 2019-Octob, pp 5551–5560, 2019. https://doi.org/10.1109/ICCV.2019.00565 .
Vaniya SM, Bharathi B (2017) Exploring object segmentation methods in visual surveillance for human activity recognition. In: Proceedings of International Conference on Global Trends in Signal Processing, Information Computing and Communication. ICGTSPICC 2016, pp 520–525, 2017. https://doi.org/10.1109/ICGTSPICC.2016.7955356
Vishwakarma DK, Singh K (2017) Human activity recognition based on spatial distribution of gradients at sublevels of average energy silhouette images. IEEE Trans Cogn Dev Syst 9(4):316–327. https://doi.org/10.1109/TCDS.2016.2577044
Wang A, Chen G, Yang J, Zhao S, Chang CY (2016a) A comparative study on human activity recognition using inertial sensors in a smartphone. IEEE Sens J 16(11):4566–4578. https://doi.org/10.1109/JSEN.2016.2545708
Wang F, Feng J, Zhao Y, Zhang X, Zhang S, Han J (2019c) Joint activity recognition and indoor localization with WiFi fingerprints. IEEE Access 7:80058–80068. https://doi.org/10.1109/ACCESS.2019.2923743
Wang F, Gong W, Liu J (2019d) On spatial diversity in wifi-based human activity recognition: a deep learning-based approach. IEEE Internet Things J 6(2):2035–2047. https://doi.org/10.1109/JIOT.2018.2871445
Wang K, He J, Zhang L (2019a) Attention-based convolutional neural network for weakly labeled human activities’ recognition with wearable sensors. IEEE Sens J 19(17):7598–7604. https://doi.org/10.1109/JSEN.2019.2917225
Wang Q, Ma Y, Zhao K, Tian Y (2020) A comprehensive survey of loss functions in machine learning. Ann Data Sci. https://doi.org/10.1007/s40745-020-00253-5
Wang Z, Wu D, Chen J, Ghoneim A, Hossain MA (2016b) A triaxial accelerometer-based human activity recognition via EEMD-based features and game-theory-based feature selection. IEEE Sens J 16(9):3198–3207. https://doi.org/10.1109/JSEN.2016.2519679
Wang F, Liu J, Gong W (2020) Multi-adversarial in-car activity recognition using RFIDs. IEEE Trans Mob Comput 1–1. https://doi.org/10.1109/tmc.2020.2977902
Wang X, Ji Q (2014) A hierarchical context model for event recognition in surveillance video. In: Proceedings of the IEEE computer society conference on computer vision and pattern recognition, pp 2561–2568. https://doi.org/10.1109/CVPR.2014.328 .
Wang K, He J, Zhang L (2019) Attention-based convolutional neural network for weakly labeled human activities recognition with wearable sensors. arXiv, vol 19, no. 17, pp 7598–7604
Wang L, Zhou F, Li Z, Zuo W, Tan H (2018) Abnormal event detection in videos using hybrid spatio-temporal autoencoder school of instrumentation science and opto-electronics Engineering, Beihang University, Beijing, China Department of Electronic Information Engineering, Foshan University, Fo. In: 2018 25th IEEE international conference on image processing, pp 2276–2280
Weiss GM, Yoneda K, Hayajneh T (2019) Smartphone and smartwatch-based biometrics using activities of daily living. IEEE Access 7:133190–133202. https://doi.org/10.1109/ACCESS.2019.2940729
Weng Z, Li W, Jin Z (2021) Human activity prediction using saliency-aware motion enhancement and weighted LSTM network. Eurasip J Image Video Process 1:2021. https://doi.org/10.1186/s13640-020-00544-0
Xia K, Huang J, Wang H (2020) LSTM-CNN architecture for human activity recognition. IEEE Access 8:56855–56866. https://doi.org/10.1109/ACCESS.2020.2982225
Xia L, Chen C, Aggarwal J (2012) View invariant human action recognition using histograms of 3D joints The University of Texas at Austin. In: CVPR 2012 HAU3D workshop, pp 20–27, 2012, [Online]. http://scholar.google.com/scholar?hl=en&btnG=Search&q=intitle:View+Invariant+Human+Action+Recognition+Using+Histograms+of+3D+Joints+The+University+of+Texas+at+Austin#1
Xie L, Wang C, Liu AX, Sun J, Lu S (2018) Multi-Touch in the air: concurrent micromovement recognition using RF signals. IEEE/ACM Trans Netw 26(1):231–244. https://doi.org/10.1109/TNET.2017.2772781
Xu W, Miao Z, Zhang XP, Tian Y (2017) A hierarchical spatio-temporal model for human activity recognition. IEEE Trans Multimed 19(7):1494–1509. https://doi.org/10.1109/TMM.2017.2674622
Xu X, Tang J, Zhang X, Liu X, Zhang H, Qiu Y (2013) Exploring techniques for vision based human activity recognition: methods, systems, and evaluation. Sensors (Switzerland) 13(2):1635–1650. https://doi.org/10.3390/s130201635
Yan H, Zhang Y, Wang Y, Xu K (2020) WiAct: a passive WiFi-based human activity recognition system. IEEE Sens J 20(1):296–305. https://doi.org/10.1109/JSEN.2019.2938245
Yan S, Xiong Y, Lin D (2018) Spatial temporal graph convolutional networks for skeleton-based action recognition, arXiv, 2018
Yao L et al (2018) Compressive representation for device-free activity recognition with passive RFID signal strength. IEEE Trans Mob Comput 17(2):293–306. https://doi.org/10.1109/TMC.2017.2706282
Yao S., Hu S, Zhao Y, Zhang A, Abdelzaher T (2017) DeepSense: A unified deep learning framework for time-series mobile sensing data processing. In: 26th international world wide web conferences WWW 2017, pp 351–360. https://doi.org/10.1145/3038912.3052577
Yao S et al. (2019) SADeepSense: self-attention deep learning framework for heterogeneous on-device sensors in internet of things applications. In: Proceedings of IEEE INFOCOM, vol 2019-April, pp 1243–1251. https://doi.org/10.1109/INFOCOM.2019.8737500
Yao S et al. (2018) Cover feature embedded deep learning, 2018, [Online]. https://fardapaper.ir/mohavaha/uploads/2018/06/Fardapaper-Deep-Learning-for-the-Internet-of-Things.pdf .
Zeng M et al. (2015) Convolutional Neural Networks for human activity recognition using mobile sensors. In: Proceedings of 2014 6th international conference on mobile computing, applications and services MobiCASE 2014, vol 6, pp 197–205, 2015. https://doi.org/10.4108/icst.mobicase.2014.257786 .
Zhang H, Parker LE (2016) CoDe4D: color-depth local spatio-temporal features for human activity recognition from RGB-D videos. IEEE Trans Circuits Syst Video Technol 26(3):541–555. https://doi.org/10.1109/TCSVT.2014.2376139
Zhang D, Zhou J, Guo M, Cao J, Li T (2011) TASA: tag-free activity sensing using RFID tag arrays. IEEE Trans Parallel Distrib Syst 22(4):558–570. https://doi.org/10.1109/TPDS.2010.118
Zhang M, Sawchuk AA (2012) USC-HAD: a daily activity dataset for ubiquitous activity recognition using wearable sensors. In: UbiComp’12—proceedings of 2012 ACM conference on ubiquitous computing, pp 1036–1043
Zhou X, Liang W, Wang KIK, Wang H, Yang LT, Jin Q (2020) Deep-learning-enhanced human activity recognition for internet of healthcare things. IEEE Internet Things J 7(7):6429–6438. https://doi.org/10.1109/JIOT.2020.2985082
Zhu R et al (2019) Efficient human activity recognition solving the confusing activities via deep ensemble learning. IEEE Access 7:75490–75499. https://doi.org/10.1109/ACCESS.2019.2922104
Zhu C, Sheng W (2012) Realtime recognition of complex human daily activities using human motion and location data. IEEE Trans Biomed Eng 59(9):2422–2430. https://doi.org/10.1109/TBME.2012.2190602
Zou H, Zhou Y, Arghandeh R, Spanos CJ (2019) Multiple kernel semi-representation learning with its application to device-free human activity recognition. IEEE Internet Things J 6(5):7670–7680. https://doi.org/10.1109/JIOT.2019.2901927
van Kasteren TLM, Englebienne G, Kröse BJA (2011) Human activity recognition from wireless sensor network data: benchmark and software, pp 165–186. https://doi.org/10.2991/978-94-91216-05-3_8 .
Download references
Authors and affiliations.
CSE Department, Bennett University, Greater Noida, UP, India
Neha Gupta & Suneet K. Gupta
Rawatpura Sarkar University, Raipur, Chhattisgarh, India
Rajesh K. Pathak
Bharati Vidyapeeth’s College of Engineering, Paschim Vihar, New Delhi, India
Neha Gupta & Vanita Jain
Intelligent Health Laboratory, Department of Biomedical Engineering, University of Florida, Gainesville, USA
Parisa Rashidi
Stroke Diagnostic and Monitoring Division, AtheroPointTM, Roseville, CA, 95661, USA
Jasjit S. Suri
Global Biomedical Technologies, Inc., Roseville, CA, USA
You can also search for this author in PubMed Google Scholar
Correspondence to Jasjit S. Suri .
Publisher's note.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Below is the link to the electronic supplementary material.
Appendix a.
The type of HAR devices and applications are two main components of HAR. Table A.1 , A.2 , A.3 , and A.4 illustrates the device wise description of existing HAR models in terms of data source, #activities, #subjects, datasets, activity scenarios and performance evaluation.
The performance of HAR model is evaluated using metrics. Table B.1 illustrates various evaluation metrics used in existing HAR models. But before the description of metrics, some terms need to be understood:
True positive (TP): no. of positive samples predicted correctly.
False positive (FP): no. of actual negative samples predicted as positive.
True negative (TN): no. of negative samples predicted correctly.
False negative (FN): no. of actual positive samples predicted as negative.
Reprints and permissions
Gupta, N., Gupta, S.K., Pathak, R.K. et al. Human activity recognition in artificial intelligence framework: a narrative review. Artif Intell Rev 55 , 4755–4808 (2022). https://doi.org/10.1007/s10462-021-10116-x
Download citation
Published : 18 January 2022
Issue Date : August 2022
DOI : https://doi.org/10.1007/s10462-021-10116-x
Anyone you share the following link with will be able to read this content:
Sorry, a shareable link is not currently available for this article.
Provided by the Springer Nature SharedIt content-sharing initiative
COMMENTS
Human Activity Recognition (HAR) is an important field in smart healthcare that has attracted remarkable attention from researchers. ... Various data segmentation techniques are proposed in the literature to separate the composite activity into different unified activities. For example, cycling is a composite activity that consists of several ...
In this activity, Grade 3 learners will create patterns according to specific criteria, such as colour sequences or shapes, and then have their pattern reviewed by a peer. This peer review process is important because it encourages collaboration, communication, and critical thinking among learners. It also allows them to see different perspectives and approaches to problem-solving. Pattern ...
Human Activity Recognition (HAR) plays a significant role in several fields by automatically identifying and monitoring human activities using advanced techniques. It enhances safety, improves healthcare services, optimizes fitness routines, and enables context-aware applications in various fields. HAR contributes to a more efficient and intelligent interaction between humans and technology.
Researchers have proposed various human activity recognition (HAR) systems aimed at translating measurements from smartphones into various types of physical activity. ... The literature review ...
Human activity recognition is being leveraged for an increasingly wide variety of computer vision applications. What all of these works have in common is to study some aspects of human-computer interaction. Recognizing activities can range from a single person action to multi-people activity recognition.
Nowadays, Human Activity Recognition (HAR) is being widely used in a variety of domains, and vision and sensor-based data enable cutting-edge technologies to detect, recognize, and monitor human activities. Several reviews and surveys on HAR have already been published, but due to the constantly growing literature, the status of HAR literature needed to be updated. Hence, this review aims to ...
Abstract. Human activity recognition is a key to a lot of applications such as healthcare and smart home. In this study, we provide a comprehensive survey on recent advances and challenges in human activity recognition (HAR) with deep learning. Although there are many surveys on HAR, they focused mainly on the taxonomy of HAR and reviewed the ...
1. Introduction. Humans engage in a wide range of activities in their daily lives. The recent advancement in technology and data from Closed-Circuit Television (CCTV) and sensors has enabled the detection of anomalies as well as the recognition of daily human activities for surveillance [1,2].The term anomaly refers to abnormal or unusual behavior or activity [].
This review highlights the advances of image representation approaches and classification methods in vision-based activity recognition. Generally, for representation approaches, related literatures follow a research trajectory of global representations, local representations, and recent depth-based representations (Figure 1 ).
A plethora of human activity recognition methods based on space-time representation have been proposed in the literature (Efros et al., 2003; Schuldt et al., 2004; Jhuang et al., 2007; Fathi and ...
Secondly, we review and analyze the research progress of HAR based on mobile devices from each main aspect, including human activities, sensor data, data preprocessing, recognition approaches, evaluation standards and application cases. Finally, we present some promising trends in HAR based on mobile devices for future research.
Abstract. Human Activity Recognition (HAR) has been a challenging problem yet it needs to be solved. It will mainly be used for eldercare and healthcare as an assistive technology when ensemble with other technologies like Internet of Things (IoT). HAR can be done with the help of sensors, smartphones or images.
In Sections 4 and 5, we review various human activity recognition methods and analyze the strengths and weaknesses of each category separately. In Section 6, we provide a categorization of human activity classification datasets and discuss some future research directions. Finally, conclusions are drawn in Section 7. 2.
Human activity recognition is essential in many domains, including the medical and smart home sectors. Using deep learning, we conduct a comprehensive survey of current state and future directions in human activity recognition (HAR). Key contributions of deep learning to the advancement of HAR, including sensor and video modalities, are the focus of this review. A wide range of databases and ...
The interest literature review focuses on original papers investigating MWE computing, transfer learning, and activity recognition. Surveys and reviews papers were excluded to avoid duplication as well as studies consisting of activity recognition techniques overlooking transfer learning approaches, and papers lacking scientific format.
Human activity recognition (HAR) is a process aimed at the classification of human actions in a given period of time based on discrete measurements (acceleration, rotation speed, geographical coordinates, etc.) made by personal digital devices. ... Our literature review revealed that the most commonly used sensors for HAR are the accelerometer ...
This systematic review of the literature locates the advances made in Human Activity Recognition in each of the automatic learning methods, their evolution, and results. The recognition of human activities has become one of the most used areas of knowledge that has allowed many advances in the care of patients at home and the improvement of the ...
A scoping literature review was conducted to summarize the current research trends in fatigue identification with applications to human activity recognition through the use of diverse commercially available accelerometers. This paper also provides a brief overview of heart rate variability and its effect on fatigue.
Human Activity Recognition is one of the active research areas in computer vision for various contexts like security surveillance, healthcare and human computer interaction. In this paper, a total of thirty-two recent research papers on sensing technologies used in HAR are reviewed. The review covers three area of sensing technologies namely RGB cameras, depth sensors and wearable devices. It ...
The primary objective of this Systematic Literature Review (SLR) is to collect existing research on video-based human activity recognition, summarize, and analyze the state-of-the-art deep learning architectures regarding various methodologies, challenges, and issues. ... this systematic study by summarizing 70 different research articles on ...
This contribution provides a systematic literature review of Human Activity Recognition for Production and Logistics. An initial list of 1243 publications that complies with predefined Inclusion Criteria was surveyed by three reviewers. Fifty-two publications that comply with the Content Criteria were analysed regarding the observed activities, sensor attachment, utilised datasets, sensor ...
Human activity recognition (HAR) has multifaceted applications due to its worldly usage of acquisition devices such as smartphones, video cameras, and its ability to capture human activity data. While electronic devices and their applications are steadily growing, the advances in Artificial intelligence (AI) have revolutionized the ability to extract deep hidden information for accurate ...
The current survey aims to provide the literature review of vision-based human activity recognition based on up-to-date deep learning techniques on benchmark video datasets. These video datasets are containing the video clips recorded from the static cameras installed at specific fixed locations. ... A Review on Human Activity Recognition Using ...
Activity recognition, machine learning, wireless sensor networks (WSNs) WSN: Chen Wu et al. 49: Multi-view activity recognition, decision fusion methods, smart home: Multi-camera views: Parisa Rashidi et al. 64: Activity recognition, data mining, sequence mining, clustering, smart homes: Accelerometer, state-change sensor, motion sensors, and so on
Introduction. Human activity recognition is being leveraged for an increas-ingly wide variety of computer vision applications. What all of these works have in common is to study some aspects of human. computer interaction. Recognizing activities can range from a single person action to multi-people activity recognition.