Back to the Index

Skew

What is skew in machine learning?

In machine learning, skew refers to an imbalance in the distribution of the label (target variable) in a training dataset. A training dataset is said to be skewed if the distribution of its target variable is asymmetric around its mean value - that is, it is not balanced and some values are more highly represented than other values. For example, if we have a dataset of credit card transactions, and only a small fraction of the transactions are fraudulent - the training data is skewed towards non-fraudulent credit card transactions.

In machine learning, skew can affect the accuracy of predictive models, as models trained on imbalanced data may have difficulty accurately predicting minority classes or values. In such cases, techniques such as oversampling or undersampling can be used to balance the data distribution and improve model performance.

Does this content look outdated? If you are interested in helping us maintain this, feel free to contact us.