Data mining has a long history. Originally known as knowledge discovery in databases, the term “data mining was coined until the 1990s. Data mining is the process of digging through large sets of data to identify patterns to predict future trends. Data mining is used at the intersection of machine learning, statistics, and database systems.

Database information is doubling every two years developing chaotic and repetitive noise in data. More information does not necessarily mean more knowledge. Unstructured data makes up 90 percent of the digital universe. This is why data mining is important, it allows you to understand what is relevant and how you can use that information to assess likely outcomes.

Data mining consists of four main steps. These steps include setting objectives, collecting and analyzing data, applying data mining algorithms, and evaluating results.

Data is stored in data warehouses, either in-house servers or the cloud. Information technology professionals, business analysts, and management teams then access the data and decide how they want to organize it using application software. The data is then presented in easy-to-understand and share formats, such as a graph or table.

The most common algorithms and techniques used to turn data into useful information include association rules, neural networks, decision trees, and K-nearest neighbor

In sales and marketing companies collect large amounts of data about their customers and prospects. Companies can optimize their marketing efforts by observing their consumer demographics and online user behaviors to increase profits. In educational institutions, data is collected to understand student population and environmental performance for optimal success.

Lately, data mining has gone under criticism due to users being unaware of data mining happening with their personal information. This data is being collected to influence consumer behavior and change their preferences. One way to protect yourself from data mining is to use a secure VPN, remove personal information from social networking sites, and always look at the privacy policy for any website and social media platform.


The field of data science struggles with the presence and successes of women and people of color. According to Ms. Magazine, research suggests that only 15 percent of data scientists are women and fewer than 3 percent are women of color. Due to the education system failing to attract young girls and women to computer science, math, and other related fields, the number of girls and women leaning toward careers in data science is disproportionate. The leaky pipeline metaphor describes the gender gap in STEM-related careers.

A few solutions to this faulty pipeline include STEM education for women and people of color early in life, providing mentorship programs for women in data science, and developing gender-balanced policies.

One of the issues the lack of diversity in data science brings is racial and gender bias in algorithms. Women and people of color become overlooked depending on who is developing these algorithms. Machine learning is the act of training the computer to make judgments or predictions about the information it processes based on patterns it sees. In an article written by Rebecca Heilweil for Vox Recode, Amazon tried to use artificial intelligence to develop a resume screening tool. Its objective was to make screening resumes easier. The issue with that was that the data collected mainly came from men. In the end, this taught the computer to discriminate against women. Amazon decided not to use this tool for several reasons.

We can start accounting for everyone by hiring people of color and women to take on leadership roles. In addition, companies can start using rich and diverse data when training computers to process data.