With the current demand for data processing, it's no wonder that so many computer programs exist for that very purpose. In fact, there's even a name for it: the process of examining very large data sets to find useful information, such as patterns or relationships, is known as data mining.
A common example is a spreadsheet program, such as Google Sheets or Microsoft Excel. You can use these programs to record, modify, and organize data. If you're using numbers, you can write equations and perform operations on your data as well.
You can also process text data using text analysis (or text mining) tools.
Text analysis looks for patterns within a written piece (anywhere in length from a clause to a novel and beyond) to categorize or classify it. If you've ever had a program tell you what the tone of your writing was, you've seen text analysis at work. Text analysis can be used to sort product reviews, detect trends in public opinion and
identify anonymous authors.Data processing programs can also allow you to make tables and diagrams, such as line or bar graphs, to visualize your data. Creating visualizations of data allows you to convey what the data means and to make trends apparent. It's much easier to see positive or negative trends from a line chart, for instance, than when the data's sitting in a table. This is especially true when a lot of data is involved.
Other examples of data processing programs include search tools, like the ones that Google uses for images:
Image source: Google Images
You can use these to make finding information easier and faster and to specify what it is that you're looking for. For example, if you want an image in a certain color for a mood-board, you can find it using the color filter. If you want images taken before or after a certain date, you can use the time filter.
Different engines have different search tools based on what the search engine is used for. The search tools for an online academic journal, for instance, differ from the search tools that Google Images uses.
Some programs also have data filtering capabilities, which means that they can create and extract different subsets of data for users to work with. These subsets can be based on time (like only looking at results from the winter) or value (like only looking at values below 30 or only positive values).
One of the cool things that programs can do with data is to transform it! This is when you edit or modify data in some way to extract more information from it.
Data Transformation Examples:
- Modifying every element of a data set. This can be an arithmetic modification, although it isn't necessarily one.
- Ex. Multiplying each number by some constant value (Like if you wanted to convert a list of measurements from liters to millilitres.)
- Another non-arithmetic example is adding a grade level or class rank to a list of student records.
- Filtering a data set by category, as mentioned above.
- Besides time or value, data sets can also be filtered by quality, such as which extracurricular activities a group of students are in.
- Combining or comparing data in some way.
- Ex. Comparing the average SAT score of students going to all the colleges in one state and combining that data with average scores from other states.
- Creating data visualization tools.
- Ex. graphs, charts, and word-bubbles.
These tools are often used in an iterative and interactive process by users. You get to choose what filtering tools you want to use or what subsets you want to look at. You can also run data through data processing programs multiple times, depending on what information you want to look for. For example, you can look at data by the date it was collected, then sort it again by the location where it was collected from.
Manipulating data by combining, clustering or classifying it can bring out new information and patterns previously unseen in the raw data, making it a helpful tool for data analysis.
Some of the things we can discover by analyzing data are:
That's Big Idea 2: Data for you! Our next Big Idea Guide is a crash course to algorithms and (basic) programming.