Visualization of Big Data : 2015

Monday, December 7, 2015

Final Project

For my final project I analyzed the amount of meat recalled in 2014 (2015 is still not over, so the data may not be fully accurate as of yet). The type of meat most recalled was beef, with over 13 million pounds called into question for potential health hazards. The least recalled was ovine (sheep), barely cracking 27,000 lbs. I chose to display the data in this way so it would be easy to read and pack a lot of data into it.

The data set comes from the United States Department of Agriculture that lists everything from the species, type of contamination, and occasionally specific product that was recalled (ex chicken noodle soup, beef jerky, etc) (fsis.usda.gov). I didn't know much about meat recalls at first but after some researching, some of the information blew my mind. Recalling meat is at a food companies' discretion, even if it's government mandated-meaning even if they knew the product was tainted they could simply sell it anyway. Most of the contaminated meat is also never recovered (USA Today). Customers are buying and consuming it without knowing the meat can get them sick. Interestingly enough, poultry and pork had more recalls overall than beef, but the number of pounds of beef recalled comparatively speaking were much more. This research lead me to rediscovering a major beef recall last year that I had totally forgotten about (Food Safety News). Beef all over the country had been infected with e.coli, which the USDA classifies as a "Class I" recall (the strongest of the classes, meaning there is a reasonable chance the meat will cause health problems or even death). I eat meat fairly regularly, so this information was fascinating to me and has me intrigued to learn more about meat industry practices.

References

http://www.fsis.usda.gov/wps/portal/fsis/topics/recalls-and-public-health-alerts/recall-summaries

http://usatoday30.usatoday.com/money/industries/food/2007-12-02-meat-recalls_N.htm

http://www.foodsafetynews.com/2014/02/whats-going-on-with-the-massive-rancho-beef-recall/#.VmYfg_krKUk

Monday, November 30, 2015

Project 1 and 2

For this assignment I decided to measure the frequency between the organization types and the end date of their time using the program. As you can see I've used a scatter plot to display the data. From analyzing it I can now say with certainty that private use of the programs had the most longevity-they ended decades after other groups had. This graph also tells me that most of the program use generally ends shortly after 2010-while a few private buyers are still around the vast majority of usage stops there.

I chose this analysis because I thought it was one of the more coherent ways to make sense of a large set of data and might be something someone would potentially want to know-what kind of organizations were using the programs for the longest. I also chose a scatter plot because I thought it would work best to illustrate general trends for a large set of data-while the result does have its drawbacks it's at least very clean looking and easy to read.

Sunday, November 15, 2015

Assignment 12

This is a cleaned-up and improved version of the graph I did for Assignment 10. I liked Evergreen and Emery's strategies and thought they were well-reasoned and helpful. Their tips make graphs as easy to read as possible. I especially liked their points about color and especially the one about visualizations being accessible for people with color-blindness (I've known several people with that condition and think about them often when I see various designs).

Sunday, November 8, 2015

Assignment 11

My animation was a graph building itself off of the results of a coin toss. I was inspired by a blog I found that talks about how to use R efficiently and had code on this kind of animation. I changed some of the numbers around and watched it form. The code was much more complicated than anything I had seen before. It was very rewarding to watch the sequence animate itself however, and to realize the hard work coders put into relatively simplistic designs like these. I see animated GIFS all over the internet so it's very interesting to know some of the coding that goes into making them.

Sunday, November 1, 2015

Assignment 10

Unfortunately I couldn't get ggplot2 to install into R. I made another bar plot anyway, with made up data surrounding which presidential candidate students might want to vote for the democrat primary elections. I'm eager to look at my peers work to see what they came up with, and what ggplot2 can offer.

Sunday, October 25, 2015

Assignment 9

I had never worked with R before this assignment (though obviously I remembered hearing about it in previous lectures), so it was an interesting experience. I don't personally have much experience with coding so at first imputing what I wanted was challenging, but eventually I got used to it. It ended up fascinating to see the graph build itself before my eyes however, rather than just seeing the final product. I'm interested in learning more about R and bettering my understanding of it (including having a better understanding of how to insert colors-I tried to get each bar to correspond to each color but could not get it to work).

Saturday, October 17, 2015

Assignment 8

After generating the Chi-square results, here is what I have found:

-Goals: chi-squared equals 0.000 with a P value of 1.
-Grades: chi-squared equals 0.533 with a P value of 0.7661.
-Popular:chi-squared equals 0.982 with a P value of 0.6119.
-Sports: chi-squared equals 0.003 with a P value of 0.9987.

From the results, I gather that the biggest difference between the actual and expected results was how many students valued popularity the most in each group, and the least (actually no difference) was how many students valued goals.

Sunday, October 11, 2015

Assignment 7

Mean: 55,303,632.375 (FB), 36,042,208.5 (T)

Median: 57,963,191 (FB), 37,133,201 (T)

Standard Deviation: 15,979,901.476981508 (FB), 7,783,594.278588524 (T)

Displaying the data in a bar graph really conveys how more users overall are following celebrities social media through Facebook rather than Twitter. It's easy to look at and gain general knowledge about the data overall (Rhianna has the most Facebook likes, Shakira has the least amount of followers on Twitter, etc). However, there are downsides to this model as well. If the data is very close to one another, like Justin Bieber and Katy Perry's Twitter followers, then it's hard to tell which is greater.

Sunday, October 4, 2015

Assignment 6

For this assignment I chose to use the Wolfram Alpha program on my Facebook profile to see what kind of data it could produce from it.

Some of the information it gathered was more obvious, for instance that I currently live in Tampa and that I'm 21 years old. However the data that it gathered from my friends list was very interesting. It revealed the average age of my friends list (26), that most of my friends are women, that most of them are in relationships, and that with the exception of four people, most of my friends are mutual friends. If a company were to use my Facebook page to try and market to me, they would likely be able to appeal to most of my friends list as well.

Another insight the program gave me was that it noticed I rarely tag anyone in any of the photos I upload. This can occasionally cause problems since I won't remember who was in the photos later on, or where I was when it was taken. It also noted that most of my posts had been made within the past few years-I hadn't really used Facebook until I started college.

Sunday, September 27, 2015

Assignment 5

This map is made up of the data set provided in class, of homicides in Chicago.

Sunday, September 20, 2015

Assignment 4

Figure 1 is a Descriptive model. It's a summary of the given data, most often used as a table or graph. The data presented is organized and presented in a way that displays the most obvious features about it. From the charts we could reasonably deduce the mean, median, and mode of the data and see if there is any skewness.

Figure 2 is a Predictive model. The chart is predicting what the scores of the students and instructor will most likely be based off the given data. It cannot predict the future-however it can determine what might happen that includes risk assessments in its analysis. Predictive statistics typically help business owners understand their customers better, identity new opportunities for growth, or spot a potential problem. For this figure, it predicts the scores will lower based off previous data.

Figure 3 is a Inferential model. It draws conclusions based off a sample of a bigger data set. This particular figure wants to know if Rick Perry has a chance of winning in the upcoming primary election. Asking every registered Republican in the nation would be impossible, so the chart draws from a smaller sample-a poll. Measuring a sample of a bigger population draws conclusions about the population as a whole.

Sunday, September 13, 2015

Assignment 3

Robin Camarote's "4 Great Resources for Presenting Your Data Creatively" list resources that can help inspire anyone who needs to create a data visualization. Each source contains sample charts/graphs/other visualizations that help fully convey how differently information can be displayed. The article, like the lecture, explains that people need visualization in order to fully understand the full scope of big data. While regular bar graphs are appropriate for some projects, creativity for visualizations can also be rewarding.

Another article on Forbes, "Big Data Needs More 'Creative Types'", explains that the data science field should be populated with creative, arty people. These people, whom the article refers to as "data artists" are able to combine knowledge about statistics and problem solving skills to successfully portray a story out of large amounts of data. In the lecture it was explained that some data visualizations are able to point out certain trends or inconsistencies more than others. Data artists are able to discern patterns from information in unconventional ways, that most people simply are unable to do.

Denise Lu's "7 Data Viz Sites to Inspire Your Creative Eye" features a list of sites that display interesting, out of the box, visual displays of data. Some sites put more of an emphasis on the asethetics they offer while others are optimal for different types of data. For example, "Chart Porn" is frequently used for political and financial graphics. The lecture gave examples of several measurements of data (gender, Twitter users, etc) out of the many possibilities. Knowing about as many visualizations as possible makes it easier to think up new ways to display a given data set.

Referenced articles:

http://www.inc.com/robin-camarote/look-smart-with-inspiration-from-these-top-4-data-visualization-sites.html

http://www.forbes.com/sites/teradata/2015/01/30/big-data-needs-more-creative-types/

http://mashable.com/2013/10/01/data-viz-sites/#gkbjZh.qDuk_

Saturday, September 5, 2015

Assignment 2

The first program I tried to open the data with was Google Spreadsheets, since I had never used it before and was curious about trying it out. It kept crashing so I moved on to using Excel. From what I can gather, the document appears to be a list of programs used by various companies. The spreadsheet includes the acronym and id for the program and lists the company that uses it, along with their mailing address. It also includes two columns "start date" and "end date" that I'm assuming are for how long the company uses the program for. However, I'm not sure what a few of the columns are supposed to represent, including "duns number" and "ein."

Saturday, August 29, 2015

Assignment 1

Hi! I'm Bridget White. I'm really looking forward to learning how to use different kinds of spreadsheets. I was just mentioning to a friend a few months earlier that I'd love to learn how to use at least Excel so I'm excited that this class is using even more programs. The visualization that caught my eye the most was Kenneth Buker's example. It looks really intricate and unique-I almost never see data presented in that kind of way and I spent a few minutes looking at everything on it. It excites me to know that I'll be able to make something similar to that by the end of the class.