Visualizing Movie Data | VMD_01 | Mapping Movie Data

In 2008 we began reviewing movies. In the beginning we were doing that using points that lay between 0 and 10. At a later date we used a more narrative way. That led to detailed reviews of movies that we posted on Facebook. But over time was that too much work and it cost too much time so I decided to introduce a more accurate way of reviewing the movies. I started on 5 January 2015 to use 13 categories for each movie. Storyline, originality, cinematography, involvement, sound, editing, educational, title design, acting, interesting, unusual, exciting and superior. Every category earns a score between 0 and 10. A simple Processing program add’s all points together. And the result of this addition is divided by the number of categories. Then you get an average point for one movie. The aim of Visualizing Movie Data is to give us more insight into the choices we make in evaluating a film. Ultimately, this should tell more about ourselves than about the films. If that’s true, that would be a positive spin-off. Through Visualizing Movie Data I try different ways of visualizing data collected by us. And I get help from Ben Fry’s book ‘Visualizing Data’ published by O’Reilly Media Inc. Contrary to the Generative Design Variations project Visualizing Movie Data is not about making as many as possible variations. Now it is the intention that the movie data is used as functional and as basic as possible. I will skip any form of decoration. In this chapter I try to use a very simple and basic way of reading, displaying and interacting with a number of small data sets. These datasets consist partly of the data we found in our movie reviews.

I started looking for a world map. Eventually I found a world map that uses the Mercator projection. Mercator projection is a conformal cylinder projection with large surface deformations at higher latitudes. In this projection Europe is slightly larger which is an advantage because a lot is going to happen in this ‘small’ part of the world. Perhaps it is ultimately necessary to use a separate map of Europe as an insert. But I don’t know that at this moment. I decided to omit all color because I want to use the color for the markers of the countries. At this stage I only display a world map.

This is a first version in which the program reads coördinates from a text file to place red dots. I wanted to know if that worked. And If that works then it should also work with exact coördinates.

Now I need to gather a list of all 46 countries that have produced films. I happened to find a two-letter code list of countries through the International Organization for Standardization (ISO 3166 Country Codes). These abbreviations I can use to display in a later stage. The overall list is much too extensive so I have to select only those countries that have made films that we actually have seen from the beginning of 2015. If you look at this visualization there are a few things that stand out. In Europe it is very busy. Some countries are completely covered by a red dot. And some dots overlap other spots. Actually, it’s a mess. On all other continents there are only a few spots. It looks empty. Further, the overall color of the image looks too dark. I also have made an outline version of the dots. But this does not solve the problem. In Europe the small countries are slightly more visible but it is a minimal improvement and no real solution.

This version I’m going to make an insert for Europe. At least so I thought at that time. After I had made a rough sketch it actually delivered more problems than a solution. In the first place Europe is out of proportion when you compare it with the rest of the world map. Which is the case anyway at each world map. An exception is the projection of Goode. That has is an equal-area map projection. And in the second place an insert covers always a portion of the world map. Making that to be moved again. And actually you create five continents, which are all out of proportion. The question is whether that’s good. So it seemed best to temporarily leave everything as it is. And I solve problems when they are relevant. The dots are slightly reduced and everything becomes pretty clear. I have also added a title plus additional information which makes it even more complete.

A few years ago I have worked on patterns and photography in Processing. I could apply one of those patterns on the world map. Since a realistic world map is not possible anyway, it is just as good to make an abstraction of the world map. The question is now: is abstraction decoration? The world map is now completed with dots with a diameter of 4 pixels. The locations are displayed (as much as possible) in the middle of the country. Although the middle in some countries is hard to find. Where is the middle of the USA with its territories and various possessions?

Those dots are of course meaningless. They only give the central locations of countries where movies are made. It would for instance be more helpful when you could see where the most films are made. This would be able when you could vary the size of the dot. Large dot is a lot of movies. Small dots are a few films. This is a version that uses random generated numbers.

This version uses our movie data. The size of the dot determines the number of films from that country. It is immediately clear that most films come from France, USA, UK and Germany. This of course says nothing about the quality of the movie. It only shows information about the quantity. And it is only partly the truth as we will see later. And how will this visualization look like at the end of 2015 or 2016 as we have seen more movies. It was a surprise that France plays such a leading role in the field of film production. But it also raises immediately questions. Maybe we’re being manipulated? Might there be another variable which makes that France appears so high in the movie production?

It is also possible to interpolate between two colors, and make all dots the same size. I go for the low numbers in red and green for the high numbers. But I find that this version does not really show the smaller differences very well. In fact, there are only two green dots and two interpolations between green and red tending towards brown. The rest is nuances in red. And what does green-ish or reddish-brown mean then?

All countries of which we have seen five or more films are green. The largest dot on the world map indicates that we have seen much more than five movies. A small dot indicates that we have seen five movies. All countries of which we have seen less than five movies are red. A larger red dot indicates that we have seen almost 5 movies. A small dot indicates that we may just have seen only one movie. In short, very complicated and not very clear. And you really need some textual information here.

Of course you do want to be more accurate in displaying this data. So I’ve changed the program in a way that the abbreviated name of the country is displayed when you get to the dot of the country with the cursor. This goes not flawless though. Sometimes the name of the country disappears under a dot. At a certain point when the countries are small and close together both country names are being activated. I made a version of the Futura Medium 12 and with Futura Bold 12. I think that the Futura Bold version is better suited because it is more readable.

I have optimized the mouse interaction. Now, there are never two countries selected at the same time and it is also true that the name of a country is always drawn last. So it can never be overwritten by a dot.

I have replaced the floating numbers with integers. And I replaced all the colors by green. I think this is a reasonable version. You can easily see from which country most films come from. Ad the for the exact quantities you get feedback from the cursor.

I’m going a little deeper into the movie productions made in France, United Kingdom, Germany and the USA. I begin with France. That data does not look very spectacular. Apparently, most of the movies are filmed in Paris. In addition there are a few film locations in the south and center of France. Furthermore it seems that of the 47 French films that we have seen only 20 are filmed on a location in France. The other 27 are made through partnerships with other countries. And the film locations are all situated outside of France.

There is a problem with the data. When I run the program I get an ArrayIndexOutOfBoundsException: 8 error. I tried to find out what causes this error. It appears when you read data from text files that are of different lengths you get an error. That seems logical because in that way the arrays can not all be filled just the same way. If you have an array with 10 lines and another with 8 lines you get this error. And that was the case when I went to adjust the French data files for the United Kingdom data files. The 8 in the error message is the number of lines that were set aside for the array.

It looks empty in Germany. But I think it all will be fine in the long-term. Also, I think I’ve done something wrong. Some movies, of course, have several film locations. If I can trace those it is guaranteed that the image is getting more interesting.

Also this US version is not really interesting. But something else is happening. If the first two characters in my text file are not unique (and thus are duplicates in the list) the positioning of the dots and text goes wrong. And because the first two characters have no further function (but apparently have influence) it might be better to keep this form of abbreviations: AA, AB, AC, AD, AE, etc., and after AZ continuing with BA, BB, BC, BD.

I have now found all film locations for all American films we have seen since January 2015. I did a much too  superficially search so I found only 18 locations. Now I have 218 locations available. A number of them will not be used because the film locations are outside the USA. And there are several movies that play at the same location. So the final list will be shorter than 218 but longer than 18. Now I need to make a list and avoid duplication. All the movies filmed in Los Angeles are to be summed up to a total. All films in New York. All films of Detroit. And this goes for all the other cities and towns in the USA. I ended up with a final list of 87 cities and villages. Los Angeles has reached the top 61 productions. Which is actually not very surprising.

I now go one level deeper. I started on a global scale. Then countries scale. And I’m now going to work on an urban scale. After some research it seemed suitable to me to use the film locations of the series ‘Breaking Bad’. I display the abstract version of the map of Albuquerque in the background. And I used a reduced version of the Breaking Bad wordmark. But that doesn’t work at all. I also find the amount of film locations insufficient.

After some more research I could trace a lot more ‘Breaking Bad’ film locations. I changed the title and background colors. But I think it’s really ugly. When something is designed simple and basically it doesn’t have to look ugly. So I have to work on that.

This is the state in which I want to finish this exercise for the time being. All locations can now be read and the amount of scenes are shown after the location name. There is a title and a subtile. You can immediately see that Walter White’s House is the film location which is most used (with 80 scenes). Then Jesse Pinkman’s House follows (with 42 scenes). Hank and Marie’s House (with 29 scenes). The DEA offices (with 27 scenes). The Car Wash (with 22 scenes), Jesse Pinkman & Jane’s House (with 21), Gus’s Laundry Service (with 20) and Los Pollos Hermanos (with 17). Too bad I can not show this in a JavaScript version because that is unfortunately not working.

A rough conclusion: Data visualization is much harder than dreaming up nice effects like I did in the previous Generative Design Variations project. In data visualization you should limit yourself in order not to come up with lots nonfunctional decoration. A lot of time must be invested in research in the beginning, during and sometimes afterwards the design phase. And it is an iterative process. There are always improvements possible after the improvement. It also takes more time than a general design job. You must be very precise and constantly looking for better data and better interpretation and visualization of the data. Data visualization has a high level of detective work.


Comments? Leave a reply.

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s