Visualizing Movie Data | VMD_08 | Performers names and full names

In the films that we have seen in 2015 (and a few years before) are almost always actors and actresses involved. The question I asked myself was: ‘Where do these actors and actresses come from?’ To answer this simple question I started importing a world map. I supposed to find the birthplaces of the actors worldwide so for a start a map seems to be a good start. Probably the color and size of the map is wrong, but I always can change that later. Gradually, I learned that it might be better to, instead of actors and actresses, only mention the directors of films. This is easier because directors usually consist of one person. It is certainly easier than, say, 25 actors in one movie. But than I found out that there are also films which are directed by several directors. That made me decide to name all directors. Mentioning the films is not very relevant because a director may have directed multiple movies. The downside is that you have to figure out this information yourself. Drop the data in a spreadsheet and check for typos. Another problem is that you can continue to add more and more columns of data because there is a lot of inconsistent information about actors, actresses and directors available. For example, the name of the actors. And the real name of the actors. And their parents. Plus the place where they live. And their birthplace. And the year of birth. Basically there is enough data to find. Finally I ended up with compiling a list of actors and actresses who appear in our list of 200 films from 2015. My original question: ‘Where do these actors and actresses come from?’ changed in comparing the name of the actor with his or her’s real name. The full name that is.

I started with a test where I import some actors with their known name and their real name in Processing. I especially made sure to import the longest name of the total list. I assume to write every name in a monospaced font because than you can compare the lengths easier. An i is not as wide as a w. And an o is less wide as a m. A monospaced font consists of characters which are all of the same width. And that goes for both uppercase, lowercase, numbers and punctuation. This first test demonstrates that the workflow is functioning. The list of actors with their name and full name are typed in a spreadsheet. This list is exported to a csv (comma-separated values) text file. That file is read into Processing. Through Processing I can create the layout. Furthermore, the number of characters are calculated from the length of each name.

Now it’s about time to think of the layout of the page. Firstly I use multiple names in two columns. But how many lines fit on a page? Maybe it’s a good idea to give the name in the right column a different color than the name in the left column? However, an urgent problem is that both columns are too long. At the size of a 1000 × 1000 pixels display window, you can load up to 47 names in height. But my complete list of actors includes 225 names. That’s almost five times as much. So I have to divide this text file in some way or another.

If I scale the list in the program at 21% the total file with names fits on one page. Perhaps there is a way in which the mouse is able to detect a name. And that name is than enlarged in the layout. But than this problem is still not solved because you loose the possibility to compare the names with each other. So I think a scrollbar is a better option.

Just to be sure I checked which is the longest name in the list of actors. And the longest name is: ‘Isabella Orsini Princesse de Ligne de La Trémoïlle’. An Italian actress who married in 2009, with his highness Prince Édouard Lamoral Rodolphe de Ligne de La Trémoïlle. An even longer name for a man who cannot act. And he does not have to in order to be able to survive. Here are some screen dumps where all names are displayed. Each name has a number in front that indicates the amount of characters in the name. On the right side of the right column are yellow numbers that indicate how many characters the real name is longer than the common name.

In fact, I could leave it at that. But I would like to make a version with a scrollbar. And I had never programmed a scrollbar so that’s a good reason to make one. Although it took a lot of time. But it is more worse when it took a lot of time and when I ultimately failed to make a scrollbar. In this case, I succeeded. These are two test files. To use the scrollbar I imported an image in which all names are displayed.

In this setup I used also a title and subtitles. The names are scrolling underneeth them. Actors by name length, actors by name en actors by full name. But perhaps the word performers is better because it covers both male and female actors. But the ultimate goal I have not found yet, ‘Where are the actors and actresses coming from?’ But that’s for later.


Visualizing Movie Data | VMD_03 | Waltzing with Bezier

When I started this assignment I was interested in how much money is actually going on in the film industry. What costs a movie? What is the budget? How much money does it produce? And how do these figures compare with our ratings. I thought it was easy to check the data on the site of IMDb. But unfortunately all I found was very incomplete data. I checked all 150 films that we have seen since January 2015. And guess what. There are only 56 films that both show you the budget and the profits. In addition, all amounts are mentioned in different currencies. So I have to convert them to dollars or an other currency unit. Additionally, in all the movies descriptions that are not from the United States, there is almost no sign of costs and benefits to find. So I have to check at other websites if there is additional information.

After that extensive check this resulted in 69 films with complete financial information. I think I should leave out the series. These often run over several years and are applying varying budgets. While a film only runs once and receives just one budget. Another thing is that these figures represent only periods when movies are played. Some play longer periods than others. Because they are more popular they bring in more money. But that says nothing about the quality. Our list shows that there are only three films made which costs less than one million dollar. However, there are 14 films which benefits less than 1 million. I made two text files of them. One with the highest budget on the top. The other list has the highest gross at the top.

Then it is important to read the text-file into Processing and display it in the display window. A simple task. But that turned out to be more complicated than I thought. It comes down to that there is a lot of attention in the tutorials to get a text-file into Processing’s console. But how to get the data into the display window I could not find anywhere. I got my question answered 50% through the Processing Forum. And partly solved it myself. Been busy with it for one afternoon. And this is the first result. Not very impressive but all data that is in the text file is displayed in my Processing display window. And that was the first goal I had in mind.

The next step I need to take is to get the data lists separated. It should be possible to reposition the movie-titles, budget and revenue. If I cannot do that I cannot deal with the layout. Incidentally, at this moment the sort and reverse functions are quite handy. And I have changed the font to Futura Book.

How does the program know which budget and income are associated with a movie? That is a question for me too. For the two digit columns are mixed-up. The budget and the income lists are both sorted from large to small amounts. The budget list thus does not have the same order as on the income list. So I have added film titles both to the budget and the income list. In that manner it is easy to check for me if the lines of the budget is written to the right amounts of the income list.
VMD _03_03

Changed the background colour to a very dark grey. Furthermore, now the budget and income-lists are connected by a line to one another. Everything looks pretty cluttered. But that will change in the next design. What’s striking is that the biggest blockbuster has a horizontal line. ‘Interstellar’ with a budget of 165,000.000 dollar and a total income of 675,020.017 dollar.

I’ve started checking the film titles. Whether they are written correctly and without mistakes. All non-English-language film-titles translated. Les Petits Mouchoirs is Little White Lies. Loin des Hommes is Far from Men. Relatos Salvajes: Wild Tales. Marie Heurtin: Marie’s Story. Elddfjall: Volcano. And that is one side of data visualization. You must be an administrator, Sherlock Holmes, graphic designer, translator, animation designer and programmer at the same time. I have given the chart some more space. And the distance is increased to the lists of numbers. Which suddenly brings me to a new idea.

I now work in Processing 2. Its time to download the new Processing 3 and fund the Processing Foundation. That is the least I can do because I work daily with Processing. In Processing 3 you can use the Table Class. It’s easier to work with because everything is now in one text file.

The $ sign was added but I do not find it successful. Maybe find another solution. Right now you do not see what the amounts of the lists are. I know that the left-hand amounts are for the budget. The right column represents the amount of income.

Because ‘Interstellar’ is misrepresented I have thrown this film out. I think the columns should have proper labels. And I need room for doing that. I have also added the sequence of 0-10 to the right. The numbers 0-10 represent the ratings we have given to the films. The idea is that I’m once again going to draw the lines but now from the income-list to our ratings.

I have adapted the total graph a bit. Lines start and stop now slightly closer to the lists of numbers. I have added the vertical text ‘Amounts in American Dollars’. The overall chart remains somewhat chaotic but I think the result is not disappointing.

Added colour. I chose green for the films that cost less than their revenue. And I choose red for the films that have cost more than their revenue. Now the graph begins to show a disadvantage. Because the lines are thicker it is difficult to see to what amounts they belong.

I replaced the line function by the bezier function. Now it is better to see which amount belongs to which line. And the overall chart looks slightly smoother. Of the 64 films, 26 films have made a loss. 38 Films have made profits. Mr. Turner eventually made losses but was still on top of our rating. Locke is a movie made for 2,000,000 dollar. It made a profit of 5,000.000 dollar and received a 10 in our rating. The Salvation has cost 11,524.796 dollar. To our knowledge it has brought 5000 dollar (which I strongly doubt). But it still gets a 7 in our rating. In short, data visualization is very interesting, very time-consuming and precise puzzling. Actually I had to code the program much smarter. But that would cost even more time.