Visualizing Movie Data | VMD_04 | Reviews by categories

As a next step, I find it interesting to see what our reviews are telling us when I show each category of a movie. I can imagine that the titles of the films are on the left. Suppose we start from the first 100 movies we have watched from the beginning of 2015? What does it look like? And what conclusions can we commit to? I’ll try programming this version slightly smarter than the earlier version.

I start by creating a grid of numbers. There are 13 categories (13 columns) with decreasing numbers from top to bottom and from 10 to 0. The size of the display window is a bit of guesswork. I now work on a size of 800 by 800 pixels. On the left side of the display window film titles have yet to be placed. And all 13 category labels should still come on top. I expect that I need much more space than 800 pixels in width and height. In the program I have added an empty draw block. Otherwise functions as keyReleased and timeStamp do not work.

Placing the film titles is a matter of creating a text file with 100 titles of films that we have seen since the beginning of 2015. Then read this text file into Processing and displaying it in the display window. The order (from top to bottom) corresponds to the viewing order. The list starts with the film ‘Boyhood’. Which is the first film that we saw in 2015. The list ends with the film ‘Restless’. And that’s the hundredth film we’ve seen. However, there is only one-third of the list visible. This is up to the film ‘Calvary’. And that is film number 38. Putting another 62 films in this display height makes no sense because the point size would become too small to read.

To get all the movies titles on the left in the picture, I have a few options. Reduce the line spacing. Reduce the point size of the font. Or I can increase the size of the display window. In this case I have used all three possibilities. I end up with 1500 x 1300 pixels. I also added the names of categories.

Another stage where I further optimize the distances. The category names (the labels of the columns) are still too far from the category columns. I’m going to put them closer and place them on an angle of 45º. The category numbers are now placed on an imaginary square. The display window is now 1460 x 1228 pixels. And the grid is built with squares of 90 x 90 pixels. Testing a first line which is drawn through the numbers who rated the film ‘Boyhood’. That does not look good. The lines are too stiff. It should be more fluid. VMD_04_04

In order to make more fluid lines I did one attempt with the curveVertex function. The problem here is that the curveVertex function uses Catmull-Rom splines. It does not make beautiful curves. In the end I opted for bezier curves. For the quality of the curve that is the best solution, but it requires more passes of data to describe the curve. Four anchor points and four control points per line. That means 13 x 8 points per bezier curve. That is 104 numbers for the first movie. Thus, in total there must be 10.400 points calculated to make the final visualization.

The first six films drawn using bezier curves.

I have now drawn 26 films with bezier curves. And it shows directly the weakness of this visualization method. Since all lines have the same color and thickness it is difficult to see which movie has scored which number in which category. At a later stage I will do something about that. But the problem is not completely solvable.

About half way with the positioning of bezier curves. I place the curves in a very straightforward way. I know that this can be done with more intelligence but I will not have time enough to solve this problem now. I think it requires an additional study which I might do in a later stage.

And about to place a fourth number of bezier curves.

All bezier curves are now positioned. On the left, it has become a pretty organized chaos. Looking at the line patterns you can conclude that most movies have brought us a 6, 7 or 8. What might also be said of our rating. Is our rating mediocre?

With all the lines in their place, it is now the time to bring in the Futura font. I have changed the background color to black. Font color is white. The color of the lines is gray with 50% transparency.

Time for a number of tests with line widths. Some are absolutely exaggerated. Others are functional. These variations also show that the number columns have to be written as a last item. Otherwise they will be overwritten by the bezier lines. And I shifted the column with movie titles slightly to create some space  between the start of the bezier lines and the end of the movie titles.

Trying to solve a problem that popped up in VMD_04_07. To what extent is it possible to get more distinction between the bezier curves themselves. I start with two colors. Red and green. There seems to be a strange effect to occur. When a certain amount of red and green lines overlap it creates an additional color. It looks like orange. At least that seems to be orange but if you make the lines thicker it seems to be some light version of something brown-ish.

Added a blue color. Now it seems that there are many more shades of additional color variations possible.

What happens if I make an ascending color scale from 0 to 360? I switch to color mode HSB. HSB is easier to work with (as a human).

Which movies have been honored with at least once the highest possible value of 10 points?

Which movies have been awarded with at least once the highest value of 9 points or higher?

And finally: which movies have been rewarded with at least once the highest value of 8 points or more?

A quick conclusion. I am tempting to say that if a film did not score one 8, 9 or 10 in the assessment it would be not a good movie. That means it is of a lower level than films who scored at least one 8. Or one 9. Or one 10. This visualization is showing the worst films of all 100 films we have seen since the beginning of 2015. In total these are only 27 movies. So a little over a quarter. That means that three-quarters of the 100 films that we have seen always had something of good quality in them. And that’s very reassuring. For the filmmakers, the film industry and for us.


Visualizing Movie Data | VMD_03 | Waltzing with Bezier

When I started this assignment I was interested in how much money is actually going on in the film industry. What costs a movie? What is the budget? How much money does it produce? And how do these figures compare with our ratings. I thought it was easy to check the data on the site of IMDb. But unfortunately all I found was very incomplete data. I checked all 150 films that we have seen since January 2015. And guess what. There are only 56 films that both show you the budget and the profits. In addition, all amounts are mentioned in different currencies. So I have to convert them to dollars or an other currency unit. Additionally, in all the movies descriptions that are not from the United States, there is almost no sign of costs and benefits to find. So I have to check at other websites if there is additional information.

After that extensive check this resulted in 69 films with complete financial information. I think I should leave out the series. These often run over several years and are applying varying budgets. While a film only runs once and receives just one budget. Another thing is that these figures represent only periods when movies are played. Some play longer periods than others. Because they are more popular they bring in more money. But that says nothing about the quality. Our list shows that there are only three films made which costs less than one million dollar. However, there are 14 films which benefits less than 1 million. I made two text files of them. One with the highest budget on the top. The other list has the highest gross at the top.

Then it is important to read the text-file into Processing and display it in the display window. A simple task. But that turned out to be more complicated than I thought. It comes down to that there is a lot of attention in the tutorials to get a text-file into Processing’s console. But how to get the data into the display window I could not find anywhere. I got my question answered 50% through the Processing Forum. And partly solved it myself. Been busy with it for one afternoon. And this is the first result. Not very impressive but all data that is in the text file is displayed in my Processing display window. And that was the first goal I had in mind.

The next step I need to take is to get the data lists separated. It should be possible to reposition the movie-titles, budget and revenue. If I cannot do that I cannot deal with the layout. Incidentally, at this moment the sort and reverse functions are quite handy. And I have changed the font to Futura Book.

How does the program know which budget and income are associated with a movie? That is a question for me too. For the two digit columns are mixed-up. The budget and the income lists are both sorted from large to small amounts. The budget list thus does not have the same order as on the income list. So I have added film titles both to the budget and the income list. In that manner it is easy to check for me if the lines of the budget is written to the right amounts of the income list.
VMD _03_03

Changed the background colour to a very dark grey. Furthermore, now the budget and income-lists are connected by a line to one another. Everything looks pretty cluttered. But that will change in the next design. What’s striking is that the biggest blockbuster has a horizontal line. ‘Interstellar’ with a budget of 165,000.000 dollar and a total income of 675,020.017 dollar.

I’ve started checking the film titles. Whether they are written correctly and without mistakes. All non-English-language film-titles translated. Les Petits Mouchoirs is Little White Lies. Loin des Hommes is Far from Men. Relatos Salvajes: Wild Tales. Marie Heurtin: Marie’s Story. Elddfjall: Volcano. And that is one side of data visualization. You must be an administrator, Sherlock Holmes, graphic designer, translator, animation designer and programmer at the same time. I have given the chart some more space. And the distance is increased to the lists of numbers. Which suddenly brings me to a new idea.

I now work in Processing 2. Its time to download the new Processing 3 and fund the Processing Foundation. That is the least I can do because I work daily with Processing. In Processing 3 you can use the Table Class. It’s easier to work with because everything is now in one text file.

The $ sign was added but I do not find it successful. Maybe find another solution. Right now you do not see what the amounts of the lists are. I know that the left-hand amounts are for the budget. The right column represents the amount of income.

Because ‘Interstellar’ is misrepresented I have thrown this film out. I think the columns should have proper labels. And I need room for doing that. I have also added the sequence of 0-10 to the right. The numbers 0-10 represent the ratings we have given to the films. The idea is that I’m once again going to draw the lines but now from the income-list to our ratings.

I have adapted the total graph a bit. Lines start and stop now slightly closer to the lists of numbers. I have added the vertical text ‘Amounts in American Dollars’. The overall chart remains somewhat chaotic but I think the result is not disappointing.

Added colour. I chose green for the films that cost less than their revenue. And I choose red for the films that have cost more than their revenue. Now the graph begins to show a disadvantage. Because the lines are thicker it is difficult to see to what amounts they belong.

I replaced the line function by the bezier function. Now it is better to see which amount belongs to which line. And the overall chart looks slightly smoother. Of the 64 films, 26 films have made a loss. 38 Films have made profits. Mr. Turner eventually made losses but was still on top of our rating. Locke is a movie made for 2,000,000 dollar. It made a profit of 5,000.000 dollar and received a 10 in our rating. The Salvation has cost 11,524.796 dollar. To our knowledge it has brought 5000 dollar (which I strongly doubt). But it still gets a 7 in our rating. In short, data visualization is very interesting, very time-consuming and precise puzzling. Actually I had to code the program much smarter. But that would cost even more time.