Visualizing Movie Data | VMD_04 | Reviews by categories

As a next step, I find it interesting to see what our reviews are telling us when I show each category of a movie. I can imagine that the titles of the films are on the left. Suppose we start from the first 100 movies we have watched from the beginning of 2015? What does it look like? And what conclusions can we commit to? I’ll try programming this version slightly smarter than the earlier version.

I start by creating a grid of numbers. There are 13 categories (13 columns) with decreasing numbers from top to bottom and from 10 to 0. The size of the display window is a bit of guesswork. I now work on a size of 800 by 800 pixels. On the left side of the display window film titles have yet to be placed. And all 13 category labels should still come on top. I expect that I need much more space than 800 pixels in width and height. In the program I have added an empty draw block. Otherwise functions as keyReleased and timeStamp do not work.

Placing the film titles is a matter of creating a text file with 100 titles of films that we have seen since the beginning of 2015. Then read this text file into Processing and displaying it in the display window. The order (from top to bottom) corresponds to the viewing order. The list starts with the film ‘Boyhood’. Which is the first film that we saw in 2015. The list ends with the film ‘Restless’. And that’s the hundredth film we’ve seen. However, there is only one-third of the list visible. This is up to the film ‘Calvary’. And that is film number 38. Putting another 62 films in this display height makes no sense because the point size would become too small to read.

To get all the movies titles on the left in the picture, I have a few options. Reduce the line spacing. Reduce the point size of the font. Or I can increase the size of the display window. In this case I have used all three possibilities. I end up with 1500 x 1300 pixels. I also added the names of categories.

Another stage where I further optimize the distances. The category names (the labels of the columns) are still too far from the category columns. I’m going to put them closer and place them on an angle of 45º. The category numbers are now placed on an imaginary square. The display window is now 1460 x 1228 pixels. And the grid is built with squares of 90 x 90 pixels. Testing a first line which is drawn through the numbers who rated the film ‘Boyhood’. That does not look good. The lines are too stiff. It should be more fluid. VMD_04_04

In order to make more fluid lines I did one attempt with the curveVertex function. The problem here is that the curveVertex function uses Catmull-Rom splines. It does not make beautiful curves. In the end I opted for bezier curves. For the quality of the curve that is the best solution, but it requires more passes of data to describe the curve. Four anchor points and four control points per line. That means 13 x 8 points per bezier curve. That is 104 numbers for the first movie. Thus, in total there must be 10.400 points calculated to make the final visualization.

The first six films drawn using bezier curves.

I have now drawn 26 films with bezier curves. And it shows directly the weakness of this visualization method. Since all lines have the same color and thickness it is difficult to see which movie has scored which number in which category. At a later stage I will do something about that. But the problem is not completely solvable.

About half way with the positioning of bezier curves. I place the curves in a very straightforward way. I know that this can be done with more intelligence but I will not have time enough to solve this problem now. I think it requires an additional study which I might do in a later stage.

And about to place a fourth number of bezier curves.

All bezier curves are now positioned. On the left, it has become a pretty organized chaos. Looking at the line patterns you can conclude that most movies have brought us a 6, 7 or 8. What might also be said of our rating. Is our rating mediocre?

With all the lines in their place, it is now the time to bring in the Futura font. I have changed the background color to black. Font color is white. The color of the lines is gray with 50% transparency.

Time for a number of tests with line widths. Some are absolutely exaggerated. Others are functional. These variations also show that the number columns have to be written as a last item. Otherwise they will be overwritten by the bezier lines. And I shifted the column with movie titles slightly to create some space  between the start of the bezier lines and the end of the movie titles.

Trying to solve a problem that popped up in VMD_04_07. To what extent is it possible to get more distinction between the bezier curves themselves. I start with two colors. Red and green. There seems to be a strange effect to occur. When a certain amount of red and green lines overlap it creates an additional color. It looks like orange. At least that seems to be orange but if you make the lines thicker it seems to be some light version of something brown-ish.

Added a blue color. Now it seems that there are many more shades of additional color variations possible.

What happens if I make an ascending color scale from 0 to 360? I switch to color mode HSB. HSB is easier to work with (as a human).

Which movies have been honored with at least once the highest possible value of 10 points?

Which movies have been awarded with at least once the highest value of 9 points or higher?

And finally: which movies have been rewarded with at least once the highest value of 8 points or more?

A quick conclusion. I am tempting to say that if a film did not score one 8, 9 or 10 in the assessment it would be not a good movie. That means it is of a lower level than films who scored at least one 8. Or one 9. Or one 10. This visualization is showing the worst films of all 100 films we have seen since the beginning of 2015. In total these are only 27 movies. So a little over a quarter. That means that three-quarters of the 100 films that we have seen always had something of good quality in them. And that’s very reassuring. For the filmmakers, the film industry and for us.

Visualizing Movie Data | VMD_03 | Waltzing with Bezier

When I started this assignment I was interested in how much money is actually going on in the film industry. What costs a movie? What is the budget? How much money does it produce? And how do these figures compare with our ratings. I thought it was easy to check the data on the site of IMDb. But unfortunately all I found was very incomplete data. I checked all 150 films that we have seen since January 2015. And guess what. There are only 56 films that both show you the budget and the profits. In addition, all amounts are mentioned in different currencies. So I have to convert them to dollars or an other currency unit. Additionally, in all the movies descriptions that are not from the United States, there is almost no sign of costs and benefits to find. So I have to check at other websites if there is additional information.

After that extensive check this resulted in 69 films with complete financial information. I think I should leave out the series. These often run over several years and are applying varying budgets. While a film only runs once and receives just one budget. Another thing is that these figures represent only periods when movies are played. Some play longer periods than others. Because they are more popular they bring in more money. But that says nothing about the quality. Our list shows that there are only three films made which costs less than one million dollar. However, there are 14 films which benefits less than 1 million. I made two text files of them. One with the highest budget on the top. The other list has the highest gross at the top.

Then it is important to read the text-file into Processing and display it in the display window. A simple task. But that turned out to be more complicated than I thought. It comes down to that there is a lot of attention in the tutorials to get a text-file into Processing’s console. But how to get the data into the display window I could not find anywhere. I got my question answered 50% through the Processing Forum. And partly solved it myself. Been busy with it for one afternoon. And this is the first result. Not very impressive but all data that is in the text file is displayed in my Processing display window. And that was the first goal I had in mind.

The next step I need to take is to get the data lists separated. It should be possible to reposition the movie-titles, budget and revenue. If I cannot do that I cannot deal with the layout. Incidentally, at this moment the sort and reverse functions are quite handy. And I have changed the font to Futura Book.

How does the program know which budget and income are associated with a movie? That is a question for me too. For the two digit columns are mixed-up. The budget and the income lists are both sorted from large to small amounts. The budget list thus does not have the same order as on the income list. So I have added film titles both to the budget and the income list. In that manner it is easy to check for me if the lines of the budget is written to the right amounts of the income list.
VMD _03_03

Changed the background colour to a very dark grey. Furthermore, now the budget and income-lists are connected by a line to one another. Everything looks pretty cluttered. But that will change in the next design. What’s striking is that the biggest blockbuster has a horizontal line. ‘Interstellar’ with a budget of 165,000.000 dollar and a total income of 675,020.017 dollar.

I’ve started checking the film titles. Whether they are written correctly and without mistakes. All non-English-language film-titles translated. Les Petits Mouchoirs is Little White Lies. Loin des Hommes is Far from Men. Relatos Salvajes: Wild Tales. Marie Heurtin: Marie’s Story. Elddfjall: Volcano. And that is one side of data visualization. You must be an administrator, Sherlock Holmes, graphic designer, translator, animation designer and programmer at the same time. I have given the chart some more space. And the distance is increased to the lists of numbers. Which suddenly brings me to a new idea.

I now work in Processing 2. Its time to download the new Processing 3 and fund the Processing Foundation. That is the least I can do because I work daily with Processing. In Processing 3 you can use the Table Class. It’s easier to work with because everything is now in one text file.

The $ sign was added but I do not find it successful. Maybe find another solution. Right now you do not see what the amounts of the lists are. I know that the left-hand amounts are for the budget. The right column represents the amount of income.

Because ‘Interstellar’ is misrepresented I have thrown this film out. I think the columns should have proper labels. And I need room for doing that. I have also added the sequence of 0-10 to the right. The numbers 0-10 represent the ratings we have given to the films. The idea is that I’m once again going to draw the lines but now from the income-list to our ratings.

I have adapted the total graph a bit. Lines start and stop now slightly closer to the lists of numbers. I have added the vertical text ‘Amounts in American Dollars’. The overall chart remains somewhat chaotic but I think the result is not disappointing.

Added colour. I chose green for the films that cost less than their revenue. And I choose red for the films that have cost more than their revenue. Now the graph begins to show a disadvantage. Because the lines are thicker it is difficult to see to what amounts they belong.

I replaced the line function by the bezier function. Now it is better to see which amount belongs to which line. And the overall chart looks slightly smoother. Of the 64 films, 26 films have made a loss. 38 Films have made profits. Mr. Turner eventually made losses but was still on top of our rating. Locke is a movie made for 2,000,000 dollar. It made a profit of 5,000.000 dollar and received a 10 in our rating. The Salvation has cost 11,524.796 dollar. To our knowledge it has brought 5000 dollar (which I strongly doubt). But it still gets a 7 in our rating. In short, data visualization is very interesting, very time-consuming and precise puzzling. Actually I had to code the program much smarter. But that would cost even more time.

Visualizing Movie Data | VMD_02 | Time Series

A dozen years ago I heard the word ‘ubiquitous’ for the first time. I wondered for what it stands for. Looked it up and the word ‘ubiquitous’ means: present, appearing, or found everywhere. So these time series graphs are a type of graphs that you can find anywhere. Because this project is about visualizing our movie data I need three or more data sets. The idea is that I will compare these data sets with our own data set. I hope to find out how our qualifications relate to, for instance, the IMDb (Internet Movie Database), Metacritic and/or Rotten Tomatoes. Suppose I would like to see the first one hundred films compared to results of these websites I should be able to draw the necessary conclusions. This is going to be a lot of handwork. But that’s okay because I’am in a learning process.

I could imagine that you have the numbers 1 to 10 on the left side of the graphic. And at the bottom are all the film titles. That seems logical. But it is not. Movie titles may be very long. For example: ‘A Pigeon Sat on a Branch Reflecting on Existence’. So you would expect the film titles to the left side of the graph. And then, the numbers 1 to 10 at the bottom. At this moment I think the best solution would be if you place your mouse cursor on a data point that the movie title is displayed at that point. But maybe I run too much ahead of myself. I have now read the original data of Ben Fry’s Time Series chapter in the program and I changed the display format.

Let me concentrate on the data. The first thing you notice about the IMDb, Metacritic and Rotten Tomatoes reviews is that they work with floats. Our own movie data works also with floats but the end results in ints. So I actually have to run all 100 film programs again and see what the endresult is using float’s. When I have those results, I have to type them in a text file. And then I do the same with the results of IMDb, Metacritics and Rotten Tomatoes. I left out Metacritics in the end. It sometimes happens that we have seen a film but that it is not found on IMDb or Rotten Tomatoes. In that case, the film gets a zero. The first thing I noticed in our chart, which uses our own data, is that it looks quite messy. There is not really some logic to find in the positioning of the points. The reason for this is that our films are chosen randomly. This results in random positions for the positioning of the set of points. The sequence is the real sequence of the first 100 films we have seen in 2015 though. Furthermore, the points are positioned at the bottom. This is caused by the largest value in the other data series. Our data set ranges from 0.0 to 10.0. While the other two data sets a range from 5.1 to 46.4. Therefore these other two sets have still to be adjusted. But I do not have the right data for them yet.

At this moment I have added all the scores from all the IMDb and Rotten Tomatoes. I can now on hit the “]” key and the “[” key to go through the three different graphs. It all looks a bit scarce. But you do get an impression of how the scores are distributed. I’ve also added titles as a placeholder (We, IMDb and RT (Rotten Tomatoes)).

I have increased the number of films to 150. It now looks somewhat less scarce. Eleven films from Rotten Tomatoes are not evaluated. That makes them stand to zero. At the bottom chart of the chart. However, these films are evaluated on IMDb and by us.

At the bottom, I added the amount of films we have seen in numbers. I also reduced the white background space slightly. This ensures that everything is shown less cramped in the display window. It would even be better when you could read the titles of the movies instead of our numbering. But perhaps I can add that at a later stage. And perhaps not at all. Maybe. Because after all these graphs are only about comparing our voting behavior with IMDB and RT. A quick conclusion about it teaches that our differences are slightly wider spread. It ranges from 3.3 to 5.9 points. IMDB ranges from 4.1 to 9.3. Rotten Tomatoes series go from 4,5 to 9,8 (if you do not count the 0.0).

I have added horizontal and vertical grid lines that may be helpful to compare the data points better. On the left side of the graph are now the scores of 0.0 to 10.0 displayed. And as a result, there is no need for the positioning of additional tickmarks. The horizontal and vertical lines do their work instead. I think that score numbers are displayed too long. I have now four digits after the point because we are working with floats. The function ceil does not help in this case. Because that rounds everything off upwards. Floor rounds everything downwards. The feature I’ve used now is nf. This means that there is just one number after the point shown. I use two versions of the Futura. Futura Medium and Bold. Furthermore, I also labeled the numbers. That makes the chart clearer.

I now go ahead replacing the points with a line. Actually this is a bit rubbish. The scores of the films have nothing to do with each other. Each score of a film state is a value on its own. So there is no mutual connection with a line necessary. But as a variation it is perhaps interesting. I also changed the colors. The white field is replaced with a dark gray. Because then the colored lines stand out better.

In this version all scores are displayed on top of each other to see where the differences are. The title of the data sets should change with it if you choose another data set. But I don’t like it anyway. It is a poor and chaotic whole. So this seems to be not a good option.

I now have retrieved some items from one of the earlier sessions. The line connections remained blue and the points themselves are white. The points are most important so they are allowed to stand out. I’ve made them a little smaller. This has as a result that (when points are close to each other) they overlap each other less.

This proposal introduces rollovers. I now get feedback that I already can see on the x and y axes but much more precise. But actually you would like to see the movie title when your cursor is at a data point. I think I’m going to do that at a later stage. But I am unsure about it. I think it’s it’s more important that I get some sense of what you can do with the data.

I do have the feeling that the lines have become too dominant. Especially now that you’re getting direct feedback on the cursor. The lines are no longer functional. I will also try if I can make the middle block more squared. You lose that  the smaller rectangles are not square anymore. However, it does create more room in the width. I also reduced the proximity of the cursor and increased the point size of 10 to 12. And Futura Bold is used for the values under the cursor.

Replacing vertex in drawDataLine by curveVertex actually does not make much sense. The data points are most of the time so close together that no fluid line between the points can be made. But if you make a plane field to the lower right point right and the lower left point it makes more sense and it gives a different picture. The question then is whether the horizontal and vertical lines are still functional. So I have them  removed. I think this looks better than all the previous versions. And along with the feedback you get when you stand with your cursor on a data point it looks just fine.

I have made the background of the chart the same color as the background color. That gives a completely different picture. I initially had accentuated the vertical lines. But I think the horizontal lines can better be accentuated. These lead you too much more meaningful data. I have given the horizontal lines 50% transparency in the beginning. But afterwards I got a better result by decreasing the line width to 0.5 pixels. Which is basically logically impossible.

It seems silly to transform this graph to a bar graph. I must then let the program draw rectangles instead of one flat plane. But then I have a problem. Because I have 150 bars in a width of 600 pixels. This means that the width of one bar can be a maximum of 3 pixels or less. At 4 pixels, the total lower surface is filled again by overlapping bars. But with 3 pixels I think it’s just about acceptable and it even has some form of sophistication.

As a last proposal I introduced tabs for the three different data sets. But I found the Futura Bold far too heavy in these white tabs. So I opted for the Futura Medium.

Now I have to do a few more things. The white area behind the title is way too loud and is almost visually independent of the graph. Plus the bar chart layout is not the best I’ve seen so far. As a final detail I go back to the design of VDM_02_12. I now only use the Futura Medium. I also adjusted the color. I chose red and green. Two distinctly different colors. The strong contrast between the two colors allows the separation-line between the two planes extra stand out. And thus it seems to me that this session is finished. But there is one more thing.

I have made a very simple animation of the three datasets. The datasets of us, the IMDb and Rotten Tomatoes interpolate their points. Unfortunately, the interactive version is not available. I captured the animation so that there is atleast something to see.


Visualizing Movie Data | VMD_01 | Mapping Movie Data

In 2008 we began reviewing movies. In the beginning we were doing that using points that lay between 0 and 10. At a later date we used a more narrative way. That led to detailed reviews of movies that we posted on Facebook. But over time was that too much work and it cost too much time so I decided to introduce a more accurate way of reviewing the movies. I started on 5 January 2015 to use 13 categories for each movie. Storyline, originality, cinematography, involvement, sound, editing, educational, title design, acting, interesting, unusual, exciting and superior. Every category earns a score between 0 and 10. A simple Processing program add’s all points together. And the result of this addition is divided by the number of categories. Then you get an average point for one movie. The aim of Visualizing Movie Data is to give us more insight into the choices we make in evaluating a film. Ultimately, this should tell more about ourselves than about the films. If that’s true, that would be a positive spin-off. Through Visualizing Movie Data I try different ways of visualizing data collected by us. And I get help from Ben Fry’s book ‘Visualizing Data’ published by O’Reilly Media Inc. Contrary to the Generative Design Variations project Visualizing Movie Data is not about making as many as possible variations. Now it is the intention that the movie data is used as functional and as basic as possible. I will skip any form of decoration. In this chapter I try to use a very simple and basic way of reading, displaying and interacting with a number of small data sets. These datasets consist partly of the data we found in our movie reviews.

I started looking for a world map. Eventually I found a world map that uses the Mercator projection. Mercator projection is a conformal cylinder projection with large surface deformations at higher latitudes. In this projection Europe is slightly larger which is an advantage because a lot is going to happen in this ‘small’ part of the world. Perhaps it is ultimately necessary to use a separate map of Europe as an insert. But I don’t know that at this moment. I decided to omit all color because I want to use the color for the markers of the countries. At this stage I only display a world map.

This is a first version in which the program reads coördinates from a text file to place red dots. I wanted to know if that worked. And If that works then it should also work with exact coördinates.

Now I need to gather a list of all 46 countries that have produced films. I happened to find a two-letter code list of countries through the International Organization for Standardization (ISO 3166 Country Codes). These abbreviations I can use to display in a later stage. The overall list is much too extensive so I have to select only those countries that have made films that we actually have seen from the beginning of 2015. If you look at this visualization there are a few things that stand out. In Europe it is very busy. Some countries are completely covered by a red dot. And some dots overlap other spots. Actually, it’s a mess. On all other continents there are only a few spots. It looks empty. Further, the overall color of the image looks too dark. I also have made an outline version of the dots. But this does not solve the problem. In Europe the small countries are slightly more visible but it is a minimal improvement and no real solution.

This version I’m going to make an insert for Europe. At least so I thought at that time. After I had made a rough sketch it actually delivered more problems than a solution. In the first place Europe is out of proportion when you compare it with the rest of the world map. Which is the case anyway at each world map. An exception is the projection of Goode. That has is an equal-area map projection. And in the second place an insert covers always a portion of the world map. Making that to be moved again. And actually you create five continents, which are all out of proportion. The question is whether that’s good. So it seemed best to temporarily leave everything as it is. And I solve problems when they are relevant. The dots are slightly reduced and everything becomes pretty clear. I have also added a title plus additional information which makes it even more complete.

A few years ago I have worked on patterns and photography in Processing. I could apply one of those patterns on the world map. Since a realistic world map is not possible anyway, it is just as good to make an abstraction of the world map. The question is now: is abstraction decoration? The world map is now completed with dots with a diameter of 4 pixels. The locations are displayed (as much as possible) in the middle of the country. Although the middle in some countries is hard to find. Where is the middle of the USA with its territories and various possessions?

Those dots are of course meaningless. They only give the central locations of countries where movies are made. It would for instance be more helpful when you could see where the most films are made. This would be able when you could vary the size of the dot. Large dot is a lot of movies. Small dots are a few films. This is a version that uses random generated numbers.

This version uses our movie data. The size of the dot determines the number of films from that country. It is immediately clear that most films come from France, USA, UK and Germany. This of course says nothing about the quality of the movie. It only shows information about the quantity. And it is only partly the truth as we will see later. And how will this visualization look like at the end of 2015 or 2016 as we have seen more movies. It was a surprise that France plays such a leading role in the field of film production. But it also raises immediately questions. Maybe we’re being manipulated? Might there be another variable which makes that France appears so high in the movie production?

It is also possible to interpolate between two colors, and make all dots the same size. I go for the low numbers in red and green for the high numbers. But I find that this version does not really show the smaller differences very well. In fact, there are only two green dots and two interpolations between green and red tending towards brown. The rest is nuances in red. And what does green-ish or reddish-brown mean then?

All countries of which we have seen five or more films are green. The largest dot on the world map indicates that we have seen much more than five movies. A small dot indicates that we have seen five movies. All countries of which we have seen less than five movies are red. A larger red dot indicates that we have seen almost 5 movies. A small dot indicates that we may just have seen only one movie. In short, very complicated and not very clear. And you really need some textual information here.

Of course you do want to be more accurate in displaying this data. So I’ve changed the program in a way that the abbreviated name of the country is displayed when you get to the dot of the country with the cursor. This goes not flawless though. Sometimes the name of the country disappears under a dot. At a certain point when the countries are small and close together both country names are being activated. I made a version of the Futura Medium 12 and with Futura Bold 12. I think that the Futura Bold version is better suited because it is more readable.

I have optimized the mouse interaction. Now, there are never two countries selected at the same time and it is also true that the name of a country is always drawn last. So it can never be overwritten by a dot.

I have replaced the floating numbers with integers. And I replaced all the colors by green. I think this is a reasonable version. You can easily see from which country most films come from. Ad the for the exact quantities you get feedback from the cursor.

I’m going a little deeper into the movie productions made in France, United Kingdom, Germany and the USA. I begin with France. That data does not look very spectacular. Apparently, most of the movies are filmed in Paris. In addition there are a few film locations in the south and center of France. Furthermore it seems that of the 47 French films that we have seen only 20 are filmed on a location in France. The other 27 are made through partnerships with other countries. And the film locations are all situated outside of France.

There is a problem with the data. When I run the program I get an ArrayIndexOutOfBoundsException: 8 error. I tried to find out what causes this error. It appears when you read data from text files that are of different lengths you get an error. That seems logical because in that way the arrays can not all be filled just the same way. If you have an array with 10 lines and another with 8 lines you get this error. And that was the case when I went to adjust the French data files for the United Kingdom data files. The 8 in the error message is the number of lines that were set aside for the array.

It looks empty in Germany. But I think it all will be fine in the long-term. Also, I think I’ve done something wrong. Some movies, of course, have several film locations. If I can trace those it is guaranteed that the image is getting more interesting.

Also this US version is not really interesting. But something else is happening. If the first two characters in my text file are not unique (and thus are duplicates in the list) the positioning of the dots and text goes wrong. And because the first two characters have no further function (but apparently have influence) it might be better to keep this form of abbreviations: AA, AB, AC, AD, AE, etc., and after AZ continuing with BA, BB, BC, BD.

I have now found all film locations for all American films we have seen since January 2015. I did a much too  superficially search so I found only 18 locations. Now I have 218 locations available. A number of them will not be used because the film locations are outside the USA. And there are several movies that play at the same location. So the final list will be shorter than 218 but longer than 18. Now I need to make a list and avoid duplication. All the movies filmed in Los Angeles are to be summed up to a total. All films in New York. All films of Detroit. And this goes for all the other cities and towns in the USA. I ended up with a final list of 87 cities and villages. Los Angeles has reached the top 61 productions. Which is actually not very surprising.

I now go one level deeper. I started on a global scale. Then countries scale. And I’m now going to work on an urban scale. After some research it seemed suitable to me to use the film locations of the series ‘Breaking Bad’. I display the abstract version of the map of Albuquerque in the background. And I used a reduced version of the Breaking Bad wordmark. But that doesn’t work at all. I also find the amount of film locations insufficient.

After some more research I could trace a lot more ‘Breaking Bad’ film locations. I changed the title and background colors. But I think it’s really ugly. When something is designed simple and basically it doesn’t have to look ugly. So I have to work on that.

This is the state in which I want to finish this exercise for the time being. All locations can now be read and the amount of scenes are shown after the location name. There is a title and a subtile. You can immediately see that Walter White’s House is the film location which is most used (with 80 scenes). Then Jesse Pinkman’s House follows (with 42 scenes). Hank and Marie’s House (with 29 scenes). The DEA offices (with 27 scenes). The Car Wash (with 22 scenes), Jesse Pinkman & Jane’s House (with 21), Gus’s Laundry Service (with 20) and Los Pollos Hermanos (with 17). Too bad I can not show this in a JavaScript version because that is unfortunately not working.

A rough conclusion: Data visualization is much harder than dreaming up nice effects like I did in the previous Generative Design Variations project. In data visualization you should limit yourself in order not to come up with lots nonfunctional decoration. A lot of time must be invested in research in the beginning, during and sometimes afterwards the design phase. And it is an iterative process. There are always improvements possible after the improvement. It also takes more time than a general design job. You must be very precise and constantly looking for better data and better interpretation and visualization of the data. Data visualization has a high level of detective work.

Generative Design Variations M.6.6 Fish-eye view

This is the last exercise of the Generative Design Variations project. Technically I have modified 1219 programs. The programs generated 5957 images. And I have written 83 articles about this project which lasted from November 10, 2013 until  August 21, 2015.

Here is the summary from the Generative Design book: ‘The more nodes added to the graph, the more the network expands in all directions. One can zoom out in order to gain an overview, but the nodes, and especially their inscriptions, lose their legibility. To resolve this discrepancy, the area in which the graph is displayed can be distorted with a fish-eye projection. This functions like a wide-angle lens with a 180º angle. The elements in the middle of the display are still depicted at their original size, but te farther the elements are from the center, the smaller they are drawn–although they remain visible in the display. This kind of distortion can help maintain perspective without distorting information, especially with representations like force-directed layouts, in which the linking structure of the nodes is more important than their exact position. Projection always means taking the original coördinates of the points to be displayed (in this case the coördinates of the nodes) and calculating new coördinates. Only the latter coördinates are used to draw the object in the display.’ So far this summary. And here you can find the the original program.

JavaScript does not support the Generative Design library. Or vice versa. I think the class Processing XML data is also not supported. But I am not sure about that. Furthermore, there are some obscure things imported during the initialization phase:*,, java.util.regex.*, java.util.Calendar, java.util.Iterator and java.util.Map. Because this program is a tool and because I modified it not much I thought it was not necessary to put it online. But I have created a Flickr album where all the images are that I have made during this exercise.

This is still the same tool I used in the previous assignments. I only have chosen to use a little more functionality. And as I said in the previous exercise it is now my intention to use all functionality of the tool to create images. I repeat once again the basic elements of the tool (for myself):
– any element (or node) reflects an article from Wikipedia pages
– the themes are science, nature and society, art and culture
– in the original setup of the tool science is blue, nature and society is yellow, art and culture is purple
– the size of the element is decisive for the length of the article
– there should be an indication in the node that represents the amount of links
– another element should show that there are additional items.

I have made a few screen dumps from the arrangement as it was in the earlier exercise (M.6.5). So the colors I have used in this previous set-up are changed and do not correspond to the above list. Any node reflects a Wikipedia article. And the keywords that I would like to use are coming from people who are working in the science, nature, society, art and cultural world. I start with: Louis B Mayer (1885-1957), US film producer.

Keyword: Maria Callas (1923-1977), Operatic soprano. In this graphic are three theme colors. But I want to modify those because I think there are more possibilities. Science is now red, nature and society are yellow and  art and culture is blue. And looking a bit deeper into this linking mechanism it seems that Maria Callas was little concerned with science. That is why there is only a bit red present in the image. The majority is art, which is blue-ish. Of course it depends on how many nodes you click. But there also are interesting relationships noticeable. For example, what has Maria Callas to do with L-Dopa? I continue to change colors during the following exercises.

Keyword: Ralph Waldo Emerson (1803-1882), US writer. Because the colors are interpolated it seems better to me to choose three very different colors. I have made science red, nature and society white, art and culture is blue. The disadvantage of white is that it brings back all bright colors to a shade or a tint of that color. So I replaced white with magenta. Probably this is not a good choice but I think yellow is still good. Although I think it’s a very ugly color combination. I experiment now by exaggerating the shapes. Which also leads to terrible results if you ask me.

Keyword: Alma Mahler Werfel (1879-1964, Wife of composer Gustav Mahler, then architect Walter Gropius, and finally writer Franz Werfel. There are now three bars one above the other. The white bar is there only if there are additional items. The middle bar represents the themes and the lower bar is there if there are any links.

Keyword: Arthur Wellesley 1st Duke of Wellington (1769-1852), British general and statesman. What happens if I drop all these circles and just start with texts and lines. Maybe I should also cut the arrow points for a moment but it could well be that they eventually disappear forever. Finally, the main thing that I am interested in is the article’s name. So I’m going to bring this chart all the way back to its essence.

Keyword: George Sand (1804-1876), French novelist and memoirist. I have not much used the feature that lets you specify colored lines. That gives a completely different picture.

Keyword: Ludwig Mies van der Rohe (1886-1969), German-American architect. Color was added to the central point that indicates that there are back links. I think the point maybe slightly larger. Actually, I make the same mistake again. The emphasis is now on the nodes. While I would like the emphasis to be on the texts.

Keyword: Eva Duarte de Perón (1919-1952), Actress and First Lady of Argentina. I have associated the colors with the text. So now you see directly in which group the texts fall. Over time the colored texts disappear but the context of the categories is taken over by the colored dots in the nodes.

Keyword: Groucho Marx (1895-1977), Comedian. I tried to find out how many links are possible. But only so that it all remains manageable. I also think that those black circles in the background do a great job. I leave them in.

And let’s finally spend the last keyword to the woman to whom we owe all this programming: Ada, Countess of  Lovelace (1815-1852), English mathematician, writer and pioneer of computing. And the Wikipedia link is not working. What is wrong? Nothing! Just kidding!

Generative Design Variations M.6.5 Semantic text analysis

Here is a copy of the summary from the Generative Design book: ‘Now the nodes can tell us how significant a Wikipedia article is. Other than the title, there is still no way to infer the content of the article. It would be useful when, for example, the color of the node reflected its thematic affiliation–if the article is about science, art or culture or of it concerns geographic or political subjects. Unfortunately, Wikipedia does not supply this information. It is possible, however, to implement a simple semantic text analysis. This means that keywords in the text are defined and counted. The more often the keywords of a particular subject appear and the less frequently others do, the more likely it is that the article is about that one subject. If colors are assigned to subjects, then the frequency of the keywords can be used to interpolate between these colors.’ So far this summary. Here is the original program.

JavaScript does not support the Generative Design library. Or vice versa. I think that the Processing XML data is also not supported. Furthermore, there are some (for me obscure) things imported when initialized:, java.util.regex, java.util.Calendar, java.util.Iterator and java.util.Map. This program is a tool and that is why I thought it was not necessary to put my versions it online. I did not change a lot. Most changes are about the visual look of what the tool generates. I have created a Flickr album where all the images are that I have made during this exercise.

This exercise is about the functional use of color. Furthermore, I’m going to try if I can do something about the readability of the text. During the earlier exercise (M.6.4) I had chosen to take design related keywords. Let me choose this time for computer-related words. I begin with the keyword: Computer. Furthermore I switched on colorize nodes. And than some strange things happen. I start the program and it does not work. Only after 3 restarts, it works. I try it again. Now the program works only after starting twice. That’s one thing. Another thing is that I have reduced the some nodes (resultCount) from 50 to 5. The first time I get the following nodes from the keyword ‘Computer’: Kermit (protocol), Computer Monitor, EGB, fighter aircraft and Alan Turing. The second time I get the following nodes: Digital object identifier, Interactive fiction, Floppy disk, first-person shooter, Lisp (programming language) and Digital object identifier. The third time I run the program: Computer data storage, Mac OS, BUNCH, BASIC, and Nano Engineering. So I keep getting different links (nodes) with the same keyword. I’m not sure if I like that a lot. Concerning color, we have three groups: Science is blue. Geography and politics are yellow. Culture and art is purple. The colors are fine. I’ve just made them a bit brighter. And it is still true that I have to start the program at least three times before it works (or runs).

Keyword: Software. I have to think of something that I can improve the readability of the text. I find those gray texts are not (or hardly) legible. So I removed the black rectangle beneath all the texts. Just like I did in the previous exercise. With regard to the size of the font you simply turn off auto zoom. And then it reads all just fine. With auto zoom turned on text will automatically enlarge or reduce. On a certain moment (very large or ver small) the texts are unreadable. To increase the size for all texts (to allow them always be legible) is tricky because on a certain moment you cannot see the graphics anymore.

Keyword: Hardware. I want to see if I can get rid of those circles. I do a search in the program itself and actually all circles are generated in the Wikipedia Node class. I first made a version in which I replaced a portion of the circles by squares.

Keyword: Malware. The color scheme of science is now white, geography and politics are now red. And culture and art is now blue. All shapes now consist solely of squares. Changed the colors again. Science is now blue, geography and politics are now purple and culture and art is now red.

Keyword: Algorithm. I have changed the squares into rectangles. That creates a less chaotic image than the previous images. But if that is any better?

Keyword Boolean. Maybe it’s good to go back to the original version with circles. And maybe those circles do not have to sit on each other but close to each other. Or maybe side by side. This also means that some colors will flow into the color of their neighbor. And I think an outline version also could be an option. But for now I only will give the center dot (for back links) a white outline. There is still one problem with the color of some texts. Some stay gray. Others remain white. I also think that the node’s cricles should be a little closer to each other.

Keyword: Programming language. I think it is time to sort out a few things. I feel that I have strayed a bit. In the original program, each circle reflects a Wikipedia article. The arrows between the articles show whether an article is linked to another article. Then there are three themes. Blue is science, nature and society is yellow and purple is art and culture. The size of the circle indicates the length of the article. And the thickness of the outer ring represents the number of links in the article. The dot in the center indicates if there are multiple items. So I think I should go back to these principles. Maybe I can make another setup.

Keyword: Virtual machine. I have to change the names of some variables in the program. As ‘s’ and ‘b’ and ‘d’ tell me so little about their functionality. I now have a some circles commented out. The circles (who are on), I replaced by rectangles. Eventually I’ve replaced the last circle to a rectangle. And now all the information is translated from circle to rectangle. While this does not give me an image which is wrong it is not entirely true. I do have the impression that it is an improvement on M_6_5_01_GDV_04. But I really want to try something else.

Keyword: Processor. Actually, I want to get rid of the circles because it does not give a proper interpretation of the data. Why? Then I suggest to buy Alberto Cairo’s book: ‘The functional art’. He explains the circle problem better than I can. At first glance, this seems more like a London Underground Map. What I want to try is not to make a square but a horizontal line. I now have two horizontal rectangles. The lower rectangle has a color which  is a percentage of the upper rectangle color. If there is no bottom rectangle, then there is no link. The quantity of links I cannot measure now. And the texts have to be moved out-of-the-way.

Keyword: Microprocessor. I was not able yet to figure out how I could make the visual translation to the quantity of links in the lower rectangle. But you can clearly see if there are any links. So this exercise was not 100 percent successful. But perhaps 75 percent. Let’s see if I succeed to 100% in the next exercise. I’ll also try to bring more structure to the layout of the nodes (as I’ve tried in M_6_4_01_GDV_06). That has to be better in the next assignment.

Generative Design Variations M.6.4.1 Visualizing proportions

Here is the summary of the Generative Design book: ‘Until now the nodes have all looked the same, although they represent completely different Wikipedia articles. For instance, it is not apparent if a represented article is long or short, how many left in it refer to other articles, or if the article itself is refered to. It would make sense to draw the larger nodes when the articles are longer. It must be noted that the quantitative information is recognized by the area of the graphic element and not by its radius. A ring is drawn around the node with a width indicating how many links have yet to be displayed with the arrows originating from the node.’ So much for the summary. And here you can find the original program.

JavaScript does not support the Generative Design library. And I am not sure but I think the Processing XML data class is also not supported. Furthermore, there are some (for me) obscure things imported in this tool:, and some java utilities. And because this program is a tool I did not find it necessary to put the programs online. I have created a Flickr album where all the images can be found that I have made during this exercise.

Extremely annoying to say but to be frank this tool does not work. It displays the keyword Design. And you can add a new node. But that’s all. I started the program several times. That worked sometimes during using the previous program (M.6.3.1). After three times the program than suddenly started. But that does not work now with this program. I have figured out three different ways I can still use this tool. The first way is to import the changes I made in M.6.3.1 (the previous program) and look what happens. The second way is to build on the previous program (M.6.3.1). And then forget about the tool. And the third way is to try to make my version of the tool. That program would have absolutely nothing to do with what is described in the Generative Design book. But that would obviously be sad. Since I have been able to use all the programs so far. I first replaced in the WikipediaNode class all http:// links by https://. And then I was surprised that the program worked normal. I get no more errors in the console. So the problem seems to be solved easily. I reported that on the Generative Design website. Ah … and this could also be one of the causes. I work with a Wacom tablet and a pen. When I start the program with the pen it sometimes does not work. When I start the program with the mouse on the Wacom tablet then it starts most of the times. Sounds very vague. But it is what it is.

As usual I started to type all the book-comments into the code. I also downloaded the complete font family MISO. MISO is an architectural lettering font completed in 2006 by Mârten Nettelbladt. I wanted to test whether any of these fonts (or a combination of them) is also suitable for use in this program. But unfortunately this did not lead to better results. I did not choose for a different keyword. That will happen in the following exercises. The keyword for this exercise is ‘Design’. I am now going to set the first variables. Auto Zoom is true. Invert background is true. I do not know what the variable resultCount represents. So I’m going to increase it from 10 to 50. Ah… resultCount is the number of nodes with which the program starts. I put this back to 1 for now. Then it is much easier to track where all nodes link to. I think a resultCount of 500 is the limit for this program. That number is also as stated in the API lines of the WikipediaNode class. There also occurs a strange side effect in the graphics of this program. The design node begins to vibrate violently when I use high numbers of nodes. I have no idea why. I also removed all color information from the image. Bringing color in the graphics will be the next assignment.

Let’s use design related keywords for these exercises. I will use the keyword ‘Designer’ for this exercise. That’s a tiny difference from the previous keyword: Design. But there is a big difference in the end-result when you put the spring length at 10 or 500. I can influence that amount via the GUI. So these differences are even getting bigger. I reduced the minimum setting. And the doubled maximum setting in the GUI. The disadvantage is that the texts in many nodes can no longer be read. So you may have to zoom in. Actually, this typographically not very wise. Perhaps the texts had to decrease more slowly when creating more nodes. Or switch off automatic zoom.

Keyword: Industrial Designer. I have put the spring stiffness very low. This way you can make beautiful organic structures. If you want to go that way. The tool can also create unexpected node clusters.

Keyword: Design Research. What happens if I double the slider to for the spring stiffness again? Then everything becomes even more interesting. Only the readability goes steps backwards. In this case, I do not care but my clients will appreciate it less.

Keyword: Graphic design. Spring strength and spring stiffness are at their maximum. Spring damping is on its minimum. This results in very fancy circles which are formed by the nodes. As you change the settings the circles are getting more organic again.

Keyword: Industrial design. I Increased the node radius to its maximum. Once to its minimum and one time half of the slider. I also increased the spring length. At a certain moment all nodes pile themselves up. Therefore it seems that there are fewer nodes present. But by playing with the settings the nodes are getting visible again. M_6_4_01_GDV_06

Keyword: Designer drug. Brought every setting to its maximum. But wait … that’s a reassuring warning in my console: Error during asyncHTMLLoad – but no problem.

Keyword: Design Patent. Brought the line weight to an extreme setting. This results graphically in nice images. But I really should do something about the size of the texts. I’m going to try to solve that problem in the next assignment.

Keyword: Software design. Just a variation in which I made the line thickness excessively thick. It is less refined but not really wrong. Plus some screen dumps of the moments that I’ve zoomed in really deep on a constellation of nodes and springs.

Keyword: User interface design. Actually, the gray text on black rectangles is not readable. Perhaps there is also room for improvement for that issue in the next assignment.