R2 - Data Integrity

Subtleties of Color by Robert Simmon

The use of color to display data is a solved problem, right? Just pick a palette from a drop-down menu (probably either a grayscale ramp or a rainbow), set start and end points, press “apply,” and you’re done. Although we all know it’s not that simple, that’s often how colors are chosen in the real world. As a result, many visualizations fail to represent the underlying data as well as they could.

Read the blog series and optionally also watch the lecture.

Use the tag “R2” when you post your assessment of the readings and the questions raised.

Avishi jain

In the ‘Subtleties of Color’, Robert Simmons talks about the importance of the use of color in data visualization and how effective use of color can be extremely functional in conveying information and making a point regarding the dataset. For instance, the author talks about the use of color in the first images of Mars taken from the interplanetary probe wherein color was used to represent spatial datasets with multiple dimensions of quantity including individual atoms and cosmic background radiation. The writer states “Careful use of color enhances clarity, aids storytelling, and draws the viewer into your dataset. Poor use of color can obscure data, or even mislead.” When talking about some of the problems with the use of color in data visualization, he lays emphasis on the difference between the representation of color on screen and the perception of color by the human eye. He explains that one of the biggest problems with the use of color in representing data sets lies in the fact that computers display and interpret color very differently than humans. Firstly, they make use of the RGB system to represent colors while humans often interpret colors in terms of their specific characteristics namely lightness or value, hue and saturation or chroma. While the cones in our retinal cells can manipulate a broad spectrum of colors, computers can display colors that are a combination of very narrow frequency bands. Our eyes are also more sensitive to certain colors than others and may also perceive certain values and hues as brighter than others.The unevenness of color perception has been analyzed and resolved to a great extent by CIE that helps accurately translate color through different mediums and ensure consistent change across the entire color palette such that it becomes easier to represent data accurately using perceivable color ranges.

Based on what the writer defines as a ‘perfect palette’, he emphasizes on the need for color palettes to be consistent in the steps across the range of colors so that the change between any two steps is equivalent. Consistent relationships between colors on a scale help preserve the quality of the data and convey differences or variations effectively. He also explains that phenomena such as simultaneous contrast, an optical illusion that makes certain colors appear different (lighter or darker) when they are placed on other colors, in order to avoid misconceptions in the representation of data. In order to most accurately take advantage of the three characteristics of color, the writer advises the use of a linear and proportional change in lightness accompanied by a simultaneous but subtler change in hue and saturation. In this manner, the change in lightness helps represent patterns in data, the change in hue makes reading quantities easier and the changes in saturation magnify contrast.

Further in the series, the writer also talks about how the use of color palettes may vary based on the type of data that is being represented. For instance, sequential data is best represented using color palettes that have equal steps of variation from light to dark or vice versa. Divergent palettes on the other hand are better represented by two sequential palettes that have individual changes in hue and saturation across the values. In a divergent palette, data is often shows diverging or varying from a central data point such as temperature ranges from average temperature or profit and loss variations in the stock market and thus it is often more effective to use color to represent the particular increases and decreases on either side of the data set. For bipolar data, the two hues that are used should vary from a central neutral color as this aids the proper perception of the changes in the data on either side. Divergent palettes are often harder to represent because similarities in lightness may make data impossible to read for people with color blindness. Categorical data or qualitative data uses color to separate areas into distinct categories and usually, each color can then be associated with a specific category which makes such data easier to read. For larger sets of data with more categories, it is often helpful to use additional elements such as symbols, textures, patterns and labelled elements.

The writer also lays great emphasis on connecting color to meaning in various ways. Sometimes, complicated conventional color palettes such as those used in scientific visualization may not be easily understandable by the general audience and therefore it is better to use color palettes that are widely recognized by a large audience and cater to people’s general association of color based on culture or nature. For instance, representing ocean with the color blue and tree cover with the color green is more likely to be understood by the majority of an audience since these colors are conventionally associated with these particular elements. Layering datasets that communicate different but supporting information are extremely informative and using color palettes that differ in hue and saturation such that one set of colors is more muted than the other can be extremely helpful in accurately conveying the information. For data sets that depict a certain specific breakpoint or a drastic difference among a range of values, it is useful to keep the change in lightness consistent but use a sudden change in hue or value, perhaps a contrasting color, in order to depict the area of drastic change. The author also states that areas that do not represent any data should be treated as a background and use shades of grey, white or black so that they can easily be differentiated from the areas that represent data points. Sometimes, differences in data points and the range they are trying to communicate or changes in time period can result in changes in the way that the colors convey the data. For instance, the difference that color represents between foreground and background may be altered by changes in some aspects of the data.

Finally, the writer suggests Color brewer as a useful tool for creating suitable color palettes for maps and other data visualization graphics. After reading this article, I realized that the use of color in data visualization is far more important than I thought. Very often, my choices of color for graphs and maps have been rather arbitrary, having only seen color as a distinguishing factor that helps separate one shape, line or bar from another. However, I realize now that color can have significant underlying meaning and making full use of its different characteristics- lightness, hue and saturation- is extremely important. Knowing and understanding where each of these aspects can be used in data visualization and how differences in value and hue may be translated into differences in particular datasets efficiently is extremely important. Another important takeaway from this article is that color is extremely subjective and it is important to have knowledge of how a particular target audience perceives color in order to appropriately use color in data visualization.

Branden (Ji Hoon) Choi

All of this we have learned as a freshmen in Pratt in LCD class. However, It was interesting to read about color theory through the scope of data visualization. Talking about in terms of cartography and how color palette has to fallow basic and also has to fallow cultural associations and it has to work intuitively.

sequential palette is to show step by step gradient change. This palette is to picture gradual info changes. the reading mentions upsides and downsides of grey scale usage. They also say when you use color it is good to have two different hues to change gradually. a lot of color platforms can have short color range for certain colors.

Divergent palettes are useful when there are two different end of values that the user wants to compare to. This is why it is important to make sure the middle value color do not have any association with two ends of the palettes. Also, palette makers has to be careful about choosing color for two different ends. this is because color blind person cannot differentiate some colors. As a person who lays out info, people has to make sure all people can read your informations.

Qualitative palette is to convey difference between random values. tip to making this pallet is to avoid similar colors and usually colors under seven.(because when colors go above seven, it is easy to end up in colors that overlap each other). In the circumstances that your palette has to go above 7 it is good to group different factors you are trying to show and have similar colors with in that group.

Showing data is important but sometimes knowing how to show no data is as important as showing data. When we talk about showing no data usually we are talking about the back ground of your visual information. According to how you portrayed you data, different back ground colors can help or ruin your whole work. The background color has to disassociate itself from the any other colors on your graphic to be effective.

Lila Meyer

This series on color lays out three important lessons I inherently already knew, but maybe wouldn’t have been able to articulate myself: the discrepancies of how we perceive color in the physical world compared to digital screens and how to translate these values; the components of color to focus on (namely, I was surprised to learn that ‘lightness’ is the strongest of the HLS values); and that datasets can be simplified down into the three categories of sequential, divergent, and qualitative data. These are all considerations to keep in mind (that I, for one, have not been) each coming with it’s own set of rules and principles to follow in order to best represent the information, and all make up the central message of the piece.

I thought there was something interesting about the strictness of these rules, however - the author briefly alludes to their lack if subtleties in a section titled “Aesthetics” in Part 4. Everything presented in this reading is very scientific with a practical, neurological basis but here, Simmon addresses that design includes so many other formal elements (typography, line, shape, alignment, etc) and that even if color rules are followed to a T, it doesn’t necessarily make for a “great” or even “good” design. Personal judgement is another factor that doesn’t come with an instruction manual. I’d be curious to hear others opinions on the import of this, how much aesthetic judgement should matter in relation to the goal of displaying information accurately. Can a visualization be considered “successful” if it is entirely technically accurate, yet has no aesthetic value?

I would also be interested in a discussion of the dilemma that color-blindness presents: should this be a consideration in every piece of design? Should all works be accessible to everyone, even if the percent of colorblind people is only 5%? I don’t have an answer myself, though I would assume most people would say no, it is not vital to have all work be clear to everyone. I think it’s a noble goal though I think it would add an extra layer of difficulty to every project, and perhaps takes away from the possibilities that could be used if only designing for the other 95% of the population.

Daniel Salomon

In the blog series "Subtleties of Color," written by Robert Simmon, he explains the importance of color for highlighting data and finding patterns and relationships. In data visualization, color is a crucial factor that affects how data is perceived and understood. For the sake of conveying data in a more precise manner, color cannot be picked randomly from a drop-down menu; often, that is how people choose colors, and it negatively affects data perception. For this, there are multiple color schemes, which suit different types of data in ways that emphasize data accordingly and ease readability.

The human eye and computers perceive color differently. On the one hand, humans perceive color in a non-linear and uneven way and are more sensitive to changes in lower lightness levels rather than high lightness levels and more sensitive to green light, then red light, and lastly, less sensitive to blue light. On the other hand, computer colors are linear and symmetrical and utilize systems that are not so "sympathetic" to the way human eyes perceive color. One of the main issues is that computers use the RGB color system (Red, Green, Blue), and due to how humans perceive changes in these three colors is different depending on the lightness, colors may not translate accurately.

Because of the way the human brain works, different palettes are more efficient in translating data and value differences more accurately than others. In that manner, sequential data is best represented by a color scheme that varies continuously and with even graduation. Divergent data, which is known as a dataset with two opposing values, such as the difference in temperature or the fluctuations of the stock market, is best represented with a divergent palette. A divergent palette is originated by the combination of two sequential palettes, joint in the center and expanding evenly in opposing, yet mirrored graduations. Last but not least, categorical data does not intend to represent proportional relationships but distinct categories. For this, a range of distinct, non-related colors, yet similar in contrast, is chosen; nevertheless, there is the counterpart of perceptual limitations, allowing a maximum of twelve, often fewer, colors per set.

Also, it is necessary to think of the audience. Sometimes palettes commonly utilized in scientific visualization may be confusing for a more general audience, therefore being better to use color palettes that are relatable to a broader public containing associations of color based on culture or nature. Color is crucial to better understand and read datasets; however, many factors affect how humans perceive color versus how screens translate these values. Color is very subjective, and
it is essential to be discerning when choosing a color palette to address the data and the audience better.

Jeahun Jung

In “Subtleties of Color” author says what a powerful tool color is in visualization, how it clarifies data and helps people to understand complicated patterns. I find this reading interesting because it brought out a topic I already unconsciously have in my brain but simply haven’t realized it on the conscious level. There is a picture in the book representing bar of lightness, hue and saturation. I see these elements many times in Photoshop when I change colors of some objects. However, I did not know that it is a converted graph from computer’s colors perception to human colors perception. I even remember having difficulties creating color I wanted in P5.js. It was because computer and I had different perception of color. Computers perceive colors as linear and symmetrical, while humans perceive them as non-linear. Because we see colors in a non-linear way, the rainbow palette in chapter 2 accentuates two lines of bright cyan and yellow area, and a wide range of green.

Then, author mentioned why lightness is the most important form the visualization. With gradient changes, sequential data is able to show change easily. Apart from the sequential data, there are two data sets: divergent data and categorical data. Divergent data is perfect for representing two different sets of data and showing the middle of data, e.g. stock market fluctuations. It also shows the middle ground which doesn’t have any association with two different sets of data. Meanwhile, categorical data does not tend to show correlation of data, but different categories of data. To represent categorical data, it is important to use different color range to not confuse the reader. Other than that, author mentions good advice in terms of using color data visualization.

It was a great minder for me. For example, he said to use color to separate data from non-data. This is something I knew before, yet I didn’t think about it consciously. Since computer’s perception is different from human, it is important to humanize data visualization, so that the readers would be able to understand it clearly.

Anna Maguire

In the blog series, Subtleties of Color by Robert Simmon, he discusses the importance, power, and underlying issues with using color. Like most forms of retinal indicators, colors are complicated and become even more complicated when you begin exploring the psychology and mental processes that go into our perception of them. Interestingly, we do not perceive colors the same way as they are scientifically perceived. For instance, the cones in our eyes are RGB, but the way our brains interpret colors can be much more complicated, taking into account hue, saturation, brightness, and even contrast. Another aspect of this is how computers come into play, he writes “Computer colors are linear and symmetrical, human color perception is non-linear and uneven” I found this well said and interesting, when we use colors to help us use color, we are trying to apply a non linear approach to a extremely linear one. Both grayscale and color gradients hold power for data visualization but also can become problematic when it comes to them being used in an incorrect manner. A good thing to always take into account is to use a consistent change of gradient with each important value. Also, while using divergent color palettes, it is important to keep your colors saturated in order to keep their visual importance, desaturated colors often lose emphasis. The most interesting part of this blog to me was the discussion of intuitive colors. Intuitive colors are often meaningless and just become intuitive due to our preexisting mental schemas. For instance, the use of blue for chilly and red for warm. I find this interesting because of how often these intuitive colors are applied to scientific things and digested as science. In this case, the colorized images created by NASA, the images that are originally black and white are colorized by humans to increase visual information, yet the colors are intuitive to some extent. Color theory is an endless project, there is really no way to define the correct scientific use of color but there are definitely correct and incorrect approaches.

Yeojin Kim

In Robert Simmon's Subtleties of Color, Simmon writes about effective use of color and how it can be used to optimize data visualization in a world where computer colors are linear and symmetrical, whereas human color perception is non-linear and uneven. He outlines the principles behind the “perfect” color palette, describes different types of data that require unique types of palettes, gives some suggestions for mitigating color blindness, and illustrates some tricks enabled by careful use of colors.

Simmon states that of the three components of color (hue, saturation, and lightness) that lightness is the strongest, and as a result, clear, one-way changes in lightness are more important than those in hue or saturation. He moves on to state that different palettes are suited to specific types of data. For example, sequential data (data that varies continuously from a high to low value; such as temperature, elevation, or income) is most optimized by a color palette that shifts linearly from light to dark. Divergent data (data that varies from a central value/breakpoint) it is more important to differentiate data on either side of the breakpoint- merging two sequential palettes with equal variation in lightness and saturation is most optimal for expressing this type of data. For qualitative data (categorical/ thematic data) it is important to use colors that are distinctive from each other. The reading then elaborates on how colors can be connected to meaning.

I thought the reading was interesting because it was an attempt to define optimal color usage on distinct rules and principles, which to my mind, is something that is more intuitive than learned. Whenever I use color, whether it is for illustration or data visualization, I never really distinctly think about why I'm using the color I'm using outside of "well, it just makes sense in this context/ I feel like it describes the feeling I'm going for." It's interesting to read about a process that I think of as being very subconscious being defined and organized in a way that can be learned/ read.

Elena Sonnenfeld

The reading Subtleties of Color, Robert Simmon very clearly talks about the different ways in which color can be perceived and how it effects the way in which viewers interpret data. Humans view color in three different parts, hue, saturation, and lightness. Color has the ability to enhance data and draw the viewer in. It allows a simple visual to a possibly more complex story. By using color associations, a graph is able to build on what people already know and feel about a topic and emphasize it. For example, if a person was presented a graph about the ice caps melting and the colors used were blue and red, the person would automatically interpret that the red is indicative of heat and is a threat. If the graph was in black and white, it might not be as effective in conveying the emotion and the story. Simmon brings up three different types of data and makes the point that different data needs to have different color treatments in order for the data to make the most sense. The three types of data that he mentions are sequential data, divergent data, and qualitative data. Sequential data is best suited with light to dark color palettes. Divergent data has a breaking point in the middle which should be a neutral color and the two sides of the point should be contrasting. Lastly, qualitative data divides data into categories and these categories should use color to distinctly differentiate. One example that I found really helpful with this reading was in section four when the author starts to describe how he used different aspects of color to treat a heat map of the world. By using colors that varied in hue and saturation, he was able to create an almost three dimensional plane that showed the difference in heat. It allowed some of the data to recede and other parts of the data to stand out. This contrast was very helpful when trying to decipher what the graph was about. Color is very important when applied to data and has the ability to either enhance it and create a strong graphic or hinder it and take away from the data. It is important to know how to use color correctly so one’s work can be taken to the next level.