Materials and methods


We tested 48 cats (28 males and 19 females). Twenty-nine (17 males and 12 females, mean age 3.59 years, SD 2.71 years) lived in five “cat cafés” (mean number living together: 14.2, SD 10.01), where visitors can freely interact with the cats. The other 19 (11 males and 8 females, mean age 8.16 years, SD 5.16 years) were household cats (mean number living together: 6.37, SD 4.27). We tested household cats living with at least two other cats because the experiment required two cats as models. The model cats were quasi-randomly chosen from the cats living with the subject, on condition of a minimum period of 6 months cohabiting, and having different coat colors so that their faces might be more easily identified. We did not ask the owner to make any changes to water or feeding schedules.


For each subject, visual stimuli consisted of two photos of two cats other than the subject who lived together, and auditory stimuli consisting of the voice of the owner calling the cats’ names. We asked the owner to call each cat’s name as s/he would usually do, and recorded the call using a handheld digital audio recorder (SONY ICD-UX560F, Japan) in WAV format. The sampling rate was 44,100 Hz and the sampling resolution was 16-bit. The call lasted about 1 s, depending on the length of cat’s name (mean duration 1.04 s, SD 0.02). All sound files were adjusted to the same volume with the help of version 2.3.0 of Audacity(R) recording and editing software26. We took a digital, frontal face, neutral expression, color photo of each cat against a plain background (resolution range: x = 185 to 1039, y = 195 to 871) which was expanded or shrunk to fit the monitor size (12.3″ PixelSense™ built-in display).


We tested cats individually in a familiar room. The cat was softly restrained by Experimenter 1, 30 cm in front of the laptop computer (SurfacePro6, Microsoft) which controlled the auditory and visual stimuli. Each cat was tested in one session consisting of two phases. First, in the name phase the model cat’s name was played back from the laptop’s built-in speaker four times, each separated by a 2.5-s inter-stimulus interval. During this phase, the monitor remained black. Immediately after the name phase, the face phase began, in which a cat’s face appeared on the monitor for 7 s. The face photos were ca. 16.5 × 16 cm on the monitor. Experimenter 1 gently restrained the cat, looking down at its head; she never looked at the monitor, and so was unaware of the test condition. When the cat was calm and oriented toward the monitor, Experimenter 1 started the name phase by pressing a key on the computer. She restrained the cat until the end of the name phase, and then released it. Some cats remained stationary, whereas others moved around and explored the photograph presented on the monitor. The trial ended after the 7-s face phase.

We conducted two congruent and two incongruent trials for each subject (Fig. 1), in pseudo-random order, with the restriction that the same vocalization was not repeated on consecutive trials. The inter-trial interval was at least 3 min. The subject’s behaviors were recorded on three cameras (two Gopros (HERO black 7) and SONY FDR-X3000): one beside the monitor for a lateral view, one in front of the cat to measure time looking at the monitor, and one recording the entire trial from behind.

Figure 1
figure 1

Diagram illustrating each condition in Exp.1. Two model cats were chosen from cats living with subject. The model cat’s name called by owner was played through the speaker built into the laptop computer (Name phase). Immediately after playback, a cat’s face appeared on the monitor (Face phase). On half of the trials the name and face matched (congruent condition), on the other half they mismatched (incongruent condition).


One cat completed only the first trial before escaping from the room and climbing out of reach. For the face phase we measured time attending to the monitor, defined as visual orientation toward or sniffing the monitor. Trials in which the subject paid no attention to the monitor in the face phase were excluded from the analyses. In total, 34 congruent trials and 33 incongruent trials for café cats, and 26 congruent trials and 27 incongruent trials for house cats were analyzed (69 trials excluded overall). A coder who was blind to the conditions counted the number of frames (30 frames/sec.) in which the cat attended to the monitor. To check inter-observer reliability, an assistant who was blind to the conditions coded a randomly chosen 20{4e908c29df01d999f087e4f922633998e2ded1c72f05851cd6252034960daee5} of the videos. The correlation between the two coders was high and positive (Pearson’s r \(=\) 0.88, n \(=\) 24, p < 0.001).

We used R version 3.5.1 for all statistical analyses27. Time attending to the monitor was analyzed by a linear mixed model (LMM) using a lmer function in a lme4 package version 1.1.1028. We log-transformed attention time to get close to normal distribution. Congruency (congruent/ incongruent), environment (cat café/house), and the interaction were entered as fixed factors, and subject identity was a random factor. We ran F tests using an Anova function in a car package29 to test whether effects of each factor were significant. To test for differences between conditions, an emmeans function in an emmeans package30 was used, testing differences of least squares means. Degrees of freedom were adjusted by the Kenward–Roger procedure.

In addition to attention to the monitor, we calculated the Violation Index (VI), which indicates how much longer cats attended in the incongruent condition than the congruent condition. VI was calculated by subtracting the mean congruent value from the mean incongruent value for each subject. Greater VI values indicate longer looking in incongruent conditions. Note that we used data only from subjects with at least one congruent—incongruent pair. Thus, if a subject had one congruent/incongruent data point, we used that value for analysis instead of calculating the mean. Data from 14 household cats and 16 café cats were analyzed. We ran a linear model (LM) using a lmer function in a lme4 package version 1.1.1028. Living environment (café/house) was entered as a fixed factor. To examine whether VI was greater than 0, we also conduct a one-sample t-test for each group.

Results and discussion

Figure 2 shows time attending to the monitor for each group. House cats attended for longer in the incongruent than the congruent condition, as predicted; however, café cats did not show this difference.

Figure 2
figure 2

Time attending to the monitor during the face phase for each group in Exp.1. Red bar represents congruent condition; Blue bar represents incongruent condition. Left panel shows café cat data, right panel shows house cat data. The y-axis is log-transformed.

LMM revealed a significant main effect of living environment (\(\rm X\)2 (1) = 16.544, p < 0.001), and a congruency x living environment interaction (\(\rm X\)2 (1) = 6.743, p = 0.009). The differences of least squares means test confirmed a significant difference between congruent and incongruent conditions in house cat (t (86) = 2.027, p = 0.045), but not café cats (t (97.4) = 1.604, p = 0.110).

Figure 3 shows the difference in VI between groups. House cats had a significantly greater VI than café cats (F (1,28) = 6.334, p = 0.017). A one-sample t-test revealed that house cats’ VI was greater than 0 (t(13) = 2.522, p = 0.025) whereas that of café cats was not (t(15) = 1.309, p = 0.210).

Figure 3
figure 3

Violation Index for each group in Exp.1. Red boxplot (left) shows café cat data; blue boxplot (right) shows house cat data.

These results indicate that only household cats anticipated a specific cat face upon hearing the cat’s name, suggesting that they matched the stimulus cat’s name and the specific individual. Cats probably learn such name-face relationships by observing third-party interactions; a role for direct receipt of rewards or punishments seems highly unlikely. The ability to learn others’ names would involve a form of social learning. New behaviors or other knowledge can also be acquired by observing other cats31. Recent study has reported that cats learn new behaviors from humans32. However, we could not identify the mechanism of learning. It is still an open question how cats learn the other cats’ names and faces.

Environmental differences between house cats and café cats include how often they observe other cats being called and reacting to calls. Contrary to human infants who are able to disambiguate the referent of a new word among many potential ones33, cats might not do that at least in this study. Saito et al. showed that café cats did not distinguish their own name from the name of cohabiting cats whereas household cats did so, in a habituation–dishabituation procedure25. We extend this finding by showing that café cats also do not appear to learn the association between another cat’s name and its face.

We also asked whether the ability to recall another cat’s face upon hearing its name was limited to conspecifics. How about human family members? In Exp.2 we used household cats and re-ran the same experiment using a family member’s name and face.

A limitation of Exp.1 was that we could not analyze the effect of the duration of cohabiting with the model cat because this differed across cats, and in some cases the information was lacking (i.e., it was hard to track the exact length of time subject and model cats lived together, as the owner quarantined cats that didn’t get along with others.). We predicted that the longer the cat and human had lived together, the stronger the association between name and face would be, due to more opportunities to learn it.


By admin

Leave a Reply

Your email address will not be published. Required fields are marked *