How we hate and how statistics can help

Researchers in the University of Chieti-Pescara are developing statistical methodologies to analyse Italian hate speech data (more and more present in social networks, in particular, thanks to an Italian political party). I’ve asked Alice Tontodimamma, a PhD student there, to introduce this subject, which is not only interesting but extremely useful in these years. Here’s her contribution.

The exponential growth of social media has brought an increasing propagation of hate speech and hate-based propaganda. Hate speech is commonly defined as any communication that disparages a person or a group on the basis of some characteristic such as race, colour, ethnicity, gender, sexual orientation, nationality, religion.

It is a natural activity, in societies where freedom of speech is recognised, for people to express their opinions about certain subjects. Evidently the development of social media has created new means for people to communicate their ideas and share them with others: we have moved from an era in which individuals could communicate their ideas only to a small number of other individuals, usually orally, in a meeting place such as the town square, to an era in which individuals can make free use of a variety of diffusion channels in order to communicate, instantaneously, with people who are at a great distance. Besides, more and more users take advantage of these platforms not only to interact with others but also to share the news.

The detachment created by being enabled to write, without any obligation to reveal oneself directly, means that this new medium of virtual communication allows people to feel greater freedom in the way they express themselves; unfortunately, there is also a dark side of this system: social media have become a fertile ground for heated discussions, usually resulting in insulting and offensive language usage.

The ease with which hate can be spread is not, nowadays, a phenomenon confined to the internet; it influences real society and can affect individual behaviour: countries are recognizing hate speech as a serious problem. This has led to a number of international initiatives being proposed, aimed at qualifying the problem and developing effective counter-measures. In this context, it is not surprising that most existing efforts are motivated by the impulse to detect and eliminate hateful messages or hate speech.

That is why increased attention and accordingly a continuously growing publication rate in the research area of hate speech may be observed. A wide variety of disciplines, among them Social Science, Psychology, Statistics, and Computer Science, are engaged in research into hate speech.

fig1

Until 2011, the publications on the topic remained limited, with less than fifty publications per year. Since 2008, an increasing number of publications could be observed every year, with a peak of publications in 2018

The question remains: will this growing trend continue during the next years?

The Price’s law states that the development of science goes through four phases. In the first phase (known as the precursor) a small group of scientists begins to publish research into a new field. The second phase is the proper exponential growth, since the expansion of the field attracts an increasing number of scientists, as many aspects of the subject still have to be explored. In the third phase, there is a consolidation of the body of knowledge, which is followed by a decrease in the number of publications. The growth of scientific production becomes linear, so, ultimately, the aspect of the curve transforms from exponential to logistic. The fourth phase corresponds to the collapse of the domain and the important reduction of publication.

It can be said that research about hate speech has probably now entered the second phase of development: an increasing amount of research is being published, but there is still room for improvement in many aspects, among them the need for statistical methodologies and software tools to enable hate speech to be identified automatically, and then enable effective counter-measures to be created.

Institutions and companies agree on the importance of automatic detection of hate speech. In recent years, the European Union has developed a number of programs for preventing the appearance of hate speech online, and various companies and platforms have a clear interest in the detection and removal of hate speech: for instance, newspapers need to attract advertisers and therefore cannot risk becoming known as platforms for hate speech; social media companies wish to maximise the quality of communication service that they offer to their users.

There is, in general, and especially in Italy, a lack of systematic monitoring, documentation, and data collection for online hate speech. Furthermore, it is rare to find works with open source code, and also no open source tools are available for the automatic detection of hate speech. 

Regarding the main aspects of previous research, we can say that:

– research generally focuses on datasets containing messages collected from social networks: the most commonly used source is Twitter;

– the most frequent approach consists in building a Machine Learning model for the classification of hate speech: the most widely used algorithms are SVM, Random Forests, and Decision Trees;

– the most widely used language is English;

– researchers tend to begin by collecting and classifying new messages; often those datasets remain private.

It is evident that the detection of hate speech involves much more than simple keyword spotting. For this reason, we list the main difficulties:

– authors do not use public datasets, and do not publish the new ones they collect: this makes it very difficult to compare results and conclusions;

– a low rate of agreement (33%) in hate speech classification by humans, indicating that such classification would be an even harder task for machines;

– the task of annotating a dataset is also more difficult because it requires expertise about culture and social structure, and the evolution of social phenomena and language makes it difficult to track all racial and minority insults.

This is undoubtedly an area that has profound societal impact and which presents many research challenges.

Advertisements

The first attempts to visualize data

One of the most important thing in a statistician’s job is data visualization. A graph can make a difference in a publication and pictures are essential when statisticians meet non-statisticians.

Data visualization is considered a subject so important that the Royal Statistical Society proposed a call for papers in 2017 for an extended discussion meeting which was held at the annual RSS conference in September 2018 in Cardiff. In that occasion, three papers were discussed:

Visualizing spatiotemporal models with virtual reality: from fully immersive environments to applications in stereoscopic view’ (S. Castruccio, M. G. Genton and Y. Sun) – video;
‘Visualization in Bayesian workflow’ (J. Gabry, D. Simpson, A. Vehtari, M. Betancourt and A. Gelman) – video;
‘Graphics for uncertainty’ (A. W. Bowman) – video;

There is a picture which is considered the first representation of a statistical methodology. It’s the following picture proposed by Michael Florent van Langren (or Langrenus) who was a Dutch cartographer and astronomer (again an astronomer so close at the beginning of Statistics!) and served the Spanish Monarchy in the first half of the 17th century. In particular, van Langren’s father attended lessons and observations by Tycho Brahe.

400px-Grados_de_la_Longitud

This picture is taken from “La Verdadera Longitud Por Mar y Tierra Demonstrada y Dedicada A Su Majestad Catolica Felipe IV” by Miguel Florencio Van Langren (Cosmografo y Matematico de su Majestad en Flandes), 1644.

The picture shows the enormous variability among measurements of the distance between Rome and Toledo, made by eminent astronomers and cartographers. Starting from this, he stated to be able to define longitudes at sea in a better way than previous works. It is pretty interesting that in the paper cited above, van Langren not only presented a review of the literature about determining the longitude at sea but also listed the amount of money each researcher received to pursue his studies.

The method proposed by van Langren was completely based on the observation of peaks and craters of the Moon and, actually, he produced a very detailed map of it.

It is pretty interesting that again the recognition of variability of measurements did not end up in a theory of the errors (for which we have to wait almost a century).

The mean’s path

As I’ve shown in previous posts, in the past researchers were figures with mixed interests, like medicine, literature, astronomy (and astrology), etc. Statistics was born in this way, by proposing methods to count probabilities (in card games, as in Cardano) or diminishing measurement errors, given by instruments which were amazing for the time, but indeed imperfect.

One of these measures was the arithmetic mean. Today it is a concept so familiar to us that children know how to compute a mean. It is the first measure computed in all the reports, in most of the scientific papers, it is available everywhere. However, its use was not obvious for scientists until the 18th century. The most used measure was the “best guess”, the idea to use the measure considered the closest to the truth. It is like when we have discrete observations (or observations discretised by the used instrument) and the arithmetic mean proposes a number which cannot be actually read during a measurement. As when we compute the posterior distribution of the number of clusters in a mixture model and we say that, a posteriori, there are 2.74 clusters. It is easy to see that this could be considered nonsense.

While nowadays nobody has to explain to anybody else what a mean is (maybe it is more difficult to explain a weighted mean, accordingly to the students in my past classes), I think it is useful to think that what is clear for a statistician could not be so clear for a non-statistician: the use of the mean was not so obvious in the past, as the (correct) use of the p-value (or of the alternatives) is not so obvious nowadays.

The arithmetic mean was initially introduced (even if not formally) for measurements. For example, when Köbel used the “rod”, the unit of land measurement defined as 16 feet, he physically used 16 individuals, by lining them up toe to heel. He was “averaging” the foot length, in some way, he also registered the identity of the people involved, so that it was possible to consider personal variability. Nevertheless, he never introduced a formal arithmetic mean.

kobel01 000BORRARaKoebel-00567(The Figure is available at https://enroquedeciencia.blogspot.com/2016/04/pie-medieval-1.html)

Astronomy was the fields were the arithmetic mean was initially more used. The scientist who is normally considered the first one to have proposed the use of repeated measurements in astronomy was Tycho Brahe. Brahe was an excellent astronomer (a geo-heliocentric astronomer, though), whose definitions of celestial positions were the most accurate of his time.

Nevertheless, there was no consensus about how to merge these measurements (the mean, the median, mid-ranges, etc). One important consideration to do is that the repetition had to be considered equivalent in a strong sense: measurements should have been taken at the time of the day, by the same person, in the same place to be merged.

We have to wait for the middle of the 18th century to see the average as an accepted standard summary of data. As I said above, one point against the use of the arithmetic mean was the idea that the accepted value of a measure had to be an actual measurement. But there was another conceptual problem in the acceptance of the mean: the fact that the errors were supposed to be summed up and not averaged. This idea was so strong that apparently Euler initially failed to find a solution to the three-body problem because he assumed that error should be summed.

(The three-body problem considers the motion of a particle, attracted by the gravitational field of two other point masses, fixed in space; Euler came to an exact solution in his memoirs in 1760 and there are generalisations available in the following years).

The idea that errors do not sum up (in absence of biases) was discarded only in the beginning of the 19th century, with the work of Gauss, who introduced the normal distribution for his theory of errors.

A useful reading about the use of the mean and other statistical concepts is “The Seven Pillars of Statistical Wisdom” by Stephen Stigler.

changing p-values or changing journals?

There has been recently a new interest in the use of p-values and in alternative measures to answer statistical questions. This may come from a recent Nature Correspondence where more than 800 scientists (including some friends) all around the world signed an appeal to “retire statistical significance”. This is only the last attempts to analyse the concept of significance in more detail, to show that there is no such thing as standard choices (p=0.05!!) when making statistical conclusions. There was the suggestion to abandon statistical significance by McShane, Gal, Gelman, Robert and Tackett and the ASA’s statement by Wasserstein and Lazar, among others.

While all these contributions do not state that the p-values should be completely forgotten (in general), the message I’ve often heard from colleagues from different fields is that we should stop using p-values. My personal opinion on this is, as usual, in the middle. It is evident that, although scientists (non-statisticians) have become familiar with the concept of p-value (so familiar they seem sometimes obsessed), they have not become familiar with the flaws of p-values.

I would like to make an example from my personal experience.

I participate in a project to validate a microtitre plate, i.e. a biological plate where to test the pattern of resistance/sensitiveness of a bacterium to several drugs. You put some drug on the plate, you put the bacterium and see if the bacterium continues to grow or its growth is inhibited by the drug. At the point, you can understand the level of the drug needed to stop the bacterium’s growth (very very very very basically). The plate included different classes of drug, i.e. different levels of concentrations. It was necessary to see if the analysis made on that particular plate could be replicated in several laboratories. In the paper, you can find several p-values. There are several problems in the analysis:

  1. the first problem is the model: we sent to 7 laboratories 19 strains, which were duplicated several times and read by two readers, with three different methods, at four different days, on different plates. The basic idea was to define a linear mixed model which could take into account all these factors. There was a part of the group where I was working who was specialised in using linear mixed models with thousands of parameters for just tens of observations with no correction at all (many Ph.D. theses were based on this idea). And interpreting the deriving p-values as if they were meaningful, without acknowledging the fact that it was not possible to estimate the model proposed with standard methods. It is evident that the problem was not the presence of p-values in the analysis, but the use of them.

  1. the second problem relates to what was included in the paper. We decided to define replicability in an odd way, i.e. when the read was between + or – 1 class from the modal class. This is a standard definition of reproducibility in the case of phenotypic testing when the truth is known. Of course, saying that the read is within one class from the mode in the resistant group is very different to say that it is within one class from the mode in the area which separates resistant from sensitive cases. Moreover, we didn’t include in the paper an analysis showing that the level of the read depended on the temporal distance from the expiry date of the plate and that the lowest level of reproducibility appeared when we ask to repeat the analysis for blinded replicates (so the experimenters were not supposed to know which strain they were analysing). This flaw is related to the fact of showing only positive results in the analysis.

This is an example of using statistics in the wrong way (including reporting p-values) to support some conclusions.

I’ve proposed this example because I think it is also associated with a possible solution which does not involve stopping to use p-values. The first time we submit the paper, we did it in a medical journal whose editor-in-chief is a famous statistician. The paper was immediately rejected. We resubmitted the paper elsewhere where no statistician was involved in the reviewing process and it was accepted in the end (even if many flaws were identified).

This made me think: maybe we do not need to ask scientists to stop using p-values, we just need to ask journals to ask also statisticians to review the papers using statistics (as some journals already do). I think this should be common practice in every field where statistics is used to support conclusions and should also be a basic requirement for accepting or rejecting the paper.

Justice or chance?

Historians like Sambursky, Sheynin, Hacking, and Styan have looked to Aristotle when searching for the birth of probability ideas, relating it to the chance of an event; also all the subsequent Scholastic intellectuals relied on the idea that an event is probable if it is reasonable that it will happen. Cardano‘s use of probability is different, although it still relies on Aristotle: it is associated to the idea of justice more than the idea of chance. This is the reason why Cardano is interested in games.

One of the possible explanations on his association between chance and justice may come from a problem often related to Cicero, but which has had great success after him, also among Scholastic philosophers and was popular at the time when Cardano lived. This problem is known as “lifeboat problem”. Suppose there are several people in a lifeboat, but the lifeboat will sink if everybody will stay on it: how to choose the person who will be thrown overboard? In particular, Cicero supposed there were two wise men on a plank after a shipwreck, but they cannot both survive there. Both are wise and equal. Then the decision is left to pure chance. So the chance is seen as something fair.

The first discussion about games of chance in the Liber de Ludo Aleae is related to dice. Cardano focuses his interest on the six-sided die. Cardano explicitly links the idea of chance with the idea of the circuit: “the magnitude of the circuit is the length of time which shows forth all forms”, that means that after six throws of the die you should see all the six possible faces. This is a rather strict definition of chance. He still admitted that this definition did not hold empirically. He then spent three chapters of the book describing the circuit and the results and chances of the outputs, in particular for the game called Sors, where the points are the sum of the faces when throwing two or three dice. Cardano’s mathematical method is called “reasoning on the mean“:

  • The probability that any particular face shows in the throw of a single die is 1/6, since the length of the circuit is 6.
  • If the die is thrown three times, the expected number of times the face shows is 1/2.
  • Therefore, there is an equal chance for any particular face.

Cardano thinks that one-half of the total number of faces always represent equality so that the chances are equal that

  • a given point turns up in three throws
  • one of three points turns up in one throw

but Cardano’s reasoning is incorrect: the probability that an ace, two or three shows up in one throw is 1/2; however, the probability of obtaining at least an ace in three throws of a single die is 91/216-0.42.

This error is still due to Aristotle’s concept of justice, which Cardano interpreted as equiprobable outcomes. It’s like forcing the mathematics to produce justice, trying to create an agreement between the correct mathematics and an overall sense of justice.

Cardano himself noticed the error and later in the book produced the correct calculations

Table: Chances in the throw of three dice

Sum of Faces

Number of Chances

Sum of Faces

3

1

18

4

3

17

5

6

16

6

10

15

7

15

14

8

21

13

9

25

12

10

27

11

Cardano himself noticed the error and later in the book produced the correct calculations

Cardano also treated the card game called primero, which is a problem solved later by Pascal and Fermat leading to the basics of probability calculus. The object of the game, as in poker, is to attain the highest possible point, or to bluff (as in poker) the other players betting against you. There are only descriptions of the game, but not written rules, nevertheless we know it was a game played with a 40-card deck. The player who holds a “prime”, which means a card from each suit, is sure to win, which explains the name of the game.  When two players remain in the game with a card to draw, the player with the lowest number of points can ask for a “fare a salvare“, so that the pot could be split into two parts. One part of the pot is evenly split between the two players, the other is still played and taken by the winner. Cardano stated that the “fare a salvare” should be decided before the start of the game, since the underdog may find it sometimes advantageous and sometimes disadvantageous to invoke the “fare a salvare” and, using his mathematical criterion, shows that this rule favours the underdog.

In the Liber de Ludo Aleae, there is no calculation of the chances of each type of hand in primero. However, a few years after its publication, Cardano proposed a combinatorial technique based on an arithmetical triangle (the Cardano’s triangle) which could be used also for card games, although he didn’t actually use for it. It provides the number of combinations of n different things taken r at a time, i.e. he could generate the successive numbers of combinations of n objects taken 1,2,3,4,\dots at a time

^nC_r = \frac{n (n-1) (n-2)\dots (n-r+1)}{1 2 3\dots r}

Table: Cardano’s triangle

1

2

3

4

5

6

7

8

9

10

11

1

1

1

1

1

1

1

1

1

1

1

2

3

4

5

6

7

8

9

10

11

3

6

10

15

21

28

36

45

55

4

10

20

35

56

84

120

165

5

15

35

70

126

210

330

6

21

56

126

252

462

7

28

84

210

462

8

36

120

330

9

45

165

10

55

111

Gender equity: where are we?

The International Women’s Day on the 8th of March is inevitably a moment of reflection on gender equity and on the achievements made by women over time.

On this occasion, I would like to relive a very difficult time in my career and try to understand what I have learned.

In my career, I have found Academia a work environment quite open for women. Much has still to be done: women presence seems to not increase in recent years in fields like economics, there is a recent study showing gender bias in students evaluation of teachers and there seems to be a bias also in the funding. However, I have always had the impression that Academia is a bit more open to women than other work environments.

However, my work experience in the United Kingdom was in a Department characterized by fairly narrow views, both in terms of openness to new methods and in terms of openness to new people. One of the characteristics of the Department was the lack of presence of women in prestigious roles. I was involved in the analysis group of an international project; the analysis group was made up of three people, one for descriptive statistics (a man), one for machine learning (a woman) and one for statistics (I, a woman). The first thing that I noticed was a strong imbalance between the opportunity to present the results, the two women had at most a few minutes, while the man had the opportunity to present at every meeting for most of the time. Every two months there was a meeting with the entire consortium to present the results of the analysis group: the two statisticians had the opportunity to present their results for two minutes each (exactly, only two minutes!) and the man could talk for about 90 minutes. During the annual meeting, the two statisticians had the opportunity to present the results for a quarter of an hour each, while the man had one hour and his Master’s student (also a man and with no analytical background) another hour. We spent two hours looking at the counting of frequencies.

But the difficulties did not end here: I witnessed very embarrassing situations. Women who were told that they had to think about having children instead of concentrating on research; male Master students who joined the group for few days and, nonetheless, were asked to explain their analysis instead of their supervisors (women) in front of collaborators or funders; female project managers who were treated as secretaries, their tasks remained to write minutes during meetings, prepare slides for principal investigators or enter data manually.

None of my male colleagues have ever considered it necessary to point out that perhaps even women should have had the opportunity to present their work. Why should they? They had an advantage to be quiet. But when these male colleagues will be in charge, will they allow men and women to present their work equally? My colleague who was dedicated to descriptive statistics only had male students. Is it just by chance or is he more attracted by similars when he hires?

For one year, I insisted on the need to devote some attention to an analysis to define the concept of drug resistance in our experiment (the project had to do with the identification of genetic mutations that make tuberculosis antibiotic-resistant). Nobody seemed interested. When the group finally began to think it was important, the project was given to my descriptive colleague and I was excluded; my colleague decided to use techniques that were in vogue in the past (descriptive, non-inferential techniques) and that has been proven to be biased.

One might think that the level of research is not affected by these decisions. Unfortunately, I participated in a project, whose publication was highly criticized by the journals where we tried to publish as “unscientific” and the reviewers even asked to completely repeat the experiment (which lasted over a year). The article, one year after its publication, has only two citations.

I think situations are sometimes just unlucky for women: we can arrive in environments whose culture is men-oriented. The problem is the possibility to change things: in such a situation, I had no chance to change anything. Even if I asked to have the chance to present my results at least in one meeting, I was asked to postpone and postpone my presentation, when I finally presented the people I was working with didn’t come. I had no chance to formally complain because I would have presented my complaint to the people who were the problem.

In such a situation, I would like to thank the societies I am a member of, ISBA and the Royal Statistical Society and their members, who highly supported me, with their advice. However, I think we are still far away in determining methods within the departments to create a safe environment for people who feel discriminate in any way and want to change it.


The (Italian) birth of probability

In these days of obscurantism in Italy (where the no-vax movement is one of the strongest in the world and the term “professor” has openly become an offence thanks to the Italian politicians, many of whom do not hold a degree), it is sometimes difficult to remember that there was a time when Italy was the centre of the European culture, in particular during the 16th century and the so-called “humanism” during the Renaissance. Among Petrarca, Leonardo, Machiavelli, Michelangelo, there was also Girolamo Cardano, who was one of the fathers of probability and statistics; many countries would like to be considered the hometown of modern Statistics (while I like to look at knowledge as a path with no clear start and no clear end), maybe Italy can do it thanks to Cardano.

Girolamo Cardano (1501-1576) is the author of “Liber de Ludo Alae”, which can be seen as a treatment of probability calculus or a gambling manual, almost a century before Huygens, Pascal, and Fermat. In the book, you can find calculations relative to probabilities of the sums when throwing two or three dice and probabilities relative to some card games. Even if it is unclear the audience Cardano had in mind (the book remained unpublished for all his life), there are several suggestions on how to detect cheating (or how to cheat).

Like many other authors of his time, Cardano had a wide range of interests, working in medicine and astronomy (and astrology). It is interesting to remember that a great development in mathematical theory happened in those years, thanks to the commercial relationships among countries in the Mediterranean area, in particular with the Muslim who had more developed commercial practices and it is interesting to remember that European mathematical books of that time derived from Arabic books, as the book by Ahmad ibn Ibrahim Uqlidisi dated on the 10th century.

Cardano took over the classic university studies, called “quadrivium”, including arithmetic, geometry, astronomy and music (after which it was possible to go on to doctoral studies in law, theology or medicine) at the University of Pavia, which was one of the best for the teaching of Mathematics at that time (today as well), after Bologna. After that, he studied medicine at the University of Padua.

In the past, the opinion on the work of Cardano was highly influenced by the opinion of Gabriel Naude, who wrote in 1643 a preface of Cardano’s autobiography where he accused Cardano of superstition, but there has been a more recent re-evaluation of his work.

It has to say that Cardano’s himself did not see the “Liber de Ludo Alae” as a mathematical book, so it is difficult to judge it mathematically. He was a gambler for all his life and the book could more be associated with this interest than to the mathematical reasoning.

The Liber de Ludo Aleae was driven by a sense of justice (or equality), and this ethical sense is at the basis of all the probability calculations (and the following description of cheating methods). In the book, equality means equality of chances for all the players.

One can still see this goal of statistics in many applications: identifying as soon as possible when a medical treatment works and is not dangerous as soon as possible in a clinical trial, so that the largest number of patients can benefit from it, analysing the short and long term effects of pollution on environment and human health and trying to understand the specific causes, so that resources can be focused on that and specific policies can be developed, etc.

However, Cardano was also influenced by Aristotle’s Ethics; suppose there are two players, one wagering x and winning with probability p and one wagering y and winning with probability (1-p). The game is defined as “fair” if the expected gains for both players are the same, i.e.

yp - x(1-p) = x(1-p) -yp

that can be rewritten in terms of ration between bets and probabilities

\frac{y}{x} = \frac{(1-p)}{p}

which is also known are Cardano’s rule. So he relies on the definition of probability associated with justice more than the definition associated with chance.