A Stylometric Analysis of the Mar Saba Letter

Discussion about the New Testament, apocrypha, gnostics, church fathers, Christian origins, historical Jesus or otherwise, etc.
Post Reply
enricotuccinardi
Posts: 18
Joined: Wed Jun 29, 2016 3:57 am

A Stylometric Analysis of the Mar Saba Letter

Post by enricotuccinardi » Fri Jun 12, 2020 4:23 am

Abstract

Since the publication of Clement’s letter to Theodore, discovered by Morton Smith at Mar Saba, there has been a great deal of controversy surrounding its authenticity.
The main aim of the present paper is to weigh the linguistic evidence for and against Clementine authorship of the letter, also checking its alleged excessively Clementine nature in an objective manner, using a profile-based stylometric technique for authorship verification which has proven to be a valuable tool for text of relatively small size.
The outcomes of the analysis tend to attribute the disputed letter to Clement but they also show its hyper-Clementine quality. Is this due to a forger, deliberately trying to imitate Clement’s style or is it instead a feature characteristic of the epistolary style of Clement? Regrettably without further samples of Clement’s letters to be used as terms of comparison it seems not possible to safely answer this question.

https://brill.com/view/journals/vc/74/3 ... p265_2.xml

User avatar
Secret Alias
Posts: 12125
Joined: Sun Apr 19, 2015 8:47 am

Re: A Stylometric Analysis of the Mar Saba Letter

Post by Secret Alias » Fri Jun 12, 2020 5:02 am

Will read with interest! Thank you. Have you thought of posting to Academia.com or is there a public (free) copy available?
“Finally, from so little sleeping and so much reading, his brain dried up and he went completely out of his mind.”
― Miguel de Cervantes Saavedra, Don Quixote

User avatar
Secret Alias
Posts: 12125
Joined: Sun Apr 19, 2015 8:47 am

Re: A Stylometric Analysis of the Mar Saba Letter

Post by Secret Alias » Tue Jun 16, 2020 10:18 am

Question: the words are Clementine, the phraseology is Clementine. The letter as a whole might be hyper-Clementine because it's too perfect. That's the gist here, right?

Let's say I was one of those monkeys on a keyboard. If you compared my output on a particular day in June with a particular day in December or any two days from an infinite period of time, you are going to find a 'too perfect' match too. There are going to be days where there are more differences. But surely the purpose of stylometry is to assume that commonality exists between things written by the same person. The only reason we are studying this letter is because of one line in the letter:
But "naked man with naked man," and the other things about which you wrote, are not found.
I mean, by the same token - somewhere out there in the body of literature inherited to us from antiquity there are forgeries - ancient forgeries - whatever. We picked this text to examine because homosexuality or perceived homosexuality has deemed to be 'un-Christian.' That's the bottom-line. So when is it 'hyper' Clementine as opposed to just 'Clementine'? Is there an agreed upon 'line in the sand'? I am just wondering if you hesitate to call it Clementine simply because there's a controversy or someone happened to have coined the term 'hyper-Clementine' when looking at the same phenomenon you looked at. Would you have thought it strange if two 'acknowledged' passages from Plato showed the same number of reused words?

I guess I am asking is it 'hyper-Clementine' because someone before you said it was hyper-Clementine? I can remember coming home after doing LSD for the first time and my Dad was watching the Leaf game. And I was like - 'how do I act normal'? So I came in and was like 'hey Dad what's the score?' and the like. But I was out of my fucking mind. That was the closest I ever got to 'hyper-Stephanine'
“Finally, from so little sleeping and so much reading, his brain dried up and he went completely out of his mind.”
― Miguel de Cervantes Saavedra, Don Quixote

User avatar
DCHindley
Posts: 2815
Joined: Mon Oct 07, 2013 9:53 am
Location: Ohio, USA

Re: A Stylometric Analysis of the Mar Saba Letter

Post by DCHindley » Wed Jun 17, 2020 8:33 am

IIRC, there are also several treatises and letters preserved under the name of Plato, some of which are considered genuine and some are considered to be pseudepigraphic and exhibit traits of a later stage of Platonic theory. Of course, the true authors try to give the writing verisimilitude, and were largely successful as these same works continued to be preserved by ancients and to this day despite doubts.

Reading them as someone unfamiliar with Platonic philosophy they may seem "authentic." Modern specialists could tell, though, by the presence of anachronisms.

To a specialist, though, the anonymous writer being meticulous enough in making the letter or treatise look to like one of Plato's undisputed works/letters, the work might seem "hyper-Platonic."

Lunch is over, boss.

andrewcriddle
Posts: 1814
Joined: Sat Oct 05, 2013 12:36 am

Re: A Stylometric Analysis of the Mar Saba Letter

Post by andrewcriddle » Wed Jun 17, 2020 11:31 am

I've been reading the article.

What it does is compile a list of all 4 character sequences from the Mar Saba letter. It then compiles a list of 4 character sequences from passages from the protrepticus the paedagogus and the stromata. It uses the the list of sequences from these passages to construct a Clementine profile. This profile is then compared with other passages from Clement's acknowledged works, with non-Clementine works, and with the Mar Saba letter.

The results vary depending on whether one is comparing only the most frequent character sequences or whether one is comparing a much larger list. With a small list of the very frequent sequences the Mar Saba letter agrees with the profile more than do passages from Clement's acknowledged works. The Mar Saba letter appears hyper-Clementine. Using a very large list of sequences the Mar Saba letter differs substantially from the profile and appears prima-facie non-Clementine.

Either of these results in itself is probably compatible with authenticity but the combination is surprising. It may indicate that a non-Clementine writer has attempted to imitate Clement by over using Clement's favourite words and phrases.

(One thing the paper almost certainly demonstrates is that the Mar Sab letter is not by Origen - see another thread. passages from Origen are included in the non-Clementine works and behave very differently to the Mar Saba letter.)

Andrew Criddle

enricotuccinardi
Posts: 18
Joined: Wed Jun 29, 2016 3:57 am

Re: A Stylometric Analysis of the Mar Saba Letter

Post by enricotuccinardi » Wed Jun 17, 2020 11:53 am

I try to explain in which sense the fragment can be considered as ‘hyper’ clementine.

Neil Godfrey on Vridar has explained the basic concepts behind this stylometric method:

https://vridar.org/2016/02/17/fresh-dou ... hristians/

The measure of the similarity between the authorial profile and the discussed fragment is the SPI value which represents the number of common elements between the authorial profile and the fragment itself.

It is rather intuitive that the higher the SPI value the higher the possibility that that fragment belongs to the candidate author.
So when is it 'hyper' Clementine as opposed to just 'Clementine'? Is there an agreed upon 'line in the sand'?
Now look at the picture below:

Image

If you consider the model with L=200, the letter (LC fragment) has the highest value among all the considered fragments.

This is surprising, to say the least, because Clement’s profile was created from Stromateis, Paedagogus and Protrepticus consequently higher values should be expected from fragments coming from these works.

Nonetheless no fragment from Stromateis, Paedagogus and Protrepticus (among the ones not used to generate the authorial profile) is able to reach a SPI value as high as the one of LC.

We are not speaking of few fragments. LC has the highest SPI value among n° 194 fragments from Stromateis, n° 69 fragments from Paedagogus and n°19 fragments from Protrepticus.

Basically it means that wherever you fix the “line in the sand” LC will always be beyond this line. That's why I consider it as 'hyper' clementine.

The model with L=200 is not the best model to distinguish clementine from non-clementine fragments (as it is instead the model with L=1000) but in spite of that it is a very interesting model because is the one which may be more easily affected by an imitation attempt from an human being.

If this were the case an accentuated drop should be expected in the SPI values of LC as the length of the profile increases and that’s exactly what happens.

Image

In fact “It is rather intuitive that a forger, supposedly well acquainted with Clement’s works and deliberately trying to imitate his style, could have succeeded in this task reproducing, somewhat excessively, Clement’s most common stylistic features but barely could the forger have gone so far as to emulate even the nuances of his writing habits. That would explain why LC appears to be strongly Clementine when compared with relatively short profiles of Clement’s works, to become instead definitely an outlier when longer lengths of the profile are considered

It is interesting that if we applied mechanically the method we do not absolutely realize what noted above. The best model is L=1000 able to correctly recognize 94% of Clement’s fragments and 74% of non –clementine fragments. LC is attributed to Clement and its SPI value does not raise any suspicion (it is neither too close nor too far from the threshold).

The fact that LC is instead rejected in the second best model (L=8000 which recognizes 89% of Clement’s fragments and 77% of non –clementine fragments) is not really surprising because it is shown in the article that this model is very genre-sensitive.

Basically, if LC is forgery, the forger was very experienced and this does not bode well for an ancient fabrication. To make a comparison, in the case of Pliny’s letter concerning the Christians, we are dealing with a far more rough interpolator and the letter is rejected no matter the used model.

This “strange” behavior of LC may be also theoretically explained by the fact the Clement may have been more prone to drawing only from his basic vocabulary when writing private letters. We need a number of further Clement’s letters to test the likelihood of this possibility.

User avatar
Secret Alias
Posts: 12125
Joined: Sun Apr 19, 2015 8:47 am

Re: A Stylometric Analysis of the Mar Saba Letter

Post by Secret Alias » Wed Jun 17, 2020 4:52 pm

Let me ask you this. I have long thought that the Instructor for instance is a composite work. It's Clement plus something else. Some have argued in one book there is a Stoic treatise added to Clement. But it is not wholly Clementine. I have also thought that the references to the subordinate role of women in the Stromata is suspicious. Even the structure of the Stromata - a patchwork of things with no discernible order - is peculiar. Others have wondered about the Hypotyposes. Is this fully Clementine?

The same situation exists with Justin Martyr, Tertullian, Irenaeus and Clement of Rome. If an authentic work from anyone of these Church Fathers was being compared to a composite work - part Church Father, part forger - the authentic work would have 'hyper' matches for that particular Church Father no?

The accusation that Eusebius altered the works of Origen and like the other Alexandrian Fathers to make them seem less heretical, less 'Arian' - given that the Arian cited previous Alexandrian fathers to argue for their 'orthodoxy' is at work here. I just think that we are a little naive with respect to the notion that anyone faithfully preserved any Church Father given the ever tightening noose of what orthodoxy was in the fourth and fifth centuries.

Of course scholars work the other way - in Origen's treatise De principiis, 1.4.1, 1.6.3, 1.8.1, there are some references to the 'embodiment' of the spiritual beings. However, there is an ongoing debate among modern scholars regarding the authenticity of these passages i.e. Traité des Principes, ed. H. Crouzel and M. Simonetti, 2 vols, Sources chrétiennes 252, 253 (Paris: Cerf, 1978). But I'd argue the authentic Origen would be more heretical and other scholars that heresy is a sign of tampering. So when confronted with Photius's statement that Clement's Hypotyposes had heretical statements they inevitably postulate that his copy was forged https://books.google.com/books?id=zCes- ... on&f=false rather than the inverse that our copies of Clement were corrected.

Here Watson cites Ehrman saying that 'authentic Clement' i.e. Stromata, Instructor etc - contradicts the Letter to Theodore. https://books.google.com/books?id=MEhJA ... ry&f=false But the question is - how 'authentic' is 'authentic Clement.' Because clearly there are 'heretical' works of Clement (Hypotyposes, the Letter to Theodore) and then works were no obvious examples of heresy are attested.

So the question is - in a situation where the Patristic works themselves are all corrupt, if an authentic Origen or an authentic Clement survived via another channel wouldn't it necessarily be 'hyper' that Church Father?
“Finally, from so little sleeping and so much reading, his brain dried up and he went completely out of his mind.”
― Miguel de Cervantes Saavedra, Don Quixote

andrewcriddle
Posts: 1814
Joined: Sat Oct 05, 2013 12:36 am

Re: A Stylometric Analysis of the Mar Saba Letter

Post by andrewcriddle » Thu Jun 18, 2020 8:42 am

enricotuccinardi wrote:
Wed Jun 17, 2020 11:53 am


This “strange” behavior of LC may be also theoretically explained by the fact the Clement may have been more prone to drawing only from his basic vocabulary when writing private letters. We need a number of further Clement’s letters to test the likelihood of this possibility.
Even if this could explain the value with L=200, could it also plausibly explain the difference between the value for L=200 and the value for L=8000 ?
One would seem to be suggesting that Clement's letters a/ show differences of genre from his main works and b/ draw surprisingly heavily from the core vocabulary of his main works. Even if a/ and b/ are both plausible in themselves their combination seems less so.

Andrew Criddle

enricotuccinardi
Posts: 18
Joined: Wed Jun 29, 2016 3:57 am

Re: A Stylometric Analysis of the Mar Saba Letter

Post by enricotuccinardi » Sat Jun 20, 2020 1:14 pm

So the question is - in a situation where the Patristic works themselves are all corrupt, if an authentic Origen or an authentic Clement survived via another channel wouldn't it necessarily be 'hyper' that Church Father?
In this scenario - Letter of Clement as the only 100% clementine fragment - LC would appear of course as 'hyper' when compared with other clementine fragments. But this should happen in any of the considered models and not only when L=200 as is instead the case.
(One thing the paper almost certainly demonstrates is that the Mar Sab letter is not by Origen - see another thread. passages from Origen are included in the non-Clementine works and behave very differently to the Mar Saba letter.)
I have not created an Origenian profile consequently LC was not tested against this profile. Therefore we cannot say anything about possible textual similarities between LC and Origen coming from my analysis. We can say instead that fragments from Against Celsus are not “clementine”...
Even if this could explain the value with L=200, could it also plausibly explain the difference between the value for L=200 and the value for L=8000 ?
One would seem to be suggesting that Clement's letters a/ show differences of genre from his main works and b/ draw surprisingly heavily from the core vocabulary of his main works. Even if a/ and b/ are both plausible in themselves their combination seems less so.
I fully agree.

Post Reply