First of all I want to apologize with Andrew for using this topic to reply to questions raised in another one, but this seems to me a more appropriate place to explore the matter.
Chris Hansen wrote: ↑Thu Oct 13, 2022 9:34 am
I personally do not think that Pliny the Younger or Tacitus are interpolations, and I've elsewhere shared several reservations I have with Tuccinardi's work on the former, specifically that he does not actually demonstrate the letters were forgeries but only "parts" of the letters seemed inauthentically Plinian. Further, he did not identify which parts were inauthentic, so we have no way of knowing, and there are several caveats. Since that letter is the only one where Pliny discusses Christians, it would make perfect sense for him to have terms and phrases (which he got from interrogating Christians supposedly) that are not Plinian in origin. As a result, finding "non-Plinian" phraseology and language in the letter actually demonstrates nothing about its authenticity.
So, from the get go, I think Tuccinardi's entire methodology and results are essentially not pointing out anything meaningful. If authentic, we would expect non-Plinian elements in a letter discussing a hitherto undiscussed cult of Christians. And then Tuccinardi never actually points out what parts are supposedly inauthentic, so in total, he demonstrates nothing usable in the paper.
As far as I'm concerned, then, he hasn't demonstrated anything about its inauthenticity. He has just given us a few things interesting to think about, at most.
I had years ago an extensive correspondance with Larry Hurtado. I am copying it here below (in red Hurtado questions). I believe it is useful to better understand in plain terms the method used in my article. I must insist that what makes the Plinian Testimonium different from other Plinian subsections of the same size are its missing n-grams
therefore the fact that this is the only one where Pliny discusses Christians is totally irrelevant. I hope this will be clearer after reading the exchanges.
Furthermore in Book 10 we have one global topical pattern (questions about affairs in Ponto and Bitinia) common to all the letters of Book 10 (including Ep.96.10) and as many local topics as the numbers of the letters. In this sense they all are unique in the Pliny corpus.
I also wish to highlight that the outcomes of my analysis simply imply that Ep.96.10 should be excluded from Book 10. I suggest instead, as a minimum, the presence of interpolations that might justify this result.
I’m convinced that even the non-specialist reader, curious of examining in depth the matter, also exploiting the reference provided in the article, can understand the basic principles of the method and the results of the analysis. First of all you can read this article:
http://vridar.org/2016/02/17/fresh-doub ... more-64033
where the Neil made a good synthesis of the work without being an “expert” of the field.
I will try to explain in plain language, albeit simplifying a bit, how the method I used (but not invented of course) for authorship verification works. I presumed that after having read the synthesis on Vridar, the concept of n-gram is perfectly clear for you (i.e. I suppose that when I write n=4 or 5 or 6 you understand what I mean).
From the text of Book 10 the Plinian profile (i.e. Lk the text of known authorship) is created. Lk is the list of the most frequently found n-grams of Book 10, sorted in descending order of frequency. In fact the most frequent character n-grams can give information concerning the stylistic peculiarity of an author. Lk=500 means that only the first 500 most frequent character n-grams are considered in the analysis.
Then PT (Ep.96) was isolated from Book 10 and the text of the remainder of Pliny’s letters was divided up into fifteen sections about the same length as the PT. The 15 subsections of Pliny's letters, extracted from what we know to be reliable pieces of Plinian authorship, have been used to see what we must expect from reliable Plinian fragments having the same size of the disputed document, then comparing these results with the ones obtained from PT.
So, the profile of each Plinian subsection (PT and P1-15) is created. These are called Lu, i.e. the text of unknown authorship. Lu is the list of all the n-grams found in the corresponding Plinian subsection. Now it’s rather intuitive that the intersection (i.e. the common n-grams – CNG) between Lk (from Book 10) and each Lu of the Plinian subsections can give information about the authorship of the Plinian subsection. Of course this intersection (i.e. the number of the common n-grams between the two profiles) will be higher if the Plinian subsection has the same author as Book 10.
The parameter “measuring” this intersection is called SPI. The values of SPI for the 15 Plinian subsections and for each considered model are homogeneously distributed in a normal distribution. This is not at all surprising because the stylistic homogeneity of Book 10 (see fig.3) have since long been recognized even without stylometric tools. Conformance in genre, in register (all letters written to the Emperor Trajan), and in time of writing—all these highly contribute to its uniformity. The problem is PT. In all the considered models it has always the lowest value of SPI and in 4 models out of 6 it is clearly an outlier. So what makes PT different (just like letters from Cicero) from other Plinian subsections of the same size are its missing n-grams causing its low values of SPI.
If this is clear enough, I can respond to your further questions.
I hadn't noticed (and still can't see it) where you stated that you used a sequence of 5 characters as your "n-gram", for example.
I suppose that now where in my article it's written that “the results of the analysis are that the model having greater discriminatory power is Model 3” you understand what I mean (Model 3 has n=5, i.e. sequence of 5 characters are used, and Lk=500).
Here is part of the email from my colleague, Dr. David Mealand, which indicates a similar lack of adequate data:
"It does look as though, using sequences of 4, 5 or 6 characters (esp the last of these?), PT does seem significantly different from the rest of Pliny's letters in Book 10. I would be more confident if I could see just which sequences of characters were implicated, and where they are most prevalent in PT. This would list words or small sequences of part words which contribute to the significant difference. That might also indicate whether any genre difference might be an issue, or whether P writes to Trajan in ways which we would expect to differ from his letters to other recipients. But these are purely speculative responses which I could only check if I had the necessary data, and those seem not to be there on a quick read through."
Adequate data are strictly connected to the scope of the work that is, in my article, answering to this basic question: is the style of Ep.96.10 coherent with the one of the rest of Book 10? The results of my analysis clearly show that the answer is NO. Are the data supplied in the article adequate to support this answer? I hope that now, having understood the meaning of SPI, you understand that the answer to this question is YES. Attaching to the paper hundreds of pages with lists of n-grams, would have add nothing to my analyses and, of course, they would not have been published at all. Besides, on the base of the information given in my article, anyone can replicate my experiment and confirm its results. I have even indicated the PERL MODULE used to extract n-grams from book 10.
Dr. Mealand is not saying that the data provided are not adequate to conclude that Ep.96.10 has a stylistic behavior different from the rest of Book 10, he is trying instead to understand “which sequences of character were implicated and where they are most prevalent in PT” in order to understand the causes of this difference. This is a quite different question. Of course it would be great to have such a list. If that was possible one might then be able to see which words in which passages might be most responsible for the evident differences and so make some hypothesis on the spurious passages. It’s true that “it might also indicate whether any genre difference might be an issue, or whether P writes to Trajan in ways which we would expect to differ from his letters to other recipients” but as I said previously the stylistic homogeneity of Book 10 have since long been recognized and all the letters of this book are written to Trajan.
Unfortunately, this task (understand which sequences of character were implicated and where they are most prevalent in PT) is not feasible with this method in fact, as I have already explained, what makes PT different from other Plinian subsections of the same size are its missing n-grams. Other methods, working instead on the discriminant n-grams (like the one I cited in the conclusion of my article), might instead give some further useful information.
My final question is not at all baffling. If you can't indicate whether Letter 10 as a whole is a fabrication, or which parts are insertions, then it is not clear that you have done more than to say something such as: On the frozen lake there are some possible pockets of thin ice. I can't say where they are, however. So, what? Do we avoid going onto the lake altogether?
From the findings of my analysis Ep.96.10 should be excluded from Book 10, but I’m not an advocate of this extreme solution. Because I’m fully aware that large insertions in the letter might justify the anomaly. I fully agree with you that it could be better for an historian to know which parts are insertions but at this moment it's not possible. Nevertheless I think that an historian, knowing that this Pliny’s letter, so important for its implications, have been contaminated by large Christian interpolations, should be very cautious in using it as a primary source even if he doesn’t know exactly which parts of the letter are likely spurious (so, I can say that YES, in this condition I will avoid going onto the lake).
--Can I check that I understand the "mechanics" of your method? It's not clear to me how you select or identify your "n-grams" or even what they are. I take it that they are a set of characters, e.g., 4 or 5 or whatever, that appear in a fixed sequence in a given text. Yes? I.e., the phenomenon tracked is the recurrence of these particular characters in a given sequence?
--If so, what are the specific characters that (1) identify authentic Pliny letters, and (2) seem to be absent or altered in letter 10.96? This doesn't require a large list of occurrences, only the specific n-grams being tracked.
_IN PRAETORIIS LEONES IN CASTRIS LEPORES_
The list represents all the n-grams (with n=3) of the short sentence above sorted in descending order frequency. Now you can imagine a list of this kind:
1) for the Book 10 as whole
2) for each Plinian subsections
Now, for Book 10, we consider not all the n-grams but only the first 500 or the first 1000 n-grams, i.e. only the first 500 or 1000 most frequent n-grams. This is the Plinian fingerprint of Book 10.
Now let’s consider the case of Lk=500. We can ask: how many n-grams are common between the list of Book 10 and the list of each Plinian Subsections?
We obtain, let’s say:
For P1, 302 common n-grams (i.e. P1 shares 302 n-grams out of the 500 most frequent of Book 10)
For P2, 294 common n-grams (i.e. P2 shares 294 n-grams out of the 500 most frequent of Book 10)
For P3, 298 common n-grams (i.e. P3 shares 298 n-grams out of the 500 most frequent of Book 10) and so on.
Of course the 302 n-grams shared between P1 and Book 10 are not the same of the ones shared by P2 and Book 10 or by P3 and Book 10. So you can understand that specific characters identifying authentic Pliny letters don’t exist.
What is important for recognize the Plinian fingerprint is the total number of these common n-grams and not which they are. If PT has only, let’s say, 250 n-grams i.e. a value that is the lowest between the 15 Plinian subsections and comparable to the values obtained from Cicero’s fragments we are not able to recognize the Plinian fingerprint that is instead fully identifiable in the other 15 Plinian subsections. Now please look at Fig.1 of my article where the whole process is graphically described. I hope to have clarified your doubts.
--If I understand your response correctly, you say that your method hasn't been used for other bodies of letters of ancients. If so, I would think that this is a very strong desideratum. You may have demonstrated that letter 10.96 is on your method an "outlying" item, but, even assuming the validity of the data, the larger question is what to make of this. If, e.g., one finds that other collections of letters of verified authenticity include similar "outlying" members, then the implication would be that this doesn't tell us much about authenticity. Or have I missed something important?
First of all I wish to specify that this is not my method. The method I have used for authorship verification was proposed by Potha and Stamatatos (2014). It was tested on the PAN-2013 corpora, surpassing the performance of PAN-2013 winners and it works with texts of any type. N-grams analysis are language independent, they have been successfully used in problems of authorship attribution with texts written in Latin, Spanish, English, French or Greek. The bibliography on the matter is huge(only some references in my article). It’s rather intuitive, as Dr. Mealand has pointed out, that difference in genre or in the case of letters, different recipients can cause stylistic differences between two texts written by the same author but this is not the case of Book 10. Due to its stylistic homogeneity Book 10 represents instead an ideal testing ground and this stylistic homogeneity is fully confirmed by the SPI values of the 15 Plinian subsections. So then the question arise: why Ep.96.10 is so different?
I just want to add a couple of things regarding your first question. You may be ask why the method uses Lk=500 or 1000 and not, let's say, Lk=20 or 50 or 100. This is because the first 20 (or 50 or 100) most frequent n-grams are not able to catch the Plinian fingerprint (i.e. they are indifferently used by Pliny, Cicero, Seneca or...the supposed Christian forger). This drives to an important consequence. I can easily identify the common n-grams of Ep.96.10 (and consequently the non common n-grams) but what can they say about the altered parts of letter 96?
1) non common n-grams don't imply automatically altered parts. Coming back to the previous exemple P1 has 198 non common n-grams and it is surely a Plinian fragment.
2) common n-grams don't imply automatically genuine parts. As I explain before the most common n-grams are common to almost all latin writers.
Thank you yet again for taking the time to engage my questions (and ignorance of stylometric approaches). I've spent the last few days reading through several of the article mentioned as helpful in your own article.
I think I now have a better general sense of stylometric developments and issues. Could I try now to see if I understand what your own work comprises?
--You developed a stylistic "profile" of book 10 of Pliny's letters, by extracting Trajan's replies and also letter 96 about the Christians.
--You then compared the stylistic traits of each of the samples of Pliny's letters with this profile. You did the same with letter 96.
Yes you have grasped the logic of the method, but some further clarifications are needed. The dubiousness of the authenticity of PT is not taken as an a priori assumption in the analysis. I didn’t treat Ep.96 in a manner different from other Plinian fragments (P1-15). In fact I compared each Plinian fragment with the profile of Book 10 not including the corresponding Plinian fragment, i.e. each one of the 15 fragments was compared, in turn, with the remainder of the Pliny set.
--The results vary a bit, depending on the model (e.g., the size of the n-grams chosen), but letter 96 comes out a bit of an "out-rider" in comparison to the other samples, though more in some models than others (e.g., in model 2, p. 9 of your article, letter 96 isn't so distant from all the others).
Models vary modifying the size of the n-grams and the length of the profile of known authorship. Varying the parameters (i.e. the models) would help to identify that model more able to catch stylistic differences between different authors. For fragments having the same author these differences would be less relevant. Instead Ep.96 changes significantly its behavior varying the models (as fragments of Cicero or Seneca). Please look at fig. 4 (ex 6) and fig. 5.
Question: Did you compare each of the samples with each of the others? Am I correct that your comparisons were all a comparison of each sample with the general profile of the corpus of book 10?
Of course I didn’t compare each of the samples with each of the others, because this make little or no sense with this method. In fact single Plinian fragments are not able to preserve the Plinian fingerprint because they are too short.
--I do wonder if your rhetoric is a bit . . . tendentious. It is not (yet) clear that the sort of (still experimental) method you use takes us to "more solid ground" than the historical-critical and other methods (p. 3), or that it is more "objective".
“More solid ground” is not referred to the method I used but to stylistical analysis in general. It’s no surprise that vivid debates about the authenticity of this Pliny’letter almost ended after the stylistic analysis carried out by Mayer and Linck (and then exploited by Sherwin White). But these analysis are, of necessity, less objective than an analysis using stylometric tools. However I didn't want to reduce the whole analysis to a mere stylometric analysis, so I have tried to contextualize my investigation.
--Cf. your pp. 6-7, it is not clear to me that your results "suggest the presence of *interpolations* inside the text of PT." Your data indicate some stylistic "distance" from the profile of book 10, but the reasons for this you have not established, and your method doesn't point to any particular reason by itself. I think that you introduce your own hypothesis, which itself must first be tested.
The analysis I carried out concerns Authorship Verification using a stylometric tool. The answer of this tool is that Ep.96.10 should be excluded from Book 10. I suggest instead, as a minimum, the presence of interpolations that might justify this result.
The reason I asked about comparing each of the Pliny book 10 blocks of material with each of the others is precisely to determine how much variation there is in this collection of his letters. This would provide a context in which to assess where letter 96 stands in relation to the others.
By constructing a composite author-profile for Pliny, combining the book 10 letters, you wind up with a construct that isn't actually found in any of the letters (unless I misunderstand you). It's interesting that letter 96 (on your preferred model) sits as something of an "outrider" to the composite author-profile, to be sure. But it's not immediately clear to me what to make of this, without more of the sort of context that I mention.
By constructing a composite author-profile for Pliny, combining the book 10 letters, I wind up with a construct that includes all the book 10 letters (Ep.96.10 included), considering only those characteristics (the most frequent n-grams) useful to catch the Pliny’s profile in book 10. The logic behind the method proposed by Potha and Stamatos is completely clear. You are proposing something different that has nothing to do with a profile-based method but, let me be frank, I’m not able to catch the logic of these comparisons. You say “to determine how much variation there is in this collection of his letters”, but let’s say, you find that all Plinian fragments seem very different (as they can), what’s the problem if they all preserve enough of the Plinian fingerprint (as they do indeed)? I don’t know if it’s clear what I mean. The Plinian profile is the key of this approach that uses in fact a profile-base paradigm. You have read the article by Potha and Stamatatos and you know now what I mean (it is even in the title of my article).
I’m not saying that a new approach cannot be proposed but in this field, as Dr. Mealand can confirm, in order to validate the results the author should compare the new proposed approach with a state-of-art approach using some metrics as False Acceptance, False Rejection and Equal Error Rate.
And I'm still not fully clear why you contend that your data (assuming for the moment its accuracy) suggest simply interpolations. Surely, the data allow at least three suggestions (in no particular order): (1) forgery, (2) partial forgery (your interpolations theory), and (3) Pliny wrote this letter somewhat distinctively, and it simply departs from his usual "profile".
It should be a valid reason for 3), this is the base of whatever stylometric analysis. You say that “the contents of letter 96 are unique in the Pliny corpus, being the only letter dealing with the judicial handling of Christians” implying that this may be a valid reason for Pliny to depart from his usual “profile”. I’ve already replied.
But I wonder also what candidate-portions of letter 96 strike you as likely interpolations (and you surely have given this some thought, as it appears that you have devoted other efforts to trying to discredit some of the traditional sources referring to Jesus and early Christianity). What statements seem to you improbable for Pliny to have written? And why?
I’ll probably fail you. I have no idea what portions of letter 96 might be interpolations. I read all debates concerning the authenticity of Ep.96.10 and I found valid reasons on both sides of the quarrel. You speak about my “efforts to trying to discredit some of the traditional sources referring to Jesus and early Christianity”. I suppose you are referring to “Nazareth, l'Épigraphe de Césarée et la main de Dieu” (CER 252, 2011) and Vardaman’s archeological fraud. I hope that a day you will find the time to read this article, then you will decide if it’s time or not to put aside definitely this archeological artifact.