Basic Stylometry Beta (early access)
- Peter Kirby
- Site Admin
- Posts: 6682
- Joined: Fri Oct 04, 2013 2:13 pm
- Location: Santa Clara
- Contact:
Re: Basic Stylometry Beta (early access)
Very good feedback. Thanks. Glad just to know anyone's actually interested in using it, other than me. Those all sound like good features. And 'automated word discovery' could itself lead to increased accuracy and/or decreased subjectivity. Thanks again for this valuable feedback.
Also yes if I loaded the Greek on the back end you could (a) save the wait on the upload and (b) use a 'get' request with all the data referenced in the URL. Meaning you could share results by URL.
Also yes if I loaded the Greek on the back end you could (a) save the wait on the upload and (b) use a 'get' request with all the data referenced in the URL. Meaning you could share results by URL.
"... almost every critical biblical position was earlier advanced by skeptics." - Raymond Brown
- Peter Kirby
- Site Admin
- Posts: 6682
- Joined: Fri Oct 04, 2013 2:13 pm
- Location: Santa Clara
- Contact:
Re: Basic Stylometry Beta (early access)
The program right now is very raw. It has about 600 lines of Perl, written over 3 to 4 days or so. Point is, I am by no means opposed to putting more work into it.
Of course the biggest problems are more 'theoretical' or scientific than technical.... Ie, finding what techniques offer increased accuracy, better detection of unreliable results, and/or or allow results using less data.
Of course the biggest problems are more 'theoretical' or scientific than technical.... Ie, finding what techniques offer increased accuracy, better detection of unreliable results, and/or or allow results using less data.
"... almost every critical biblical position was earlier advanced by skeptics." - Raymond Brown
Re: Basic Stylometry Beta (early access)
Hey this is a tip that I'd venture other users could find useful:
Selected (highlighted) text in a program like Note/Wordpad can be unproblematically dragged right into any of the boxes in Peter's program.
To quickly select a single, entire line of text in (e.g.) Wordpad, triple click anywhere within it. (OR a single click in the empty margin to the left of the line's start will work too). This is especially helpful when adding the 'word formulas' you want into the program. Download and open Peter's "greek.txt" from this thread, then simply triple-click and drag each common word formula you want to use right into the appropriate boxes in the program. This method works especially well because each word formula (i.e. all the grammatical permutations of single base word) is on it's own (non word-wrapped) line, so even though a huge word formulas may appear to occupy many lines of text, triple-clicking will still select the entire word formula without problem.
And, as all should have learned in kindergarten: [Ctrl]+[A] is the keyboard shortcut for Select All! So adding a document into the program (when correctly formatted as Peter specified in this thread's original post, like "justin.txt" from the collection provided earlier in this thread, for example) is as easy as opening the text file, pressing [Ctrl]+[A], and dragging the highlighted body into your box-of-choice in the program.
If you narrow your program web browser window to just the left or right half of your monitor, and open a your text files in a Note/Wordpad window occupying the other half, simply clicking and dragging selections right into the program's boxes makes the whole process much faster and easier than you might expect!
This was just the way that worked best in my own case, of course, so YMMV!
Jeff
P.S.: So, what's this "Mac" word I keep hearing mean, again?
Selected (highlighted) text in a program like Note/Wordpad can be unproblematically dragged right into any of the boxes in Peter's program.
To quickly select a single, entire line of text in (e.g.) Wordpad, triple click anywhere within it. (OR a single click in the empty margin to the left of the line's start will work too). This is especially helpful when adding the 'word formulas' you want into the program. Download and open Peter's "greek.txt" from this thread, then simply triple-click and drag each common word formula you want to use right into the appropriate boxes in the program. This method works especially well because each word formula (i.e. all the grammatical permutations of single base word) is on it's own (non word-wrapped) line, so even though a huge word formulas may appear to occupy many lines of text, triple-clicking will still select the entire word formula without problem.
And, as all should have learned in kindergarten: [Ctrl]+[A] is the keyboard shortcut for Select All! So adding a document into the program (when correctly formatted as Peter specified in this thread's original post, like "justin.txt" from the collection provided earlier in this thread, for example) is as easy as opening the text file, pressing [Ctrl]+[A], and dragging the highlighted body into your box-of-choice in the program.
If you narrow your program web browser window to just the left or right half of your monitor, and open a your text files in a Note/Wordpad window occupying the other half, simply clicking and dragging selections right into the program's boxes makes the whole process much faster and easier than you might expect!
This was just the way that worked best in my own case, of course, so YMMV!
Jeff
P.S.: So, what's this "Mac" word I keep hearing mean, again?

Last edited by Aleph One on Sun Jun 07, 2015 10:08 pm, edited 1 time in total.
- Peter Kirby
- Site Admin
- Posts: 6682
- Joined: Fri Oct 04, 2013 2:13 pm
- Location: Santa Clara
- Contact:
Re: Basic Stylometry Beta (early access)
An advanced text editor like EditPlus is recommended for the time being. It has useful features such as "join"/"split" lines and the ability to 'pre-process' text in order to remove quotations, for example, with regular expressions. (I can share the regular expressions I'm using to remove quotes.)
https://www.editplus.com/
(IMO the best feature is really that EditPlus has never stalled, hung, or stuttered no matter how many MB's of text I throw into it...)
I agree with the general idea that Jeff is talking about. What I ended up doing is putting each author (or each sample) on a single line in my text files. When 'word wrap' is turned off in EditPlus, it becomes very easy to scroll through them and copy them.
And I haven't tried them personally but there are various extensions with form "Memory":
https://chrome.google.com/webstore/deta ... d?hl=en-US
https://chrome.google.com/webstore/deta ... fgno?hl=en
Thanks for sharing what you've learned here, Jeff.
Please let me know if you think of anything else.
(If the program is popular, I will want to spin up a new server--my blog becomes inaccessible if the program is busy churning through text!)
https://www.editplus.com/
(IMO the best feature is really that EditPlus has never stalled, hung, or stuttered no matter how many MB's of text I throw into it...)
I agree with the general idea that Jeff is talking about. What I ended up doing is putting each author (or each sample) on a single line in my text files. When 'word wrap' is turned off in EditPlus, it becomes very easy to scroll through them and copy them.
And I haven't tried them personally but there are various extensions with form "Memory":
https://chrome.google.com/webstore/deta ... d?hl=en-US
https://chrome.google.com/webstore/deta ... fgno?hl=en
Thanks for sharing what you've learned here, Jeff.
Please let me know if you think of anything else.
(If the program is popular, I will want to spin up a new server--my blog becomes inaccessible if the program is busy churning through text!)
"... almost every critical biblical position was earlier advanced by skeptics." - Raymond Brown
Re: Basic Stylometry Beta (early access)
Hell yea heh!Peter Kirby wrote:I agree with the general idea that Jeff is talking about. What I ended up doing is putting each author (or each sample) on a single line in my text files.

Another potentially helpful tip I've found is that (at least in Chrome), you can right-click on the tab-header and use "Duplicate tab" in order to make an additional copy of whatever state you have program in right then (in other words, any text within the boxes IS duly duplicated, along with the program tab). And I don't think this causes any problems for the program's functioning (as far as I can tell).
- Peter Kirby
- Site Admin
- Posts: 6682
- Joined: Fri Oct 04, 2013 2:13 pm
- Location: Santa Clara
- Contact:
Re: Basic Stylometry Beta (early access)
Put everything on a single line first!Peter Kirby wrote:(I can share the regular expressions I'm using to remove quotes.)
This removes anything in fancy curly double quotes.
Code: Select all
“[^”]*”
Code: Select all
‘[^’]*’
Code: Select all
«[^»]*»
Code: Select all
»[^»]*»
Code: Select all
<[^>]*>
Code: Select all
\[[ A-Za-z0-9\\\/\(\)\*\|\.\:\;\,\–\=\+\'\{\}\"\<\>\“\”\«\»\?\!\&\†\#]*\]
This removes any stray non-Greek text that is occasionally found in the TLG (with lower case letters). Enable 'case sensitive' first!
Code: Select all
[A-Z]*[a-z][A-Za-z]*
Good one. I didn't know that.Another potentially helpful tip I've found is that (at least in Chrome), you can right-click on the tab-header and use "Duplicate tab" in order to make an additional copy of whatever state you have program in right then (in other words, any text within the boxes IS duly duplicated, along with the program tab). And I don't think this causes any problems for the program's functioning (as far as I can tell).
"... almost every critical biblical position was earlier advanced by skeptics." - Raymond Brown
- Ben C. Smith
- Posts: 8994
- Joined: Wed Apr 08, 2015 2:18 pm
- Location: USA
- Contact:
Re: Basic Stylometry Beta (early access)
Peter, do you happen to have more text files available with Greek samples, like those samples you provided up the thread of Josephus, Origen, Mark, and Justin?
ΤΙ ΕΣΤΙΝ ΑΛΗΘΕΙΑ
- Peter Kirby
- Site Admin
- Posts: 6682
- Joined: Fri Oct 04, 2013 2:13 pm
- Location: Santa Clara
- Contact:
Re: Basic Stylometry Beta (early access)
Yes. I will prepare a nice set of them, in a single text file formatted one to a line.Ben C. Smith wrote:Peter, do you happen to have more text files available with Greek samples, like those samples you provided up the thread of Josephus, Origen, Mark, and Justin?
"... almost every critical biblical position was earlier advanced by skeptics." - Raymond Brown
- Ben C. Smith
- Posts: 8994
- Joined: Wed Apr 08, 2015 2:18 pm
- Location: USA
- Contact:
Re: Basic Stylometry Beta (early access)
That would be most helpful! Thanks.Peter Kirby wrote:Yes. I will prepare a nice set of them, in a single text file formatted one to a line.Ben C. Smith wrote:Peter, do you happen to have more text files available with Greek samples, like those samples you provided up the thread of Josephus, Origen, Mark, and Justin?
Ben.
ΤΙ ΕΣΤΙΝ ΑΛΗΘΕΙΑ
- Peter Kirby
- Site Admin
- Posts: 6682
- Joined: Fri Oct 04, 2013 2:13 pm
- Location: Santa Clara
- Contact:
Re: Basic Stylometry Beta (early access)
Okay, here it is.Ben C. Smith wrote:That would be most helpful! Thanks.Peter Kirby wrote:Yes. I will prepare a nice set of them, in a single text file formatted one to a line.Ben C. Smith wrote:Peter, do you happen to have more text files available with Greek samples, like those samples you provided up the thread of Josephus, Origen, Mark, and Justin?
Ben.

- Attachments
-
- greekcompendium.txt
- (15.6 MiB) Downloaded 618 times
"... almost every critical biblical position was earlier advanced by skeptics." - Raymond Brown