Literary Lab Pamphlet 11 - Stanford Literary Lab

group: Canon A

0.50

0.48

Median TTR of 1000-word slices for text

Canon/Archive. Large-scale Dynamics in the Literary Field

AB

0.46

Mark Algee-Hewitt Sarah Allison 0.44 Marissa Gemma

Ryan Heuser

Franco Moretti 0.42 Hannah Walser

Literary Lab

0.40

Pamphlet 11

January 2016

0.38

0.36

0.34

Pamphlets of the Stanford Literary Lab

1795

1800

1805

1810

1815

1820

1825

1830

1835

1840

1845 1850 1855 Date of publication

1860

1865

ISSN 2164-1757 (online version)

1870

1875

1880

1885

1890

1

I. Sociological Metrics

1. Dowry and vegetables

Of the novelties introduced by digitization in the study of literature, the size of the archive is probably the most dramatic: we used to work on a couple of hundred nineteenth-century novels, and now we can analyze thousands of them, tens of thousands, tomorrow hundreds of thousands. It’s a moment of euphoria, for quantitative literary history: like having a telescope that makes you see entirely new galaxies. And it’s a moment of truth: so, have the digital skies revealed anything that changes our knowledge of literature?

This is not a rhetorical question. In the famous 1958 essay in which he hailed “the advent of a quantitative history” that would “break with the traditional form of nineteenth-century history”, Fernand Braudel mentioned as its typical materials “demographic progressions, the movement of wages, the variations in interest rates [...] productivity [...] money supply and demand.”2 These were all quantifiable entities, clearly enough; but they were also completely new objects compared to the study of legislation, military campaigns, political cabinets, diplomacy, and so on. It was this double shift that changed the practice of history; not quantification alone. In our case, though, there is no shift in materials: we may end up studying 200,000 novels instead of 200; but, they’re all still novels. Where exactly is the novelty?

199,000 books that no one has ever studied – runs the typical answer – how could there not be novelties? It’s a whole new dimension of literary history.

1 This project has been supported by a grant from the Fondation Maison Sciences de l’Homme of Paris and the Mellon Foundation; the research was conducted in collaboration with a group working at the Sorbonne, in the Labex OBVIL.

2 Fernand Braudel, “History and the Social Sciences: The Longue Durée”, in On History, Chicago 1980, p. 29.

Let us illustrate the problem with one of the findings from our own research: the decline of the semantic field of “abstract values” – words like “modesty”, “respect”, “virtue” and so on – described by Ryan Heuser and Long Le-Khac in “A Quantitative Literary History of 2,958 Nineteenth-Century British Novels: The Semantic Cohort Method” (Figure 1.1). As that punctilious 2,958 makes clear, Heuser and Le-Khac saw the width of the archive as a crucial aspect of their research. Had they studied the old, narrower canon instead, would their results have changed? Figure 1.2 provides the answer: no. The canon

4 It might not. In a piece forthcoming in a special issue of MLQ on “Scale and Value”, James English has convincingly argued that a “a sample gathered on the principle that every individual work of new fiction must hold equal value in the analysis” – that is to say, a sample very similar to our “archive” – is actually not very “suitable for a sociology of literary production, where 'production' is understood to mean not merely (or even primarily) the production of certain kinds of texts by authors but the production of certain kinds of value by a social system, whose agents include readers, reviewers, editors and booksellers, professors and teachers, and all the many moving pieces of literature’s institutional apparatus.” The fact that, when the present pamphlet turned to the study of the archive, it ended up focusing almost exclusively on the “production of certain kinds of texts” seems clearly to corroborate English’s thesis. On the other hand, in so far as a “social system” creates “value” not only by assigning it to certain authors or texts, but also by denying it to others (“In matters of taste, more than anywhere else, all determination is negation; and tastes are perhaps first and foremost distastes”: Bourdieu, Distinction), readers and the rest of “literature’s institutional apparatus” are present in our narrative – but always and only with a destructive role.

1

Canon/Archive. Large-scale Dynamics in the Literary Field1

11

Hannah Walser

r_

ho

Au t

Franco Moretti

Aggregate word frequency 1.0

0.5

group: Canon Archive

0.015

3 André Leroi-Gourham, Gesture and Speech, 1965, Cambridge 1993, p. 148. 0.010

0.005

0.000

U C M nk ov Rain L no en U U nk n w tH HGicvhillienn no n_ S U rya_y _u2 arSe Do R no kn UnUk nk 1w K _ Yo w drc xo 27 co w ow nnko n_ Un no 1ooim t3hrie ssoo_t4t _d2d un n_ n nw V 12 tt_ n__2 _2 dK_be 0 kn _ w o g_ _ 1 n o F 10 Er ow 4 24 w_n7 lta U Un_ im6 r_ a8u 128 0 16 U sk nk26 b 13 UH _1 ire nk n_ nk q U in R 7 _1no Lu no F er_ 15 Da nk nuakews n e_ G ou k_e ss wU rie 1 no Cle ow vy uil w n 9 25 Le ow9swU C ss D Gu n_ an n_k n s_ dfo r n w nkA la ow n 5 _ 2nod ea la n_ gym_3 n o 45 G 1 r r _ _4rt nuo k ib nd 3 M n d_ u_ 62 th e_ b _ Pe an G or nin ing 7 w4n9_ 4h_ w 32 30 n W o 4 Uold e g_ S _ 5 e 4 n _5 L U 1 _r_53 U7n s_U 0 Hnin U annksn ll_5 4V3he56 nk 29 U 4 on Ke g k 3 n nk dem R 3 5 o n nk o n t gu 5 U r o it lta id Un ow lly B o 9kn a on uf evUn noSw R Un nowrw_4hnn_kn _5 an irean kn nU_ U Uwn ow mch _3 fhe ey k illekn wnift_ 7_3F nk n _4 n_ Ci− 6 a 1 cr _3_4 ow n52 UnCh U no no R n_6 432olewt d of no kn 8 5 _3oBwa _658Sk nc_h 4 6 n_ kn kUn ia nk Un D w ic 0 oLVuaanm _28UD t_ w ow 8 rnc_la 1 inGnra lds n_ ha U0 e 31ow nokwri_7no kn u 92 dtyr_y n_ n nu _U7v nC _9 84Urdn nk U63r_9 n_ nn 8 w o ff_ U S6y8c_ _5_796 knclo e7sk ra 5 U sko no nk U6 70 _9 Sm nk 99 o_w8 n_ wn81 a1r0 ows W _n1odo 29 3 nknno_w Uwn nonkn no 2 1 _ W _ n ro6 alk oll U w R0 c n_80 _7 03 98a n 9 k_ w o w n_ et nk Lo er lke n_ BOu W e2env_k8_1 Uonwn1_1 n6o4wDenv_ wPn 3 U66 t_ no 9 UU U _1 r nk 4 rxnoay e_6 00 kPAn_ 04R n o88a_9 S ed 79Vo n r_ U nknk 87 w U Ty 01 nk nk 75 no nyie_6 89 B nraout76 ic _6ns lm7 ea lta kn an e nono p Pig n_ h no no o_ w M ire ow o_ _n 5 ur wthG_no7 h_ 7 irUeer_ lly ww n_ ou w n_n_ no 90U w _r1e 1 G 15 n Br_1 n_ S8ms5is 10 1 _ CR n_ n_ 13 enU y_ U ey_ Paod7as4s 41 H n_k8n332 164 1416 tti_ n U D G Le ntfo ic 08 13 5 uicb 5 13 nk 1 li rd_1 e o J 16 w e_ 7 L th_1U5n kn BnLk ay 9 lisn 98 dw r 1 ie n o A y J w u h 2 7 k e B 3 Tn 2 u _135F nd Rn h o Gr Lo 61 owecnmo _1 6 1 nn ndeU 16kn H ic t_1 ru 1 lo e e_ ns hnna C arrdes _1n4o er rnkG 3 ow ohwom ng _ h n_kfoowinn U 43 A S3F8 _ S r 5 U w U S 1 L s B − 8 4 n s r U if o u r 1 s _ n 1ke o n U _s _n m ia so w29 n ofitdhum oRPn a nk ue Co 1Y rFdre_1n MleS Tmlo uriot 57 119 U n_1 Sta dy n_ n ic _1Ranh___e1ll Ub nRa_akdlm U 1nle3om Au ke_1 no yw_eC _15 UnnkR1o6n7 1k4n2oIithW vil ns o4u5 _e_1d12ek2n Ne'Dxaa eyartiah 4 n n n _ n _ U e 1 h k 1 n _L 1 1 A_ 40dc115225yen_r ec6oliew kn 5 hn 1 _112 leC e ng P701rA7orw woinntd chienR LR Skan n1r_5s_a1 1 knnoad_S1 w th _1a tt_ _ luickion ton _e e_se1e aich dole 18ll2 F owPwc a5yU wcnh_H1ba7odit05fBie n_ 3Kn 19u1t 3 U lif 64 k1lann s0tiffzfene ow okpn W1r 0 or 4 1 lo 4 a 3 1 t lc w r 7 7 fe _ n o _ d li _ m e il U W _ o ig a _1s_t 1 _ld6_1 R11_v7e yaP Hrn 161 r nYn_kfofeGHenr kP 1a2y_r Raldig_ ho4 n R_o Un3wd_ _2a1u n 1 3s6t 10 nk illo R27USnCmk30CrM inU_e_2 28 ia _1inAuW _lcn2Ba ld1 _o1fg8oe2ht r_ T kn 2b k n1 326x0 _2 _w2n4 ht es 25 ch6o_2 1_752 4 a520_ _r1dM 9 o 2 _ u 2 2 8 _ g n c t _ 9 s 2 6 _ no de o 7 o n 0 F n in 4 b it 4 8 lm 9 4 _ _ d _ r 2 5 o d 1 7 _ t_ 2r 8t wr2orw9ekR 2852in271 7hoP w 6s o 2I6n _2 usU onKuohulla_2 1 C81r6 c15 e9B5oWshnna110 42 17 Ta 1 1 A_ w 26 0 nwu Pek_rnlkile_B U _26 1tsg5oig _2 T 6 7 13 senk lirw_nt 2netHhniae5C4u1 esRMoliffe M_ne5hL1nin Uylo eftwneorb M S2u8Ts69o084 6Bmuararsn_ on wn2c2h 31 2 26 nB_ 6 03ay nkr W 2 Lr nU Le 9 llU_no nge7e_7_CY2ys_ lle paigUdUsSeLhe_ icrPn2e_dko4e1_inwr2915age nk 2 Bo_tP7n1_w9_812_is2_2e_18r_in kseon21 _2 _2 ba lo _ Sp Bo n k 0_gn825 S_ no a tao7988_077265s14ooseGeare_y2lo_n2k y2L _ U 8 a2k0 s_7 61 21 ld W e_ J__o22U e8r Un ErU_ B EdEd nCo2w3ilsieos MFSKe Y2n1kw4nL_ar_t31is9ohM5tua282n43KKnno_kt1 UncnnylikkUfrPn_oil22wn7o8 il_ky2IPinCrW _ t_0o72 e2 w AMurk rlis4e8 29 5n_ro_d_20r 6n0o H yd _2 e 25 Leig e x 8eag2la _2 kn dn1g8 o g g han8 ns__ Borxcohnd ouno 2 ho899e6rreltg95e nz9e n _nfe1nookk0in n536n62k7 th8En40 ld ay C L9 n t e ou 1 2 2w 54_ w _1 79 st_ 7 or h_ 45 ow ken6wyse ewew r_lt2 2276 _rwe2illeall nDgwn48 m _ok1_en2n_do7wbu kno7_ww42nnno1sg__118r1artoicikitn3nfodkd_g622ae2dn _2ohoorrr_1n_s2 5334in2207 n_ es_n0o s_ 82 27 _2 2 ry w 23 28 n_ oowrn _3 orothrt o0n9_ 23Jo n_7_21r__2M2__u2__22 _1992226a43llnL_ey__2 wn9__3w2t1non933De ___3r2d2e_20w3oy__139 27 e__23941_0222 19 242n_ _3 3 8 0 _ t 3 1 3 2 2 _ 8 h 6 4 W 6 2 _ 3 3 6 29 h__3 5S _3_ 29 sh 1A898350340 5 0 1 t2tle035 _3 2833 3 67380B1rt 297 337 0 29 21 G9 9t7o NH0e 311B3 03 hitis2hAs_ O od 7 3210 ar 1394 5 ua 4us 0 4 He 6 2 f a 3 D n_ uri_ en ie net 3 pie _3 te 1 h1 wMo lm O ac in r _3 30 nte29 ne ldo__n38 _3 30 n_ E e rm g _ D M r 18 P e_ 33 d _3 33aon Ba An 9 r_9 tt_P 3y22m8 39 P B sb all er M gPe 23 A 30 o_r Sh yf M 33 ou S ory 2 6 3 3 y a L o o e o 3 n o 1 _ W at s_ ea e4_ e ie ny G 8 Sa rt5e gr E on 30 he teroBr nru A4s PaAwuroter il oauls in r H s_3 M 31 db m re_ in d W030lleyld_ _ y L n _ 1 r a ll _ t B h 3 s W da_ 3 o r e _ g lb t 1 huE_ _3 2o9f 6 ea at7 _3 32 e_ 3 2y3rt61onew us _B3 a ew A ke teh_3A39 u mo 0 teuSyr _c An illi 37 so To9 4 n o7 is rAdn3g7e 69 4lan ino3 36 _3 2 am 3 oSn S ter_ O 0A5rSrect orPtaBuur st r_3 _3375ste us_ n r1 _t83t_9 n__39 _3 H d_ _lg32w 4Th 51 Acnyo Sc S 3 b u ot_t hr_ neen 6B8 93En_ e4 o 3L _3 5337 L3a68 65ofla _P3e4cren 35 om m omtnto_ PocotStco A 47 sersRtASe t4_03 k3e8r y__A3 urn dg39 F 74ew r3north 1 46 5k_s_ 1 8 b_ nd is n_ as v yn 27 _ SM4 8 e e 5 e yu3 e t_t t u 37360 _3 S ms8_4acc_3oS39tt_3Sstce S HAauhnlese_ytll4 9 335t_a0tu30ste y_ Brwoo rrie _3 _339 3 _3 0 54 S 506 59 npot_e_3e0n3y0 6 3u6r n 36 wHrth r_ An coot Su3s5o5ctc92_o93S76ont_t_co oM An 48 49 in a _ F 6 t 3 k t n 2 _ 3 _ t c 7 t G n 3 a G 5 e t _ on _c3oHL3 _4a_8o3t 3875t_ _tuy 2a37388 rLeK _3 37 Go__o33 8 on n M M ho aredSn 5 5 re le6S96u4 7 ym 9t0to_4Sc4 0lt1_8t_G 7 38G3rm4inolt_6M ym 2 S a 'H C 1 4 7 a M _ _ _ oo pe CK e 4lliv r ouSSc oS4ck6oh_ H4o6Sc3olt5_tM'H8aHlt2Hu4s64W5 oenyShn4_heeA5Alln ou o n r re _4 y_4 s_cooDt lin3ta4tg_r4t g7gc t4_54 e e_o4o_84a4 srSyta4ed0n1ery_woyn Lenesntn_429a_n44 s R _ 16 _4 _ J5k 1ll h_ 0_4_y 4313 ad 4 ttis_ s4_4b54 o_t4t 564nr lm M Wo wisaend5t y3 482 36 L 85 _64 y_Boe7h__n44aceScne4tll6ineo404o05m 4 cli M 6 W Ba ag S 1Mt8_4r4a6D y S y in tto a e ffe in c 4e9liis C_42 S54 4Luy 4Hs38 _ ar0ye d66_4uilClis_s _4ca1_e4_APo6r W4 3 4M_4_434 Lr2g 34n1 te n_ S B ck s _4 FRSr Ho nimP n_ otat_xw7_r4aero1w1 Phe9r S5hr6dtteo0oto7onk 432gil_l_ 6il 3'H aM 2o3 r_ t_ eaz en t_42 41 51 aehse gAg F_e4a 4S2 44Hello 62li_ e icker eMll tnt_ e__4 'HC n 4 z G4 1s3 2e0410 ildl_ 7 Jayenrrw_ r_n4 raM4c5o c7o M 6 _o4k D4is2_43en_LG4y4 eoyoAT4S35 4245 r0e4on_ nr G y_4eu40nrym 409 22 ley ie_ 0 M o o5 o19 s oock tt_ itfo 2_4 r4 7K _4ott1 _rnheom9 2 G y 4 y_or 98sr h o _4 4 M e ar 5 NmKeld a or N_4 8N4 5 e_ y6_a uSs 88 15 ar eeas_o0sBd5_a_ldM_4BHrB_er5re_a_5351 rd 8 9G2o eSli unig8hdo0wSnc_ o_3nm tin H 3 m 5it r nled5 5y501or9a Dr0by5_20 3G _4 dw_4llivL t_ ino5_t3 8y1shm9oP_na4rTh e_oD5ro83 ort 9 5S14 E53P_1_ich45e1ll eao t le Sc yat n 4 t T A o ic 2 7 C D 8 d _ o D o a4ll3 08ygaimekrh449 G re 1is inro89ad 50 4_585 ou_950oomB 1ow7n L n he llis 9ke80ey ot A _5 D icnouk n_t5onD _846L _4n__Heta_nm 5 or _5 raT _ll4o ny__4 2 708 s_1e6_ poylen_e5ytt _47 lle _4 rin_5 5 t_Tain 34 ick e yT2 _ic e 4 H 4751so _ s0_3on 9 y_ 74 g_29 h2 5k T 51ysw Deic nm P9isy1ttoGV7J2oin5185roCb5soe2_ie54r e_W5L27ehli_ac 8pT2re_4771 53 5 1 n_47 50_4 52 2loror A nJksaes_oH5auoc2ek0n roll h 7_ il 4 o 6 a nBoG 507 7 70 _M5th p _m 6 _r4ceonesn okrJc_tak_45847 LS0aeP9linaanmGk9ae4r ll5o40 1 M n 2soe s_ o L HnoW a 9_rH_t_ tm93 3 tnor o r ay p 6 ar Dic 4il1ls_5pFe 5seD2_sL8y6_Ht5or0any53peH_yHtt i_ _3 rrle ic6_ t e 1__ 7 DeL5Loe o5hk8citoe95n_e855454Goone32_sHe dndoeoxns_e_FS5DudisinA_en6e_C5 tin k rea_55r _5_ 48llera ro08 9a H H eaBen 539a4sey_ 1k5t2e9o3nr_b5e65I0n5gBisreu9bve9eevnrTbeH_e5 _0e5h7_5r65b5e6L9yC er e 55989y r e_lin5y 5rle n 8 r r r e r _r66beL 58m 477thto _H5m t be rb H ule_n5s_5N HTBhr or_559J1 sB_5JAa4nrt_T6DholdLdaoerrt_nli_ore5_T6ll6rt6o_b9A6Br2Boa5d3t_m6bu5le77t45oErlltyis t t _ n 2 o 6 o ie e 3 k rt e 5C r6 m issar bn _96o1L01p0e4n17 rno_5 07rn n_ 6_0nG s_5e1rou n_ 9 Serbe 7Jin9HCs 8ewemrbnrnteTt a5omzFiso1Gno3Ttnehys2aa3cW H _61rt_6 eCrB6eevr_toR_n tne_85 e 5 5P96_or5e C_5694bes_ 5 Fr kaeHaeydt_os50r7G Scn rtameaDorp Caer o_hnWKesnhaes_Gm er 8 2 8 _ r r t 4 c 5y_ ra7 rt 5 2 o o67ym e6_2 A _5 D2 8nPM A_T not 6a_ in_se k6M5roke Bralia_w5or_7 8oaneorr5srle be 9 oiotJra_6 eicble 1oH 640 no 50 o sSinH1oak5nrs6 ik0_M_6620 ude dSn__tb25te65in 0 seknt5Jroeohta8nam__w66e2c3h50kit8eeg5s_r75H_35e1Clla55u_un4rrsta_eyyll6_kn_A256e1_N rt_ uoesB 2 ny M rm keebh2 5 a 2 D t__m _6 1_0 1r h leB4 4o6raC65_ 7n70r50_o7u8nn3oinGno6_c8_na3sla 5_7nH7_s6Tyc_5Yehm ic 559He6Jsa11 _W 6 2y396 076 n 686_H26nr__be5elf Po m ail e nrnbeye__6 66dd 16 DK5a0k64o4saen_r1g67a3ye_ayer6_orb79nCllin ke 6a_G Yo03 hoCicyTeSkmoH65cGua26kye6 7d2_ e0terra8soik__le6822_55 3y2m_6os5irke6e6_8r4645ar7ts_nEsoous laSrrt_6ey_r6t63374 en ns nm6nrT4arR ngJ d _8 _6 e_ t ll6b t a w _Y7Co eD9nysR_lltaotehtfereettoale nlecsT_esayerr_b_sr9a6k77y10Y G86R399__7710_6C686467 B6u1s__7e75r0t 7_26lt75onnb _68Pr_eS6t5t 7He3750 79 _6e Ba Hu em T2nhTaATich__666pyte_enm r6Hk_6allo_ee7_26o r eFa 71 ra80 Praon6 0M_07 3 _y6_ H7onh4oe5n m5 3 lla ghD 8s3_ M CroC2llgoroelcLnll_yrEkoPae7c88180roR__9ld6eo6o0L7_8s8rll6p0a_69Lor58tapll22__7772ngElioHgM d ik a a ere _ rdte1 a03 7647 o sop_p6 ph 8 llin n a6 ill_ nt eicsk 64 an ollorpEnoa7tntliopfontykwsEeTS9bs3e92_in3tv4y6o_31vee_11651e_ota_llO n_r_wCe6a_6 6 oe_7 th 3 in o R n 1 0 _ S y r m 7 h n r 6 r w n 6 _ b D e e R _ 7 M e T 6 P c n d 6 e e a 6 6 gs y d_ 5 _608 w ob ne icWB7r n2 roEinW C sd_aT9t_e6_li7oyao s6_2 u9E 57 e9o_it4 em 6 _6_6 64 1 ll ligoo_oSoec_Gae77dnll11is_he6oar46r_9u4e2t6__sRT4ct78Htk_roe_0n660_6_77y6r6e 80O26n4r47kh_7r1_63eyl_th 48 s_ Kin inMa _A7rkoeaSdas3_ SL CC 66 t_6kd_o_BtaClaEe40_li7_c6C8kd65R_itre9o62o3oll85ra0767 O6_Lyli a79n2 7ie s_nNThB TH1mLonds dAu7H1 teeSClaoaopola 9143 2 gs 8ose _ SonAna6 mpBlarllein 4 8usoscllo8819eorha5b1dlm Oy3_93 li7p7ttph P2o_17 74s_ 5 M 7n9TeinraarcYaLrD ado lp hoakrAn_e_Jns888_217W _o5in leC 894k kintD1_s8a85Sllain epelip8h4 h6oanan ns61 75 aEnd6oKgLlllekodeendllic_F_ot_rB7itoL778C 8M Le GKenknim 83oG9r5_uenm y_a Ceo148 il82k_eA7llm R n_ t C n__LM 4Sty_s3_8s_a4o_8un7O z8erne3ie 7 r8nrays2in wAin_eo_7p8rago8kenp1eAG Fe n_t6eme_a_ees_7a_o7n_r751oky3inm 8rr4 C R Tnrin 7MieE8oo28nr7eie48tcvreel_ 73 2g3a60lin ea t_79_8 on 1 ra ng4g__rllindm o3ll ha u ollangTro__gdBn27esT0la St rg ik 9_Cry2W __6EFu_rsite87ly7580sOtr_K7o_oros4u7n8_n9ssP1_k9re7yn07_7_2wL81eb_oa8u_itli8on3HetW y_6_m 7eCh7pNtra8th_oe58nr 8_9h70pLa7tyh_a7 de767 20 by_ _7 78oAcyo3lllf87olloR1ils4a_otznr51in8agt2_C _8 ttGss orpAo8Hslly4le re 7 a 7 us A n 8 o g 7 ld ll 0 9 0 e t O G 9 v _ r _ n 7 D 8 la t D 3 n 1 0 y 6 _ 5 8 t 7 s 7 u 7 m Y y 5 _ 2 n ig 7 o k _ o 3 M o Binop40desa_r_e79ar_ald75e3r0Ha_hL987li490Hpn is856P2err7y_Yn71aGlm eM eu0Co4t7icck_oin3m tto 2 tu6et0h9Sys8n _o7nt4_ 77 44 lex 39 1H3 eW BCla_thp8o4ehMk8uo2es1reola a t a7t8e3bdseentolp M ac n_ CM an h u riltPboBoll_oa7M n_ _t3c18h4_aa72r9Fa4aa_y7_d76t_eoL7es___72_ot0__7n97 3_5870M 0 oll usswnBdBB_5r__o_cred787k7L8_e4208y07ld Cac Do 83 eusgThGeerndrm 77 1_8sros7uar_t_6h_1395t08to_66e_7cAn7k__in7sk774_rh5r8n7ist5e_rline_C_4bh36C83so1r_ySe1t9tt_7o798535736_4937693 665aDinLin Ed de tu7_ce5Da7chcak2r_in arD n 2 Le Lrnreereosifr_TFm awvt le_a78d8_2912n_392_B5s7w8a47rt 8 98r3C 9 r r8n roon aldW An _ a_yyll_fitroohog38rgorh14oeytnDoaGto_tHen9r8a2_9rea2n_95llF6tyH_d9H ge r_8 4o_85n79T2 8 9ra07M ie 86 oa4trh n2a1d6_ U8 aorsn_ Allin_ald _o on8T5rT8__98o9D8pB6llsro_9r__tB8ny8PM ldraa3tnld w 84 or0dny3_93a8err816je 7lp8699o_37ro 52Md0odr1_at 9t 283 96hleuesrrisc Phli_6u9n4g2 rsu in_ 9 r 9s 4_ 9o3Sd y 7orllo48J4ae07_aGdp8e_7_r7M o _ a d ll s ll 8 _ h e _ G h t L g8_538 1 S 7 rth 91__988A9rG la wS7e94 6_p8e mo ollpo9ovhie9Mr3daP7oo_5aae86890dm r2iooL09oB9Da3nyn1ouiln__7_989861ophee oren_d_e8r5Eslio h89_51886e5y_y9tt9n2 _8 77ecPd38rD 844 _ or w5 9ig Hu Hep W b y n o V n B 8 2 1 n 3 li M _ s 3 f O le 8 8 it 9 e o 2 5 a F _ 9 P 3 G lm n 2 e r_clak t_fyieM50 8rliis h73 9at_e 5 Mn__N 82 8 St the 9BhtY_aos_WaTS_yeI9n_set_BS9tr_s_nr884_ narn_O 85 a8n7C9NCHytoOoa89nlit8nld H_ll9_ PuSr8W rndg9rilo_t P4M ep 9o9b4nh1ao4dw hliC9hkpeW C eah9_om0ldHra_9or_o B6epC8hrh_o _7951s8e9r_4e1 u 886o8b g a t o 7 B 2 e o 2 a A _ a 8 2 ar a5 yBelft3ollpoaeab4Nc_eCM8rdt7otr_atu6eo0_pp_419hhrOC r o9_ 8uk8lm Je 2rd954G RgkBid he s95Besaa8rpn5_ 2 9d0Litin ir_ 40le_ le l_WOe84a8gha4tliis Sc ldre W re_SqwSoruBtr_6ilahC0lsonh9r_ef_1fe y_8 ra eeYnr__d0on9R_ar_so9Sa9rn5o8dthp_r3lo ffe St ilitll2ettT9ie ns to in82 _61e9_a11rSar nlett8t_8o845 5h_to9 869 91 u y0_9inli_6p3n1C1nt7pl_ n_C1r_B0hero9o1is hr dL eM r3dn0kir_t41fliie rieev on 4 2 p_ 8 n_ n5 9 s1a nHtCP_S1eaunB01llgid g70_ah__a8a1_s1lb opikid0yo__3ll1s99in9e0_0190G 1A0n10s16W G6t6en0_105ot6T108_ty05500e6tet3_n10n_69 Me_19 s_en Ha SeinSim _Ju9ac5yyFMStYe0arm r7 Sil1rSaty0o3_rnhBae61d_tae1_nM _9 10 _e 0_6808hf3iea29rn9ld ldBo06Penn0rh_le 3rr 6_t 1d 2 r7 taeYPikwt94oco_7kDd_Bo01Os9n5ll0lik_a5ng057o_2o9a89D2wasyr_a61aya19r3n1_nd97s01s283_018G0r6ais1nM 10so gg te erHGe_ B4__9uo9nllHny_3ailH9FLrtlal_ 84 15 Bu y0_t__1o1n6020stta_Bele1_v951il4970le080r_ G0er5_1B69 Leed7C vo5e_lenny91eBaa0h9e_2ps9ll1_6on7Wy9d7le ie 78n_ ar vHe_a1rgCif 1a0 G9ro5na4r8g_e18t_ao_0H e 1 it 8 9 _ 3 9 6 2 1 r a h 4 B le 0 9_hn 191H560_K2 in9 atr_21 5_510 6y1War aWngns0rS_v_Br6ain 15Dc0km y_ hla_rra Fa t8hw0G 10 dF_SS Rants0g7fitinh4CBSFFemt__0t9yhd0W aD_u0d106o1089Hme36n Hg7_0r1y2101 6 MW_0a11tre n0Dtuy5_cnro1l_ gaag_8−m 9308_7ey3r29_rad8os91eRstyooe1__1Ftid9enig 10 p1p0rd 0 rje F s n 6 5 oaitltHrtbje _1W 84De1nh0maBaicRthoyDnixa2rsde1__ala r 4 c is 9 1 r d 8 _ a M 1 0 1 v 0 8 _ 3 2 0 o u e 6 n h 1 _ 3 _ g d1a1_01eB4_nG_0sHe9un0eTusr_d_c2o30aCeb aals0 5181 M Frad05e_1 e_ BrepartabaHr_ie _91re0czNekgurieE1oar05G4ll0rild90ol_ on itz 4u654m HBM 1h1ll0n470r_srla_errd e600S77 er ll__5n_0 57 8r7to10 170y6C oM GoByMenle7W _Haet9c_rn_rh8o_C9is_1C9e8b_nR90n0e9100e70n90Tr929a4s_yD8doo9t9in at1Csm _o5a9a_rr_inle n53se2oW d 4 _ s _9 ge o _ 1 0 g e 0 9 1 1 lg 9 H 3 n r t _ M _ L 7 9 F _ o 7 8 rKo SLar g6rCy0g_0_H1e1_liB27M1ilW Nr G inll99Sas_h7 _7ort1G 4sp41hLk6o 1 1M pe ed 907 83 15le8ld8_eAh__yllilG _91Ma 97 rald 64et0oSraMaed48Hrhw3aM1oro0arkA6aisr09re2dld 0oa3 n ith 789 6 c1o7ee2o1gtw1tyhs0_1_0is0dun1G0ocr_1C_hn1_o11W CHil esvs_cyerPr1aM osisin1vg4atid_aoar1−let2KC_4SG 0o4rKnf01ora3fo 197e_ad19n16aese950s__a_21ylle5n_7__c910H id66_rsy0_s50rin 0 1 B 0 y 8 6 c 3 r t 1 M le n 9 d m r 3 n r e ip ic t 1 in e g F r C 9 o y H s 0rae_0r1s18e0W c 1h1135ldr0tin e _1 _1 5 k Nnb3ru01bokdcsya8rS_m M1rli ill_hCitta9g_1ll_5d−1ed1le_rW ain 1oig _r t_arsisDoisrixS_e1in__m en Cro ol_PScW 1_11gn8Bt2__t_1y11nrd51g14in s 3 4K06C1a3d_S14_a171t11eiro1111ig90D All en 09 ky1oefcfoo il1Dhrip1eh__sHG tno17o1a1ugSn6hSogldnt9B_ar_rC_o_earg1n1cis1nrS_a1ay1oo0_111dan0stt1odG _e 11141__7r1aW t1n1_9id 1 il 4g1y s12ll3in i_ 8 04u_5180 w_ r_ 1 ty en zie 5 _1drmt918_7_77no11o__n1n1it151h718156_46105d31968ad6388_61l_11__11_21110r11do1_sv1d2e1_40n5n5en7_h1a2 o1r0t 115 81 0edt _r11orin _1 cSkte t_15Ctllets3r_ro_e_d1__ o0s1y8_s13a1w019r_uS710nd1rld 1171 1 2rodn 216 82 76s1_11s1 1 m2 h9_6 4 _1 _1 otkt_ 111le11111 le9_0e147t_ylle127_1_e_e411v111e1u_1143_1gh91931it91841y30h1is6_3B11s1_3256_in 1 5 P 5 _ 1 1 1 9_ 1 1 h 3 73019 _ 7 2 3 1 1009o1n212_1 a 11 H 48 e1 3r13id 19 11 ag r 1 4829g760 117_11l_1161n578s1410t26o_19611a1r49g00160 1101 3 7 92_19 14ter 45 ga _1519 9e_ 761183705149onn64__17r2y2_8_311 9380 18 0 _1 5 1 111 119 98 11 rd 09 2 155 19 _1 92 0 26 1 15 8

Ryan Heuser

nk

Marissa Gemma

U

Sarah Allison

“We know more about people exchanging goods for reasons of prestige than about the kinds of exchanges that go on every day”, wrote André LeroiGourhan in Gesture and Speech, a few years after Braudel; “more about the circulation of dowry money than about the selling of vegetables...”3 Dowry and vegetables: perfect antithesis. Both are important, but for opposite reasons: dowry, because it happens once in a lifetime; vegetables, because we eat them every day. And at first sight, it seems like the perfect parallel for the 200 and the 200,000 novels. But as soon as we start looking deeper into the matter, complications arise. Take two historical novels published in the same year of 1814: Walter Scott’s Waverley, and James Brewer’s Sir Ferdinand of England. Intuitively, one would associate Waverley with the prestige of the dowry, and Sir Ferdinand with the humble role of chicory. In fact, though, Scott’s novel was both a great formal breakthrough, and the book everybody was reading all over Europe: dowry and vegetables, rolled into one. But if that is the case, what difference can all the Sir Ferdinands of the digital archive make? We used to know nothing about them, and now we know something. Good. Does this knowledge also make a difference?4

Abstract Values

Mark Algee-Hewitt 2.0 ●

●

1.5 ●

●

●

1750

● ●

● ●

●

● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ●●● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ●● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ●●●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ●● ●● ● ● ●●● ●● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ●● ● ● ● ● ● ● ●● ● ●● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ●● ●● ● ●● ●● ● ● ● ● ● ● ●● ● ● ● ● ●● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ●●●●●● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ●● ●● ● ●● ● ● ●● ● ●● ● ●●● ● ● ●● ● ● ● ● ●● ●● ●●● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ●●●● ●●● ● ●● ● ● ● ● ● ● ● ●● ●●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ●● ●● ●● ●●● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ●●●● ● ● ●● ●● ● ●● ● ●● ●● ● ● ●● ● ● ● ● ●●●●● ●● ● ● ●●●●●● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ●● ●● ● ● ●●● ●● ● ●● ● ● ● ● ● ●● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ●● ●● ●●● ● ● ●● ● ● ● ● ●●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ●● ●● ● ● ●● ● ●● ● ● ●● ●● ● ●● ● ● ● ● ●● ● ● ●● ● ● ● ● ●● ●●● ● ●● ● ● ● ●●● ●● ●● ●● ●● ●●● ●● ● ● ● ● ●● ●● ● ● ●●●● ●● ● ● ● ● ● ● ●● ● ● ●● ● ● ●● ●● ●● ● ● ●● ●● ● ●● ● ● ● ●● ● ● ●●● ● ●● ●● ●● ●● ● ● ● ●●●●●●● ● ● ● ● ● ●● ● ●●●● ●● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ●●● ●●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ●● ● ●●● ●●●● ● ● ● ●● ● ● ●● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ●● ● ●●● ●●● ● ● ● ●● ● ●●● ● ● ●● ● ● ●● ● ● ● ●● ● ●● ● ●●●● ● ●● ● ● ● ● ● ●● ● ●● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●●● ● ●● ● ● ●●● ●● ●●● ●●● ● ● ● ● ●● ●● ● ● ● ●● ● ●● ●● ● ● ● ●●●●● ●● ● ●● ● ● ●● ●● ●●● ●● ●● ●● ●● ●●● ● ●●● ● ●● ● ● ● ● ● ●●●●● ● ● ●● ●● ●● ● ●● ●● ●● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●●●● ● ● ● ● ● ●● ●● ● ●● ●●●●●● ● ● ●● ● ●● ● ● ●● ●● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ●● ●● ●● ●●● ●●● ● ● ● ● ● ● ● ● ● ●●● ● ●● ●● ● ●● ● ● ● ● ● ● ● ●●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ●● ● ● ● ● ● ●● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ●●● ●●● ●●● ●● ● ●●●● ●● ●●● ●● ● ● ● ● ● ●●● ● ●●● ● ● ●●●● ● ● ●● ●● ● ● ● ● ● ●●● ● ● ● ● ●● ●● ●●●● ●● ● ● ● ● ●● ● ●● ●● ● ● ●● ●●●● ● ●● ● ● ●● ● ● ● ●● ● ●●●● ● ● ● ●●●● ● ● ● ● ●● ● ●● ● ● ●● ●● ●● ●● ● ● ●● ● ●● ● ●● ● ●● ●●● ● ● ●● ● ●●●● ●●●● ●●● ● ●● ● ●● ●●● ●● ● ● ●●●● ● ● ●● ●● ●●● ●● ● ● ●●● ● ● ●●●●●●● ● ● ● ● ●●● ● ● ●● ●● ● ● ●●● ●●● ●●● ● ●●● ● ●● ● ● ●●●●● ● ●● ● ●● ● ● ● ● ●● ● ●●●●●●●●● ● ● ● ●● ● ● ● ●● ● ●●●● ●●●● ● ● ●●● ● ●● ● ● ●●●● ●● ●●● ● ● ●●● ● ●● ●● ●●●● ●●● ● ● ● ● ● ●●● ● ● ● ●●● ● ● ● ●● ● ●● ● ● ●●●● ●●●● ●●● ● ●● ● ●● ●●●●●●●●●●●● ● ● ●● ●● ●●●● ●● ●● ● ● ● ● ● ●● ●●●●●● ● ● ● ● ●●●●●● ● ● ● ● ●●●● ●●●● ●●●● ● ● ● ● ●● ● ● ● ●● ● ●● ● ●●●● ● ● ●●●● ●●●●● ● ●●● ● ● ●●● ●●● ●●● ● ●● ● ● ● ● ● ●● ●● ●● ● ● ● ● ● ● ● ●●●● ●●●●● ● ●● ● ●●● ●●●●●●●● ●●●●● ●●●●● ●● ●●● ●●●● ●●● ● ●● ● ●●●●● ●●●●●● ●●●● ● ● ● ● ● ●● ● ●●● ●● ● ●● ●● ● ● ●●● ●●● ●● ●●● ● ● ● ●●● ● ●● ●●● ● ●● ● ●●●● ●●●● ● ●● ●●● ● ● ● ● ●●●● ● ●● ● ●● ●●●● ●●● ●● ●● ●●●●● ●●●● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ●● ●● ● ● ● ● ● ●● ● ● ●●● ● ●●● ● ● ● ● ● ● ●● ● ● ● ●●● ● ●● ●● ● ● ●● ● ● ● ●●●● ● ● ●● ●●●●●● ●●●● ●● ● ● ●● ●●●●● ●●●●● ●● ● ● ● ●●●● ●● ●●● ●● ● ● ● ●●●●●● ● ●●●● ● ●●● ●●● ● ● ●● ● ● ● ●● ● ●● ● ●●●●●●●●●● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●●● ● ●● ● ● ●●●●●●● ● ●●● ● ● ● ● ● ●● ● ● ●● ● ●●●● ● ●● ●●●●●● ● ● ● ● ● ●●●●●●● ●●●● ●●●● ●● ● ●●●● ●●●●●● ● ●●● ●● ● ● ●● ●● ●●● ●●● ● ● ● ●●●●● ●●●● ● ● ●● ● ●●●● ● ●●●● ●● ●●●● ●●●●● ●● ●● ● ● ● ● ●● ● ● ●●●●● ●●● ●●●●●●● ●● ● ●● ●● ● ● ● ● ●●● ● ● ●●●●●●● ●●● ●●●● ●● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ●●● ●●●●●● ●●● ●●● ● ● ●● ● ● ●● ●●●●●● ● ●●● ● ●●● ● ●● ● ●●●●● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ●●● ●●●●●● ● ●●● ● ● ●●●● ● ● ● ●●●●●●●●●● ●●● ● ● ●●● ● ● ● ●● ●●●●● ● ●● ● ● ●●● ●● ●● ● ●●● ● ● ● ●●● ● ● ● ●●● ●●●●● ● ● ● ●●● ● ●● ● ●● ●● ● ●●● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ●● ●● ● ●● ●●●●● ●●●● ● ● ●● ● ● ● ● ●●● ● ● ●● ●● ● ●● ●● ● ●● ●● ● ●● ●● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●●● ● ●●● ●●●● ●●●●● ●● ●● ● ● ● ● ● ● ● ●● ● ● ●● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

●

●

●

● ●

●

● ● ● ●

●

●

1800

● ● ●

●

● ●

●

●

●

● ● ● ●

● ● ●

●

● ● ●●● ● ●

●

● ●

● ● ●

●

●

●

● ●●

0.0 ●

1820 1840 year

1800

1860

year

1880

Figure 1.1 Abstract values in British novels, 1785-1900

Ryan Heuser and Long Le-Khac, “A Quantitative Literary History of 2,958 NineteenthCentury British Novels: The Semantic Cohort Method”, Literary Lab Pamphlet 4, 2012, p. 18.

1850 1900

Figure 1.2 Abstract values, canon, and archive in British novels, 1750-1900

In this figure, the canon consists of the 250 novels originally included in the Chadwyck-Healey Nineteenth-Century Fiction Collection. We explain the choice of Chadwyck-Healey in section 3 below. 1

precedes the archive by about 15-20 years; but the historical trajectory is the same.

Stevens’ historical novels before Scott, and 35% of Perazzini’s gothic bibliography.6

This does not mean that the new archive contains no new information; it means, however, that we must still learn to ask the right type of questions. But before doing so, something needs to be clarified. Canon and archive: what do we mean, by these two words?

Clearly, these were slippery statistical grounds. Compared to the handful of texts usually considered canonical, our 190 gothic novels were a very large number, and it was tempting to identify them with the archive tout court; but were they truly representative of the “population” of the English gothic as a whole? Almost certainly not; simplifying somewhat, a sample is representative when it has been randomly chosen from a given population; but our 190 novels had definitely not been chosen that way. Ultimately, they all came from a few great libraries – and libraries don’t buy books in order to have representative samples; they want books they consider worth preserving. Good books; good, according to principles that are likely to be similar to those that lead to the formation of canons. Though our corpus was twenty times larger than the traditional canon, then, it was perfectly possible that its principle of selection would make it resemble the canon much more than the archive as a whole. That was the problem.7

2. Bias in the Archive Let’s begin with three preliminary notions: the published, the archive, and the corpus. The first is simple: it’s the totality of the books that have been published (the plays that have been acted, the poems that have been recited, and so on). This literature that has become “public” is the fundamental horizon of all quantitative work (though of course its borders are fuzzy, and may be expanded to include books written but kept in a drawer, or rejected by publishers, etc.) The archive is for its part that portion of published literature that has been preserved – in libraries and elsewhere – and that is now being increasingly digitized. The corpus, finally, is that portion of the archive that is selected, for one reason or another, in order to pursue a specific research project. The corpus is thus smaller than the archive, which is smaller than the published: like three Russian dolls, fitting neatly into one another. But with digital technology, the relationship between the three layers has changed: the corpus of a project can now easily be (almost) as large as the archive, while the archive is itself becoming – at least for modern times – (almost) as large as all of published literature. When we use the term “archive”, what we have in mind is precisely this potential convergence of the three layers into one; into that “total history of literature”, to borrow an expression from the Annales, that used to be a mirage, and may soon be reality. This, in theory. In practice, things are not so simple. Take the present project. Its initial corpus consisted of about 4,000 English novels from 1750 to 1880; for the eighteenth century, they came from ECCO; for the nineteenth, from the Chadwyck-Healey Nineteenth-Century Fiction corpus and the Internet Archive of the University of Illinois.5 By the old standards of literary history, 4,000 novels were a very large corpus; but its actual coverage turned out to be quite uneven. For the period 1770-1830, for instance, we had about one third of the titles listed in the Raven-Garside-Schöwerling bibliography; for the later nineteenth century, however, the percentage was much lower, around 10%. The same for specific genres: we held 96% of Adburgham’s silver-fork bibliography, but only 77% of Gallagher’s industrial novels, 53% of 5 See https://archive.org/details/19thcennov. ECCO (Eighteenth Century Collections Online) is a two-part digital collection of 18th century materials, based on the English Short Title Catalogue (ESTC), and sourced from a number of libraries in the US and UK; part II of ECCO is an update, consisting of texts or editions that were not available when the original ECCO was released.

We wanted our results to be reliable, hence we generated a random sample of the field to be studied: 507 novels tout court for the period 1750-1836, 82 gothic novels, and 85 historical novels before Scott.8 All in all, 674 novels. In the digital age, this wouldn’t take long. We generated the sample at the end of the school year, in June 2014. Then we turned to our own database, where we found 35 of the 82 gothic novels, 35 of the 85 historical novels, and 145 of the 507 novels from the Raven-Garside bibliographies. In early July, we passed the list of the titles we had not found – roughly 460 – to Glen Worthey and Rebecca Wingfield, at the Stanford Libraries, who promptly disentangled it into a few major bundles. Around 300 texts were held (in more or less equal parts) by the Hathi trust and by Gale (through NCCO and ECCO II).9 Another 30 were in collected works, in alternate edi6 Alison Adburgham, Silver Fork Society, London 1983; Catherine Gallagher, The Industrial Reformation of English Fiction, Chicago 1985; Anne H. Stevens, British Historical Fiction Before Scott, London 2010; Federica Perazzini, Il Gotico @ Distanza, Roma 2013. 7 To complicate matters further, different genres have different canon-to-archive ratios: whereas epistolary and silver-fork novels have relatively large archives and small canons, the opposite is true of the industrial novel and the Bildungsroman, both of which attracted many major Victorian writers; while the two super-genres of gothic and historical novels lie somewhere in between the two extremes. On this – and much else – we need a lot more empirical evidence. 8 This last group was not a random sample: since Anne Stevens’ bibliography included only 85 pre-Scott historical novels, we decided to look for all of them. 9 HathiTrust is a partnership of major research libraries, which serves as a repository for digital collections; these include volumes scanned as part of the Google project and the Internet Archive, as well as other smaller local projects. Gale's NCCO (Nineteenth Century Collections Online) is a digital collection of 19th century materials, usually sourced from major collections, and ranging across disciplines (literature,

tions, concealed by slightly different titles, in microfiche or microfilm collections, etc.; about 100 existed only in print, and of 10 novels there were no known extant copies. In August, requests were sent to Hathi and Gale – with both of which Stanford has a long-standing financial agreement – for their 300 volumes. Of the 100 novels existing only in print, about half were held by the British Library, in London, which a few months earlier had kindly offered the Literary Lab a collection of 65,000 digitized volumes from its collections; unfortunately, none of the books we were looking for was there. The special collections at UCLA and Harvard, which held about 50 of the books, sent us a series of estimates that ranged (depending, quite reasonably, on the conditions of the original, and on photographic requirements which could be very labor-intensive) from $1,000 to $20,000 per novel; finally, six novels were part of larger collections held by Proquest, and would have cost us – despite Proquest’s very generous 50% discount – $147,000, or $25,000 per title.10 Remember: this was a search involving many excellent librarians in London, Cambridge, Los Angeles, and of course at Stanford; a half dozen researchers at the Literary Lab; plus people at Hathi, Gale, and so on. The books we were looking for were only two centuries old; they had had print runs of at least 750-1,000 copies, and in a part of the world which, at the time, already possessed efficient libraries. The Literary Lab has some money for research (though, make no mistake, not that kind of money). In other words, one could hardly hope for better resources. And yet it took about six months to receive from Hathi and Gale the set of texts that should have allowed us to move from the initial 30%, to around 70-80% of the random sample:11 a figure which science and technology, photography, etc.) Thus far, there are twelve parts to NCCO, one of which consists of the Corvey novel collection; unlike ECCO, NCCO is not based on a standard bibliography in the field, so it's hard to predict what is being added. Gale is a large conglomerate of information and education services – run as a for-profit business – that sells content and services to libraries; it publishes both print works (reference and fiction) and electronic collections (ECCO, NCCO, and others). Its parent company is Cengage Learning, which defines itself as “a leading educational content, technology, and services company for the higher education and K-12, professional and library markets worldwide”. 10 To these figures one should add what the Stanford libraries have paid for ECCO, ECCOII, and NCCO to begin with: with the usual generous discounts, something like one million dollars for the three collections. ProQuest is another for-profit education service whose products include the Historical Newspapers series, Literature Online, Dissertation Abstracts, and others. Its parent company is Cambridge Information Group. 11 “Should have allowed”, because receiving a text from these collections is not the same as being able to work on it. Much of the data from Chadwyck-Healey and ECCO I used to be delivered on tape, in formats requiring drives that are both hard to find and difficult to use; more “convenient” data deliveries (such as network data transfer, or on external hard drive) have their own problems, ranging from the vagaries of mail systems to bizarre firewall incompatibilities and odd documentary requirements of usage agreements. (Most of Stanford libraries’ licensing agreements, for instance, used to be quite vague on the subject of text-mining, or sharing outside the Library preservation structures; over the past five years libraries have explicitly insisted on 2

would probably make many of our findings questionable, as the missing 2030% would be, almost by definition, furthest from all conceivable forms of canonization. Clearly, the idea that digitization has made everything available and cheap – let alone “free” – is a myth. As we became slowly aware of this fact, we decided to start working with a selection from the corpus we had: a database of 1,117 works, 263 from Chadwyck-Healey, and 854 from various archival sources. Initial results took us quickly in one direction; new findings added further momentum; and, by the time the (near-)random sample was (almost-) available, we were too involved in the work to re-start from zero. We don’t present this as an ideal model of research, and are aware that our results are weaker as a consequence of our decision. But collective work, especially when conducted in a sort of “interstitial” institutional space – as ours still is – has its own temporality: waiting months and months before research can begin would kill any project. Maybe in the future we will send out a scout, a year in advance, in search of the sample. Or maybe we will keep working with what we have, acknowledging the limits and flaws of our data. Dirty hands are better than empty. 3. From the Canon to the Literary Field If the selection of our archive was determined by historical library practices (which novels were on the shelves? which were easy to digitize?), that of our canon was a matter of critical judgment – though not our own. The first canon we turned to in this project, the Chadwyck-Healey Nineteenth-Century Fiction Collection, was designed by an editorial board of two, Danny Karlin and Tom Keymer.12 It is a set of about 250 novels chosen for being so very worth the inclusion of text-mining rights in current licenses, but previous agreements remain in a gray area). Finally, extracting data from an ocean of tape or hard drive, with insufficient or incorrect metadata and no database to assist, is a truly Byzantine process. The Libraries would search the ECCO database – for instance – using Gale's search interface, and citing its URL as that interface instructs. But for the Libraries to get a raw file to the Lab, they need to go through a couple of hard drives (or tapes) containing hundreds of thousands of directories named only with series of random numbers; the metadata “manifest” that Gale delivers with these raw files is contained in about ten Microsoft Word files formatted as if for print: two columns, authors in bold, very basic catalog data, a document ID, and ESTC ID, and a directory path. These documents are immense: ECCO II, Literature and Language module, Authors L-Z – which represents about 1/10th of the ECCO II delivery – is a 2,750-page document. Second, the ID numbers included are not the ones that you see in the Gale interface; they are internal, invisible numbers. So, despite all the Lab’s work in identifying ECCO sources using the database and noting the official Gale ID number, the Libraries have had to re-search each item by author or title in order to find the name of a file to copy: that Gale ID number is not included at all in the file manifest. “My lesson”, concluded a research librarian who assisted through the whole process, “is this: even when we've found the file you need, we still haven't really found the file”. 12 Personal communication with Steven Hall confirmed that the editors were uncon-

preserving, and so valuable to scholars, that libraries would pay for digital access to the set. Compiled in the late 1990s, with new novels added subsequently, the marketing materials of the Nineteenth-Century Fiction Collection claim that it “represents the great achievements of the Victorian canon and reflects the landmarks of the period,” while also covering “many neglected or littleknown works, most of them out of print or difficult to find.” From 1794, for example, the collection includes Ann Radcliffe’s Mysteries of Udolpho and William Godwin’s Caleb Williams, but also Jane Austen’s Lady Susan (a very short novel probably written around then, but published posthumously in 1871), and Thomas Holcroft’s radical Adventures of Hugh Trevor. The first two are obvious choices; the other two less so. It seems that selecting 250 texts makes room for lesser-known novels of critical or historical importance: not only the six major Austen novels, but also Lady Susan; not only Godwin, but also Holcroft. In so far as we understand a “canon” to signal a relatively small number of texts selected and consecrated for close study, Chadwyck-Healey – a major searchable collection immediately available to researchers today –13 is not a bad proxy. Still, a proxy it is; and we realized that relying on a single source was the wrong way to think about such a many-sided and elusive concept as that of the canon. In “Between Canon and Corpus: Six Perspectives on 20th-Century Novels” (Literary Lab Pamphlet 8, 2015), Mark Algee-Hewitt and Mark McGurl had addressed a similar problem by presenting several lists of “best twentieth-century novels” selected by very different groups, and then analyzing their varying degrees of proximity. We followed a different path, which led us from Chadwyck-Healey’s short catalogue of books to two long lists of authors: those mentioned by the Dictionary of National Biography, and those listed as “primary subject author” for twentieth-century academic articles indexed by the MLA Bibliography; in a lateral project, we also added the texts included in the Stanford Ph.D. exam lists of the last 30 years. In doing so, we were neither looking for the “right” definition of the canon (which none of them was), nor hoping that the DNB, MLA, and Stanford would agree with each other (which they didn’t).14 Rather, these different measurements were meant to replicate the multiple aspects of the idea of the canon: the fact that the national culture (DNB) defines it in one way, and international scholarship strained in their choice of texts. 13 Provided, that is, that said researchers belong to an institution with the necessary resources. According to one university’s ProQuest representative, in the entire world there are only “over 600” universities which subscribe to the Literature Online (LION) database. 14 Even leaving aside the representativeness of the Stanford Ph.D. exams, the author-centered approach of the DNB and MLA places Scott’s Castle Dangerous, or Thackeray’s Catherine, on the same plane as Waverley and Vanity Fair, which cannot be right. But alternative criteria have similar flaws, or are impossibly time-consuming.

(MLA) in a somewhat different one; that it may be conceived of as a series of personalities (DNB and MLA), or as a collection of texts (Ph.D. lists). The specific choices remained questionable – of course! – but the criteria that we had followed would be multiple, explicit, and measurable. That was the novelty. Then, we realized that there were other features of the novelistic field that could enter the equation. In their bibliographies, Raven and Garside had for instance identified the novels which had been reprinted in the British isles, or translated into French and German between 1770 and 1830; and one could envisage similar data for future research – from print runs to presence in circulating libraries and more. In these cases, too, the criteria would be multiple, explicit, and measurable; but with a major difference from the DNB and MLA. Reprints and translations measure the appeal of novels for a “general” audience, and through the institutions of the literary market; DNB and MLA focus on “specialized” readers, and institutions of higher education. One measures the “popularity” of novels; the other, their “prestige”.15 Popularity and prestige. With this conceptual pair, our research found itself on the same terrain as Bourdieu’s path-breaking chart of the French literary 15 That popularity is measured on nineteenth-century data, and prestige is derived from twentieth-century sources, is of course a problem. Twentieth-century studies have it better in this respect: in “Becoming Yourself: the Afterlife of Reception” (Literary Lab Pamphlet 3, 2011), for instance, Ed Finn charted the position of contemporary authors in the American literary field by using two categories – “consumption” and “conversation” – that belonged to the same chronological frame: “consumption” derived by Amazon.com “also bought” data, and “conversation” by contemporary reviews. Interestingly, “consumption” and “conversation” align rather well with our “popularity” and “prestige”; while the six “canons” discussed by Algee-Hewitt and McGurl also gravitate around market success on one side, and more “qualified” cultural selection on the other. In an attempt to correct the discrepancy between nineteenth- and twentieth-century data, follow-up studies may enlarge the prestige metrics by taking into account textbooks and anthologies for the school (as Martine Jey is doing for France), prizes (James English, The Economy of Prestige), reviews from eighteenth- and nineteenthcentury periodicals, or early collections of novels such as Barbauld’s, Ballantyne’s, and Bentley’s. It is by no means certain, however, that collections and reviews should be seen as indicators of prestige, rather than as mere cogs in the developing novelistic market; in an interesting recent essay, Michael Gamer has made a case for both possibilities, by presenting them as having canonical ambitions, while also competing in the commercial market. (See “A Select Collection: Barbauld, Scott, and the Rise of the (Reprinted) Novel”, in Jillian Heydt-Stevenson and Charlotte Sussman, eds, Recognizing the Romantic Novel, Liverpool 2008.) William St Clair, for his part, has expressed unambiguous skepticism about the role of reviews (“in general, the influence of the reviews appears to have been greatly exaggerated both at the time and by subsequent writers [...] I can discern no correlation between reviews, reputations, and sales”), and about the concept of novelistic prestige in the early 19th century: “As far as the prose fiction of the romantic period is concerned, there was no recognized contemporaneous canon. Indeed, the whole notion of a canon made little sense when most novels were published anonymously. One author dominated the age, ‘the author of Waverley’, not publicly acknowledged to be the famous poet Sir Walter Scott until the mid-1820s.” See William St Clair, The Reading Nation in the 3

22

11

20

10 Shelley, Percy Bysshe 9

18

Goethe, Johann Wolfgang von

16

Austen, Jane

14

7

Rousseau, Jean-Jacques

6

12 Disraeli, Benjamin Cervantes Saavedra, Miguel de

10

5

Shelley, Percy Bysshe

Sterne, Laurence Boswell, James Fielding, Henry Russell, Lord John

6

4

Behn, Aphra Goldsmith, Oliver Beckford, William Thomas Holcroft, Thomas Combe, William

2

Figure 3.1 The French literary field at the end of the nineteenth century 0

Fielding, Henry Richardson, Samuel

Russell, Lord John Diderot, Denis

Defoe, Daniel

8

Boswell, James Voltaire, François Marie Arouet de

Scott, Walter, Sir

Z_prestige

Z_prestige

Defoe, Daniel

8

4

Smollett, Tobias Burney, Frances (Fanny) [later d'Arblay] Goldsmith, Oliver

Hugo, Victor Behn, Aphra

Bulwer-Lytton, Edward George Earle Lytton Godwin, William

3 Lamb, Charles

Burney, Frances (Fanny) [later d'Arblay]

2

Beckford, William Thomas

Greene, Robert Brooke, Frances

Holcroft, Thomas

Cooper, James Fenimore

Bage, Robert

1

Galt, John

Cuthbertson, Catherine

0

6

Smith, Charlotte Turner

Opie, Amelia Alderson Reeve, Clara Galt, John

Ferrier, Susan Edmonstone

Roche, Regina Maria Day, Thomas Porter, Anna Maria

Morgan, Lady Sydney (née Sydney Owenson) Mackenzie, Henry

Genlis, Stéphanie-Félicité, Comtesse de

0

Edgeworth, Maria

Cooper, James Fenimore

Haywood, Eliza

Roche, Regina Maria

Bourdieu’s diagram of the literary field, though wonderfully suggestive, offers no empirical evidence for the specific position of the various genres and movements. The absence of explicit and measurable criteria is probably the reason why – despite its elegance, and its wide influence – Bourdieu’s chart has never become a genuine research tool, replicated and adapted by other scholars. The hard-to-believe regularity of the distribution, so unlike those of Figures 3.2 and 3.3, and of Bourdieu’s own diagrams in Distinction, is itself probably a consequence of the speculative foundation of the diagram. Pierre Bourdieu, The Rules of Art: Genesis and Structure of the Literary Field, 1992, Stanford 1996, p 122.

Figure 3.2. The British novelistic field, 1770–1830

Figure 3.3. The three regions of the British novelistic field, 1770–1830

Results for the popularity axis are based on the number of reprints (in the British isles) and of translations (into French and German); for the prestige axis, they are based on the number of mentions as "primary subject author" in the MLA Bibliography, and on the length of DNB entries.

field (Figure 3.1). By placing popularity data on the horizontal (“high/low economic profits”) axis, and prestige ones along the vertical (“high/low consecration”) one, we could provide a “British” version of Bourdieu’s chart. For now, this covered only a single genre, and a handful of decades; but at this

The position of writers is determined by the number of standard deviations above the mean of the field; John Galt, for instance, is 7.5 standard deviations above the mean on the popularity axis, and 1 above the mean on the prestige axis; at the opposite extreme, Percy Shelley is 10 standard deviations above the mean in terms of prestige, but slightly below the field’s mean in terms of popularity.

The three regions of this diagram express variable relationships between popularity and prestige. The area near the vertical axis has prestige scores at least twice as high as the scores for popularity; the area near the horizontal axis is its mirror image, with popularity at least twice as high as prestige; while in the central area the two sets of measurements tend to balance each other.

Romantic Period, Cambridge 2004, p. 189. On the other hand, the existence of a relationship between reviews and reputation has been recently – and convincingly – proposed by Ted Underwood and Jordan Sellers in “How Quickly Do Literary Standards Change?” h t t p : / / f i g s h a re .c o m /a r t i c l e s / H o w _ Q u i c k l y _ D o _ L i t e r a r y _ St a n d a rd s _ Change_/1418394. Underwood and Sellers study poetry instead of novels, and start their investigation in 1820, when St Clair’s book and our own corpus more or less end; too much of a mis-match in object and time frame for a direct comparison. But we are slowly approaching the moment when evidence from independent studies may be successfully compared and integrated.

Burke, Anne (Mrs. Burke)

-2 -2

2

4

8

10 Z_popularity

12

14

16

18

20

22

-1

-1

0

1

2

Helme, Elizabeth

3

4 Z_popularity

5

6

7

8

9

10

11

A study of popularity and prestige on a much larger time-scale is currently in progress at the Literary Lab, directed by J.D. Porter, with data collected both algorithmically and by a team of undergraduate researchers led by Micah Siegel.

point, an empirical cartography of the literary field was no longer a daydream (Figure 3.2).

from the picture, however, a tri-partition of the British novel becomes clearly visible (Figure 3.3).

In Figure 3.2, all data are dwarfed by Walter Scott’s incredible scores: only two novelists are slightly higher than him on the prestige axis (Goethe and Austen), and no one is even close in terms of popularity: the next author along that axis – Thomas Day, author of the Rousseauian bestseller The History of Sandford and Merton (1789) – is seven standard deviations below Scott.16 Once the out-of-scale results of “the author of Waverley” are removed

Let’s begin with the group near the horizontal axis: writers with high popularity scores – 5, 8, 10, 13, standard deviations above the average – but quite low on prestige; at most a couple of standard deviation, but often just one, or less. Here we find MacKenzie’s sentimental Man of Feeling and Day’s educational best-seller; the gothic cohort, with their frequent sentimental overtones (Radcliffe, Reeve, Roche, Helme, Maturin), Jacobin and anti-Jacobin novels (Charlotte Smith, Opie), national tales (Edgeworth, Morgan), and the new hegemonic form of the historical novel (Galt, Genlis, Horace Smith, Porter, Cooper). We could call this the space of genre, in the sense of all genres:

16 Since we are not measuring print runs, the chart actually understates Scott’s popularity: whereas most contemporary novels had a first run of 1,000 copies, the first three Waverley novels had opening runs of 6,000, 8,000, and 10,000 respectively.

4

“the” novel unfolding as a family of distinct forms, whose easily recognizable conventions pave the way to market success. Waverley’s opening chapter, entirely devoted to generic allusions in titles, is the perfect symptom of this state of affairs. Moving “up” from this region to the central part of the diagram takes us into very different territory. If one is ever justified in simply saying, “Here is the canon”, this must be the case: Defoe, Richardson, Fielding, Sterne, Goldsmith, Smollett, Burney, Godwin ... All of them, clustered in a perfectly balanced space (4-to-7 standard deviations above the popularity mean, and 3-to-8 above the prestige one), where the wide audience of formula fiction blends seamlessly with high cultural recognition. Looking at this central region makes you “see” the process of canonization as the combination of two simultaneous processes: popularity slowly shrinking with the passing years along the horizontal axis – in that respect, most eighteenth-century giants are well below Roche, Porter, Charlotte Smith, and Opie – while prestige increases along the vertical one.17 Though there is clearly more than one way of becoming a canonical writer,18 the main lesson of this image is that the canon is not the “the economic world reversed” of Bourdieu’s formula for the autonomous literary field; the canon – or at least this canon – is made of authors from whom commercial publishers are still expecting to make profits two or three generations after their initial success. And prestige, for its part, is not necessarily in antithesis to popularity; here, it seems rather to grow out of it, “distilling” economic returns into something more impalpable, but also more durable. 19 Things are different in the “high-prestige” region of Figure 3.3, which is clearly dominated by foreign writers (Cervantes, Voltaire, Diderot, Rousseau, Goethe, Schiller, Hugo...), or by those British authors who, though they did write at least one novel, or even a few, can hardly be seen as “professional” novelists. Among them are the encyclopedic figure of Samuel Johnson, and the almost equally versatile Horace Walpole; poets like Percy Shelley (and, 17 In terms of shrinking popularity, Austen and her contemporaries would provide a perfect case study: as Figure 3.2 shows, about 25 authors (one third of them from the eighteenth century) were more popular than Austen in the sixty years covered by the diagram. As nineteenth-century novelistic bibliographies become more reliable, we will know how many of them were still more popular than her a generation or two later (initial results from the 1830s and 1840s suggest: Scott, and no one else). 18 Scott’s immediate fame and acclaim are different from Austen’s significantly slower pace, or from the ambiguous status of authors long confined to specific niches because of their initial audience (Carroll) or genre (Radcliffe, Doyle). And then, of course, there is the nemesis of any general theory of the canon – Moby-Dick. 19 Although our findings are completely different from Bourdieu’s idea of the French literary field, they don’t necessarily falsify his thesis, as we are working only on novels (to the exclusion of poetry, drama, magazines, and so on), and on a different country and period. Truth be told, we need many empirical maps of literary fields (plural), from different cultures and epochs, for the “literary field” (singular) to become a solid historical concept.

lower down, Thomas “Anacreon” Moore and James Hogg); the novelistpolitician Disraeli and the politician-politician Lord Russell (who published an improbable Nun of Arrouca in 1822); essayists like James Boswell and Charles Lamb; at lower prestige levels, the, musician and playwright Charles Dibdin, the playwright and actress Charlotte Cibber Chalke, the economist and travel writer Arthur Young. Among the few novelists-novelists, politics plays an unusually strong role: aside from Russell and Disraeli, we encounter the bluestocking Sarah Scott (Millennium Hall and Desmond), Mary Shelley, and Hannah More – whose Coelebs in Search of a Wife, legend has it, was the only novel Queen Victoria entirely approved of.

of the composite nature of the canon – and of its historical nature, too: the canon of 1770-1830 (and, we suspect, of the following 70-80 years) was the product of the happy age of the European bourgeoisie, when the imperatives of success and education could be seen as compatible with each other, as was appropriate for a ruling class which, for the first time in history, felt at home in the market as well as the school. To have made the dual nature of the nineteenth-century canon intuitively “visible” – such is the achievement of these initial sections.23

With the prestige/popularity diagrams, a first arc of our project had found its natural conclusion. Although, against our original intentions, we had ended up quite far from the archive,20 our operationalization of the concept of the canon had been both surprising and satisfying: it had brought the notion down to earth, resolving it into the simpler elements of popularity and prestige – or, in plainer words: of the market and the school. Within these new coordinates, the canon remains as visible as ever, but it loses its conceptual autonomy, becoming the contingent outcome of the encounter between opposite forces. It is these forces, then, that deserve to be further investigated, if one wants to know more about the canon;21 and future research might easily add print runs and presence in the circulating libraries to the popularity metrics, and excerpts from textbooks, or mentions in the non-fiction archive, to the prestige ones.22 With each new addition, we will acquire a better sense

II. Morphological Features

20 In Figures 3.2–3.3, which have as their cut-off point two or three standard deviations above the mean of the field, all authors in the high prestige and in the middle area, and about half of those in the high popularity area, can be considered canonical. As one descends “lower”, the field’s tri-partition remains visible a little longer, then disappears. What happens then it’s a fascinating question – for another study. 21 Or more precisely: if one wants to de-compose the concept of the canon into the two underlying elements of popularity and prestige. Here, it’s worth comparing the initial epistemological choice of this project with that of Algee-Hewitt’s and McGurl’s “Between Canon and Corpus”. The main difference is not that between texts (“Between Canon and Corpus”) and authors (“Canon/Archive”) – which could be easily ironed out – but between an analysis based on networks, and one based on a Cartesian diagrams. Networks are much better at investigating the relationships among individual nodes (the hyper-canonical cluster identified in Figure 3 of the study, the singular centrality of Grapes of Wrath, the disconnect between bestsellers and the other groups), but cannot connect the nodes to anything outside the network itself. Cartesian diagrams, for their part, embed the “outside” into their very axes (like here popularity and prestige), but inevitably loosen the relationships among individual data points (in a diagram, there is no equivalent to network edges and clustering measures). Clearly, this is not a case of one strategy being “better” than the other, but of research projects that aim at investigating different properties of the system, and choose their means of analysis accordingly. 22 Needless to add, some of these measurements may be discontinuous and hard to come by (like print runs), while others (like textbooks) may start at a significantly later date. But if the notion of the literary field must help us understand different epochs and countries, having recourse to disparate historical indexes will be inevitable;

4. Measuring redundancy Though different from Bourdieu’s in many respects, the charts presented in the previous section shared his main methodological premise: they had a social rather than a literary foundation.24 To make Figure 3.3, you don’t need to open a single novel. As literary historians, however, we wanted to open the novels, and find out whether their social destiny – popular, prestigious, both, neither... – had any connection to their morphological features. So, while working at the diagrams of the literary field, we were also focusing on the internal composition of Chadwyck-Healey and of the sample from the larger archive. Here, the first step consisted in measuring the amount of redundancy and information present in the corpus. That readers prefer informative texts to redundant ones – thus keeping the former in print, while dooming the latter to extinction – is a widespread received idea, and we wanted to test it. Taking a cue from information theory, Mark Algee-Hewitt measured what is called “second order redundancy” (predictability at the level of individual words), using a modification of Shannon’s measure of information load which determines the information content of each text by assessing how predictable each word-to-word transition is, given the range of possible transitions. Since “of” is much more often followed by “the” than by “no”, for instance, the word

rather than hoping for a – chimerical – homogeneity of the sources, we should learn to make heterogeneous data conceptually comparable. 23 “Between Canon and Corpus” shows how much things have changed since then: in the twentieth century, canon(s) are all characterized by a “systematic differentiation, if not contradiction, between artistic and commercial value”. It is precisely this differentiation/contradiction that is absent from the “canonical” region of Figure 3.3. 24 “I propose that the problem of what is called canon formation”, writes John Guillory, in a similar vein, “is best understood as a problem in the constitution and distribution of cultural capital, or more specifically, a problem of access to the means of literary production and consumption.” John Guillory, Cultural Capital: The Problem of Literary Canon Formation, Chicago 1995, p. ix. 5

0.490


0.490

0.485 0.480

0.485

0.475 0.480

0.470

Redundancy

0.465

0.475

0.460 0.470

0.455 0.450

0.465

0.445 R edundancy

0.440 0.435 0.430

0.455

0.450

0.425 1795

0.460

1800

1805

1810

1815

1820

1825

1830

1835

1840

1845

1850 Year

1855

1860

1865

1870

1875

1880

1885

1890

1895

1900 1905 0.445

Figure 4.1. Measuring redundancy, 1800-1900 Purple crosses indicate archival novels, orange circles canonical ones

pair “of no” is far less predictable – hence more informative – than the bigram “of the”.25 Figures 4.1 and 4.2 summarize Algee-Hewitt’s investigation. Figure 4.2 was particularly striking: that three-fourths of the ChadwyckHealey collection would be less redundant than three-fourths of the archive was a much stronger separation than we had expected to find. And yet, we weren’t completely happy. The clarity of the contrast had simply confirmed a received idea: forgotten authors used language in a redundant fashion; if they had remained unread, it was because they weren’t really worth reading. And vice-versa: we still enjoy reading Austen because she is a paragon of information, as the close-up of Figure 4.3 makes perfectly clear. Not exciting, corroborating a received idea.26 And then, there was a second problem. Though Algee-Hewitt had operationalized the concept of redundancy, and produced striking quantitative findings, it wasn’t clear how we could dis-aggregate the overall score and look at the results, determining which specific word pairs returned all the time – or never did so. We had successfully measured redundancy, but couldn’t really analyze it: an unsettling 25 Throughout this pamphlet, we will use “redundancy” and “repetition” almost interchangeably, placing them in antithesis to “information” and “variety”; though this is a simplification, we don’t think it affects the level at which we are working, nor the type of results we have found. On a similar note, the relationship between information and redundancy is often referred to as “entropy”; we have opted for different definitions in order to make the various aspects of this research as comparable as possible. 26 And it was already the second time: in Figure 1.2, the fact that the canon regularly preceded the archive by 15-20 years seemed to “prove” that other received idea according to which great writers open the way, and the rest follow.

0.440

Figure 4.2. Redundancy in the nineteenth century: a synthetic diagram This figure aggregates the data of Figure 4.1 into the two sub-corpora of canon and archive. Each “box” includes the two central quartiles of the group, separated by a line which indicates the group’s median value; the “whiskers” emerging from the box represent the two extreme quartiles, while outliers are indicated by individual dots.

0.435

0.430

0.425

Archive

Canon ANOVA pValue = 2.70638e-90

Sco, Walter Porter, Jane

Austen, Jane

Austen, Jane

More, Hannah

Figure 4.3. Very low redundancy in the early nineteenth century

Brunton, Mary Shelley, Percy Bysshe

Maturin, Charles Robert

Opie, Amelia Alderson Shelley, Mary Wollstonecra Dacre, Charloe

Austen, Jane

Shelley, Percy Bysshe

Austen, Jane

Austen, Jane

A novel that never repeated a single word would have zero redundancy and 100% information – but this “information” would have no value, because it would rapidly become incomprehensible. Meaning always depends on a mix of repetition and novelty: that’s why the scores in these figures oscillate in a rather narrow range. Differences within this range are however both consistent and significant, as is illustrated by this enlargement of the bottom left area of Figure 4.1. 6

We seemed to have created for ourselves a home-grown version of the uncertainty principle: the more precisely we measured redundancy, the harder it became to determine “where” it actually was. Redundancy operated at a scale that was all-pervasive, and apparently decisive in shaping the destiny of books; but the whole process took place so far below the level of conscious reading as to be practically invisible. In the future, perhaps even the near future, such a problem might be addressed by experimental psychology; in the meantime, we turned to a standard linguistic measure of lexical variety known as type-token ratio.27 The lower a text’s redundancy, we reasoned, the higher must its variety be: convex to concave. We would get an image that would be the exact reverse of Figure 4.2. So we did our calculations, and the result was Figure 4.5. Placing Figures 4.2 and 4.5 next to each other produced the following paradox: the canon was far less repetitive than the archive (hence much more 27 This is how the Longman Grammar of Written and Spoken English defines typetoken ratio: “The relationship between the number of different word forms, or types, and the number of running words, or tokens, is called the type-token ratio (or TTR). As a percentage, type-token ratio is equal to (types/tokens) x 100.” See Biber, Johansson, Leech, Conrad, Finegan, Longman Grammar of Spoken and Written English, Harlow 1999, pp. 52-3. The Longman Grammar follows the variations of type-token ratio across four registers (Conversation, Academic prose, Fiction, and News), and three sample lengths (100, 1,000, and 10,000 words). For 100-word segments the results are as follows: Conversation 63; Academic prose 70; Fiction 73; News 75. For 1,000-word segments: Conversation 30; Academic prose 40; Fiction 46; News 50. And for 10,000-word segments: Conversation 13; Academic prose 19; Fiction 22; News 28. Notice how the difference between the registers increases dramatically with the length of the segment: at 10,000 words, the type-token ratio of News is more than double that of Conversation, whereas it was only 16% higher at 100 words. We opted for 1,000 words segments, which seemed to be long enough to capture a good amount of variety, and short enough to allow direct analysis.

of_the_1441 of_the_1148 of_the_266 of_the_942 of_the_1486 of_the_746 of_the_679 of_the_702 of_the_359 of_the_389 of_the_459 of_the_226 of_the_161 of_the_342 of_the_245 of_the_648 of_the_607 of_the_603 of_the_425 of_the_383 of_the_1627 of_the_1004 of_the_751 of_the_4794 of_the_1169 of_the_1681 of_the_1302 of_the_1632 of_the_1465

in_the_672 in_the_578 in_the_99 in_the_404 in_the_781 to_be_616 in_the_401 in_the_494 said_i_236 in_the_295 to_be_428 in_the_143 in_the_91 to_the_246 to_the_196 in_the_419 in_the_471 to_the_321 to_the_304 in_the_194 in_the_1325 in_the_626 in_the_452 in_the_3327 to_the_724 to_the_703 in_the_589 in_the_688 to_the_555

to_the_634 to_the_521 to_the_95 to_the_365 to_the_633 in_the_574 sir_ulick_348 of_her_440 in_the_212 to_be_271 in_the_382 mr_glowry_84 to_be_85 the_marquis_215 in_the_184 to_the_338 to_the_332 in_the_290 in_the_289 on_the_153 of_her_1315 to_the_573 to_the_412 to_the_2510 in_the_676 in_the_653 to_the_586 to_the_649 in_the_525

of_his_341 of_his_309 on_the_86 of_his_245 of_his_365 it_was_389 to_the_330 to_the_404 to_the_191 of_her_194 i_am_297 and_the_71 to_the_52 in_the_214 mrs_villars_126 i_have_280 to_be_247 of_his_234 said_i_264 and_the_135 to_the_1286 he_had_441 he_had_399 to_be_2195 to_her_512 said_the_525 of_his_369 the_earl_408 said_the_379

of_a_333 of_a_308 of_his_69 to_be_197 of_a_364 she_had_365 to_be_298 to_be_387 i_am_181 i_am_183 of_her_264 of_a_69 i_am_40 of_his_212 of_her_115 it_is_262 of_her_225 to_be_208 of_a_210 to_the_125 to_her_975 he_was_389 she_had_350 of_her_2146 of_her_490 and_the_403 and_the_318 of_his_385 of_his_358

and_the_318 0.56 the_master_294 by_the_66 0.54 and_the_179 and_the_354 of_her_346 0.52 he_had_292 lady_juliana_276 0.50 of_my_175 to_the_163 0.48 to_the_251 to_the_630.46 of_a_40 and_the_164 0.44 she_was_103 i_am_2040.42 had_been_202 he_had_182 0.40 and_the_184 of_my_109 0.38 of_his_935 to_be_364 0.36 to_be_298 she_had_2072 0.34 to_be_490 of_a_349 0.32 on_the_305 and_the_373 and_the_300 T ype T oken R atio

departure from that interplay of quantitative measurement and qualitative interpretation which had been a constant of our work since the beginning. Here, statistical significance seemed impervious to critical meaningfulness: the “text” created by extracting the 100 most frequent bigrams from each novel in the corpus was a spreadsheet with over 100,000 cells: “reading” them was out of the question (Figure 4.4). A more technical approach – following the decay curve of the most frequent constructions – turned out to be equally inconclusive: very frequent bigrams (“there is”, “I am”, “to the”) had very similar frequencies in all the texts, and variation occurred only in minute traces far down the curve. Plus, there were so many bigrams, in each novel, that their effects manifested themselves through an immense number of extremely small changes: in a relatively short text of 66,500 words, for instance, there were 66,499 bigrams, about 40,000 of which never repeated themselves. And whereas the number of shared words between two texts was substantial – at least 3-4,000 – the shared bigrams were usually less than 1000; too few for a solid comparative analysis.

on_the_292 and_the_261 of_her_62 by_the_175 on_the_308 she_was_329 he_was_275 to_her_273 and_the_174 it_was_161 it_was_246 it_is_57 a_very_39 he_had_157 she_had_100 on_the_193 on_the_199 and_the_180 at_the_180 i_had_107 for_the_682 it_was_352 he_was_295 she_was_1935 i_am_406 of_his_349 as_he_272 Archive sir_simon_369 on_the_272

by_the_286 said_the_266 sir_arthur_263 with_the_237 by_the_228 for_the_213 from_the_58 at_the_49 to_be_48 on_the_171 of_a_161 at_the_159 to_be_305 he_had_279 at_the_270 to_the_324 to_her_303 i_am_297 of_his_239 that_he_222 and_the_187 and_the_256 of_a_246 at_the_235 and_i_174 that_i_171 at_the_144 she_had_155 of_a_133 it_is_130 mr_darcy_233 she_was_210 she_had_205 on_the_56 to_be_52 in_a_47 for_the_38 at_the_35 in_a_34 to_his_125 for_the_110 he_was_104 on_the_96 i_am_87 i_have_84 that_i_185 and_the_168 with_the_149 she_had_194 i_have_169 of_a_169 on_the_179 he_was_171 at_the_157 i_was_158 in_a_155 i_had_144 it_was_95 i_was_89 from_the_84 from_the_637 she_had_551 to_be_547 she_had_351 of_her_330 on_the_316 she_was_283 on_the_278 of_his_262 of_a_1868 it_was_1603 on_the_1533 she_had_381 of_a_364 of_his_364 it_is_331 to_be_323 for_the_310 by_the_256 with_the_230 he_had_224 Canon by_the_365 i_will_359 for_the_324 ANOVA pValue = 6.503e-11 of_a_270 from_the_256 with_the_243

Figure 4.4. Reading bigrams: 0.00003% of the data

Figure 4.5. Measuring variety: a synthetic diagram of type-token ratio

A section of the spreadsheet used for the calculations behind Figures 4.1–4.2. Though the bigrams themselves are perfectly identified, it’s nearly impossible to “interpret” what they mean other than in statistical fashion. In this respect, Walser and Algee-Hewitt observed, bigrams were comparable to Braudel’s “demographic progressions” and “variations in interest rates”: all phenomena that could not be perceived at the passage-by-passage level on which we typically conduct our readings.

Though the distinction between the two sub-corpora is here much less sharp than in Figure 4.2, the result is actually more dramatic: 4.2 had fully confirmed our expectations about canon and archive, whereas this chart completely contradicted them: the lexicon of the canon was not more variedthan that of the archive, but significantly less so. (The procedure followed to determine type-token ratio is described in footnote 28, at the beginning of the next section).

varied) from the perspective of word pairs, and at the scale of the entire text; and less varied (hence more repetitive) from the perspective of single words, and at the scale of a thousand. In itself, the fact that different textual scales would behave differently was not a surprise: two previous pamphlets (“Style at the Scale of the Sentence” and “On Paragraphs”) had focused exactly on that question. But in those cases, different scales had been associated with completely different features: sentences with style, paragraphs with themes, and so on. Here, the features measured were very closely related. How could results reverse themselves from two words to a thousand? And we mean that

“how” literally, not as a cry of despair: concretely, what textual mechanism could transform the first result into the second? Algee-Hewitt addressed the question by “translating” all words into partsof-speech, thus re-formulating redundancy via categories of bigrams rather than individual units; “clever little” and “first cruel”, for instance, both became “adjective-adjective”; “a condition” and “the kitchen” became “determinernoun”, etc. Re-calculating everything in terms of “grammatical redundancy”

7

This time, the two sub-corpora revealed to have very different centers of gravity: the archive was dominated by nouns, while the canon had a very large presence of function words (conjunctions, determiners, prepositions). The archive’s delight in titles (count Goldstein, uncle Gerard), punctiliousness about places and people (in Ireland; to Shirley), and liberality with proper nouns in general (Hector’s lodgings, Shelburne upon) finally gave us a clue to its high redundancy: “count Goldstein” and “Shelburne upon” may not appear very often in a novel – but when they do, the two words are likely to re-occur together, increasing the text’s redundancy; and the same for con-

Preposition-proper noun (IN_NNP): to Shirley; in Ireland Adjective-adjective (JJ_JJ): young happy; first cruel Noun-adjective (NN_JJ): child incapable; nomenclature peculiar Noun-noun (NN_NN): iron will; evening sky Noun- proper noun (NN_NNP): count Goldstein; uncle Gerard Noun-plural noun (NN_NNS): iron bars; autumn tints Proper noun-preposition (NNP_IN): Alps of; Shelburne upon Proper noun-noun (NNP_NN): Agnes’ wedding; Manchester cotton Proper noun-plural noun (NNP_NNS): Cumberland coasts, Hector’s lodgings Noun-pronoun (NN_PRP): tail itself, driver himself. Figure 4.6 Most distinctive grammatical bigrams: archive

Conjunction-gerund (CC_VBG): and walking; and taking Determiner-adjective (DT_JJ): the silly; an eventful Determiner-noun (DT_NN): a condition; the kitchen Determiner-plural noun (DT_NNS): the environs; the travelers Preposition-determiner (IN_DT): at the; in a Adjective-plural noun (JJ_NNS): folded arms; harsh features Noun-preposition (NN_IN): account of; sense of Plural noun-preposition (NNS_IN): grains of; years of Possessive pronoun-plural noun (PRP$_NNS): their excursions; our girls Figure 4.7 Most distinctive grammatical bigrams: canon 28 For this part of the work, Algee-Hewitt used the Stanford Parts-of-Speech Tagger; the abbreviations enclosed in parentheses (IN_NNP etc.) are however those used by the Treebank project (https://www.cis.upenn.edu/~treebank/) of the University of Pennsylvania.

structions like the adjunct nouns “iron will” and “autumn tints”. It wasn’t an answer to all our questions, but it was a beginning. And then, in order to address the other side of the paradox, we turned back to type-token ratio. 5. “But I couldn’t go away”


0.50

0.48

Median TTR of 1000-word slices for text

made it possible to identify which kinds of bigrams were most distinctive of the canon, and which of the archive (Figures 4.6–4.7).28

0.46

0.44

In the case of type-token ratio, the first thing that needed to be done was to come up with 0.42 a mode of analysis appropriate to a corpus where most novels had not been reprinted for a 0.40 century or two, making optical recognition difficult, and hence potentially invalidating all subsequent calculations. Ryan Heuser, who had 0.38 first directed our attention to type-token ratio in the early phases of the project, found a way to 0.36 measure it equally reliably across texts of very different quality.29 Once the results were in, we 0.34 started by looking at low type-token ratio, to 1795 1800 1805 1810 1815 see how its specific kind of repetitiveness compared to the redundancy calculated by AlgeeHewitt. We knew from Figure 4.6 that low lexical variety would often correlate with canonical texts, and indeed the frequency of the Chadwyck-Healey collection, which amounted to around 20% of the corpus overall, rose to 50% among the 500 segments with the lowest type-token ratio (whereas it was a mere 3.2 in the top 500). Among the 50 texts with the lowest scores, about half were from Chadwyck-Healey: several children books (Alice, Through the Looking-Glass, The Water Babies, Black Beauty, Little Lord Fauntleroy, Island’s Night’s Entertainments...), ten of Trollope’s novels (The Last Chronicle 29 Heuser began by creating a very large dictionary of novelistic English – 232,845 distinct words – and slicing all texts into segments of 1,000 “dictionary-words”. (Actual segments would be anywhere from 1,000 to ~1,500 words long, depending on how many “non-dictionary” words – OCR errors, hapax legomena, etc – they had.) Since the number of tokens was fixed at 1,000, dividing the number of types in each segment by 1,000 produced segment-based scores whose average gave us the type-token ratio for the text. The function was written with two parameters: “slice_len” [the length of the segment (set at 1000)] and “force_english” [whether to include words not in a very large English dictionary (set at False)]. The reasoning behind the “force English” parameter, which excluded all non-“English” words, was that, without it, the archive would have a higher type-token ratio simply by virtue of its bad OCR. Conversely, the concern with forcing English was that the same bad OCR would produce a lower type-token ratio: if the segment had to expand over ~1,500 “real” words in order to find 1,000 “English” ones, then it might privilege shorter, easier-to-spell-and-OCR words, which are also the most frequent in the language, thus driving type-token ratio downwards. In the event, these two undesirable outcomes seemed to balance each other out.

1820

1825

1830

1835

1840

1845 1850 1855 Date of publication

1860

1865

1870

1875

1880

1885

1890

1895

1900

Figure 5.1 Type-token ratio, 1800-1900 The “pull” of children’s stories towards a low type-token ratio is visible between 1860 and 1880; in general, though, the type-token ratios of both canon and archive remain rather stable across the nineteenth century.

of Barset, Phineas Finn the Irish Member, Can You Forgive Her?, The Eustace Diamonds...), plus two Irish novels (Edgeworth’s Castle Rackrent and Samuel Ferguson’s Father Tom and the Pope, with The Absentee not very far). In itself, this mix was not particularly representative of the canon (whatever one may mean with that term); more significant seemed to be the fact that ChadwyckHealey’s scores remained low across the century (Figure 5.1), and that the trend involved some of the greatest nineteenth-century stylists: all of Austen was below the corpus mean (with Persuasion, Sense and Sensibility, and Mansfield Park in the bottom 20%); all of Dickens was below the mean (with Little Dorrit, A Tale of two Cities, David Copperfield, Our Mutual Friend, Bleak House, and Great Expectations in the bottom 20%); all of George Eliot was

8

below the mean – and Adam Bede contained the passage with the lowest type-token ratio of the entire century. Now, Adam Bede is a strange novel for that kind of result, because it contains Eliot’s famous reflections on Dutch painting: a manifesto for aesthetic precision and variety, written with extraordinary precision and variety (Figure 5.2). The first 100 words of this passage have a type-token ratio of 79: higher than anything, in any register, discussed by the Longman Grammar. And yet, later in the novel, Eliot’s style runs to the opposite extreme (Figure 5.3). Eliot’s passage includes the central moment of Hetty’s confession to Dinah: the recollection of having abandoned her child in the woods, and of waiting for “its” death (to use the pronoun she herself uses). But “waiting” is the wrong word (Figure 5.4). Grammatically, the most arresting feature of these sentences is the flood of inflected verb forms with Hetty as their subject: I made haste ... I could hear ... I got out ... I was held fast ... I couldn’t go away ... I wanted ... I sat ... I was ... I had ... I couldn’t ... In narrative analysis, verb forms are usually seen as indices of “action” – and comprehensibly so. But here, in a grating dissonance between grammar and semantics, they stand for paralysis instead: Hetty desperately wants to “go away” – and can’t. And just as she cannot leave the physical setting of the episode, she cannot relinquish the words which describe it. She cannot forget: that’s where the repetition comes from. Better: she can neither forget, nor really say what has happened. In a textbook instance of the opposition between “repeating” and “working through”, she keeps saying the same things over and over again, because she cannot bring herself to utter the one thing that really matters: the word “death” is never repeated, and only appears in an oblique, misleading construction at the end of the passage.30 Why repetition? Because a trauma has occurred, and repetition is a great way to express it in language: an imprisonment in one’s own words whose enigmatic force explains why Eliot, despite her love for analytical details, could write the most repetitive passage of the entire century. And then, Hetty’s confession also brings to light the fundamentally oral component of type-token ratio. Next to Eliot’s page, the two segments with the lowest lexical density are also confessions: of baby-changing in Edgeworth’s Ennui,31 and of love in Trollope’s Last Chronicle of Barset.32 In the same low range we find passages 30 “But it was morning, for it kept getting lighter, and I turned back the way I’d come. I couldn’t help it, Dinah; it was the baby’s crying made me go--and yet I was frightened to death. I thought that man in the smock-frock ‘ud see me and know I put the baby there.” Notice how “death” is referred to Hetty instead of her child. 31 “I thought, how happy he would be if he had such a fine babby as you; dear; and you was a fine babby to be sure; and then I thought, how happy it would be for you, if you was in the place of the little lord: and then it came into my head, just like a shot, where would be the harm to change you?” 32 “You are so good and so true, and so excellent,-- such a dear, dear, dear friend,

It is for this rare, precious quality of truthfulness that I delight in many Dutch paintings, which lofty-minded people despise. I find a source of delicious sympathy in these faithful pictures of a monotonous homely existence, which has been the fate of so many more among my fellow-mortals than a life of pomp or of absolute indigence, of tragic suffering or of world-stirring actions. I turn, without shrinking, from cloud-borne angels, from prophets, sibyls, and heroic warriors, to an old woman bending over her flower-pot, or eating her solitary dinner, while the noonday light, softened perhaps by a screen of leaves, falls on her mob-cap, and just touches the

rim of her spinning-wheel, and her stone jug, and all those cheap common things which are the precious necessaries of life to her—or I turn to that village wedding, kept between four brown walls, where an awkward bridegroom opens the dance with a high-shouldered, broad-faced bride, while elderly and middle-aged friends look on, with very irregular noses and lips, and probably with quart-pots in their hands, but with an expression of unmistakable contentment and goodwill. "Foh!" says my idealistic friend, "what vulgar details!"

Figure 5.2 “This rare, precious quality of truthfulness” came all of a sudden, as I was lying in the bed, and it got stronger and# stronger#... I# longed so to go back again... I# could n’t* bear being so# lonely, and# coming to# beg for want. And# it# gave me strength and# resolution to# get up and# dress myself. I# felt I# must do it#... I# did n’t* know how... I# thought I#’d find a# pool, if I# could#, like that other, in# the# corner of# the# field, in# the# dark. And# when the# woman went out, I# felt# as# if# I# was# strong enough to# do# anything... I# thought# I# should get# rid of# all# my misery, and# go# back# home, and# never let’em know# why I# ran away. I# put on my# bonnet and# shawl, and# went# out# into the# dark# street, with the# baby under my# cloak; and# I# walked fast till I# got# into# a# street# a# good way off, and# there was# a# public, and# I# got# some warm stuff to# drink and# some# bread. And# I# walked# on# and# on#, and# I# hardly felt# the# ground I# trod on#; and# it# got# lighter, for# there# came# the# moon-- O, Dinah, it# frightened me# when# it# first looked at me# out# o#’ the# clouds-- it# never# looked# so# before; and# I# turned out# of# the# road into# the# fields, for# I# was# afraid o#’ meeting anybody with# the# moon# shining on# me#. And# I# came# to# a# haystack, where I# thought# I# could# lie down and# keep myself# warm# all# night. There# was# a# place cut into# it#, where# I# could# make me# a# bed#; and# I# lay comfortable, and# the# baby# was# warm# against me#; and# I# must# have gone to# sleep for# a# good# while, for# when# I# woke it# was# morning, but not very light, and# the# baby# was# crying. And# I# saw a# wood a# little way# off#... I# thought# there#’d perhaps be a# ditch or a# pond there#... and# it# was# so# early I# thought# I# could# hide the# child there#, and# get# a# long way# off# before# folks was# up#. And# then I# thought# I#’d go# home#-- I#’d get# rides in# carts and# go# home#, and# tell’em I#’d been to# try and# see for# a# place#, and# could# n’t* get# one. I# longed# so# for# it#, Dinah#-- I# longed# so# to# be# safe at# home#. I# do# n’t* know# how# I# felt# about the# baby#. I# seemed to# hate it#-- it# was# like# a# heavy weight hanging round my# neck; and# yet its crying# went# through me#, and# I# dared n’t* look at# its# little# hands and# face. But# I# went# on# to# the# wood#, and# I# walked# about#, but# there# was# no water’’... Hetty shuddered. She was# silent for# some# moments, and# Figure 5.3. The nineteenth-century’s most repetitive passage: Hetty’s confession in Adam Bede The pound sign indicates that a word is being repeated within the given segment, while asterisks denote words that are not part of the “dictionary” used for the calculations. Some odd aspects of this and other passages are artifacts of the Stanford parser – which, for instance, considers negative contractions, such as “n’t” at the end of “couldn’t”, as a separate word.

when# she# began again#, it# was# in# a# whisper.`` I# came# to# a# place# where# there# was# lots of# chips and# turf, and# I# sat down# on# the# trunk of# a# tree to# think what I# should# do#. And# all# of# a# sudden# I# saw# a# hole under# the# nut-tree*, like# a# little# grave. And# it# darted into# me# like# lightning-- I#’d lay# the# baby# there#, and# cover it# with# the# grass and# the# chips#. I# could# n’t* kill it# any other# way#. And# I#’d done it# in# a# minute; and#, O#, it# cried so#, Dinah#-- I# could# n’t* cover# it# quite up#-- I# thought# perhaps# somebody ùd* come and# take care of# it#, and# then# it# would n’t* die. And# I# made haste out# of# the# wood#, but# I# could# hear it# crying# all# the# while#; and# when# I# got# out# into# the# fields#, it# was# as# if# I# was# held fast#-- I# could# n’t* go# away#, for# all# I# wanted so# to# go#. And# I# sat# against# the# haystack# to# watch if# anybody# ùd* come#: I# was# very# hungry, and# I#’d only a# bit of# bread# left; but# I# could# n’t* go# away#. And# after ever such a# while#-- hours and# hours#-- the# man came#-- him in# a# smock-frock*, and# he looked# at# me# so#, I# was# frightened#, and# I# made# haste# and# went# on#. I# thought# he# was# going to# the# wood#, and# would# perhaps# find# the# baby#. And# I# went# right on#, till# I# came# to# a# village, a# long# way# off# from the# wood#; and# I# was# very# sick, and# faint, and# hungry#. I# got# something to# eat there#, and# bought a# loaf. But# I# was# frightened# to# stay. I# heard the# baby# crying#, and# thought# the# other# folks# heard# it# too,- and# I# went# on#. But# I# was# so# tried, and# it# was# getting towards dark#. And# at# last, by the# roadside there# was# a# barn-- ever# such# a# way# off# any# house-- like# the# barn# in# Abbot’s Close; and# I# thought# I# could# go# in# there# and# hide# myself# among the# hay and# straw, and# nobody ùd* be# likely to# come#. I# went# in#, and# it# was# half full o#’ trusses of# straw#, and# there# was# some# hay#, too#. And# I# made# myself# a# bed#, ever# so# far behind, where# nobody# could# find# me#; and# I# was# so# tired and# weak, I# went# to# sleep#.... But# oh, the# baby#’s crying# kept waking me#; and# I# thought# that# man# as# looked# at# me# so# was# come# and# laying hold of# me#. But# I# must# have# slept a# long#

And# I# made haste out# of# the# wood#, but# I# could# hear it# crying# all# the# while#; and# when# I# got# out# into# the# fields#, it# was# as# if# I# was# held fast#-- I# could# n't* go# away#, for# all# I# wanted so# to# go#. And# I# sat# against# the# haystack# to# watch if# anybody# ùd* come#: I# was# very# hungry, and# I#'d only a# bit of# bread# left; but# I# could# n't* go# away#. Figure 5.4. “But I could hear it crying all the while” 9

from children stories (with their typically life-like narrators), Irish novels (which specialized in the imitation of speech), and countless instances of Trollope’s petty-bourgeois stichomythia.33 There are trial scenes (The Ordeal of Richard Feverel, The Heart of Mid-Lothian, William Scargill’s Tales of a Briefless Barrister), ideological confrontations (Marius the Epicurean), an ecstatic vision of the “communism of happiness” (Mary Christie’s Lady Laura),34 and a great invective against money (Thomas Pemberton’s A Very Old Question).35 There are characters who talk too much because they are trying to be obliging (Emma), or because, like Van Helsing in Dracula, they need to rehearse the evidence over and over again. It could hardly be an accident, concluded Allison and Gemma, that our lowest-ranked (and largely canonical) 1,000word segments were in exactly the same range as conversation in the Longman Grammar: a mean of 30 in their case, and a range of 27-33 for our bottom 500 segments. We had turned to type-token ratio in the hope that it would lead us back to some kind of textual analysis – and we had not been disappointed: low scores captured crucial aspects of narrative structure, signaling trauma, intensity, and orality. And high scores? 6. “Embrasures bristling with wide-mouthed cannon” Figure 6.1 shows the ten novels with the highest type-token ratio in the corpus; 6.2 the top-scoring passage, from Edward Hawker’s Arthur Montague, or, An Only Son at Sea. If the privileged social position of the canon were always correlated with linguistic privilege – Dario Fo, 1997 Nobel prize for literature, once wrote a play entitled The worker knows 300 words, the boss 1,000; that’s why he’s the boss – then canonical authors should have a much more varied language than that I will tell you everything, so that you may read my heart. I will tell you as I tell mamma,-- you and her and no one else;-- for you are the choice friend of my heart. I can not be your wife because of the love I bear for another man”. 33 “Do you think that I am in earnest?” “Yes, I think you are in earnest.” “And do you believe that I love you with all my heart and all my strength and all my soul?” “Oh, John!” “But do you?” “I think you love me.” “Think!” 34 “All are not equally happy; all can not be equally happy. But there is a sort of communism possible in happiness. The unhappy have a claim upon the happy; the happy have a debt towards the unhappy.” “But how can one share one’s happiness with others? It seems to me impossible. It is what I have most wished to do, but I see no way in which it can be done.” “In one sense certainly you can not share your happiness, and you can not give it away. It is essentially your own, a development of your being, a part of yourself that you may not alienate.” 35 “Money!’’ she cried derisively.’’ Money! What is money to the trouble which has torn my heart ever since I have been married! What is money to those who thirst for love! I never wanted money; without money I was strong and happy; since I have had it I have been weak and miserable. Money broke down my poor father, and it was for money that Percy married, deceived, and has forsaken fine. Thank God that the wretched money has gone’’

forgotten ones. In terms of the type of lexical abundance measured by typetoken ratio, however, the opposite is true. “The whole language of aesthetics is contained in a fundamental refusal of the facile”, writes Bourdieu: “‘vulgar’ works [...] arouse distaste and disgust by the methods of seduction”.36 Facile, Hawker’s language? Seductive? If anything, the opposite. A dichotomy such as vulgar/refined will never explain the connection between the archive and high type-token ratio. We must look elsewhere. As often in this research, we found an answer in corpus linguistics. This time, it was the concept of “register”: the “communicative purposes and situational contexts” of messages described by Douglas Biber and Susan Conrad in Register, Genre, and Style.37 In the study of register, the fundamental opposition runs between oral and written, and it is a well-established fact that the latter has in English a much higher type-token ratio than the former. If the archive has a greater lexical variety than the canon, then, the reason is that the archive inclines towards the “written” register much more than the canon (while the latter, as we have seen in the previous section, is much more at ease with “oral” conventions). It’s not that archival novels with high type-token ratio have fewer oral passages (dialogue, speech, exclamations, etc.); Gemma’s work in progress on colloquial discourse suggests that they may even have more; it’s that their “spoken” passages have a markedly “written” quality. Jane West’s Ringrove, for instance, includes a lot of language typographically marked as “speech” – which however consists often of formal tirades that sound closer to a written disquisition than to an oral exchange. 38 Linguistic conservatism is certainly one reason for the “written” quality of many archival works. A passage from William North’s The Impostor – whose type-token ratio is near the top 1% of the corpus – expresses it well: There has of late years crept into our belles lettres, in addition to the soi-disant fashionable trash above mentioned, a violent predilection for low life, slang, and vulgarism of every kind. Dickens and Ainsworth led the way, and whole hosts became their followers ... Let us endeavor to reestablish pure classical taste. Let us endeavor to reestablish ... In their study of prestige and style, Underwood and Sellers have found that many obscure books “at the very bottom

36 Pierre Bourdieu, Distinction. A social critique of the judgment of taste, 1979, Harvard UP, 1984, p. 486. 37 Douglas Biber and Susan Conrad, Register, Genre, and Style, Cambridge UP 2009, p. 2. 38 Here is one, on Byron’s misuse of his poetic gifts: “There is a deep condensation of thought, an appropriateness of diction, an elegance of sentiment, and an original glow of poetical imagery; ever happy in illustrating objects, or deepening impressions;-- which so fascinate our fancy and bewilder our judgment, that we lose sight of the nature of the deeds he narrates, and the real character of the actors.”

Edward Duros, Otterbourne; A Story of the English Marches, 1832 Edward Hawker, Arthur Montague, or, An Only Son at sea, 1850 Emma Robinson, The Armourer’s Daughter: or, The Border Riders, 1850 William Lennox, Compton Audley; or, Hands Not Hearts, 1841 Mary Anne Cursham, Norman Abbey: A Tale of Sherwood Forest, 1832 William Maginn, Whitehall; or, The Days of George IV, 1827 Thomas Surr, The Mask of Fashion; A Plain Tale, with Anecdotes Foreign and Domestic, 1807 James Grant, The Scottish Cavalier: An Historical Romance, 1850 Cecil Clarke, Love’s Loyalty, 1890 Jane West, Ringrove, or Old Fashioned Notions, 1827 Figure 6.1 High type-token ratio, or, the triumph of the archive then cut through some acres of refreshing greensward, studded with the oak, walnut, and hawthorn, ascended a knoll, skirted an expansive sheet of# water; afterwards entering an# avenue of# noble elms, always tenanted* by a# countless host of# cawing* rooks, whose clamorous conclaves* interrupted the# stillness that reigned around, and# whose# visits to adjacent cornfields* of# inviting aspect raised the# ire and# outcry of# the# yelling VOL. I. C urchins employed to# guard them from depredation. Emerging from# this arched vista, a# near view was obtained of# the# mansion, approached through# a# thick luxuriant shrubbery of# full-grown* evergreens. It was# a# straggling stone structure of# considerable size and# doubtful architecture, having on either side an# ornamental wing, surmounted by# glazed cupolas*, and# indented below with# niches containing statues and# vases alternate. The# front face of# the# building displayed a# row of# fine Corinthian pillars-their capitals screened by# wire-work* shields, to# defend them# from# the# injurious intrusions of# the# feathery tril*> e, who ever chirped* and# hovered about the# forbidden spots, coveting the# shelter denied them#. In the# vicinity of# the# house was# a# spacious flower-garden*, encompassed by# a# protecting plantation of# bay, holly, augustines*, arbutus, laburnum, yellow and# red Barbary, lilac, and# Guelder-rose*, ever# melodious with# the# shy, wary blackbird’s whistle, the# sweet notes of# the# secreted thrush, and# the# varied carols of# their# fellow-choristers*, all conspiring to# give motion as well as# life to# their# leafy concealment. To# the# right, was# a# rich, park-like* prospect, sprinkled with# deer, grazing beneath clumps of# commingled oaks and# chestnuts or pulling acorns from# the# low, overhanging? branches of# some# solitary venerable stout-trinket* tree, whose# outspread limbs bent downwards to# the# earth from# whence their# life# was# drawn, as# if in# thankfulness for the# nourishment received. In# an# opposite direction stretched forth undulating woodland scenery, bordering on# an# open furzy down, which was# frequently occupied by# the# moveable* abodes* of# those houseless rovers-- the# hardy, spoliating*, mendacious tribe, whose# forefathers Selim*, on# (continued on page 13) Figure 6.2 The nineteenth-century’s least repetitive passage. Arthur Hawker’s landscape description has a type-token ratio of 60, well above the scores (46 for fiction and 50 for news) reported by the Longman Grammar for segments of equal length. 10

of [their model’s] list [...] have some inspirational or hortatory purpose”.39 The same here: the “slang and vulgarisms” typical of oral registers offend “pure classical taste”, and the cohort of Figure 6.1 strike back, “elevating” the tone of discourse to the formal gravity of the written page: many nouns, many adjectives, and as few inflected verb forms as possible (Figures 6.3-6.4-6.5).40

40 The high frequency of nouns and adjectives takes us back to the “grammatical bigrams” discussed at the end of section 4: the “adjective-adjective”, “proper noun-noun”, “noun-adjective” word pairs. By combining those results with what has emerged in this section, we can finally solve the paradox of texts with high redundancy at the level of bigrams, and high variety at that of type-token ratio. The “labeling” function of bigrams like “count Goldstein” and “uncle Gerard”, or the cliché-like loquacity of “iron will” and “clever little”, can easily repeat themselves in the course of the novel, thus raising redundancy as measured at that scale; but even a mediocre writer is unlikely to repeat “clever little” within a 1,000-word window, thus leaving type-token ratio quite high. And the opposite will happen with the “determiner-noun” or “preposition-determiner” bigrams that are typical of canonical texts: as “the” is the most frequent word in English, it will inevitably repeat itself dozens of times in a 1,000-word segment, thus lowering its lexical variety; but since the noun next to the article can easily vary, redundancy at the level of bigrams will remain relatively low. 41 “First, Venus, queen of gentle devices! taught her prototype, lady Arabella, the use of feigned sighs, artificial tears, and Studied fainting: while Aesculapius descended from Olympus, and, assuming the form of a smart physician, stepped out of an elegant chariot, and on viewing the patient, after three sagacious nods, whispered to the trembling aunt, that the young lady’s disorder, being purely mental, was beyond the power of the healing art. Reduced to the dire alternative of resigning the fair sufferer to a husband or to the grave, the relenting lady Madelina did not long hesitate.” (Jane West, A Tale of the Times, 1799).

12%

34%

11%

Median %Adjectives of 1000-word slices for text

32%

30%

28%

26%

24%

22%

10%

9%

8%

7%

6% 20%

5%

18%

16%

4%

14% 0.32

0.34

0.36

0.38

0.40 0.42 0.44 0.46 0.48 Median TTR of 1000-word slices for text

0.50

0.52

0.54

0.56

0.32

0.34

0.36

0.38


Figure 6.3 (top left)Type token ratio and nouns

0.50

0.52

0.54

0.56


Figure 6.4: (top right) Type token ratio and adjectives

22%

Figure 6.5: (bottom right) Type token ratio and verbs In Hawker’s Gibraltar passage, in Figure 6.2, adjectives (and participles) are three times as frequent, and inflected verbs three-four times less frequent than the average in nineteenth-century fiction. By contrast, the Adam Bede passage in Figure 5.4 contains only four nouns and one adjective – “hungry” – in 75 words.

21%

20% Median %Verbs of 1000-word slices for text

39 Underwood and Sellers, p. 14.


36%

Median %Nouns of 1000-word slices for text

So far, we have explained the affinity between high type-token ratio and the written register as the result of, loosely speaking, stylistic and ideological choices. But there is also a more neutral, “functional” reason for their correlation. In the findings of corpus linguistics, maximum lexical variety is consistently associated with news: a discourse which needs “an extremely high density of nominal elements”, the Longman Grammar points out, in order to “refer to a diverse range of people, places, objects, events, etc.” (53-54). There is a double source for lexical variety in news: the first is the necessary specificity internal to each distinct news item; the second, the utter discontinuity between one item and the next: as each article or correspondence begins, repetition is “reset” near zero, and type-token ratio can rise accordingly. This twofold logic returns in fictional texts with high type-token ratio: they include plenty of disparate materials, and further accentuate their diversity by using a plurality of generic forms. Jane West, six of whose novels are in the corpus’ top 3% for type-token ratio, quotes poetry in 17 of her 24 topranked segments; in the absence of poetry, she turns to elaborate metaphors (“expect a fearful tempest to arise, which will clear the tree of its unsound branches”), and even pastiche.41 William North’s introduction to The Impostor – half literary criticism, half apologia – discusses a wide range of topics, and includes an excursus on…the wide range of topics he has decided to


38%

19%

18%

17%

16%

15%

14%

13%

12%

0.32

0.34

0.36

0.38


0.50

0.52

0.54

0.56

11

insert into his “romance.”42 Thomas Hope turns to political prophecy,43 Lewis Wingfield to a half-parodic architectural digression,44 Edward Duros to erudite antiquarianism,45 Edward Hawker to naturalistic instruction ... But enough examples. It was time for some final reflections.

III. Large-Scale Dynamics in the Literary Field It is not easy, “concluding” a project that had strayed so far from its original aim. We began with canon and archive as our objects of study, and with redundancy and type-token ratio as the means to investigate them; but then, the relationship between means and ends silently reversed itself: canon and archive moved to the periphery of our discussions, while redundancy and type-token ratio were increasingly occupying their center. There was nothing planned about this switch; for quite a while, we didn’t even realize it had happened. But we were spending month after month wondering what bigrams 42 “By introducing literary criticism, satire of political and social evils, and popular illustrations of interesting facts in science, I have hoped to add to the interests of a romance, in which I trust no deficiency of adventure, plot, and carefully developed character will be found. But the day has gone by for mere fashionable novels. The age is utilitarian, and even novelists (the poets of present times) must conform to the mode” 43 “The time is at hand when all the tottering monuments of ignorance, credulity, and superstition, no longer protected by the foolish awe which they formerly inspired, shall strew the earth with their wrecks! Every where the young shoots of reason and liberty, starting from between the rents and crevices of the worn-out* fabrics of feudalism, are becoming too vigorous any longer to be checked: they soon will burst asunder the baseless edifices* of self-interest* and prejudice, which have so long impeded their growth. Religious inquisition, judicial torture, monastic seclusion, tyranny, oppression, fanaticism, and all the other relics of barbarism, are to be driven from the globe.” (Thomas Hope, Anastasius, or, Memoirs of a Greek, 1819). 44 “a stately entrance hall in the most fashionable quarter of the metropolis, embellished with lofty Ionic columns of sham Sienna marble; in front of each a magnificent bust of sham bronze by Mr. NoUekins* on a pedestal of scagliola. From a heavily stuccoed* ceiling, wrought in the classic manner, depend six enormous lanterns in the Pagoda style, wreathed with gaping serpents. Along three sides there are rows of “em pire*’’ benches, covered with amber damask, on which are lolling a regiment of drowsy myrmidons in rich liveries*. Passing these glorious athletes, you enter an ante-room choked with chairs, sofas, settees*, whose florid gilding is heightened by scarlet cushions. Very beautiful. (Lewis Wingfield, Abigel Rowe. A Chronicle of the Regency, 1883). 45 “The shield, slung to his neck, bore no emblazonry, and his open baronet and pennon-less* lance argued him neither to have undergone the clapham, or knightly box on the ear (!); nor the osculum pads, which more gently signified the chivalric brotherhood. He was, however, well mounted and perfectly armed. Judging from his simple habergeon, and a silver crescent which he bore, more in the way of cognizance than as his own device, he might be pronounced a superior retainer in the service of some great feudatory.” (Edward Duros, Otterbourne; A Story of the English Marches, 1832).

actually “meant”, and why on earth they managed to separate our texts as well as they did; later, once Allison and Gemma introduced the issue of oral and written registers, we spent even more time on type-token ratio, reading passages from unheard-of novels bristling with pound signs, asterisks, and words like “acclivities”, “laburnum”, and “commingling”. Strange. Why did we do that? Because we felt that working on type-token ratio would make us understand something about the “internal” forces – as distinct from the “external” ones discussed in section 3 – that shaped the literary field. It was another slippage in our object of study: the supposed line of demarcation between canon and archive – the diagonal slash still visible in our title – lost much of its interest, re-absorbed within a much larger landscape. With all due sense of proportion, there was a similarity with Bourdieu’s trajectory of forty years earlier: when, starting from a study of Sentimental Education, and of Flaubert’s position within nineteenth-century French literature, he developed a general framework where Flaubert was still present, but only as one element among many. The same here: canon and archive were still “in” the picture, with their differently colored markers; but now, the point of our diagrams consisted in throwing light on the literary field as a whole. A stylistic polarity exemplified by Eliot and Hawker no longer made us think of canon and archive, but of “oral” and “written” registers. The focus had shifted. Still, a major difference persisted, between our work and Bourdieu’s. For us, the sociology of the literary field cannot rest on sociology alone: it needs a strong morphological component. That’s why redundancy and (especially) type-token ratio had become so important: their mix of the quantitative and the qualitative was perfect for the morpho-sociology of fiction that was our ultimate goal. Retrospectively, we must admit that the goal has remained out of reach – though it has moved a little closer. Out of reach, in the sense that, where the correlation between morphology and social fate was strongest – the case of redundancy – the elusive nature of the morphological unit of bigrams made a causal chain difficult to establish; whereas, by contrast, where the trait allowed for a rich and explicit analysis – the case of type-token ratio – the correlation was weaker, and became undisputable only for extreme cases. At the same time, two phenomena which had become visible near those extreme cases – the intensity of characters’ voices near the lowest scores, and the topical miscellany of the narrator’s prose at the opposite extreme – had opened a new line of inquiry, where the quantitative-qualitative continuum re-emerged very clearly, and led straight to two key concepts of Bakhtin’s theory of the novel: polyphony, and heteroglossia (the “other languages” of consolidated extra-literary discourses, like politics, aesthetics, geography, architecture, etc.) Usually, these two notions are seen as closely related (and Bakhtin himself seemed to think so); but as Walser pointed out in our final round of discussions, our findings revealed that they were actually localized in opposite regions of the novelistic field: polyphony tendentially associated with canonical texts, and heteroglossia with forgotten novels. The proxim-

ity between heteroglossia and failure was especially arresting. For Bakhtin, when the novel comes into contact with other discourses, it creatively transforms them, appropriating their strength and reinforcing its own centrality within the cultural system. It’s as if, with heteroglossia, nothing could ever go wrong. But that’s exactly what happened with our small army of forgotten authors: the encounter with other discourses had a paralyzing effect, producing lifeless duplicates of non-fictional prose in lieu of dialogic vitality. As far as survival within the British literary system was concerned, it was a very bad choice. Heteroglossia as a potential pathology of novelistic structure, then? “There is no fact which is [...] pathological in itself”, writes Georges Canguilhem in his masterpiece on nineteenth-century conceptions of “normality”: “an anomaly or a mutation is not in itself pathological, they just express other possible forms of life.”46 If this thesis is right, what doomed Hawker and North and Duros was less the choice of heteroglossia in itself, than the fact that it occurred in an age and country – in an ecosystem – when the form of the novel was moving in the opposite direction: tightening its internal narrative bolts, rather than looking for inspiration in external discourses (as was still happening in other countries). Even Dickens, for all his Parliamentarese, wrote novels with an outstanding measure of “orality”. It was this specific historical conjuncture that made the “other languages” of heteroglossia bad for survival. On this point, a longer historical view can be of help. Some time ago, the classicist Niklas Holzberg, wrote an essay whose key cognitive metaphor – “the Fringe” – has left a deep mark on the study of the ancient novel.47 What Holzberg meant with his expression was that, around the extremely small cohort of Greek and Latin “novels proper”, a much larger group of texts existed, where novelistic traits were mixed with elements from other discourses (historiography, travel reports, philosophy, political education, pornography...), thus expanding the scope of what the novel could do. In the twenty centuries that followed – as the novel “proper” increased its productivity, diversified its forms, and raised its status within the general culture – the role of the Fringe correspondingly contracted, and scholars of modern literature have hardly ever bothered with the idea. But in fact, the Fringe has never ceased to exist: the writers in Figure 6.1 are its modern version, and their strange proliferation of topics is the typical sign of works situated on the border between the novel and other discourses. The real problem was that, in the meantime, the morphological function of the border – providing a favorable terrain for the encounter between the novel and other discourses – had become more uncertain. A century earlier, a novel engaging the nuances of spiritual au46 Georges Canguilhem, The Normal and the Pathological, 1966, New York 1989, p. 144. 47 Niklas Holzberg, “The Genre: Novels proper and the Fringe”, 1996, in Gareth Schmeling, ed. The Novel in the Ancient World, revised ed., Brill, Boston-Leiden 2003. 12

tobiography, the mechanics of letter-writing, or the discontinuity of “sensation” could still grow into a masterpiece, and spawn a successful subgenre: Pilgrim’s Progress, Pamela, Tristram Shandy, perhaps still even Waverley, had significant fringe-like traits. But in the course of the nineteenth century – probably as a consequence of the division of intellectual labor, which increased the distance between fiction and the social sciences, making their languages less and less translatable into each other – the role of heteroglossia within the development of novelistic form became problematic. It was this that decided the fate of those forgotten writers.48 Whether this also answers our initial question – on the archive changing our knowledge of literature – is not for us to say. What we can say is that, as the work proceeded, we found ourselves devoting more and more time to Ringrove, The Impostor, and Arthur Montague; and that, in a few lucky moments, we felt that these books were raising questions that, say, Adam Bede never would. A few lucky moments: it isn’t easy, keeping your focus on the archive. In part, it is the pull of well-known writers – the pull of what you already know – that draws you back to the beaten track. In part, it is the troubling nature of what forgotten authors force you to face: a vast wreck of ambitious ideals, very unlike the landscape literary historians are used to study. Learning to look at the wreck without arrogance – but also without pieties – is what the new digital archive is asking us to do; in the long run, it might be an even greater change than quantification itself.

conquering Egypt, was# unable to# extirpate, but contrived to# expel, thereby entailing on# Europe their# lawless and# unpopular posterity, so obnoxious to# the# proprietors of# the# localities they select for# their# temporary residences. I see a# column of# slow rising smoke Overtop the# lofty wood that# skirts the# wild. c# 2 A# vagabond and# useless tribe# there eat Their# miserable meal. A# kettle slung Between two poles upon a# stick transverse Receives the# morsel\*\*\*\*\* of# cock purloined From# his accustomed perch. Hardening race! They# pick their# fuel out of# every hedge, Which#, kindled with# dry leaves, just saves unquench*’d The# spark of# life#. The# sportive wind blows wide Their# flutt* `ring rags, and# shows a# tawny skin--. The# vellum of# the# pedigree they# claim. Great skill have they# in# palmistry, and# more To# conjure clean away the# gold they# touch, Conveying worthless dross into its place: Loud when they# beg, dumb only when# they# steal.’’ A# grove of# tall poplars* formed a# conspicuous object from# the# western look-out*; and# not far from# hence rose, up the# slope of# a# hill, a# dense extensive coppice, impervious to# the# eye, where the# lordly chief of# the# forest reared its# head proudly over its# arboreous companions, silently asserting its# supremacy; and# the# graceful beech, silvery ash, dark-green* spiral fir, Scotch larch, and# stunted hazel, were blended together, and# the# stream-wooing* willow dipped its# pensile shoots into# a# clear, gurgling stream, that# wound its# tortuous course along, its# sequestered, shady nooks pointing out# to# the# angler the# probable haunts of# the# hungry trout on# the# alert for# its# insect diet, and# snug spots# under the# gnarled roots of# undermined antique trees growing on# the# banks of# the# encroaching brook, hinting to# juvenile poachers, setters of# night-lines*, the# likely lurking-places* of# the# snake-like*, slimy eel. Situated in# a# dell, at no great# distance off, was# the# home-farm*, with# its# roomy barns, high granary, cow-sheds*, and# fowl-house*, on# entering# which#, perhaps the# cackling hen gave notice of# her sedentary occupation, or# the# outstretched neck of# the# hissing goose apprised you of# her# displeasure at# your approach. Without, probably the# clustering poultry, emitting their# various cries, surrounded you# without# alarm, expecting to# receive a# shower of# grain in# reward

for# their# courage and# confidence; a# turkey or# two#, may be fearing to# be# late for# the# fare, running greedily up# the# yard to# join the# rest, and# the# gaily-dressed* peacock condescending to# associate with# his# inferiors* on# the# occasion. Roaming about#, you# doubtless encountered the# sheepdog*, if# not# away#, attending on# his# fleecy charge in# adjoining pastures, the# nature of# his# bark denoting* delight oranger*, according to# his# knowledge or# ignorance of# your# countenance; and# in# passing the# sties*, you# naturally glanced at# the# swollen carcases* of# the# noxious inmates, lying in# their# miry beds surfeited* with# food, scarcely willing to# open# their# small eyes or# lift their# snouts* from# the# stone# troughs on# which# they# rested; a# short, low# grtint* perchance being the# only# acknowledgment of# their# consciousness of# the# presence of# a# visitor. The# farm-house* was# the# very picture of# rustic comfort-- a# model of# cleanliness and# neatness Within; and# the# brickwork* of# the# exterior almost totally hidden by# the# undying ivy that# clung tenaciously to# every# part, as# if# resolved not# to# separate from# a# pleasing acquaintance. A# primly clipped box-hedge* bounded it# on# one side#, and#, running# along# in# front#, was# a# wattled paling supporting a# mass of# white jessamine. The# dairy lay at# the# back, and# its# whitewashed walls were# always# well# garnished with# parallel tiers of# Stilton, sage, cream, and# other cheeses. Thus was# ensured a# certain supply of# creature comforts, contributing in# no# small# degree to# create that# full contentment that# pervaded the# household, where# food# was# abundant, beer and# cider plentiful, and# work light. A# few hundred yards from# the# farmstead was#’’ The# Retreat,’’ where# I#, Arthur Montague, (for# it# is fitting I# should begin to# speak in# propria* persona) was# wont to# pass many an# hour in# listless idleness, looking on# the# blooming landscape, listening to# the# humming bees, or# teasing a# pet jackdaw, who# poked his# head# between# the# bars of# his# wicker cage when# confined there# for# misconduct. The# said Retreat# was# an# elegant little two-roomed* Gothic cottage, plastered with# sparkling sanded cement

Figure 6.2 continued.

48 By the same token, from that moment on the masterpieces of heteroglossia – like Moby-Dick, or Ulysses – had to move increasingly away from the main axis of novelistic development, appealing less and less to non-academic readers. 13