Retrieve fulltext and metadata for all editions of one project
Task:
We want to analyse the fulltext content of all editions for one project.
We want to have access to the edition metadata.
As the there may be lots of editions we split the job by paging the search results.
from tgclients import TextgridSearch, TextgridSearchRequest, Aggregator, TextgridConfig
from tgclients.config import DEV_SERVER
###
# prepare textgrid clients which are configured to use the dev instance,
# because aggregating fulltext of editions may put high load on the aggregator
###
config = TextgridConfig(DEV_SERVER)
tgsearch = TextgridSearch(config)
aggregator = Aggregator(config)
###
# choose a project ID, look at https://sandbox.dev.textgridrep.org/projects for inspiration
###
#project_id = 'TGPR-1789bd93-99c5-58e4-e100-619e27ec1119' # Keine Wahlwerbung (1 Ed.)
project_id = 'TGPR-f3e628ae-74b8-2ebb-9fee-614c59c9b522' # Distant Reading – 2021-09-23 (8 Ed.)
#project_id = 'TGPR-ca80b39a-5487-27ee-1289-6294a25f975a' # Goethes Farbenlehre (51 Ed.)
#project_id = 'TGPR-44684af6-1d30-b6d0-3665-62a87b5380b7' # CoNSSA (219 Ed.)
#project_id = 'TGPR-372fe6dc-57f2-6cd4-01b5-2c4bbefcfd3c' # Digitale Bibliothek (93462 Ed.)
###
# start is the pointer which gets incremented, starting with 0
# limit is the number of search results to retrieve at once
###
start = 0
limit = 10
nextpage = True
while nextpage:
###
# filter for all editions in the chosen project
###
results = tgsearch.search(
filters=[
'project.id:'+project_id,
'format:text/tg.edition+tg.aggregation+xml'],
start=start, limit=limit)
for result in results.result:
edition_uri = result.object_value.generic.generated.textgrid_uri.value
edition_agent = result.object_value.edition.agent[0].value
edition_title = result.object_value.generic.provided.title[0]
# edition metadata
print(edition_agent + ' - ' + edition_title + '\n')
###
# aggregate all text content of all children of this edition as plaintext
###
fulltext = aggregator.text(edition_uri).text
print(fulltext[0:100])
print("---\n")
# incremet the start counter for the next run
start = start + limit
if start > int(results.hits):
# stop if there are no more results left
nextpage = False
print('\n+------+\n| DONE |\n+------+')
Frances Trollope - The Life and Adventures of Michael Armstrong
PREFACE.
When the author of "Michael Armstrong" first determined on attempting to draw the attention
---
Otto, Louise - Nürnberg. Zweiter Band
Erstes Capitel
Gobelins
Die kalten Strahlen einer halbverschleierten Wintersonne brachen sich auf de
---
Christ, Lena - Mathias Bichler
Im Weidhof
Meine Kostmutter hat mir gesagt, daß ich am vierten Sonntag nach der Erscheinung des Herr
---
Carroll, Lewis, 1832-1898 - Alice's Adventures in Wonderland
ALICE’S ADVENTURES IN WONDERLAND
By Lewis Carroll
THE MILLENNIUM FULCRUM EDITION 3.0
CHAPTER I. Down
---
Nesbit, E. (Edith), 1858-1924 - The Red House
THE RED HOUSE A Novel
BY E. NESBIT AUTHOR OF “THE TREASURE “THE WOULDBEGOODS,”
ILLUSTRATED BY A.I. K
---
Fontane, Theodor - Unterm Birnbaum
Erstes Kapitel
Vor dem in dem großen und reichen Oderbruchdorfe Tschechin um Michaeli 20 eröffneten
---
- The Mayor of Casterbridge: The Life and Death of a Man of Character
THE MAYOR OF CASTERBRIDGE
by Thomas Hardy
1.
One evening of late summer, before the nineteenth centu
---
Willkomm, Ernst Adolf - Weisse Sclaven oder die Leiden des Volkes
Erster Theil
Erstes Buch
Erstes Kapitel.
Der Haidekretscham.
Ein ansehnlicher Theil der beiden Lausi
---
+------+
| DONE |
+------+