CSV -> TTL

View illustration

We use Python as our backbone, leaning on libraries like pandas and rdflib to turn CSVs into a unified RDF graph. First, we read a tiny “prefixes” CSV to register all our namespaces so every URI stays short and consistent. Then we grab every other CSV in the folder, load each into a pandas DataFrame, and treat each row as a record: we build a subject URI from a base namespace plus a key field, map each column name to an RDF predicate (either one of our registered namespaces or a standard vocabulary), and smartly decide whether each cell becomes a typed literal (for numbers or dates) or a URI reference. All triples flow into one in-memory rdflib.Graph, which we finally serialize to a neat, human-readable Turtle file.

XML -> TTL

View illustration

For the XML-to-Turtle converter, we rely on Python’s lxml for parsing and a bit of re regex magic to normalize any numeric XML IDs into valid names. We walk through TEI scenes and global elements—capturing story mentions, place names, and quotes—using XPath queries that respect namespaces. With our prefixes in place (like val:, schema:, frbr:), we stitch together RDF triples representing the overall work and each scene’s parts, mentions, and citations. A quick clean-up step replaces trailing semicolons with periods, and we dump the final Turtle text to disk.