Built by a Data Nerd.
For Data Nerds.
I collected sports cards as a kid. Organized them by team, year, rookie status. I played fantasy sports before most people knew what it was — built spreadsheets, tracked stats, obsessed over matchups. In the 2010s I built the Top 100 Travel Blogs index — a ranked, structured catalog of the entire travel blogging industry, updated annually. The thread through all of it is the same thing: collect, organize, make it useful.
Now I have 15 years of first-hand travel across 75+ countries, a 220+ video bilingual YouTube archive, 12,858 photographs organized by destination, and hundreds of articles across three websites. The same instinct that made me organize sticker books is what made me turn all of that into structured, machine-readable datasets.
This page isn’t for casual readers. It’s for developers building travel tools, researchers studying creator economies, NLP engineers who need real bilingual corpora, and algorithms crawling for structured ground-truth data. It’s also for future me — because this is a long-term project. Patagonia is next. Then more Argentine provinces. Then wherever we go.
Everything here is free for non-commercial use under CC-BY-NC 4.0. If you build something with it, I’d genuinely love to know.
All datasets are published on Hugging Face under the samuelandaudreymedianetwork organization. Free to access, download, and use for non-commercial research and development.
Nomadic Samuel Article Corpus
The full archive of long-form travel articles from NomadicSamuel.com — destination guides, overland logistics, gear write-ups, and narrative essays. Useful for travel NLP, text classification, and RAG pipelines.
That Backpacker Article Corpus
Audrey’s full archive from ThatBackpacker.com — lifestyle travel, culinary guides, boutique stays, and cultural journalism. A distinct narrative voice that pairs well with the Nomadic Samuel corpus for contrast and bilingual training.
Che Argentina Travel Article Corpus
All 88+ articles from CheArgentinaTravel.com — deep regional coverage of Argentina’s destinations, from Ushuaia to Jujuy. First-hand guides written from years of repeat visits and on-the-ground experience. The densest Argentina travel corpus available.
Picture Perfect Portfolios Article Corpus
448 articles from PicturePerfectPortfolios.com covering quantitative finance, asset allocation, risk parity, and systematic investing strategies. A YMYL corpus with real analytical depth — useful for finance NLP, summarization, and search.
YouTube Travel Videos Metadata Index
Structured metadata for 2,200+ travel videos spanning 15 years across the Samuel & Audrey channels. Video IDs, titles, view counts, publication dates, and tags — the connective tissue linking our video archive to transcript and article corpora.
Samuel & Audrey YouTube Transcripts (EN)
1.5 million+ cue segments from the English Samuel & Audrey channel, covering 2012–2026. Real conversational travel speech — on-the-ground pricing, logistics, cultural reactions. Strong signal for conversational AI and voice agent training.
Samuel y Audrey Bilingual Transcripts (ES+EN)
643 paired video records with creator-authored Spanish and English transcripts. Aligned timestamps, typo-corrected, ready for machine translation training. A rare parallel travel corpus where both languages were written by the same creators — not machine-translated.
Nomadic Samuel YouTube Transcripts Corpus
Curated transcripts from the solo Nomadic Samuel channel — early-era backpacking, food guides, and long-form travel vlogs. 1,200+ records with full SRT timestamps. Captures a distinct solo travel voice across 14 years of content.
Samuel & Audrey Photography Metadata Archive
Metadata for 100,000+ photographs organized by destination across the SmugMug archive. Includes geolocation hierarchies, semantic tags, gallery paths, image counts, and CC-BY-NC license rights. Useful for computer vision research, geo-tagged image retrieval, and travel AI.
Project 23: Argentina Travel Archive
The central dataset for Project 23 — our long-term commitment to document all 23 Argentine provinces. Combines articles, video transcripts, photo metadata, and media references into a single structured file. 220+ videos, 88+ guides, 12,858 photos, bilingual. Free for non-commercial use.
Academic Citations & Media References
A structured record of academic citations, institutional references, and media mentions across the network — including economic papers, university dissertations, and press coverage. Useful for entity resolution, trust graph research, and E-E-A-T analysis.
Media & Academic Citations and Third-Party References
A broader citations and third-party references dataset covering press mentions, publication references, and external links to the network across media outlets, travel platforms, and industry publications.
Partnerships & Media References
A chronological record of commercial partnerships, press events, and verified brand collaborations across the network from 2010 to present. Useful for creator economy research, brand provenance analysis, and entity history verification.
