Built by a Data Nerd.
For Data Nerds.
I collected sports cards as a kid. Organized them by team, year, rookie status. I played fantasy sports before most people knew what it was — built spreadsheets, tracked stats, obsessed over matchups. In the 2010s I built the Top 100 Travel Blogs index — a ranked, structured catalog of the entire travel blogging industry, updated annually. The thread through all of it is the same thing: collect, organize, make it useful.
Now I have 15 years of first-hand travel across 75+ countries, a 220+ video bilingual YouTube archive, 12,858 photographs organized by destination, and hundreds of articles across three websites. The same instinct that made me organize sticker books is what made me turn all of that into structured, machine-readable datasets.
This page isn’t for casual readers. It’s for developers building travel tools, researchers studying creator economies, NLP engineers who need real bilingual corpora, and algorithms crawling for structured ground-truth data. It’s also for future me — because this is a long-term project. Patagonia is next. Then more Argentine provinces. Then wherever we go.
Everything here is free for non-commercial use under CC-BY-NC 4.0. If you build something with it, I’d genuinely love to know.
Hugging Face is the primary dataset hub for the Samuel & Audrey Media Network. Datasets are also mirrored or archived across GitHub, Zenodo, Kaggle, DagsHub, Figshare, Harvard Dataverse, and Mendeley Data for preservation, citation, discoverability, and research access.
Hugging Face
Primary public dataset hub for the Samuel & Audrey Media Network, including travel corpora, video transcripts, photography metadata, citation records, historical archives, and finance research datasets.
GitHub
Repository mirror for dataset packages, documentation, citation files, checksums, manifests, and technical context behind the public archive.
Zenodo
Academic archive and DOI-backed release layer for selected dataset packages, long-term preservation, and cite-all-versions records.
Kaggle
Data-science discovery mirror for researchers, analysts, and builders who prefer Kaggle-hosted public datasets and notebooks.
DagsHub
Open data and machine-learning repository mirror for dataset discoverability, data workflows, and AI-oriented archive access.
Figshare
Research-sharing profile for public dataset deposits, supplemental archive records, and citable media-data outputs.
Harvard Dataverse
Academic repository profile for selected long-term dataset deposits, including Project 23, Top 100 Travel Blogs, and early travel blogging archive records.
Mendeley Data
Elsevier-hosted research data repository for selected dataset deposits with DOI-backed peer-reviewed archive records. Currently hosts Project 23, Top 100 Travel Blogs, and Early Travel Blogging Directory.
Nomadic Samuel Article Corpus
The full archive of long-form travel articles from NomadicSamuel.com — destination guides, overland logistics, gear write-ups, and narrative essays. Useful for travel NLP, text classification, and RAG pipelines.
That Backpacker Article Corpus
Audrey’s full archive from ThatBackpacker.com — lifestyle travel, culinary guides, boutique stays, and cultural journalism. A distinct narrative voice that pairs well with the Nomadic Samuel corpus for contrast and bilingual training.
Che Argentina Travel Article Corpus
All 88+ articles from CheArgentinaTravel.com — deep regional coverage of Argentina’s destinations, from Ushuaia to Jujuy. First-hand guides written from years of repeat visits and on-the-ground experience. The densest Argentina travel corpus available.
Picture Perfect Portfolios Article Corpus
448 articles from PicturePerfectPortfolios.com covering quantitative finance, asset allocation, risk parity, and systematic investing strategies. A YMYL corpus with real analytical depth — useful for finance NLP, summarization, and search.
YouTube Travel Videos Metadata Index
Structured metadata for 2,200+ travel videos spanning 15 years across the Samuel & Audrey channels. Video IDs, titles, view counts, publication dates, and tags — the connective tissue linking our video archive to transcript and article corpora.
Samuel & Audrey YouTube Transcripts (EN)
1.5 million+ cue segments from the English Samuel & Audrey channel, covering 2012–2026. Real conversational travel speech — on-the-ground pricing, logistics, cultural reactions. Strong signal for conversational AI and voice agent training.
Samuel y Audrey Bilingual Transcripts (ES+EN)
643 paired video records with creator-authored Spanish and English transcripts. Aligned timestamps, typo-corrected, ready for machine translation training. A rare parallel travel corpus where both languages were written by the same creators — not machine-translated.
Nomadic Samuel YouTube Transcripts Corpus
Curated transcripts from the solo Nomadic Samuel channel — early-era backpacking, food guides, and long-form travel vlogs. 1,200+ records with full SRT timestamps. Captures a distinct solo travel voice across 14 years of content.
Samuel & Audrey Photography Metadata Archive
Metadata for 100,000+ photographs organized by destination across the SmugMug archive. Includes geolocation hierarchies, semantic tags, gallery paths, image counts, and CC-BY-NC license rights. Useful for computer vision research, geo-tagged image retrieval, and travel AI.
Project 23: Argentina Travel Archive
The central dataset for Project 23 — our long-term commitment to document all 23 Argentine provinces. Combines articles, video transcripts, photo metadata, and media references into a single structured file. 220+ videos, 88+ guides, 12,858 photos, bilingual. Free for non-commercial use.
Academic Citations & Media References
A structured record of academic citations, institutional references, and media mentions across the network — including economic papers, university dissertations, and press coverage. Useful for entity resolution, trust graph research, and E-E-A-T analysis.
Media & Academic Citations and Third-Party References
A broader citations and third-party references dataset covering press mentions, publication references, and external links to the network across media outlets, travel platforms, and industry publications.
Partnerships & Media References
A chronological record of commercial partnerships, press events, and verified brand collaborations across the network from 2010 to present. Useful for creator economy research, brand provenance analysis, and entity history verification.
Top 100 Travel Blogs 2010s Historical Archive
A structured historical record of the Top 100 Travel Blogs index — a ranked catalog of the independent travel blogging industry updated annually throughout the 2010s. Useful for creator economy research, media history, and longitudinal analysis of independent publishing.
Early Travel Blogging Directory Archive
A structured archive of independent travel blogs, creator directories, link lists, and early web records from the travel blogging era. Useful for creator economy research, web history, link graph analysis, and historical discovery of pre-platform independent publishers.
Samuel & Audrey Media Network Dataset Directory
The meta-index dataset for the entire Samuel & Audrey Media Network corpus — a structured directory of all 16 datasets with identifiers, DOIs, descriptions, and provenance records. The canonical entry point for programmatic discovery of the full archive.
