[Coding] Saturday Scripting: Parsing a YouTube Playlist
Posted by Khatharsis on May 12, 2018
I find great irony in writing a script on a Saturday (a day off) that resembles the functionality I have been working on for the past ~3 years. I didn’t attend Microsoft Build this year so I didn’t receive a paper printout of all of the sessions. I wasn’t able to find a PDF version. What I did instead was a little bit of manual work and a short snippit of NodeJS scripting to generate an Excel file for my own use to keep track of what videos I’ve watched. Total time: < 1 hr.
I should add I don’t have a [great] developer setup at home. Each time I start to set one up, I lose interest or, more often the case, time just vaporizes before my eyes, a few months go by, and I may as well start over. I went with what I had (I could have probably used Selenium or some other automated tool if it were installed), not wanting to fiddle around with installs for something that should be quick. So I’m running Node 0.10.30. Yes that’s correct. Zero-point-ten. I think I installed it to do a take-home problem for an interview a few years back. I was more surprised I had it on my computer.
All I really needed was the fs module to read files (3) and write to a file.
But first, I needed to grab the data. I loaded up the playlist page. In naive hope, I looked at the source code to see if I could find any data related to the playlist and there was, it just wasn’t complete. That was my first JSON file.
Only the first 100 videos of a playlist are loaded initially, then AJAX calls are fired to get the next 100. So, with Developer Tools open to the Network tab, I reloaded the page and scrolled down until the AJAX call was fired. I grabbed the data from that call–my second JSON file. Rinse and repeat to get my final JSON file.
I used a JSON formatter tool to help trim out the extra noise. I was really interested in a contents array containing playlistVideoRenderer objects. (It’s also really interesting to see the payloads and learn how I might better structure my own at work–similar to how we used to look at the HTML source code of webpages to learn how to do something cool someone else has figured out before everything got obscured by uglifiers.) From there, it was utilizing NodeJS’s fs module and built-in JavaScript capabilities.
Code below (Disclaimer: it’s quick and dirty code):
var fs = require('fs');
var readFormat1 = function(infile, outfile) {
fs.readFile(infile, 'utf8', function(err, data) {
if (err) {
console.log(err);
}
var obj = JSON.parse(data);
var playlist = obj.contents;
for (var i = 0; i < playlist.length; i++) {
var video = playlist[i].playlistVideoRenderer;
var title = video.title.simpleText;
var length = video.lengthText.simpleText;
var output = title + '\t' + length + '\r\n';
console.log(output);
fs.appendFileSync(outfile, output);
}
});
};
var outFile = './list.csv';
fs.truncateSync(outFile, 0);
readFormat1('./youTubeVideoList1.json', outFile);
readFormat1('./youTubeVideoList2.json', outFile);
readFormat1('./youTubeVideoList3.json', outFile);