You didn’t hear this from me, but HOW TO RIP MP3s FROM A WEBSITE
Ok, I probably shouldn’t be posting this, because I’d be pissed if someone ripped my content. But, this trick has served me some extraneous effort in ripping MP3’s or even other formats from certain websites (*cough* Grooveshark *cough cough* Pandora *cough*).
This approach works for everything from websites loaded in the browser using HTML5 or Flash, to custom applications (Adobe AIR, desktop applications, etc.).
Let’s do dis!
I can haz MP3z?!?
First and foremost, go grab WireShark, and install it.
OK, you’re done! … I keed I keed… Ok, so if you’re ran the application, you’ll realize there’s a BUNCH of crap that’s going across the screen, and that’s not really all that helpful. Real quick, what’s happening is the software is analyzing every packet (inbound and outbound) from a particular network device. If you haven’t, check it out… It’s pretty neat.
Obviously, the problem is that it’s hard to keep track of – so I’m going to give you some tips on how to minimize the data so it makes more sense.
First thing we want to set up, is exactly what network device to listen on, and also limit our port sniffing requirements. This reduces the amount of work that WireShark will do, and also in general limit the cruft of data spewing onto your screen.
- Open WireShark (duh!)
- Click “Capture Options”
- Select your Interface (aka, Network Device)
- In the “Capture Filter” text field, put “port 80″ – This will limit listening to port 80. Typically speaking, most HTTP data is done directly over port 80, but is subject to be up to the implementation. Meaning, if it doesn’t work or you don’t see anything, skip this part.
- Click START!
At this point, go browse the web. You should still see a bunch of crap come across the screen, but upon inspecting you should realize it’s only port 80 stuff. Almost there!
In order to limit the data shown on the screen we need to apply a filter.
- In the filter text box, put: http.content_type == “audio/mpeg”
This will only show MPEG content, which is what an MP3 is.
NOTE: After applying this filter, you will have to wait for the ENTIRE file to be streamed/buffered/loaded before you see anything on the screen. Most UI’s will indicate how much has been buffered, and once it reaches 100%, you should see a single entity PER stream.
- Once you see the entry on the screen, click it, and look at the panel showing you the hierarchy of the data stream.
- You should see a “Media Type” node with an expandable area. Simply click the header for this section, and you should see a popup stating it’s “Processing Packets.” (You can technically cancel this, but if you have problems with the next step, let it finish)
- Next thing to do is right click on the item, and choose “Export Selected Packet Bytes…,” and save the file to your disk.
- This is your content (mp3, wav, avi, etc.)… ENJOY!
- Sometimes, websites will utilize a Proxy, or a redirect system, to help thwart users attempting to look at source or using lamer tools to identify loaded content. In this case, you’ll sometimes see TWO entries for the same stream, but only ONE has the content you want.This is usually identifiable by a HTTP Response Code of 302 (redirect). You can limit results by appending && http.response.code != 302 to the filter. This might turn the input box yellow, but that’s just WireShark warning you that you MIGHT not see what you want… Although, you should.
- If you want to look at other types, you can append them to this string by doing something like:
http.content_type == “audio/mpeg” || http.content_type == “audio/wav”, etc., etc.
See this giant list of mime-types for a good resource of identifying the proper mime-type string to use.