Affichage des articles du décembre, 2017

Web scraping tips and tricks

I've listed various webscraping-related tips and tricks below. I have collected these throughout the years, hopefully you will find them useful.Tip #1 : don't do itScraping should only be considered as a last resort. If the website from which you intend to extract information offers an API, use the API instead. It'll be easier to parse a nicely formatted JSON response then it would be to download an entire web page and go through verbose and sometimes malformed HTML just to extract a small piece of information.Tip #2 : check for mobile versionsMobile versions of websites tend to be lighter and more to the point, which makes them easier to scrape. Mobile websites also tend to be less reliant on Javascript than their desktop counterparts. Certain websites offer a mobile.* or m.* domain, while others simply redirect you based on your user agent. In this case, you might need to craft a specific user agent in order to fool the website into thinking you're on mobile. Others,…