Articles

Affichage des articles du 2017

Web scraping tips and tricks

I've listed various webscraping-related tips and tricks below. I have collected these throughout the years, hopefully you will find them useful. Tip #1 : don't do it Scraping should only be considered as a last resort. If the website from which you intend to extract information offers an API, use the API instead. It'll be easier to parse a nicely formatted JSON response then it would be to download an entire web page and go through verbose and sometimes malformed HTML just to extract a small piece of information. Tip #2 : check for mobile versions Mobile versions of websites tend to be lighter and more to the point, which makes them easier to scrape. Mobile websites also tend to be less reliant on Javascript than their desktop counterparts. Certain websites offer a mobile.* or m.* domain, while others simply redirect you based on your user agent. In this case, you might need to craft a specific user agent in order to fool the website into thinking you're on mobi

Does C++ have the array[x, y] syntax ?

The short answer is no. At least not out of the box. This happened a few months ago but I forgot to document it. I was reading the solutions of that [ Monday's r/dailyprogrammer challenge ] when I came across [ an elegant C++ solution ] that was posted by fellow user MrFluffyThePanda. Having written [ a C++ solution ] myself, I found it particularly interesting that MrFluffyThePanda's solution used a syntax I did not know C++ supported : int* field = new int[8,8]; The field array was later accessed with the field[x, y] syntax. The only language I know of that supports this syntax is C#, where they're referred to as "rectangular arrays". Was it one of those [ secret C++ gems ] that no one (read: me) knew about ? The answer was much simpler than that. My initial assumption was that the compiler translates field[x, y] into field[x * width + y], a method I sometimes use to avoid the headaches of working with multidimensional arrays in C. This was suggested by

Hotmail is having problems today

Outlook hasn't been functioning properly today. My Hotmail account can neither send nor receive emails, and after checking Google News, it appears that [I am not the only one] who's suffering from this issue. Earlier this afternoon, I messaged a seller on Avito via a form on their website in order to inquire about a certain product. Avito offers the possibility of emailing you a copy of the message, and although I checked that option in the form, I never received the email. At first I suspected it was a problem on Avito, or that maybe the email ended up in my spam folder. I then ran a few tests that consisted of sending emails between my main Outook account and my secondary GMail account, then between two Outlook email addresses, both from my mail client and from the web interfaces. Unfortunately, none of the emails were delivered. It was only after I turned to Google for help that I came across the article linked above. I'm only writing about this to let it be known

Re-adjusting out-of-sync subtitles.

Re-adjusting out-of-sync subtitles. Have you ever downloaded a .srt file only to find out that it is out of sync ? Even a small delay can be intolerable. Back in the day, I used VLC to readjust the subtitles in real-time. But since there wasn't a way (that I know of) to save the changes, I decided to look elsewhere. For a while I wondered how easy it would be to make a script for this simple task. After looking into it, it turned out, to my surprise, that the SRT file format is quite simple. It is composed of fragments that are formatted like this : N HH:MM:SS,mmm --> HH:MM:SS,mmm Actual subtitle It starts with a number N that identifies the fragment. This number starts from 1 and keeps incrementing for each subtitle that's displayed on screen. A line with two timestamps follows. These represent the time during which the subtitle will be displayed. Note that these are somewhat precise as they also include milliseconds. The third line contains the actual subtitle, an

Remapping an arbitrary combo to ALT TAB in (L)Ubuntu

My laptop has a missing Tab key. This has led me to some [interesting workarounds] , but I could never quite get the keyboard's behavior back to what it once was with these hackish solutions. I installed Lubuntu in dual boot with Windows 7 a few weeks ago. Lubuntu comes with Openbox, a customizable and lightweight window manager. I didn't feel like rewriting Keymapper for Linux since it relies on low-level hooks so I turned to Google in search for an alternative. I quickly discovered that it's possible to use some of the already-installed tools to accomplish what I needed. One such tool is Xmodmap. It lets you configure a mapping for a given pair of keys. In order to figure out the key codes I needed, I used xev, yet another utility whose purpose is to open a small window and print events on the terminal as they happen. Let's say you want to remap 1 (& on an azerty keyboard) to tab. Running xev and pressing 1 while the xev window is on focus produces the follo

Writing a naive keylogger in D

Image
I would like to start by saying that this is strictly for educational purposes and to demonstrate how to interact with the Windows API in D. The reason why I know how keyloggers work is that I once wrote [a tool to intercept pressed keys and in return, simulate key presses that are defined in a config file] . I had to do this because some of my keys stopped working and I thought it wise to reuse some of the already functional ones as a replacement. It should come as no surprise to you at this point that I am an extremely lazy individual, so it was natural for me to writer "keymapper" instead of actually getting the keyboard fixed. I only found out a year later that the keyboard was not broken at all. In fact, I had opened up my laptop to fix the power jack which resulted in me unscrewing a bunch of stuff, so when it came time to put it back together, I botched the operation and may or may not have inserted the keyboard connector correctly. By the time I was done, I noticed

Decrypting .eslock files

Image
I was feeling nostalgic the other day so I decided to take a trip down memory lane by browsing some old entries on my phone. To my surprise, I found a few encrypted files that I don't remember putting there. After trying a few passwords that made sense, it quickly became apparent that a mental dictionary-based brute force approach was not the way to go. I turned to Google in search of answers. The first pages mention an app that can decrypt .eslock files, and while it looked promising, I wasn't sure it would be smart to trust a black box of an app with data that I once deemed private enough to encrypt. I had to find another way. This blog post in particular caught my eye. It states that the password of an encrypted file is hashed with the MD5 algorithm then stored within the file itself ! Since MD5 is considered to be obsolete, surely I would be able to un-hash the password if I were to get my hands on it. With no pointers on how an .eslock file is structured, I figured I&

Levereging the power of flags in function arguments

I bet you have called functions like these at some point in your life. They seemingly accept parameters that are separated by the pipe (or bitwise OR) operator. PHP's file() function, for instance, accepts an optional third parameter that can have one or more values. It's not uncommon for me to call it with the FILE_IGNORE_NEW_LINES | FILE_SKIP_EMPTY_LINES flags. But how do they actually work under the hood ? It turns out that it's only a matter of simple bitwise operations. Yes, the pipe operator was a dead giveaway but still. Let's try to implement a simple example to demonstrate how it works. I'll be using D in the code below, but the same rules apply to languages that support bitwise arithmetic. If your language of choice doesn't support them then I recommend that you stop using Brainfuck. Here's a problem that sounds like it could have some real world applications. Imagine that you're making a run of the mill RPG game where the main characte