The Hundred Minute Hack: June 2016

Suppose you were to open up an image file in Python 3. Even without knowing much about image formats or binary files, what's possible here? Let's see what we can find out about these files. Let's also see what it says about learning strategies.

Some General Ideas

Reading binary files isn't much different than reading text. One difference is that you open it with "rb" for "read binary" instead of "rt" for "read text". If the file in question is small enough, you can grab the data with a one-liner like this....

raw_data = open(file_path, "rb").read()

One issue with managing bytes is that there is sometimes text in the mix of it worth considering. Normal text operations often won't play well with bytes. For that reason, we decode the bytes so that we have actual text strings to work with.

text_data = byte_data.decode("ascii", errors="ignore")

And yes, a lot of data in the source binary won't translate nicely to text. Right now, we don't care about that stuff so we ignore it.

What about going the other direction? Sometimes, a text string needs to be a pile of bytes. So we want to encode this.

byte_data = text_data.encode("ascii")

Do you need to do encoding and decoding everywhere you go? Not necessarily. It's just a generally good thing to understand the idea of working in both directions. Now, with those ideas established, let's poke around with some image files.

Identifying Image File Formats (sort of)

Here's a thought. You can dump bytes from an image file straight to the console. A good Linux shell environment is handy like that. Heck, you could even page through the mess if you felt so inclined. For example, you can do this.....

cat pinkie1.png | less

From that, you get something that looks like this....

While most of the data isn't readable, there are bits of text there. Also to note is that PNG, GIF, and JFIF (aka JPEG) appear to have something in common. The first instance of text that shows up seems to consistently identify the image format. So that's useful.

Finding text looks like a job for regex. But wait a minute! This is bytes. Regex is meant to be used with text. No problem. We'll just decode the bytes we can and toss the rest. The result is a function that looks like this.

So if you put in image_format("pinkie1.png"), you get PNG as the result. Pretty nifty.

But also pretty useless. This "pick-the-first-ascii-text-block-you-find" approach has only been seem to work with samples of PNG, GIF, and JPEG files. It bombs on TIFF, Windows BMP, and probably several other image formats too. Oh well.

Stirring Up Curiosity Is Its Own Reward

Making a turd like this one did get me thinking. How would the pros do it? There is this function in the official Python 3 libraries called what from the imghdr module. How, exactly, did they do it? Oh wait! There is source code out there. Here's an unofficial Github resource right here. This implementation is good reading and beautiful in its simplicity.

This is a good learning strategy for a lot of things. You poke around with no particular goal. You stumble upon something. You dive in. You screw it up. You step back. You see how the experts handle it. Lather. Rinse. Repeat. It's both educational and a lot of fun. So go do that!

Okay, so here's the project on my mind. Create and provision a local Fumblina server. Deploy a simple web app to it. Time the setup process. Why? Because Devops!

The key tools in use are as such...

64-bit Linux Mint
Virtual Box 4.3
Ansible 1.5
Vagrant 1.7
Git 1.9

Technically, this whole arrangement works just as well with a standard Ubuntu desktop distro with Virtualbox 5. But whatever! We hit the stopwatch start button and off we go.

The Server Setup

We need a local server. There's a project called "basemachine" on my Github that would help. So that gets git cloned over and started up like this...

cote@mymachine ~/PycharmProjects $ git clone https://github.com/pcote/basemachine.git

Cloning into 'basemachine'...

remote: Counting objects: 14, done.

remote: Compressing objects: 100% (3/3), done.

remote: Total 14 (delta 0), reused 0 (delta 0), pack-reused 11

Unpacking objects: 100% (14/14), done.

Checking connectivity... done.

cote@mymachine ~/PycharmProjects $ cd basemachine/

cote@mymachine ~/PycharmProjects/basemachine $ vagrant up

Conveniently, there is a Vagrant box image installed locally so that saves us some time. But still, it is going to take a few minutes to make and provision this server. On to other things while this job bakes.

A Skeleton Web Application

While we're busy provisioning, there's a generic web application skeleton in Github to get. We're also going to need the contents copied into an actual project folder called "DomEditor". Here it goes!

cote@mymachine ~/PycharmProjects $ git clone https://github.com/pcote/appskeleton.git
Cloning into 'appskeleton'...
remote: Counting objects: 38, done.
remote: Compressing objects: 100% (24/24), done.
remote: Total 38 (delta 6), reused 36 (delta 4), pack-reused 0
Unpacking objects: 100% (38/38), done.
Checking connectivity... done.
cote@mymachine ~/PycharmProjects $ cd appskeleton/
cote@mymachine ~/PycharmProjects/appskeleton $ cp -R * ../DomEditor/
cote@mymachine ~/PycharmProjects/appskeleton $ cd ../DomEditor/
cote@mymachine ~/PycharmProjects/DomEditor $

We switch over to that and get app-specific things setup. There's a folder in the Ansible role that needs renaming by the way. It needs to match the name we gave this app.

cotejrp@mymachine ~/PycharmProjects/DomEditor/roles $ mv myapp domeditor

Very good. So now, we deal with configuration with the deployvarsdev.json file.

{

"hostname": "all",

"domain":"domeditor.com",

"appname":"domeditor",

"mysql_root_password":"rootPassword",

"proxyport":3032

}

Careful for that mysql_root_password by the way. That variable has to be the same here as it is in basemachine's Vagrantfile. Yes, the setup is stupid and the root password should be established in one place rather than two. For the time being, we just accept that the value must match lest the Ansible playbook barf on you later.

Oh yeah.... The /etc/hosts file needs a domain name alias to localhost. That's going to have to match the domain variable set in our json file. So this line goes into hosts....

127.0.0.1 domeditor.com

Time to check on the base machine provisioning. Ding! It's done. Time to move forward with application deployment.

Application Deployment

Deployment needs one more thing. Time to open up provisiondev.sh for a little editing...

#!/bin/sh

basemachine=/path/to/base/machine

inventory=$basemachine/.vagrant/provisioners/ansible/inventory/vagrant_ansible_inventory

privatekey=$basemachine/.vagrant/machines/default/virtualbox/private_key

ansible-playbook -u vagrant -i $inventory --private-key=$privatekey --extra-vars="@deployvarsdev.json" playbook.yml

See that basemachine variable near the top? We replace that with the path to wherever we git cloned the basemachine folder to in the first place. Something like this does the trick nicely.

basemachine=$HOME/PycharmProjects/basemachine/

And there's that. Time to run this deployment script....

cote@mymachine ~/PycharmProjects/DomEditor $ ./provisiondev.sh

A couple seconds pass. Script completes. Open up this url in a browser...

http://domeditor.com:8080/

And once we see the message "This is an angularjs stub page." in the browser, we know we're done.

Total Time Elapsed: 4 minutes 39 seconds.

Could it be faster? Maybe. One of these days, I'll have to figure that one out.

Digression: What Does Devops Mean Anyways?

A conversation with a friend raised the question of exactly how one defines the term "devops". It depends on who you ask. I can think of it in terms of management reality, my reality, and wishful thinking.

For a manager, it's about improvement in team dynamics. Developers are paid to change I.T. systems. Operations people are paid to keep everything stable. These job definitions naturally put these two camps at odds. The idea of devops in this context is to get developers and ops people on the same page to reduce the conflict.

For a solo act like myself, devops is about self-reliance. There are no operations people in my world. Naturally, it is my responsibility to set up, modify, and maintain those servers myself. Devops tools like Ansible and Vagrant just happen to make that business easier for me.

Lastly, there is the wishful thinking vision of devops. It's that dream of sitting down in a quiet environment to work. You check out the project from the local Git server. You do a "vagrant up". Bang! There's a virtual version of the same server as what ops is managing in production. That's what you get to develop against. It sure would make change change control discussions easier. One can dream.

The Hundred Minute Hack

Thursday, June 9, 2016

Using Python To Play With Binary Files