Home » Articles » Steganalysis: finding hidden data in Images
Click Here To Hide Tor

Steganalysis: finding hidden data in Images

The art of hiding secret messages in innocuous looking objects, known as steganography was introduced in a DDW article last month, featuring a tutorial on a useful tool that embeds data (usually text messages) into JPEG images – without making it obvious to the naked eye.

Today I would like to discuss steganography’s counterparty, namely steganalysis – the art of uncovering hidden communication and exposing the use of steganography. Steganalysis is not to be confused with the more well known cryptanalysis which focuses on decrypting a message that has already been intercepted. When performing Steganalysis you do not know where to look, you can’t identify the channel of communication which could be a stego image (image carrying a hidden message) uploaded onto facebook, a torrent of a star wars flick (which could carry a massive amount of hidden data – even a small video) … the possibilities are endless.

In 2001, well known security researchers Niels Provos and Peter Honeyman released a study whereby a distributed computing framework was used to analyse two million images uploaded to eBay and one million from USENET archives. Sadly no hidden messages were found – however, this study was made 15 years ago and the availability of steganography tools has increased and steganalysis methods have become a lot more sophisticated.

It is important to note that steganalysis merely constitutes the decision on whether or not a particular object contains a payload i.e. hidden information. A particular steganography tool or method is know to be broken, if it can be steganalysed with a success rate higher that 50% i.e. better than random guessing. Of course researchers are trying to push detection performance far beyond 50%! Rémi Cogranne from UTT published a research paper in 2014 claiming to achieve a detection performance of JPEG image steganography between 80 and 93 percent, depending on the amount of data hidden in the image. Just like at an airport security screening, it is easier to detect someone trying to smuggle a machete as opposed to a pen knife.

Remi and Niels are by far not the only researchers focusing on steganalysis and new white papers are published constantly on not only image but also video and audio steganalysis. Unfortunately easy to use steganalysis tools seem quite hard to come by or are out of date. A search for ‘steganalysis’ on Github.com currently returns a dire list of 38 projects. This being said, there is no need to loose hope, because if we look at the steganography world, there are lots of high quality tools available that are easy to use such as Steghide, JSteg and JPHS to name a few that specialize on JPEG embedding alone.

Now that we have a better understanding of steganalysis lets look at how we might go about actually finding some hidden data in images.

  1. Choosing where to look

In order to steganalyse images, we need to first get our hands on them. People using steganography will try to make it hard for outsiders to find out exactly who they are communicating with. To achieve this, they would upload a stego image to a public social network or image sharing site. Lets choose twitter.com as our target.

  1. Crawling the target

Because browsing Twitter and downloading random images by hand will take forever, we employ something called a web crawler (otherwise known as scraper), a computer program that browses or ‘crawls’ through web pages systematically. Usually this is performed by search engines for web indexing purposes, but anyone can create their own crawler that performs some custom task. We could create a crawler using the Ruby programming language in combination with a HTML library such as Nokogiri to crawl Twitter and download any JPEG image files encountered. Of course we could target our crawler to say all followers of @torproject because we think those people are likely to use steganography. Our crawler should have no problem downloading all these peoples image uploads, which could easily go into the millions of images.

  1. Running steganalysis tool

Once the crawler has finished its job, it is time to find a suitable steganalysis tool. As most images on the web are in JPEG format, we are going to choose a tool that specializes on JPEG’s. It is always a good idea to use multiple tools that use different steganalysis techniques and combine the output to give a more objective result. Binghamton University’s Digital Data Embedding Laboratory has published state of the art JPEG steganalysis tools, however they need to be combined with an AI framework and the ones we have tested unfortunately only work on greyscale images.

An up to date and ready to use JPEG steganalysis tool would surely be very much welcomed by many enthusiasts.

  1. Extract hidden messages from suspicious images

Once we have identified the images that are likely to have hidden information (know as stego images), we will try to extract the hidden message. Because there are different mechanisms used to embed a message into an image, we need to try out all of them (or at least the most popular embedding techniques). Additionally it is likely that we will discover an encrypted message that will need to be hacked via dictionary attack (script that tries out millions of likely passwords) for example.

The art of steganography is far from mature and most people haven’t even heard of it. However the number of monthly downloads of Steghide (arguably the most popular tool for image steganography) has doubled from 3,237 to 7,479 between October and November 2016 possibly due to increase fear of surveillance following Americas election results (37% of downloads originate from the US according to sourceforge.net). Naturally an increase in steganography activity will spark further interest in steganalysis, so we can look forward to more developments and stego tools in the near future.

babysnoop – @babysn00p

3 comments

  1. I think that steghide’s algorithm can only be detected if you have the original image, which you usually don’t.

    Trying to make a tool that discovers a message is a losing battle.

    As you wrote, steghide’s messages are also encrypted so that tool wouldn’t be very useful even if it succeeded detecting message existence.

  2. Check out my steganalysis of Lee Harvey Oswald’s address book called The Oswald Code.

  3. Wow the errors in this article and comments, or at least the lack of opsec.

    First, with a .jpg you need to use your own image (then shred and delete it) or completely change another image so it is beyond compare with say photoshop, e.g flip horizontal, adjust the curves and run a filter that changes the entire composition of colors, etc. Now you have an original again.

    You can do the same with video, as people fool the YouTube automatic copyright AI. Adjust the audio pitch, add seconds of another video to the start, flip horizontal for parts of the vid. especially the beginning and then every random so often, etc.

    Once you have an original image or video, then you proceed to the next step of hiding the text message with whatever program you are comfortable with. You don’t even need a program if you are clever. Then change the file type to something innocuous, like a .log and place it in a folder with a ton of other similar files and program that uses those files.

    Finally, first compress the files into one giant archive, and then encrypt what you end up with – which is a total mess for anyone in the world to figure out. The password can be something that you don’t even know- one system admin I know uses memory muscle passwords (he does a dance on the keyboard with his fingers and knows how to repeat the gibberish, but can only do it with a keyboard in front of him -lol.)

    And finally, if you are really worried about a State Sponsered retrieval, invent your own secret code- that has been done for battles since time began.

    Pseudo random works too, if you implement it properly. Take an irrational number, and advance your alphabet by each digit to infinity! (Just find a digit to start, say the 9th digit of e) This way you can even give someone else that code and they can decrypt your password too without revealing it.

    If you have photoshop, every so many pixels of a certain hue or even a level of K in CMYK would be a bit. Get creative and don’t depend on proprietary software to keep your secrets!

Leave a Reply

Your email address will not be published. Required fields are marked *

*

Captcha: *