To explore the role of Second World War propaganda posters in British culture, I think it’s really important to understand how these images are used on the internet today. To do this, I am testing ways of tracking the different web locations where the images have been published.
The most useful tool I’ve used so far is the Google Reverse Image scraper developed by the Digital Methods Initiative (DMI). This tool automates searches with Google’s search by image function. It allows me to enter a link to an image file and search Google for webpages containing visually similar images.
This is what it looks like when I search by image with ‘Women of Britain Come into the Factories’:
Using the DMI tool, this data is pulled in CSV form, which makes it wonderful to analyse and understand. Not only does the data include the URL of the webpage, but also the title, image size, date (for some pages) and description. After some detailed testing, I have decided on the following process for conducting searches with the Google Reverse Image scraper tool:
- Create a clean research browser following DMI suggestions
- Create a version of the image free of metadata, upload this to my own website and copy its URL
- Input this URL into the Google Reverse Image scraper tool
- Set the maximum number of results to 800 (Google will never allow more than 700 results to be pulled, so this ensures that the tool simply pulls as many results as possible)
- Set Local Google domain to ‘en’ (this ensures that only English language websites will be included)
- Set data range selection to ‘return articles published anytime’ (Setting a date range means that many webpages where dates are not available will not be included. I wish to include the widest range possible, even if dates are not available.)
- Tick ‘hide duplicate results’
- Click ‘Scrape Google Images’
I’ve now started to categorise the webpages by type, so that I can analyse what kinds of website these images appear on. This categorisation has been labour-intensive and has presented difficulties. For example, museum shop websites could be categorised as either ‘commercial’ or ‘museum’. Similarly blogs and educational sites which also contain an e-commerce element are also difficult to categorise. I decided to include a subcategory to take account of the distinctions.
Here is the distribution of categories for the ‘Women of Britain’ poster search:
Having completed the categorisation, it has become clear to me that I may need to use two different kinds of category: topic (the theme of the web page) and type (the function of the web page). So a page might be categorised as:
- Topic = War propaganda
- Type = Message board
Another form of analysis I have been trialling is network analysis. I realised from the data that some identical image files are used on multiple websites, so it seemed like a good idea to visualise the connections between image files and webpages. I edited the data to only include images that were used on more than one website and then uploaded this to Google Fusion Tables.
This is the visualisation I created. Each blue node is an image file, and each yellow connection is a page on which that exact image appears. It’s particularly interesting to see the connections between images and individual Pinterest pages, which account for the nodes with the highest number of connections.