Archive for ‘Data and Analytics’

September 3, 2011

Prediction API Part 2

Motivation

In my initial coverage of the Google Prediction API, I was very curious why Google would be so magnanimous as to open up this API for public use. This is a plausible answer from Google:

We do not describe the actual logic of the Prediction API in these documents, because that system is constantly being changed and improved. Therefore we can’t provide optimization tips that depend on specific implementations of our matching logic, which can change without notice.

An older prediction API

Based on some of the user comments in the Google group for the Prediction API, I would guess that it is one of the more difficult of all Google APIs to understand and use. Similarly, it will probably be challenging to get meaningful results. A great deal more information is available in the Prediction API developer guide. It includes an example (with detailed instructions): An application for movie recommendations.

Requirements

Google advises that all the following are prerequisite for using the Prediction API:

  • an active Google Storage account
  • an APIs Console project with both the Google Prediction API and the Google Storage for Developers API activated

And of course, a Google account! See getting started for further details.

Free but not forever

Nor is the Prediction API free of charge indefinitely. According to the initial terms, usage is free for all users for the first six months, up to the following limits per project:

  • Predictions: 100 predictions/day
  • Hosted model predictions: Hosted models have a usage limit of 100 predictions/day/user across all models
  • Training: 5MB trained/day
  • Streaming updates: 100 streaming updates/day
  • Lifetime cap: 20,000 predictions

This free quota expires at the end of the six month introductory period. The introductory periods begins the day that Google Prediction is activated for a project in the Google APIs console. Remember that charges associated with Google Storage must be included to figure total cost. Presumably this is an API that Google won’t be deprecating without replacement any time soon. However, there is a separate Terms of Service for the Prediction API, which does give Google the right to do exactly that. I think that is standard language though, as Google is not contractually bound to support a free, or even paid but unprofitable service unless explicitly specifically stated.

Summary

The Google Prediction API is probably best used as a sandbox. It may be helpful for deciding whether one wants to use machine learning for predictive purposes. If one decides to go ahead with this approach, there are probably more suitable alternatives than the Google Prediction API for an application intended for production use.

July 10, 2011

Prediction API

The recent release of the Google Prediction API Version 1.2 seemed oddly, well, magnanimous to me! Given the investment of intellectual capital and resources, I am surprised that Google would be so generous.  Allowing access to the Prediction API means that Google is giving access to its in-house machine learning algorithms to external users.

1939 Ford pick-up truck

1939 Ford pick-up truck will not likely use the Google Prediction API though other Ford products will

The official Google Code blog post, Every app a smart app, dated 27 April 2011, suggested many possible uses for the Prediction API. Some of the more interesting included:

The last item on the list has the potential, but not certainty, of causing serious privacy concerns. I’m guessing that customer feedback based on structured data is another potential use for the API.

I noticed that Ford Motor Company has plans for the Prediction API, specifically for commuters driving electric vehicles (EV). Apparently, there is a fair amount of “EV anxiety” due to limitation on range of travel. The Prediction API could be used to mitigate those concerns. AutoBlog is an online publication for automobile enthusiasts. It featured a great slide show demonstrating how Ford intends to make use of the Google Prediction API.

The Prediction API is available on Google Code. This is not the first release of the Prediction API. I’m uncertain whether versions before 1.2 were restricted in some way. (Google often grants API access to developers initially, and later, after ironing out any bugs or unexpected problems, opens the product to the public.)

Do be aware that a Google Storage account is required for access. Visit the Google API Console to get started.

February 1, 2011

Google DataWiki

I found yet another interesting Google Labs project for which there is just enough information available to be intriguing, yet not enough to provide a satisfying level of detail for the very curious!

A wiki for structured data

Google DataWiki offers a guestbook and list of recent datasets named

  • Issue
  • Twitter
  • Bloog
  • daytum
  • testwiki

According to the official Google DataWiki page

DataWiki is currently in testing; all data and current storage formats should be used for testing purposes only.

December 5, 2010

Very Basic Google Analytics

self made by Thisisborin9 in Microsoft PowerPoint

Made in MS PowerPoint via Wikipedia

I do not like video training. It is often difficult to hear and understand what the speaker is saying. Attention span wanders in fifteen minutes, or less, for most of us!  The material delivery pace doesn’t usually match mine. I often find myself going back and replaying a section over and over and over until I understand it. Or until I crash my browser, or annoy everyone around me! That’s why I prefer slides or annotated screen shots, whether Microsoft PowerPoint, Adobe Reader or SlideShare documents.

This presentation was posted on SlideShare. The topic is Google Analytics with a focus on Chrome browser users. Although the time stamp is 2007, it covers Google Analytics in its current, post Urchin format. Urchin was an earlier form of tracking offered by Google Analytics, and no longer supported.

I found it worthwhile. It consists of twenty very well annotated screen shots from a Google Analytics account, as viewed within the Chrome browser. I was able to spend as little, or as much, time as necessary on each image. I didn’t get sleepy and distracted, as I do when watching the Official Google Analytics channel on YouTube. The content is basic, and fundamental enough that it isn’t outdated, insofar as I could tell.

Tags:
November 15, 2010

Open Source Data Quality Tool

I was surprised to see Google enter an important area that it had not approached before: Data quality.

Google Refine 2.0 was released last week

Google Refine is an open source data quality and data integration tool.  DataQualityPro seemed impressed with RefineRefine is Google’s first “consumer” product* for  data quality.

Google Refine 2.0

Google Refine is a data quality app that runs in your browser

Refine is presented as a tool for especially messy data sets, with inconsistent content, mismatched formatting or units, and in dire need of clean-up for improved referential integrity.

Remember though: This is a free web app!  It isn’t SAS Data Miner. The comments in the DataQualityPro post make that clear. Have a look at them if you want to get an idea of what Refine’s benchmark performance might be. Some of the comments are funny. I suspect that later versions of Google Refine will focus on performance.

Synergies from a Google-built data quality tool

An obvious benefit will be ease of access to certain static databases such as latitude and longitude. Also, there should be fewer discrepancies due to inconsistently defined data formats when working with Google-maintained data sets. Compatibility with Google’s other open-source applications is interesting to contemplate, though not certain.

Google posted three, pleasingly brief (under 15 minutes each) “how-to” videos for Refine users:

  • Introduction
  • Data Transformation
  • Data Augmentation

This is the first of the series:

The other two are also available on YouTube.

If this is version 2.0, what was version 1.0?

I do not know if there was a Google Refine 1.0. Nor could I find any reference to Google deprecating an earlier version of Refine, which was somewhat odd. Perhaps version 1.0 was internal-use only.

Please leave a comment if you have any ideas!

UPDATE: June 2011

The predecessor to Google Refine 2.0, call it Google Refine 1.0 if you will, was Gridworks! Gridworks is a data quality tool that I associated exclusively with Freebase

Here’s some background: Freebase is a large open-use database which is designed for semantic as well as algorithmic or machine search. Gridworks was developed by Metaweb for use with Freebase. Google acquired Metaweb Technologies in late June 2010. I found the connection between Refine 1.0 and Gridworks only a few moments ago, while browsing through a Gridworks write-up on The Chicago Tribune data blog. It was dated 17 May 2010, before Google announced any intent to purchase Metaweb.

*There are other Google data quality projects such as BigTables. But BigTables is for “Big Data” or applications development, unlike Refine.