Posts Tagged ‘Google’

Inbox 0!

Thursday, March 19th, 2009

I made it to Inbox 0 in GMail. Okay, so, it is kind of lazy. I mean, I was just thinking about what the 500+ emails in my inbox were doing? Not much. It was just a pile of read email. As new email came in, I would read it. But I just left it in the inbox. So moments ago I thought, hmm, so what is the “Archive” button in GMail for? Why, it is for this pile of read email in my inbox. The vast majority, in fact all of it at the time, had already been “processed”. Why was I keeping them? Some of them were there “just in case” I needed to refer back to them. But, then what is the “All Mail” archive for? It is there for the times, “just in case”, that I need to refer back to an old email. So, then, why not get the email out of the “Inbox”. So I did. I don’t lose anything by archiving the email. It is still there. (Thanks to Google’s vast storage. I am using 4% of 7.3 GB.)

Code reviews in Google Code

Wednesday, December 3rd, 2008

So I just released a new version of “littles3“, a project hosted at Google Code. I tried out the “Issue” feature; pretty neat. But what I really found cool was the “code review” features in the source code management. For instance, the source file that was changed in my latest release was to FileS3ObjectDao. I was able to have a diff from r21 (the old version) and r37 (the new version). It even let me comment right within the diff, in either the old or new version.

Google can sort

Saturday, November 22nd, 2008

Google recently announced that they were able to sort 1 terabyte (TB) in 68 seconds using 1,000 computers. The previous record holder was 209 seconds on 910 computers. I was impressed by this because I recently read about MapReduce and have been studying some of Google’s papers about the Google File System. Google used both MapReduce and the Google File System to attain this sorting record. But, being Google, they thought that since they did 1 TB so successfully, why not try sorting 1 petabyte (PB). (A petabyte is a thousand terabytes.) Google was able to sort 1 PB in six hours and two minutes and used 4,000 computers.

Why does Google care about sorting? One reason may be that their primary revenue source is based on advertising. And they have vast access to massive amounts of data submitted by their end users in the form of search queries. The more efficient Google is at crunching this information, the better they can target their advertising to users, resulting in more revenue. And Google can use their data for other purposes too, like predicting flu outbreaks.

I have been very impressed by what I have been reading about MapReduce and the Google File system. These sorting results help prove how efficient their infrastructure is. I particulary like how they use commodity computers to achieve these results. I know that using multiple nodes can get tricky very quickly. But their techniques seem to be designed from the ground up to use multiple nodes. And with this mindset, they can more adequately manage and utilize their collective computing resources.

What I’m reading: locks!

Friday, October 10th, 2008

I have been reading some of the papers published by the Google engineers. It started with Bigtable: A Distributed Storage System for Structured Data. I am not sure how I started. The Official Google Blog posted a link announcing their new technology round series. I watched the “MapReduce” discussion, where the engineers talked about Bigtable and how it is used in MapReduce. This lead me to look for more information about Bigtable as I was looking for information on distributed “communication” techniques to enhance the littles3 implementation. (The current littles3 architecture is very simple and only supports one node. It works, but doesn’t do any cool things like scale storage or be fault tolerant.) I had heard Bigtable discussed in different technical blog settings, but I had no idea that there was a paper from 2 years ago that described the Bigtable system. (I guess I don’t read the technical CS journals like I should. I may have to become more active in IEEE.)

While reading the paper (I did find it very readable. Okay, I am a computer geek. Fair warning.) I noticed that Bigtable, which is a highly scallable distributed database (not relational), used a “lock service” called Chubby. What is a “lock service”? Well, the The Chubby Lock Service for Loosely-Coupled Distributed Systems paper will tell you. I am currently reading this paper. (Again, this is from 2006! Where have I been?) Mike Burrows, the author of The Chubby Lock Service for Loosely-Coupled Distributed Systems, sprinkles humor into a computer science paper discussing Paxos, “a family of protocols for solving consensus in a network of unreliable processors”. What I found interesting is how the “lock service” is used to share information in a highly distributed system. The Bigtable implementation is a client of the “lock service” and uses it to elect a leader; the leader is the node that aquires the lock–only one node will get the lock. The “lock service” can also store small amounts of information, like metadata or configuration information, that a client application can read from the “lock service”.

Next up is the paper Paxos Made Live – An Engineering Perspective. This paper provides some details on how the Google team implemented Chubby, some of the history of the previous implementation, and some of the issues that they discovered implementation the Paxos algorithm.

Together, these papers provide some details of how Google has implemented highly distributed systems. So far, the information about Paxos has been very enlightening. And I am impressed with the way in which a “lock service” is used to coordinate communication and direct cooperation in a automated distributed network. It seems that they have created simple building blocks that together work in sometimes unique ways to make a complex system.

Keeping data secure in Google App Engine

Thursday, April 10th, 2008

While going through the “Getting Started” documentation provided for Google App Engine, I noticed something interesting in the “Using the Datastore” section. The datastore included in the App Engine is not a relational database, but it has some similarities. When querying the datastore, you can use GQL, which is similar to SQL. For instance:

greetings = Greeting.gql("WHERE author = :1 ORDER BY date DESC", users.get_current_user())

Notice the parameter replacement where “:1” is replaced with the value of “users.get_current_user()“. The documentation states:

Unlike SQL, GQL queries may not contain value constants: Instead, GQL uses parameter binding for all values in queries.

As Wikipedia points out, using a parameterized statement like this GQL parameter binding is one way to mitigate an SQL injection attack. The SQL injection is mitigated because the parameter value can consistently be properly escaped within the execution of the parameter binding. I find it very interesting that Google decided, in implementing GQL, to enforce the use of parameter binding. This must have been a conscious decision to help App Engine developers to make their apps more secure. I think that this is a good decision.

Worked through the Google App Engine “Getting Started” introduction

Tuesday, April 8th, 2008

I just finished trying out the Google App EngineGetting Started” introduction. I haven’t programmed in Python for a very long time. The introduction was pretty cool.

Except for the problem with Windows in the static file CSS example. I found a discussion about the issue by Googling “App Engine InvalidAppConfigError”. They have a simple work-around to get the sample to work. But it looks like there will have to be fix in the API for the problem to be resolved.

But all in all, this is a pretty neat framework. I look forward to playing with the SDK some more.

(And being a pilot, I am a bit biased toward the App Engine logo. You can see it at the home page. It is a jet engine with wings and a vertical stabilizer. 🙂 )

What Would Google Do? Sometimes it may be worth asking.

Tuesday, February 12th, 2008

I have seen a couple different posts pose the question, “What would Google do?” I am taking a different tact. I work for a large organization making web applications. The web applications are used by external clients, but there is a rather limited group of users. Limited as compared to an application that is designed for general use on the Internet.

So, sometimes there are challenges developing our applications. I try to use Google as inspiration for coming up with creative solutions to the challenges.

To start this column out, let’s begin with a common task: uploading data files. Assume that you have a feature that uploads a batch data file. How should the file be formatted? This is a situation where I might ask, “What would Google do?” There are a couple different data uploads that Google supports.

Google Base is a service that provides a data feed. Google Base provides a way to describe structured data that will be included in a Google search. Here is more information about the data feed.

Google Apps, Google’s hosted applications like email, word process, and spreadsheet, has a data feed for provisioning users. They call this a Provisioning API.

In upcoming posts, I plan to look deeper into both of these data uploads that Google supports to help answer the question, “What would Google do?”

My Google account was suspended yesterday

Wednesday, November 14th, 2007

My Google account was suspended yesterday afternoon. The gadgets on my Google home page stopped loading. I then tried to log in to GMail and upon logging in I got a message that my account was suspended. I don’t know why? I sent an inquiry to Google Accounts yesterday but haven’t heard back from them yet.

In the mean time, I have opened another account.

This is very strange…