I made it to Inbox 0 in GMail. Okay, so, it is kind of lazy. I mean, I was just thinking about what the 500+ emails in my inbox were doing? Not much. It was just a pile of read email. As new email came in, I would read it. But I just left it in the inbox. So moments ago I thought, hmm, so what is the “Archive” button in GMail for? Why, it is for this pile of read email in my inbox. The vast majority, in fact all of it at the time, had already been “processed”. Why was I keeping them? Some of them were there “just in case” I needed to refer back to them. But, then what is the “All Mail” archive for? It is there for the times, “just in case”, that I need to refer back to an old email. So, then, why not get the email out of the “Inbox”. So I did. I don’t lose anything by archiving the email. It is still there. (Thanks to Google’s vast storage. I am using 4% of 7.3 GB.)
Posts Tagged ‘Google’
So I just released a new version of “littles3“, a project hosted at Google Code. I tried out the “Issue” feature; pretty neat. But what I really found cool was the “code review” features in the source code management. For instance, the source file that was changed in my latest release was to FileS3ObjectDao. I was able to have a diff from r21 (the old version) and r37 (the new version). It even let me comment right within the diff, in either the old or new version.
Google recently announced that they were able to sort 1 terabyte (TB) in 68 seconds using 1,000 computers. The previous record holder was 209 seconds on 910 computers. I was impressed by this because I recently read about MapReduce and have been studying some of Google’s papers about the Google File System. Google used both MapReduce and the Google File System to attain this sorting record. But, being Google, they thought that since they did 1 TB so successfully, why not try sorting 1 petabyte (PB). (A petabyte is a thousand terabytes.) Google was able to sort 1 PB in six hours and two minutes and used 4,000 computers.
Why does Google care about sorting? One reason may be that their primary revenue source is based on advertising. And they have vast access to massive amounts of data submitted by their end users in the form of search queries. The more efficient Google is at crunching this information, the better they can target their advertising to users, resulting in more revenue. And Google can use their data for other purposes too, like predicting flu outbreaks.
I have been very impressed by what I have been reading about MapReduce and the Google File system. These sorting results help prove how efficient their infrastructure is. I particulary like how they use commodity computers to achieve these results. I know that using multiple nodes can get tricky very quickly. But their techniques seem to be designed from the ground up to use multiple nodes. And with this mindset, they can more adequately manage and utilize their collective computing resources.
I have been reading some of the papers published by the Google engineers. It started with Bigtable: A Distributed Storage System for Structured Data. I am not sure how I started. The Official Google Blog posted a link announcing their new technology round series. I watched the “MapReduce” discussion, where the engineers talked about Bigtable and how it is used in MapReduce. This lead me to look for more information about Bigtable as I was looking for information on distributed “communication” techniques to enhance the littles3 implementation. (The current littles3 architecture is very simple and only supports one node. It works, but doesn’t do any cool things like scale storage or be fault tolerant.) I had heard Bigtable discussed in different technical blog settings, but I had no idea that there was a paper from 2 years ago that described the Bigtable system. (I guess I don’t read the technical CS journals like I should. I may have to become more active in IEEE.)
While reading the paper (I did find it very readable. Okay, I am a computer geek. Fair warning.) I noticed that Bigtable, which is a highly scallable distributed database (not relational), used a “lock service” called Chubby. What is a “lock service”? Well, the The Chubby Lock Service for Loosely-Coupled Distributed Systems paper will tell you. I am currently reading this paper. (Again, this is from 2006! Where have I been?) Mike Burrows, the author of The Chubby Lock Service for Loosely-Coupled Distributed Systems, sprinkles humor into a computer science paper discussing Paxos, “a family of protocols for solving consensus in a network of unreliable processors”. What I found interesting is how the “lock service” is used to share information in a highly distributed system. The Bigtable implementation is a client of the “lock service” and uses it to elect a leader; the leader is the node that aquires the lock–only one node will get the lock. The “lock service” can also store small amounts of information, like metadata or configuration information, that a client application can read from the “lock service”.
Next up is the paper Paxos Made Live – An Engineering Perspective. This paper provides some details on how the Google team implemented Chubby, some of the history of the previous implementation, and some of the issues that they discovered implementation the Paxos algorithm.
Together, these papers provide some details of how Google has implemented highly distributed systems. So far, the information about Paxos has been very enlightening. And I am impressed with the way in which a “lock service” is used to coordinate communication and direct cooperation in a automated distributed network. It seems that they have created simple building blocks that together work in sometimes unique ways to make a complex system.
While going through the “Getting Started” documentation provided for Google App Engine, I noticed something interesting in the “Using the Datastore” section. The datastore included in the App Engine is not a relational database, but it has some similarities. When querying the datastore, you can use GQL, which is similar to SQL. For instance:
greetings = Greeting.gql("WHERE author = :1 ORDER BY date DESC",
Notice the parameter replacement where “
:1” is replaced with the value of “
users.get_current_user()“. The documentation states:
Unlike SQL, GQL queries may not contain value constants: Instead, GQL uses parameter binding for all values in queries.
As Wikipedia points out, using a parameterized statement like this GQL parameter binding is one way to mitigate an SQL injection attack. The SQL injection is mitigated because the parameter value can consistently be properly escaped within the execution of the parameter binding. I find it very interesting that Google decided, in implementing GQL, to enforce the use of parameter binding. This must have been a conscious decision to help App Engine developers to make their apps more secure. I think that this is a good decision.
Except for the problem with Windows in the static file CSS example. I found a discussion about the issue by Googling “App Engine InvalidAppConfigError”. They have a simple work-around to get the sample to work. But it looks like there will have to be fix in the API for the problem to be resolved.
But all in all, this is a pretty neat framework. I look forward to playing with the SDK some more.
(And being a pilot, I am a bit biased toward the App Engine logo. You can see it at the home page. It is a jet engine with wings and a vertical stabilizer. 🙂 )
I have seen a couple different posts pose the question, “What would Google do?” I am taking a different tact. I work for a large organization making web applications. The web applications are used by external clients, but there is a rather limited group of users. Limited as compared to an application that is designed for general use on the Internet.
So, sometimes there are challenges developing our applications. I try to use Google as inspiration for coming up with creative solutions to the challenges.
To start this column out, let’s begin with a common task: uploading data files. Assume that you have a feature that uploads a batch data file. How should the file be formatted? This is a situation where I might ask, “What would Google do?” There are a couple different data uploads that Google supports.
In upcoming posts, I plan to look deeper into both of these data uploads that Google supports to help answer the question, “What would Google do?”
My Google account was suspended yesterday afternoon. The gadgets on my Google home page stopped loading. I then tried to log in to GMail and upon logging in I got a message that my account was suspended. I don’t know why? I sent an inquiry to Google Accounts yesterday but haven’t heard back from them yet.
In the mean time, I have opened another account.
This is very strange…