I was able to attend the Google Summer of Code (GSoC) Mentors Summit last weekend in sunny Mountain View, CA. I'd spent the previous few days working with a team to write a mentor's manual, so was full of ideas when it came time to create the actual sessions during the unconference.
The Mentors Summit is a great opportunity to mingle with the leaders in our many diverse communities. This year, the student participants were capped at 1000, and there were 150 participating open source projects mentoring them. Most of the projects were represented at the Summit.
I attended or presented at three sessions that I'll quickly summarize:
- Casablanca: This wasn't a presentation so much as a discussion. There's one room designed to be a salon - with lots of interesting gadgets, toys and clay. A group of about 20 of us talked about what they'd learned about mentoring that year, and strategies for getting the most out of students, and recovering from student and mentor failures. Some of the smaller project representatives were in awe of the level of discipline and organization of the larger projects. Several useful wiki templates were shared, as were best practices - like having scheduled, weekly meetings with all mentors and students, and requiring daily blog writing and clear deliverable dates for bits of code.
- Making our communities more welcoming. We arranged for a session to talk about bringing more diversity into open source projects - both gender diversity, as well as cultural. The list we came up with was general, but a good starting point for organizations new to exploring diversity issues:
- Build a reputation of being inclusive.
- Appreciate and recognize non-code contributions.
- Be nice to newbies!
- START YOUNG. Start going to middle schools and teaching computer classes.
- Do targeted outreach to the community you are interested in attracting.
- Tell about what open source does for the social good.
- Don't be invisible! Advertise what women are doing.
- Have personal contact with an individual.
- Have pictures that reflect diversity among your users and developers (other people like me use this software!)
- Pretty Pictures: How to create non-text based documentation. We talked about the different projects, their approach to producing pictures, diagrams, videos and audio forms of documentation. Many tools were discussed and listed in the session notes. We also talked about software we wished we had, and ways of transcribing video and audio (I suggested that we pipe through Google Voice!). I enjoyed hearing about projects like xWiki's screencasts, and efforts that GIS and video encoder projects had underway to produce non-text documentation.
Much of the rest of the time at the conference was spent discussing individual projects, new cool things that we could be doing (PL/Parrot!), and the successes each open source project had in incorporating new people into their projects.
Photo from http://www.flickr.com/photos/warthog9/ used under Creative Commons license BY-NC-SA 2.0
I'm at Google I/O at the Moscone Center in downtown San Francisco, and today was the first day. Everything was bustling:
The opening keynote started with Google CEO Eric Schmidt, and I was worried wondering how he would make over an hour be interesting. He only took a few minutes, then Vic Gundotra, VP of Engineering, led the rest of the keynote which had many presenters showing off various projects, starting with 5 major HTML 5 features already supported in Chrome, Firefox, Safari, and Opera:
Matt Waddell talked about Canvas, the very nice drawing & animation API with pixel-level control. Brendan Gibson of Backcountry.com used this at SteepandCheap.com and sister sites for the cool People on Site graphs (with a workaround for Internet Explorer which doesn't support Canvas yet). Also a quick demo of Bespin, an IDE in the browser.
Matt Papakipos showed off o3d, 3-D in the browser with just HTML 5, JavaScript, and CSS. Also the new <video> tag that makes video as easy as <img> is. Geolocation has come a long way with cell tower and wi-fi ID coverage over much of the globe.
Jay Sullivan, VP of Mozilla, showed off Firefox 3.5's upcoming features. Basically all of the above plus app cache & database (using SQLite) and web workers (background JavaScript that won't freeze the browser).
Michael Abbott, SVP of Palm, showed off their webOS 1.0 which uses HTML 5.
A good summary of the 5 big features of HTML 5 is in Tim O'Reilly's blog post about it.
Kevin Gibbs & Andrew Bowers of Google gave some numbers about Google App Engine: 200K+ developers, 80K+ apps. Coming in App Engine: background processing, large object storage, database export, XMPP, incoming email. He also showed off Google Web Toolkit a bit, with code written in Java that compiles down to JavaScript with per-browser tweaks automatically handled.
DeWitt Clinton, Tech Lead at Google, showed Google Web Elements, embeddable Google apps similar to the way YouTube & AdSense have always worked. Currently conversations, maps, search. A blog post by Tim O'Reilly gives more details about Web Elements.
Romain Guy, Software Engineer at Google, showed off Android's coming text to speech functionality. Then all attendees were told we'll be receiving a new Google Ion (aka HTC Magic) phone, the unlocked developer edition, with a SIM card for T-Mobile giving 30 days of unlimited 3G data & domestic voice so we can play with it. That was enthusiastically received. Certain attendees such as myself were hoping there'd be a discounted way to buy one at the conference, so this surprise worked out nicely. :) Various people wrote this up in more detail. Here's mine getting unpacked:
The rest of the conference was split into various tracks, and I stuck mostly with Google App Engine talks which were good. Most useful was Brett Slatkin's on using Datastore's list properties with separate entities just for lists that can be used just for their indexes in queries without serializing/deserializing the lists which avoids a lot of CPU overhead but is a little tricky to set up.
The after-hours party (dinner, music, silly video games, etc.) is now winding up, and a semi-drunk guy is walking around with a garbage can asking for laptops we want to throw away. I still need this one for a while longer, so I declined his helpful offer.
Rails optimization:
- Use eager loading (investigate the virtual attributes plugin)
- Avoid string callbacks
- Minimize view instances of the object and use template inlining. Objects passed through partials can add up and be expensive.
- Date is 16* slower than Time
- Use Date::Performance
- Avoid the string+= method, Use string<< method instead
- Compare like objects - comparing different types of objects is expensive.
- Use explain analyze
- Use any(array ()) instead of in()
- Push conditions into subselects and joins - postgresql doesn't do that for you.
- Buy more memory, optimize memory, set memory limits for mongrel (with monit)
- Competing for memory cache is expensive on a shared server (must avoid database in cold state)
- Use live debugging tools such as strace, oprofile, dtrace, monit, nagios
- Pay attention to load balancing
- Listen to yslow
- Inherently slow javascript functions are eval, DOM selectors, css selectors, element.style changes, getElementById, getElementByName, style switching.
Some final tips from the presentation were get benchmarks, use profiling tool like 'ruby-prof', optimize memory, pay attention to the garbage collection methods for the language, profile memory and measure! measure! measure!!!
Probably more important than the optimization details covered, the presentation served more valuable to remind me of the following:
Pay attention to all potential areas for optimization. As I've grown as a developer I've continued to add to my "optimization checklist".
When learning a new language, don't forget to pay attention to the the little details of the language. I should appreciate specific points that make a language unique from other languages, including inherently expensive functions.
Like other developers, sometimes I produce to meet the performance criteria, but I don't have the luxury to spend time examining every area for optimization. I'd like to spend more time throughout a project paying attention to each of these points on my optimization checklist - and always work on doing it better the second time around.
1) History of Rails Critics
David Heinemeier Hansson's keynote touched on how it's interesting to look back at some of the initial and ongoing rails critiques, such as "Ruby/Rails isn't scalable", "Rails isn't enterprise-ready", etc. and how arguments in support of Rails have grown stronger over time with the maturity of the platform. I'd like to spend some more time looking into some of these comments to be more aware of these issues.
2) Rails 3 Release
Anticipation builds in the Rails community for the announcement of Rails 3. I just recently joined the Rails development community in January and hadn't heard of the Rails vs. Merb debate until recently. I am interested in learning more about Merb and the background of the Rails/Merb merge.
3) PostRank
Appealing to my search engine optimization background, "social media measuring" offered at PostRank *essentially* applies the PageRank algorithm to the social web medium. These articles bring users what they measure to be the most credible measured by engagement. The Google: High Performance chat was presented by the PostRank founder. As social engagement becomes an increasingly important area on the web, this is an interesting company / business model I'd like to watch.
4) Yehuda Katz
I've heard many mentions of Yehuda Katz. I'm going to read more about him.
5) railstips.org
One of the recipients of this year's Ruby heroes award runs railstips.org, a great site for tutorials and development information. Adaptability and learning new things is a necessity as a consultant, and I'm always interested in finding new ways to learn more Rails.
6) Compass / Sass
With the release of Spree 0.8.0 (yesterday) came the integration of compass and sass. I went to the Birds of a Feather session on Compass and Sass integration and came away wanting to learn more about what distinguishes CSS from Sass and about how we can make use of the great functionality offered by Sass to benefit the Spree project. I'm going to check out this Sass screencast and hope to spend some time improving Spree's implementation of Sass.
7) Google Page Rank
I heard great things about the talk about Google: High Performance Computing in Rails, but did not attend. This talk was summarized by a coworker as a "few lines of ruby to implement the google Page Rank algorithm". Even though I missed it, I'm excited to check out the slides here.
8) Active Scaffold
I had a short discussion from an employee of PostRank who works in blog development. We briefly discussed the functionality/troubles around integrating a CMS into Spree. He recommended looking into Active Scaffold.
9) Advanced Git Techniques
One of my coworkers attended a presentation on advanced Git techniques and walked away happy. He mentioned it was a lot of information, and the notes are posted here. End Point's open source projects on GitHub - I'm always open to learning more tips to help me keep my Git repositories clean.
10) Rails Envy
Gregg Pollack is putting up videos from RailsConf at RailsEnvy. I want to make sure I catch these, whether it's during the conference or later on.
I attended the OpenSQL Camp last weekend, which ran Friday night to Sunday, November 14-16th. This was the first "unconference" I had been to, and Baron Schwartz did a great job in pulling this all together. I drove down with Bruce Momjian who said that this is the first cross-database conference of any kind since at least the year 2000.
The conference was slated to start at 6 pm, and Bruce and I arrived at our hotel a few minutes before then. Our hotel was at one end of the Charlottesville Downtown Mall, and the conference was at the other end, so we got a quick walking tour of the mall. Seems like a great place - lots of shops, people walking, temporary booths set out, outdoor seating for the restaurants. It reminded me a lot of Las Ramblas, but without the "human statue" performance artists. Having a hotel within walking distance of a conference is a big plus in my book, and I'll go out of my way to find one.
The first night was simply mingling with other people and designing the next day's sessions. There was a grid of talk slots on a wall, with large sticky notes stuck to some of them to indicate already-scheduled sessions. Next to the grid were two sections, where people added sticky notes for potential lightning talks, and for potential regular talks. There were probably about 20 of each type of talk by the end of the night. The idea was to put a check next to any talk you were interested in, although I don't think everyone really got the message about that, judging by the number of checks vs. the number of people. At one point, we gathered in a circle and gave a quick 5 word introduction about ourselves. Mine was "Just Another Perl Postgres Hacker." There were probably around 50-60 or so people there, and the vast majority were from Sun/MySQL. A smaller group of people were non-Sun MySQL people, such as Baron and Sheeri. Coming in at a minority of two was Bruce and myself, representing Postgres (although Saturday saw our numbers swell to three, with the addition of Kelly McDonald). However, the smallest minority was the SQLite contingent, consisting solely of Dr. Richard Hipp (whom it was great to meet in person). Needless to say, I met a lot of MySQL people at this conference! All were very friendly and receptive to Bruce and myself, and it did feel mostly like an open source database conference rather than a MySQL one. Seven of the twenty one talks were by non-MySQL people, which means we were technically overrepresented. Or had more interesting talks! ;)
After heading back to the room and reviewing my notes before bed, I got up the next day and caught the keynote, given by Brian Aker, about the future of open-source databases. Thanks for the Skype/Postgres shout out, Brian! :) A comment by Jim Starkey at the end of the talk led to an interesting discussion on bot nets, the current kings of cloud computing.
My talk on MVCC was the first talk of the day, which of course means lots of technical difficulties. As usual, my laptop refused to cooperate with the overhead projector. In anticipation of this, I had copied the presentation in PDF format to a USB disk, and ended up using someone else's Mac laptop to give the presentation. (I don't remember whose it was, but thank you!) I've given the talk before, but this was a major rewrite to suit the audience: much less Postgres-specific material, and some details about how other systems implement MVCC, as well as the advantages and disadvantages of both ways. Both Oracle and InnoDB update the actual value on disk, and save changes elsewhere, optimistically assuming that a rollback won't happen. This makes a rollback expensive, as the old diffs must be looked up and applied to the main table. Postgres is pessimistic, in that rollbacks are not as expensive as we simply add an entire new row on update, and a rollback simply marks it as no longer valid. Both ways involve some sort of cleaning up of old rows, and handle tradeoffs in different ways. There was some interesting discussions during and after the talk, as Jim Starkey and Ann Harrison weighed in on how other systems (Falcon and Firebird) perform MVCC, and the costs and tradeoffs involved. After the talk, I had some interesting talks with Ann about garbage collection and vacuuming in general.
The next talk was by Dr. Hipp, entitled "How SQL Database Engines Work", which was fascinating as it gave a glance into the inner working and philosophy of SQLite, whose underlying assumptions about power usage, memory, transactions, portability, and resource usage are radically different from most other database systems. Again there was some interesting discussions about certain slides from the audience within the talk.
The competing talk for that time slot was "Libdrizzle" by Eric Day. While I missed this talk, I did get to talk to him the night before about libdrizzle, among other things. Patrick Galbraith and I tried to explain the monstrosity that is XS to Eric (as he and I maintain DBD::mysql and DBD::Pg respectively), and Eric showed us how PHP does something similar.
My DBIx::Cache talk was sabotaged by Bruce having a better session at the same time, so I attended that instead of giving mine. I'll post the slides for the DBIx::Cache talk on the OpenSQL Camp wiki soon, however. I liked Bruce's talk ("Moving Application Logic Into the Database"), mostly becasuse he was preaching to the choir when talking about putting business logic into the database. There was an interesting discussion about the borrowing of LIMIT and OFFSET from MySQL and putting it into Postgres, and we even helped Richard figure out that he was unknowingly supporting the broken and deprecated Postgres "comma-comma" syntax. Bruce's talk was very polished and interesting. I suspect he may have given talks before. :)
Lunch was catered in, and I talked to many people while eating lunch, indeed over the conference itself. Apparently MySQL 5.1 is finally going to be released, this time for sure, according to first Giuseppe and then Dups. Post-lunch were the lightning talks, which I normally would not miss, but their overall MySQL-centricness and my interest in another session, entitled "MySQL Unconference" by Sheeri K. Cabral, drew me away. Bruce, Sheeri, Giuseppe Maxia, and myself talked about the details of such a conference. It was a very interesting perpective: MySQL has the problem of a "one company, and no community" perception, while Postgres suffers from a "all community, and no company" perception. Neither perception is accurate, of course, but there are some seeds of truth to both.
Bruce's second presentation, "Postgres Talks", turned into mostly a wide-ranging discussion between those present (myself, Bruce, Ann, Kelly, Richard, others?) about materialized views, vacuum, building query trees, and other topics.
I bailed out on my fellow Postgres talk "Postgres Extensions" by Kelly McDonald (sorry Kelly). I had already picked his brain about it earlier, so I felt not too much guilt in attending "Atomic Commit In SQLite" by Dr. Hipp. Again, it's fascinating to see things from the SQLite perspective. Not only technically, but how their development is structured is different as well.
I was not feeling well, so I ran back to the hotel to drop off my backpack with super-heavy laptop inside, and thus missed my next planned talk, "Unix Command Line Productivity Tips". If anyone went and can pass on some tips in the comments below, please do so! :)
The final talk I went to was "Join-Fu" by Jay Pipes. I honestly had no idea what this talk would be about, but I actually found it very interesting (and entertaining). Jay is a great speaker, and is not shy about pointing out some of MySQL's weaknesses. The talk was basically a collection of best practices for MySQL, and I actually learned not only things about MySQL I can put to use, but things to apply to Postgres as well. He spent some time on the MySQL query cache as well, which is particularly interesting to me as I'd love to see Postgres get something similar (and until then, people can use DBIx::Cache of course!).
After the final set of presentations was more mingling, eating of some pizza with funky toppings, and planning for the nexy day's hackathon. All the proposed ideas were MySQL-specific, as to be expected, but Bruce and I actually got some work done that night by looking over the pg_memcached code, prompted by Brian. I had looked it over a little bit a few months ago, but Bruce and I managed to fix a bug and, more importantly, found other people to continue working on it. Don't forget to take the credit when they finish their work, Bruce! :)
All in all, a great time. I would have liked to see the presentations stretched out over two days, and to have seen a greater Postgres turnout, but there's always next year. Thanks to Baron for creating a unique event!




