All opinions expressed are those of the authors and not necessarily those of OSNews.com, our sponsors, or our affiliates.
  Add to My Yahoo!  Subscribe with Bloglines  Subscribe in NewsGator Online

published by Eugenia on 2014-04-18 20:32:41 in the "Religion" category
Eugenia Loli-Queru

Great lucid dream I just had. I woke up this morning at 8:45 AM when the gardener outside started making noises, but I slept again half an hour later. So I became lucid, during two dreams in a row. The second dream, I was supposed to be this old executive in NY, and had a daughter who despised me. So I get into a meeting where they started talking finances and stuff. Well, I had enough playing along with that bullshit. I got up without saying anything to anybody, get in the hallway of the building, and I proclaimed loudly that I’m not going to participate in such a boring, shitty dream.

Next thing I know, everything goes black, or static, or something. I truly felt that I had died in my sleep. I had reached the void. I got scared (and I’m not easily scared in dreams anymore, but I felt as if I died) and started shouting for Esther, my Spirit Guide. Within a second or two, I was back at the same hallway, and Esther was there. She was not amused.

There was a conversation that took place about some things that at the time I was thinking that “I should not forget them”, but I did forget them. But I did remember two important questions and answers:
- Is Heva, as my Higher Self, real? The answer was “yes”.
- Are past lives real, and the past lives I experienced under hypnosis real? The answer was also “yes”.

At that point, two people, one man and one woman appears, with files on their hands. Esther seemed to be feeling a bit anxious. She introduced them as “tax collectors”, or at least, something important in the whole life-death-afterlife circus. Their one question to her was:

- “Why is she so evolved already?“, and that had a connotation of “having evolved spiritually faster than others within a given set of past lives, or at least since I became spiritual about 10 months ago”, and also “why are you helping her by giving her answers?”.

I didn’t let Esther reply. I did my own replying. I “excused” myself for my “premature” spiritual growing by telling them that I did some spiritual work and meditation in previous life, and that seems to have stuck (I was referring to my past life in Israel). They were satisfied with the answer, they both checked something on their files, and they went away.

Esther was adamant to get me back to play along in the dream. I complained that the “script” was boring and ridiculous, and that I wanted to have more interesting dreams. She claimed that this dream is one of the best if I was to let it play out, and that I should really give it a try. I finally conceded.

So the dream continued from where it had left off, like nothing had happened in-between, and towards the end of it, I was a super-hero. Not my average dream, indeed. It was fun, I guess. It had its moments.

I just wish I could remember what I told myself to remember, but forgot…


published by noreply@blogger.com (Greg Sabino Mullane) on 2014-04-18 20:00:00 in the "browsers" category

Image by Flickr user Dennis Jarvis

tl;dr: avoid using onmousemove events with Google Chrome.

I recently fielded a complaint about not being able to select text with the mouse on a wiki running the MediaWiki software. After some troubleshooting and research, I narrowed the problem down to a bug in the Chrome browser regarding the onmousemove event. The solution in this case was to tweak JavaScript to use onmouseover instead of onmousemove.

The first step in troubleshooting is to duplicate the problem. In this case, the page worked fine for me in Firefox, so I tried using the same browser as the reporter: Chrome. Sure enough, I could no longer hold down the mouse button and select text on the page. Now that the browser was implicated, it was time to see what it was about this page that caused the problem.

It seemed fairly unlikely that something like this would go unfixed if it was happening on the flagship MediaWiki site, Wikipedia. Sure enough, that site worked fine, I could select the text with no problem. Testing some other random sites showed no problems either. Some googling indicated others had similar problems with Chrome, and gave a bunch of workarounds for selecting the text. However, I wanted a fix, not a workaround.

There were hints that JavaScript was involved, so I disabled JavaScript in Chrome, reloaded the page, and suddenly everything started working again. Call that big clue number two. The next step was to see what was different between the local MediaWiki installation and Wikipedia. The local site was a few versions behind, but I was fortuitously testing an upgrade on a test server. This showed the problem still existed on the newer version, which meant that the problem was something specific to the wiki itself.

The most likely culprit was one of the many installed MediaWiki extensions, which are small pieces of code that perform certain actions on a wiki. These often have their own JavaScript that they run, which was still the most likely problem.

Then it was some basic troubleshooting. After turning JavaScript back on, I edited the LocalSettings.php file and commented out all the user-supplied extensions. This made the problem disappear again. Then I commented out half the extensions, then half again, etc., until I was able to narrow the problem down to a single extension.

The extension in question, known simply as "balloons", has actually been removed from the MediaWiki extensions site, for "prolonged security issues with the code." The extension allows creation of very nice looking pop up CSS "balloons" full of text. I'm guessing the concern is because the arguments for the balloons were not sanitized properly. In a public wiki, this would be a problem, but this was for a private intranet, so we were not worried about continuing to use the extension. As a side note, such changes would be caught anyway as this wiki sends an email to a few people on any change, including a full text diff of all the changes.

Looking inside the JavaScript used by the extension, I was able to narrow the problem down to a single line inside balloons/js/balloons.js:

  // Track the cursor every time the mouse moves
  document.onmousemove = this.setActiveCoordinates;

Sure enough, duck-duck-going through the Internet quickly found a fairly incriminating Chromium bug, indicating that onmousemove did not work very well at all. Looking over the balloon extension code, it appeared that onmouseover would probably be good enough to gather the same information and allow the extension to work while not blocking the ability for people to select text. One small replacement of "move" to "over", and everything was back to working as it should!

So in summary, if you cannot select text with the mouse in Chrome (or you see any other odd mouse-related behaviors), suspect an onmousemove issue.


published by noreply@blogger.com (Steph Skardal) on 2014-04-16 16:58:00 in the "piggybak" category

Piggybak and gems available in the demo (piggybak_variants, piggybak_giftcerts, piggybak_coupons, piggybak_bundle_discounts, piggybak_taxonomy) have been updated to Rails 4.1.0, Ruby 2.1.1 via Piggybak version gem 0.7.1. Interested in the technical details of the upgrade? Here are some fine points:

  • Dependencies were refactored so that the parent Rails app controls the Rails dependency only. There was a bit of redundancy in the various plugin gemspec dependencies. This has been cleaned up so the parent Rails app shall be the canonical reference to the Rails version used in the application.
  • Modified use of assets which require "//= require piggybak/piggybak-application" to be added to the assets moving forward. There have been several observed issues with precompling and asset behavior, so I simplified this by requiring this require to be added to the main Rails application.js for now. The engine file is supposed to have a way around this, but it has not behaved as expected, specifically on unique deployment architectures (e.g. Heroku). Patches welcome to address this.
  • Tables migrated to namespaced tables, e.g. "orders" migrated to "piggybak_orders". This is how namespaced engine tables are supposed to look, and this upgrade fixes the table names with a migration and related code.
  • Handled strong parameters. This was one of the most significant jumps from Rails 3 to Rails 4. The main element of Piggybak that needed updating here was the orders controller, which receives the order parameters and must determine which parameters to handle. Any references to attr_accessible in the code were removed.
  • ActiveRecord "find" method replaced with where & chaining, where applicable. The jump to Rails 4.0 deprecated find methods, but did not remove support, but the jump to Rails 4.1 removed support of these finder methods. These were removed.
  • Scope syntax update. Rails 4 handles scopes with new syntax, and all default scope and named scopes were updated to reflect this new syntax.
  • Validates syntax updated. Rails 4 has new validates syntax which accepts arguments, e.g. presence: true, uniqueness: true. Piggybak was upgraded to use the new syntax, although the old syntax is still supported.
  • Significant routes update. Rails 4 introduced a significant change in routing, and Piggybak was updated to reflect these changes.

The full commits of Piggybak are available for browsing here and here.

Wishlist

There are a few things that I'd love to see adopted in Piggybak, with the help of the community. These include:

  • Consider move to CoffeeScript. I'm still on the fence about this, but I'm seeing more projects with node and CoffeeScript lately, so I wonder if it would be worth the overhead to move to CoffeeScript.
  • Add test coverage. Perhaps Travis CI integration would make sense since it hooks into github nicely?
  • Build out more features. Things like reviews & ratings, saved cart, wishlist support, and saved address support have been on the feature list for a while. It'd be nice to see movement here.

published by noreply@blogger.com (Kent Krenrich) on 2014-04-16 13:00:00 in the "button" category

I recently discovered a discrepancy in the way Firefox treats inputs with a line-height style defined and how other browsers handle the same styling rule. Specifically, Firefox completely ignores it.

This behavior seemed odd enough to me that I did some Googling to determine if this was recently introduced, a long standing issue, or something I was just doing wrong. I found some interesting discussions on the issue. Several of the search results used the word ?bug? in the title though it appears to be more of a deliberate (though possibly outdated) ?feature? instead. Along with the discussions, I also came across a couple of suggestions for a solution.

First of all, I was able to locate a simple enough explanation of what?s causing the behavior. As Rob Glazebrook explains:

Basically, Firefox is setting the line-height to ?normal? on buttons and is enforcing this decision with an !important declaration.? and, ?browser-defined !important rules cannot be over-ruled by author-defined !important rules. This rule cannot be overruled by a CSS file, an inline style ? anything.

Good news is I can stop experimenting hoping for different results.

I also located a Bugzilla ticket opened in 2011 which contains some discussion on the pros and cons of allowing designers to control the line-height of input elements. The last few comments suggest that Firefox 30 may remove the !important declaration which would open up access to the styling property. At the time of this writing, Firefox version 30 appears to be in alpha.

Due to this long-standing stance by Mozilla, Twitter Bootstrap makes the recommendation to avoid using inputs with type set to button, submit, or reset. Instead, they recommend using button tags paired with the types already mentioned. Button tags are much more flexible in what styling rules are allowed to be applied and are therefore easier to get similar rendering results from the widest range of browsers.

If switching to button tags is not an option for whatever reason, another possible solution is to adjust the padding values for your input buttons. By shrinking the top padding, you can more easily fit text that needs to wrap due to a limited available width. Adjusting the top padding can better center the text on the button by preventing the first line of the text from rendering vertically dead center of the button.


published by noreply@blogger.com (Emanuele 'Lele' Calo') on 2014-04-15 15:48:00

Spam mail messages have been a plague since the Internet became popular and they kept growing more and more as the number of devices and people connected grew. Despite the numerous attempts of creation of anti-spam tools, there's still a fairly high number of unwanted messages sent every day.

Luckily it seems that lately something is changing with the adoption of three (relatively) new tools which are starting to be widely used: SPF, DKIM and DMARC. Let's have a quick look at each of these tools and what they achieve.

What are SPF, DKIM and DMARC

SPF (Sender Policy Framework) is a DNS text entry which shows a list of servers that should be considered allowed to send mail for a specific domain. Incidentally the fact that SPF is a DNS entry can also considered a way to enforce the fact that the list is authoritative for the domain, since the owners/administrators are the only people allowed to add/change that main domain zone.

DKIM (DomainKeys Identified Mail) should be instead considered a method to verify that the messages' content are trustworthy, meaning that they weren't changed from the moment the message left the initial mail server. This additional layer of trustability is achieved by an implementation of the standard public/private key signing process. Once again the owners of the domain add a DNS entry with the public DKIM key which will be used by receivers to verify that the message DKIM signature is correct, while on the sender side the server will sign the entitled mail messages with the corresponding private key.

DMARC (Domain-based Message Authentication, Reporting and Conformance) empowers SPF and DKIM by stating a clear policy which should be used about both the aforementioned tools and allows to set an address which can be used to send reports about the mail messages statistics gathered by receivers against the specific domain [1].

How do they work?

All these tools relies heavily on DNS and luckily their functioning process, after all the setup phase is finished, is simple enough to be (roughly) explained below:

SPF:

  • upon receipt the HELO message and the sender address are fetched by the receiving mail server
  • the receiving mail server runs an TXT DNS query against the claimed domain SPF entry
  • the SPF entry data is then used to verify the sender server
  • in case the check fails a rejection message is given to the sender server


Source [*]

DKIM:

  • when sending an outgoing message, the last server within the domain infrastructure checks against its internal settings if the domain used in the "From:" header is included in its "signing table". If not the process stops here
  • a new header, called "DKIM-Signature", is added to the mail message by using the private part of the key on the message content
  • from here on the message *main* content cannot be modified otherwise the DKIM header won't match anymore
  • upon reception the receiving server will make a TXT DNS query to retrieve the key used in the DKIM-Signature field
  • the DKIM header check result can be then used when deciding if a message is fraudulent or trustworthy


Source [*]

DMARC:

  • upon reception the receiving mail server checks if there is any existing DMARC policy published in the domain used by the SPF and/or DKIM checks
  • if *one or both* the SPF and DKIM checks succeed while still being *aligned* with the policy set by DMARC, then the check is considered successful, otherwise it's set as failed
  • if the check fails, based on the action published by the DMARC policy, different actions are taken


Source [*]

The bad news: limits and best practices

Unfortunately even by having a perfectly functional mail system with all the above tools enforced you won't be 100% safe from the bad guys out there. Not all servers are using all three tools shown above. It's enough to take a look at the table shown in Wikipedia [2] to see how that's possible.

Furthermore there are some limits that you should always consider when dealing with SPF, DKIM and DMARC:

  • as already said above DKIM alone doesn't grant in any way that the sender server is allowed to send outgoing mail for the specific domain
  • SPF is powerless with messages forged in shared hosting scenario as all the mail will appear as the same coming IP
  • DMARC is still in its early age and unfortunately not used as much as hoped to make a huge difference
  • DMARC can (and will) break your mail flow if you don't set up both SPF and DKIM before changing DMARC policy to anything above "none".

Please work through the proper process carefully, otherwise your precious messages won't be delivered to your users as potentially seen as fraudulent by a wrong SPF, DKIM or DMARC setup.

What's the message behind all this? Should I use these tools or not?

The short answer is: "Yes". The longer answer is that everybody should and eventually will in future, but we're just not there yet. So even if all these tools already have a lot of power, they're not still shining as bright as they should because of poor adoption.

Hopefully things will change soon and that starts by every one of us adopting these tools as soon as possible.

[1] The lack of such a monitoring tool is considered one of the reasons why other tools (such as ADSP) in past have failed during the adoption phase.
[2] Comparison of mail servers on Wikipedia


published by noreply@blogger.com (Jeff Boes) on 2014-04-15 13:00:00 in the "ajax" category
This is not a huge breakthrough, but it colored in some gaps in my knowledge so I thought I would share. Let's say you have a product flypage for a widget that comes in several colors. Other than some of the descriptive text, and maybe a hidden field for use in ordering one color instead of another, all the pages look the same. So your page looks like this (extremely simplified):
... a lot of boilerplate ...
... a lot more boilerplate ...
Probably the page is generated into a template based on a parameter or path segment:
http://.../app/product/WDGT-001-RED
What we're going to add is a quick-and-dirty way of having your page rewrite itself on the fly with just the bits that change when you select a different version (or variant) of the same product. E.g.,

The old-school approach was something like:
 $('select[name=sku]').change(function(){
   document.location.href = my_url + $(this).val();
 });
I.e., we'll just send the browser to re-display the page, but with the selected SKU in the URL instead of where we are now. Slow, clunky, and boring! Instead, let's take advantage of the ability to grab the page from the server and only freshen the parts that change for the desired SKU (warning: this is a bit hand-wavy, as your specifics will change up the code below quite a bit):
// This is subtly wrong:
$('select[name=sku]').change(function(){
  $.ajax({
    async: false,
    url: my_url + $(this).val(),
    complete: function(data){
      $('form#order_item').html( $(data.responseText).find('form#order_item').html() );
    }
});
Why wrong? Well, any event handlers you may have installed (such as the .change() on our selector!) will fail to fire after the content is replaced, because the contents of the form don't have those handlers. You could set them up all over again, but there's a better way:
// This is better:
$('form#order_item').on('change', 'select[name=sku]',
  function(){
  $.ajax({
    async: false,
    url: my_url + $(this).val(),
    complete: function(data){
      var doc = $(data.responseText);
      var $form = $('form#order_item');
      var $clone = $form.clone( true );
      $clone.html(doc.find('form#order_item').html());
      $form.replaceWith($clone);
    }
  });
Using an "on" handler for the whole form, with a filter of just the select element we care about, works better – because when we clone the form, we copy its handler(s), too. There's room for improvement in this solution, because we're still fetching the entire product display page, even the bits that we're going to ignore, so we should look at changing the .ajax() call to reference something else – maybe a custom version of the page that only generates the form and leaves out all the boilerplate. This solution also leaves the browser's address showing the original product, not the one we selected, so a page refresh will be confusing. There are fixes for both of these, but that's for another day.

published by noreply@blogger.com (Szymon Guz) on 2014-04-11 14:59:00 in the "performance" category

The Problem

Sometimes you need to generate sample data, like random data for tests. Sometimes you need to generate it with huge amount of code you have in your ORM mappings, just because an architect decided that all the logic needs to be stored in the ORM, and the database should be just a dummy data container. The real reason is not important - the problem is: let’s generate lots of, millions of rows, for a sample table from ORM mappings.

Sometimes the data is read from a file, but due to business logic kept in ORM, you need to load the data from file to ORM and then save the millions of ORM objects to database.

This can be done in many different ways, but here I will concentrate on making that as fast as possible.

I will use PostgreSQL and SQLAlchemy (with psycopg2) for ORM, so all the code will be implemented in Python. I will create a couple of functions, each implementing another solution for saving the data to the database, and I will test them using 10k and 100k of generated ORM objects.

Sample Table

The table I used is quite simple, just a simplified blog post:

CREATE TABLE posts (
  id SERIAL PRIMARY KEY,
  title TEXT NOT NULL,
  body TEXT NOT NULL,
  payload TEXT NOT NULL
);

SQLAlchemy Mapping

I'm using SQLAlchemy for ORM, so I need a mapping, I will use this simple one:
class BlogPost(Base):
    __tablename__ = "posts"

    id = Column(Integer, primary_key=True)
    title = Column(Text)
    body = Column(Text)
    payload = Column(Text)

The payload field is just to make the object bigger, to simulate real life where objects can be much more complicated, and thus slower to save to the database.

Generating Random Object

The main idea for this test is to have a randomly generated object, however what I really check is the database speed, and the whole randomness is used at the client side, so having a randomly generated object doesn’t really matter at this moment. The overhead of a fully random function is the same regardless of the method of saving the data to the database. So instead of randomly generating the object, I will use a static one, with static data, and I will use the function below:

TITLE   = "title"      * 1764
BODY    = "body"       * 1764
PAYLOAD = "dummy data" * 1764

def generate_random_post():
    "Generates a kind of random blog post"
    return BlogPost(title=TITLE, body=BODY, payload=PAYLOAD)

Solution Ideas

Generally there are two main ideas for such a bulk inserting of multiple ORM objects:

  • Insert them one-by-one with autocommit
  • Insert them one-by-one in one transaction

Save One By One

This is the simplest way. Usually we don’t save just one object, but instead we save many different objects in one transaction, and making a couple of related changes in multiple transactions is a great way leading to a database with bad data.

For generating millions of unrelated objects this shouldn’t cause data inconsistency, but this is highly inefficient. I’ve seen this multiple times in code: create an object, save it to the database, commit, create another object and so on. It works, but is quite slow. Sometimes it is fast enough, but for the cost of making a very simple change in this algorithm we can make it 10 times faster.

I’ve implemented this algorithm in the function below:

def save_objects_one_by_one(count=MAX_COUNT):
    for i in xrange(1, MAX_COUNT+1):
        post = generate_random_post()
        session.add(post)
        session.commit()

Save All in One Transaction

This solution is as simple as: create objects, save them to the database, commit the transaction at the end, so do everything in one huge transaction.

The implementation differs only by four spaces from the previous one, just run commit() once, after adding all objects:

def save_objects_one_transaction(count=MAX_COUNT):
    for i in xrange(1, MAX_COUNT+1):
        post = generate_random_post()
        session.add(post)
    session.commit()

Time difference

I ran the tests multiple times, truncating the table each time. The average results of saving 10k objects were quite predictable:

  • Multiple transactions - 268 seconds
  • One transaction - 25 seconds

The difference is not surprising, the whole table size is 4.8MB, but after each transaction the database needs to write the changes on disk, which slows the procedure a lot.

Copy

So far, I’ve described the most common methods of generating and storing many ORM objects. I was wondering about another, which may seem surprising a little bit at the beginning.

PostgreSQL has a great COPY command which can copy data between a table and a file. The file format is simple: one table row per one file row, fields delimited with a defined delimiter etc. It can be a normal csv or tsv file.

My crazy idea was: how about using the COPY for loading all the generated ORM objects? To do that, I need to serialize them to a text representation, to create a text file with all of them. So I created a simple function, which does that. This function is made outside the BlogPost class, so I don't need to change the data model.

def serialize_post_to_out_stream(post, out):
    import csv
    writer = csv.writer(out, delimiter="t", quoting=csv.QUOTE_MINIMAL)
    writer.writerow([post.title, post.body, post.payload])

The function above gets two parameters:

  • post - the object to be serialized
  • out - the output stream where the row with the post object will be saved, in Python it is a file-like object, so an object with all the functions a file object has

Here I use a standard csv module, which supports reading and writing csv files. I really don’t want to write my own function for escaping all the possible forms of data I could have - this usually leads to many tricky bugs.

The only thing left is to use the COPY command. I don’t want to create a file with data and load that later; the generated data can be really huge, and creating temporary files can just slow things down. I want to keep the whole procedure in Python, and use pipes for data loading.

I will use the psql program for accessing the PostgreSQL database. Psql has a different command called COPY, which can read the csv file from psql's standard input. This can be done using e.g.: cat file.csv | psql database.

To use it in Python, I’m going to use the subprocess module, and create a psql process with stdin=subprocess.PIPE which will give me write access to the pipe psql reads from. The function I’ve implemented is:

def save_objects_using_copy(count=MAX_COUNT):
    import subprocess
    p = subprocess.Popen([
        'psql', 'pgtest', '-U', 'pgtest',
        '-c', 'COPY posts(title, body, payload) FROM STDIN',
        '--set=ON_ERROR_STOP=true'
        ], stdin=subprocess.PIPE
    )
    for i in xrange(1, MAX_COUNT+1):
        post = generate_random_post()
        serialize_post_to_out_stream(post, p.stdin)
    p.stdin.close()

Results

I’ve also tested that on the same database table, truncating the table before running it. After that I’ve also checked this function, and the previous one (with one transaction) on a bigger sample - 100k of BlogPost objects.

The results are:

Sample size Multiple Transactions One Transaction COPY
10k 268 s 25 s 5 s
100k 262 s 51 s

I haven’t tested the multiple transactions version for 100k sample, as I just didn’t want to wait multiple hours for finishing that (as I run each of the tests multiple times to get more reliable results).

As you can see, the COPY version is the fastest, even 5 times faster than the full ORM version with one huge transaction. This version is also memory friendly, as no matter how many objects you want to generate, it always needs to store one ORM object in memory, and you can destroy it after saving.

The Drawbacks

Of course using psql poses a couple of problems:

  • you need to have psql available; sometimes that’s not an option
  • calling psql creates another connection to the database; sometimes that could be a problem
  • you need to set up a password in ~/.psql file; you cannot provide it in the command line

You could also get the pcycopg2 cursor directly from the SQLAlchemy connection, and then use the copy_from() function, but this method needs to have all the data already prepared in memory, as it reads from a file-like object, e.g. StringIO. This is not a good solution for inserting millions of objects, as they can be quite huge - streaming is much better in this case.

Another solution to this is to write a generator, which is a file like object, and the copy_from() method can read from it directly. This function calls the file's read() method trying to read 8192 bytes per call. This can be a good idea when you don't have access to the psql, however due to the overhead for generating the 8192 bytes strings, it should be slowever than the psql version.


published by noreply@blogger.com (Jeff Boes) on 2014-04-10 21:35:00 in the "mysql" category
Probably old news, but I hit this MySQL oddity today after a long day dealing with unrelated crazy stuff and it just made me go cross-eyed:
CREATE TABLE foo (id integer, val enum('','1'));
INSERT INTO foo VALUES (1, '');
INSERT INTO foo VALUES (2, '1');
SELECT * FROM foo WHERE val = 1;
What row do you get? I'll wait while you second- and third-guess yourself. It turns out that the "enum" datatype in MySQL just translates to a set of unique integer values. In our case, that means:
  • '' == 1
  • '1' == 2
So you get the row with (1,''). Now, if that doesn't confuse readers of your code, I don't know what will.

published by noreply@blogger.com (Szymon Guz) on 2014-04-08 09:06:00 in the "postgres" category

I found an interesting problem. There was a table with some data, among which there was a date and an integer value. The problem was to get cumulative sum for all the dates, however including dates for which we don't have entries. In case of such dates we should use the last calculated sum.

Example Data

I will use an example table:

# CREATE TABLE test (d DATE, v INTEGER);

with sample data:

# INSERT INTO test(d,v)
  VALUES('2014-02-01', 10),
        ('2014-02-02', 30),
        ('2014-02-05', 10),
        ('2014-02-10', 3);

Then the data in the table looks like:

# SELECT * FROM test;
     d      |  v
------------+----
 2014-02-01 | 10
 2014-02-02 | 30
 2014-02-05 | 10
 2014-02-10 |  3
(4 rows)

What I want is to have a cumulative sum for each day. Cumulative sum is a sum for all the earlier numbers, so for the above data I want to get:

     d      |  v
------------+----
 2014-02-01 | 10
 2014-02-02 | 40
 2014-02-05 | 50
 2014-02-10 | 53
(4 rows)

The simple query for getting the data set like shown above is:

SELECT DISTINCT d, SUM(v) OVER (ORDER BY d) v
FROM test
ORDER BY d ASC;

Filling The Gaps

The query calculates the cumulative sum for each row. Unfortunately this way there are gaps between dates, and the request was to fill those in using the values from previous days.

What I want to get is:

     d      |  v
------------+----
 2014-02-01 | 10
 2014-02-02 | 40
 2014-02-03 | 40
 2014-02-04 | 40
 2014-02-05 | 50
 2014-02-06 | 50
 2014-02-07 | 50
 2014-02-08 | 50
 2014-02-09 | 50
 2014-02-10 | 53

My first idea was to use the generate_series() function, which can generate a series of data. What I need is a series of all dates between min and max dates. This can be done using:

# SELECT generate_series(
    '2014-02-01'::timestamp,
    '2014-02-05'::timestamp,
    '1 day')::date;
 generate_series 
-----------------
 2014-02-01
 2014-02-02
 2014-02-03
 2014-02-04
 2014-02-05

The generate_series() function arguments are (begin, end, interval). The function returns all timestamps from beginning to end with given interval. The return value is timestamp, so I had to cast it to date with '::date', which is a nice PostgreSQL shortcut for the standard syntax, CAST(generate_series(...) AS DATE).

I also want to use the first query to use the cumulative sum I calculated before. It can be simply achieved using the great WITH command which creates something like a temporary table, which can be queried:

# WITH temp AS 
(
  SELECT generate_series(1, 1000) d
) 
SELECT d
FROM temp
WHERE d < 4
ORDER BY d DESC;

 d
---
 3
 2
 1

Combining all the above queries resulted in the below one:

WITH y AS 
( 
  SELECT DISTINCT d, SUM(v) OVER (ORDER BY d) v
  FROM test
)
SELECT g.d,
  (SELECT v 
   FROM y 
   WHERE y.d <= g.d
   ORDER BY d DESC
   LIMIT 1)
FROM
  (SELECT generate_series(min(d), max(d), '1 day')::date d 
   FROM y) g
ORDER BY d ASC

After the earlier explanations understanding this one should be easy.

  • I placed the original query calculating the cumulative sum in the WITH block.
  • SELECT creates a row set with two columns
    • The first column is date returns from subselect, just before ORDER BY. There are returned all dates between min and max date from the original data.
    • The second column is a subquery getting calculated cumulative sum. It gets the sum for current date (from the first column), or the previous calculated.
  • And of course we need ordering at the end. The database can reorder the data as it wants during executing the query, so we always need to declare the ordering at the end. Otherwise strange things can happen (like having the same ordering of rows for years, and suddenly a totally different one, just because someone added new row, deleted some other, or just restarted application).


published by noreply@blogger.com (Steph Skardal) on 2014-04-08 01:27:00 in the "open-source" category

While I was at the I Annotate 2014 conference last week, I spoke with a couple developers about the challenges of working in open source. Specifically, the Annotator JavaScript library that we are using for H2O is getting a bit of a push from the community to decouple (or make more modular) some components as well as improve the extensibility. Similarly, Spree, an open source Ruby on Rails platform that End Point has sponsored in the past and continued to work with, made a shift from a monolithic platform to a modular (via Ruby gems) approach, and Piggybak started out as a modular and extensible ecommerce solution. I like doodling, so here's a diagram that represents the ecosystem of building out an open source tool with a supplemental explanation below:

Here are some questions I consider on these topics:

  • What is the cost of extensibility?
    • code complexity
    • code maintenance (indirectly, as code complexity increases)
    • harder learning curve for new developers (indirectly, as code complexity increases)
    • performance implications (possibly, indirectly, as code complexity increases)
    • difficulty in testing code (possibly)
  • What is the cost of modularity?
    • same as cost of extensibility
    • challenge of determining what features to include in core (or exclude from core)
    • can be both performance implications and performance mitigation
  • What are the values of extensibility?
    • robustness of tool
    • increased community involvement (indirectly, as robustness increases)
    • further reach, increased use cases (indirectly, as robustness increases)
  • What are the values of modularity
    • same as values of extensibility

From a cost-benefit perspective, the goal here should allow the values of extensibility and modularity to outweigh the disadvantages to allow for a flourishing, growing community of developers and users of the tool. Extensibility and modularity are not always easy to figure out, especially in some frameworks, but I think getting these two elements correct are very important factors in the success of an open source project. I also don't think many tools "get it right" the first time around, so there's always a chance to improve and refactor as the user base builds.


published by noreply@blogger.com (Steph Skardal) on 2014-04-06 01:28:00 in the "Conference" category

H2O & Annotator

Yesterday I gave my talk on Integrating Annotator with H2O, which covered the specific implementation details of integrating the open source JavaScript based tool Annotator into H2O, including a history of annotation and highlights of some of the challenges. I'll update the link here to my slides when they are available on the I Annotate conference website.

Anyways...

Version Control

One of the interesting recurring topics of the conference was the concept of version control, version control of text and other multi-media content, and how to present a user interface for version control that makes sense to non-developers (or those not familiar with code based version control). A simple example of what I mean by the problem of version control on the web is described in the following Facebook scenario:

  • User A updates status on Facebook
  • User B comments on User A's status
  • User C comments on User A's status, with reference or comment to User B's status
  • User B edits original comment
  • User C's comment no longer is applicable given the context, and doesn't make sense to users who have not seen User B's original comment

Facebook doesn't do anything about this scenario now, other than allow the ability to delete or edit comments. They've only recently introduced the ability to edit comments, so while they are aware of this problem, I don't expect them to build out a complex solution to address this challenge. But if they were to address it, can you imagine both the technical implementation and [intuitive] user interface implementation that would be easily adopted by the masses? If it were easy, it would have already been solved and we wouldn't be talking about it now!

Apply this Facebook use case to content both off and on the web. In the context of this conference, this is:

  • ebooks, PDFs, file types exportable to offline use
  • images, video, audio: all mentioned during this conference
  • all of the text on the internet

While the above types of content may change at various levels of frequency (e.g. text on the web tends to be more easily and frequently changed than video and audio productions), recording and presenting annotations tied to one piece of content in one state (or version) is very challenging. In text, Annotator ties annotations to a specific Range of content, so if any of the markup changes, the annotation range may no longer be accurate. Hypothes.is has implemented an approach to mitigate this problem (I'm hesitant to describe it as a "solution") with fuzzy matching, and work is being done to include this work into Annotator. I'm excited to where this goes because I think for this concept of annotation and social discussion around dynamic content [on the web] to work, version control is something that has to be elegantly handled and intuitive in use.


published by Eugenia on 2014-04-05 19:41:16 in the "Religion" category
Eugenia Loli-Queru

So I had quite a few OBEs yesterday & today (out of body experience, aka astral projection). After waking up in the middle of the night and sleeping again, lucidity is easily reached. Then, you simply induce an OBE through a lucid dream (this method is called DILD, other methods include WILD and FILD). So I went to a few places, including my hometown, Preveza, Greece.

There, I met with another girl from a nearby town, and I told her my name and where I was from and where I now live. I told her to ask for me when she wakes up, to see if we actually shared the same dream. She was the only one I found in the crowd who was also lucid, the rest of the people there were dreaming, and behaved like sleep-walkers. On my way back, found another lucid person, and we exchanged a thumbs-up, after thinking that “if we could go wherever we want to when we’re lucid, who the heck cares on travelling in real life?”

I also visited a wormhole (with some spaceships in there too), the void, visited some people I care about, etc. I had no trouble going from one place to another, although visiting other planets was time consuming it seemed.

But here’s the thing. OBEs are not literal, they need interpretation. While they can be pretty lucid, the things that happen in there aren’t as “solid” as in real life. Reality there fluctuates, just as it does in dreams, deep meditation, or when taken psychedelics. PhD researcher Aardema says in his book about OBEs that when we don’t use our mechanical brain to filter stuff out, our consciousness exists in a type of reality where many different quantum possibilities are probable, that’s why things can morph or change, based in our perception. That New Age belief that “thought can change reality” is true, but not as much in our reality, but for the reality that lies beyond our brain’s filtering mechanisms. The further you go into that type of consciousness, the more formless, shapeable and alien reality becomes.

So anyway, my conclusion is that it’s important to have this kind of experiences, because they get you ready on accepting death. There’s nothing to be afraid in death, apart from your own belief system (if you believe in Hell, you’ll surely find it because your consciousness will construct it). I’d go as far as to say that I’m probably ready to even face ego-death.

But as far as it goes about our current daily lives, these types of experiences are often irrelevant. Entheogens might provide some insight about how to live our daily lives, and certainly my own meetings with my Higher Self and Guide have been very helpful, but I don’t see them as mandatory. They’re interesting, for sure, but an already “stable” person don’t need such experiences to live their life. They already know what to do and how to do it deep inside.

My point is, that people who judge others for not being “spiritual” (e.g. caring about psychonaut-related matters), are mistaken. Not everybody needs such experiences to function properly in the society, or even to the world beyond. They’re some well-adjusted individuals who simply don’t need OBEs, entheogens, meditation, New Age crap etc. My husband seems to be such an individual. I now know that I don’t need them either, although I did need them last year, when I reshaped my world views and found my place in the universe. I now see all that stuff as tools to live well and get prepared for the next step, not as the end-all. After you’ve used the tool to construct or fix something, then you might not need it again.

As well-known pop-philosopher Alan Watts said: “If you get the message, hang up the phone“.


published by Eugenia on 2014-04-05 18:51:39 in the "General" category
Eugenia Loli-Queru

Avery sent me a free copy of Dr Terry Wahls new book, “The Wahls Protocol: How I Beat Progressive MS Using Paleo Principles and Functional Medicine” to check it out.

The book starts with Dr Wahls health story, and how she got Multiple Sclerosis (MS). She was a vegetarian, and an athlete, and yet, she became very ill in early 2000s. She tried various solutions, including taking huge amounts of vitamins, adding and removing foods, when she finally managed to almost reverse her illness after following a wholesome Paleo-like diet.

I’ve been doing various forms of Paleo for 2.5 years now, and I’m glad to see Dr Wahls recognizing the different needs that different patients have. In the book, she suggests three different diets, one more restrictive than the other, depending on the severity of the symptoms. The first one simply removes gluten, dairy, eggs, and processed foods, the second one additionally removes most grains and legumes (Paleo-like), and the third one is a strict Paleo-ketogenic diet (minus eggs). She went through all three diets herself while trying out things, and she’s currently in the Paleo-ketogenic regiment.

Throughout the book there are testimonials of other people with MS, who have tried the Paleo/Wahls-diet and have semi-reversed their condition (aka made their lives livable). The book is very easily read, everything is laid out in plain English for everybody to understand.

My favorite parts of the book (that in my opinion needed more expansion) were the hints that Dr Wahls was giving about non-native EMF radiation, infections, mold, and other environmental problems that can have as much impact in our health than eating bad food has. I also loved her suggestions on eating sea vegetables, and offal.

The only part that I really disliked in the book was her insistence on removing eggs from the diet. She is deathly allergic to eggs, but she’s trying to impose this restriction to others too. In a response to me she claimed that “egg allergies are actually dramatically under-diagnosed”, but I have my reservations on this. I also hold reservations on her dairy suggestions. In my experience, I found that often, dairy is a secondary intolerance, created by gluten intolerance. When gluten is taken out of the picture, and the gut is healed, after a few months fermented goat/sheep dairy could often be eaten again without ill effects. But even if dairy must be taken out, given the severity of MS, I think her no-eggs suggestion is still overblown. Sure, some people will be intolerant to eggs, but I don’t expect the majority to be so.

Another addition that should be made in this book is information about FODMAPs. In my dealings with the Paleo community in the last few years, I have witnessed a 5%-10% of dieters who didn’t get better on plain Paleo, but had to go Paleo+Fodmaps to finally have their gut healed.

Other than that, I think that this is one of the most important new Paleo books out there, and people with major health problems (not just MS), should have a good read of this book and follow its instructions. It’s a book that explains in very simple terms the whys and the hows, and in my own experience with my own health problems, it has worked.


published by noreply@blogger.com (Steph Skardal) on 2014-04-04 14:13:00 in the "javascript" category

I'm here in San Francisco for the second annual I Annotate conference. Today I'm presenting my work on the H2O project, but in this post I'll share a couple of focus points for the conference thus far, described below.

What do we mean by Annotations?

Annotation work is off the path of End Point's ecommerce focus, and annotations means different things for different users, so to give a bit of context: To me, an annotation is markup tied to single target content (image, text, video). There are other interpretations of annotations, such as highlighted text with no markup (ie flagging some target content), and cases where annotations are tied to multiple pieces of target contents.

Annotations in General

One of the focuses of yesterday's talks was the topic of how to allow for the powerful concept of annotations to succeed on the web. Ivan Herman of the W3C touched on why the web has succeeded, and what we can learn from that success to help the idea of annotations. The web has been a great idea, interoperable, decentralized, and open source and we hope that those concepts can translate to web annotations to help them be successful. Another interesting topic Tom Lehman of RapGenius touched on was how the actual implementation of annotation doesn't matter, but rather it's the community in place to encourage many high quality annotations. For RapGenius, that means offering a user hierarchy that awards users accessibility such as as moderators, editors, contributors, layering on a point-based ranking system, and including encouraging posting RapGenius annotated content in other sites. This talk struck a chord with me, because I know how hard it is to get high quality content for a website.

Specific Use Cases of Annotator

Yesterday's talks also covered several interesting use cases of Annotator, an open source JavaScript-based tool that aims to be the reference platform for web annotations that has been commonly adopted in this space, which is what we are using in H2O. Many of the people attending the conference are using Annotator and interested in its future and capabilities. Some highlights of implementation were:

  • RapGenius: Aforementioned, contains active community of annotating lyrics.
  • SocialBook: A classroom and/or "book club" application for interactive discussions and annotations of books.
  • FinancialTimes: From my understanding, annotations are the guide to how content is aggregated and presented in various facets of the BBC website.
  • annotationstudio.org: collaborative web-based annotation tools under development at MIT which has similarities to H2O.
  • AustESE: Work being done in Australia for scholarly editing, includes a Drupal plugin implemented using Annotator with several plugins layered on top, including image annotation, categorization, threaded discussions.
  • hypothes.is: Hypothes.is uses tool built on top of annotator, featuring several advanced features such as image annotation, bookmarklet annotation implementation, and real time stream updates with search.

After the morning talks, we broke into two longer small group sessions, and I joined the sessions to delve into deeper issues and implementation details of Annotator, as well as the challenges and needs associated with annotation the law. I'll share my presentation and more notes from today's talks. Stay tuned!


published by noreply@blogger.com (Spencer Christensen) on 2014-03-28 02:19:00 in the "automation" category

mtnwestdevops logo Last week I attended the MountainWest DevOps conference held in Salt Lake City, UT. This was a one day conference with a good set of presenters and lightning talks. There were several interesting topics presented, but I?ll only review a few I wanted to highlight.

I Serve No Master!

Aaron Gibson of Adaptive Computing discussed a very common problem with Puppet (and other configuration management systems): they work well in the scenario they were designed for but what about when the situation isn?t typical? Aaron had a situation where developers and QA engineers could instantiate systems themselves via OpenStack, however the process for installing their company?s software stack on those VMs was inconsistent, mostly manual, and took many hours. One of the pain points he shared, which I related to, was dealing with registering a puppet node with a puppet master- the sometimes painful back and forth of certificate issuing and signing.

His solution was to remove the puppet master completely from the process. Instead he created a bash wrapper script to execute a workflow around what was needed, still using puppet manifests on each system but run locally. This wrapper tool, called ?Builder?, relies on property files to customize the config and allow the script to manage different needs. This new script allowed them to keep using puppet to manage these self-serve OpenStack servers gaining the benefits of consistency and removing manual setup steps, providing the ability to automate installs with Jenkins or other tools. But it freed them from having to use a puppet master for nodes that were disposable. It also helped to reduce software install time from 12 hours down to a 11 minutes.

His Builder tool is still an internal only tool for his company, but he discussed some next steps he would like to add, including better reporting and auditing of executions. I pinged him after the conference on twitter and mentioned that Rundeck might be a good fit to fill that gap. I used Rundeck for 2 years at my last job, integrating nicely with other automation tools and providing reporting and auditing as well as access control of arbitrary jobs.

Automating cloud factories and the internet assembly line with SaltStack

Tom Hatch of Salt Stack spoke about Salt as an automation and remote execution platform. I?ve done quite a bit of work with Salt recently with a client and so I was pretty familiar with Salt. But one thing that he mentioned I didn?t know was that Salt was originally designed as a Cloud management tool, not necessarily a configuration management tool. However in the course of time configuration management became a higher priority for the Salt dev team to focus on. Tom mentioned that recently they have been working on Cloud management tools again- providing integration with Rackspace, AWS, xen, and more. I?ll have to dig more into these tools and give them a try.

How I Learned to Stop Worrying and Love DevOps

Bridget Kromhout of 8thBridge spoke on the culture of DevOps and her journey from a corporate, strictly siloed environment to a small start-up that embraced DevOps. One of the first things she brought up that was different was the focus and approach to goals of each organization. In an organization where Ops teams are strictly separate from Developers, they often butt heads and have a limited vision of priorities. Each focuses on the goals of their own team or department, and have little understanding of the goals of the other departments. This leads to an adversarial relationship and culture of not caring much about different teams or departments.

In contrast, the organization that embraces DevOps as a culture will see to it that Ops and Devs work together on whatever solution best reaches the goals of the whole organization. In doing so, barriers will have to be questioned. Any ?special snowflake? servers/applications/etc. that only one person knows and can touch can?t exist in this culture. Instead, any unique customizations need to be minimized through automation, documentation (sharing knowledge), and reporting/monitoring. This doesn?t mean root access for all- but it means reducing barriers as much as possible. Good habits from the Ops world to keep include: monitoring, robustness, security, scaling, and alerting.

The main pillars of DevOps are: culture, automation, measurement, and sharing. Culture is important and can be supportive or rejecting of the other pillars. Without a culture in the organization that supports DevOps, it will fizzle back into siloed "us vs. them" enmity.

Thanks to all the presenters and those that put on the conference. It was a great experience and I am glad I attended.

Links