All opinions expressed are those of the authors and not necessarily those of OSNews.com, our sponsors, or our affiliates.
  Add to My Yahoo!  Subscribe with Bloglines  Subscribe in NewsGator Online

published by noreply@blogger.com (Marco Matarazzo) on 2016-05-25 15:57:00 in the "nginx" category

While working on a complex project, we had to set up a caching reverse proxying image server with the ability of automatically resize any cached image on the fly.

Looking around on the Internet, I discovered that Nginx has a neat Image Filter module capable of resizing, cropping and rotating images. I decided to try and combine this with Nginx's well-known caching capabilities, to create an Nginx-only solution.

I'll describe here a sample setup to achieve a similar configuration.

Prerequisites and requisites


We obviously need to install Nginx.

Note that the Image Filter module is not installed by default on many Linux distributions, and we may have to install it as a separate module. If we're using Nginx's official repositories, it should just be a matter of installing the nginx_module_image_filter package and restarting the service.

What we want to achieve in this example configuration is to have a URL like:

http://www.example.com/image/<width>x<height>/<URL>

...that will retrieve the image at:

https://upload.wikimedia.org/<URL>

...then resize it on the fly, cache it and serve it.

Cache Storage configuration


First of all, we need to set up the cache in our main http section:

proxy_cache_path /tmp/nginx_cache levels=1:2 keys_zone=nginx_cache:10M max_size=100M inactive=40d;

This will provide us with a 10MB storage space for keys and 100MB for actual images, that will be removed after not being accessed for 40 days. These values can be tuned as needed.

Caching Proxy configuration


Next, we'll configure our front facing virtual host.

In our case, we needed the reverse proxy to live within an already existing site, and that's why we chose the /image/ path prefix.

  server {
      listen       80;
      server_name  www.example.com;
  
      location /image/ {
          proxy_pass http://127.0.0.1:20000;
          proxy_cache nginx_cache;
          proxy_cache_key "$proxy_host$uri$is_args$args";
          proxy_cache_valid 30d;
          proxy_cache_valid any 10s;
          proxy_cache_lock on;
          proxy_cache_use_stale error invalid_header timeout updating;
          proxy_http_version 1.1;
          expires 30d;
      }
  
      location / {
          # other locations we may need for the site.
          root /var/www/whatever;
      }
  
  }

Every URL starting with /image/ will be server from the cache if present, otherwise it will be proxied to our Resizing Server, and cached for 30 days.

Resizing Server configuration


Finally, we'll configure the resizing server. Here, we use a regexp to extract the width, height and URL of the image we desire.

The server will proxy the request to https://upload.wikimedia.org/ looking for the image, resize it and then serve it back to the Caching Proxy.

  server {
      listen 127.0.0.1:20000;
      server_name localhost;
  
      resolver 8.8.8.8;
  
      location ~ ^/image/([0-9]+)x([0-9]+)/(.+) {
          image_filter_buffer 20M; # Will return 415 if image is bigger than this
          image_filter_jpeg_quality 75; # Desired JPG quality
          image_filter_interlace on; # For progressive JPG
  
          image_filter resize $1 $2;
  
          proxy_pass https://upload.wikimedia.org/$3;
      }
  
  }

We may want to tune the buffer size and jpeg quality here.

Note that we may also use image_filter resize and crop options, should we need different results than just resizing.

Testing the final result


You should now be able to fire up your browser and access an URL like:

http://www.example.com/image/150x150/wikipedia/commons/0/01/Tiger.25.jpg

...and enjoy your caching, resizing, reverse proxying image server.

Optionally securing access to your image server


As a (simple) security measure to prevent abuse from unauthorized access, you can use the Secure Link module.

All we need to do is update the Resizing Server configuration, adding some lines to the location section:

  server {
      listen 127.0.0.1:20000;
      server_name localhost;
  
      resolver 8.8.8.8;
  
      location ~ ^/image/([0-9]+)x([0-9]+)/(.+) {
          secure_link $arg_auth;
          secure_link_md5 "$uri your_secret";
          if ($secure_link = "") {
              return 403;
          }
          if ($secure_link = "0") {
              return 410;
          }
  
          image_filter_buffer 20M; # Return 415 if image is bigger than this
          image_filter_jpeg_quality 75; # Desired JPG quality
          image_filter_interlace on; # For progressive JPG
  
          image_filter resize $1 $2;
  
          proxy_pass https://upload.wikimedia.org/$3;
      }
  
  }

To access your server you will now need to add an auth parameter to the request, with a secure token that can be easily calculated as an MD5 hash.

For example, to access the previous URL you can use the following bash command:

echo -n '/image/150x150/wikipedia/commons/0/01/Tiger.25.jpg your_secret' | openssl md5 -binary | openssl base64 | tr +/ -_ | tr -d =

...and the resulting URL will be:

http://www.example.com/image/150x150/wikipedia/commons/0/01/Tiger.25.jpg?auth=TwcXg954Rhkjt1RK8IO4jA


published by Eugenia on 2016-05-25 08:21:07 in the "Entertainment" category
Eugenia Loli-Queru

A few years ago, Netflix had said that by 2015, they would stop their DVD subscription, and have almost everything streaming instead. 2015 came and left, and we’re not only still have DVD subscriptions on Netflix, but their streaming service has become weaker, with fewer noted titles. At the same time, Netflix is being battled by the establishment, be it Hollywood studios, or internet and cableTV carriers.

Sure, I’d love to have Hollywood’s latest offers streaming for me via Netflix, but this is obviously never going to happen. Not for $10 per month anyway. If people could pay $25 (or even $50) per month, a more full Hollywood catalog could be offered, but that’s not going to gather a lot of subscribers because the price is too high. Creating tiers of subscribers (e.g. $10, $25 etc with different catalogs for each), will anger customers too. So what could Netflix do?

I personally see only one way out of this mess, and it’s two fold:

1. Adopt an iTunes & Amazon model, where most Hollywood movies and TV shows are offered for a rent price (e.g. $3 for a movie, $1 for a TV episode).

2. Produce in-house about 250 productions per year (instead of the current 50 or so — episodes are counted separately here).

Let’s run some numbers on a back of an envelope:
There are 30 million Netflix subscribers in the US today. Each pays $10 per month. This means it has gross sales of $3.6 billion per year. Taxes and operational costs aside, should leave the company with $2 billion to invest in its own productions.

What this means is that on average, each production can cost up to $8 million. Which is plenty of money to shoot amazing movies *if you employ the right talent*. Consider the recent and well regarded sci-fi movies “Another Earth”, and “I, Origins”, by the same director. The first one was shot for just $70k, and the second one for $1 million.

Also, considering that some TV episodes don’t necessarily have to cost more than $2 mil (at least for dramas), it means that some more expensive than the average $8mil movie productions can take place too. I certainly don’t see why the first season of “House of Cards” cost $100 million…

So anyway, every other day a new episode or a new movie can debut on Netflix, that no other service has access to. I’m personally in favor of smaller TV seasons of 6 to 8 episodes (instead of the current 10-13), with the 3 first episodes streaming immediately together, and the rest every few days. Overtime, all these new productions will accumulate, building a strong catalog.

The first couple of years might be rough, while the catalog is building, but I think it can be done successfully, since a lot of their current streaming deals will also be active for a while before they go offline from the subscriber version of Netflix (these can still re-appear on their renting side). Plus, some of these productions (e.g. documentaries) are cheap enough to license anyway, so they can still remain at the streaming side of things.
.


published by noreply@blogger.com (Matt Galvin) on 2016-05-17 13:00:00

I was recently working on a project that was using DimpleJS, which the docs describe as "An object-oriented API for business analytics powered by d3". I was using it to create a variety of graphs, some of which were line graphs. The client had requested that the line graph display the y-value of the line on the graph. This is easily accomplished with bar graphs in Dimple, however, not so easily done with line graphs.

I had spent some time Googling to find what others had done to add this functionality but could not find it anywhere. So, I read the documentation where they add labels to a bar graph, and "tweaked" it like so:

var s = myChart.addSeries(null, dimple.plot.line);
.
.
.
/*Add prices to line chart*/
s.afterDraw = function (shape, data) {
  // Get the shape as a d3 selection
  var s = d3.select(shape);
  var i = 0;
  _.forEach(data.points, function(point) {
    var rect = {
    x: parseFloat(point.x),
    y: parseFloat(point.y)
  };
  // Add a text label for the value
  if(data.markerData[i] != undefined) {
    svg.append("text")
    .attr("x", rect.x)
    .attr("y", rect.y - 10)
    // Centre align
    .style("text-anchor", "middle")
    .style("font-size", "10px")
    .style("font-family", "sans-serif")
    // Format the number
    .text(data.markerData[i].y);
  }
  i++
});

Some styling still needs to be done but you can see that the y-values are now placed on the line graph. We are using lodash on this project but if you do not want to use lodash, just replace the _.forEach (line 10)and this technique should just plug in for you.

If you're reading this it's likely you've run into the same or similar issue and I hope this helps you!


published by noreply@blogger.com (Kent K.) on 2016-05-12 17:33:00 in the "database" category

I was recently asked about options for displaying a random set of items from a table using Ruby on Rails. The request was complicated by the fact that the technology stack hadn?t been completely decided on and one of the items still up in the air was the database. I?ve had an experience with a project I was working on where the decision was made to switch from MySQL to PostgreSQL. During the switch, a sizable amount of hand constructed queries stopped functioning and had to be manually translated before they would work again. Learning from that experience, I favor avoidance of handwritten SQL in my Rails queries when possible. This precludes the option to use built-in database functions like rand() or random().

With the goal set in mind, I decided to look around to find out what other people were doing to solve similar requests. While perusing various suggested implementations, I noticed a lot of comments along the lines of ?Don?t use this approach if you have a large data set.? or ?this handles large data sets, but won?t always give a truly random result.?

These comments and the variety of solutions got me thinking about evaluating based not only on what database is in use, but what the dataset is expected to look like. I really enjoyed the mental gymnastics and thought others might as well.

Let?s pretend we?re working on an average project. The table we?ll be pulling from has several thousand entries and we want to pull back something small like 3-5 random records. The most common solution offered based on the research I performed works perfectly for this situation.

records_desired = 3
count = [OurObject.count, 1].max
offsets = records_desired.times.inject([]) do |offsets|
  offsets << rand(count)
end
while count > offsets.uniq!.size && offsets.size < records_desired do
  offsets << rand(count)
end
offsets.collect {|offset| OurObject.offset(offset).first}

Analyzing this approach, we?re looking at minimal processing time and a total of four queries. One to determine the total count and the rest to fetch each of our three objects individually. Seems perfectly reasonable.

What happens if our client needs 100 random records at a time? The processing is still probably within tolerances, but 101 queries? I say no unless our table is Dalmations! Let?s see if we can tweak things to be more large-set friendly.

records_desired = 100
count = [OurObject.count - records_desired, 1].max
offset = rand(count)
OurObject.limit(records_desired).offset(offset)

How?s this look? Very minimal processing and only 2 queries. Fantastic! But is this result going to appear random to an observer? I think it?s highly possible that you could end up with runs of related looking objects (created at similar times or all updated recently). When people say they want random, they often really mean they want unrelated. Is this solution close enough for most clients? I would say it probably is. But I can imagine the possibility that for some it might not be. Is there something else we can tweak to get a more desirable sampling without blowing processing time sky-high? After a little thought, this is what I came up with.

records_desired = 100
count = records_desired * 3
offset = rand([OurObject.count - count, 1].max)
set = OurObject.limit(count).offset(offset).pluck(:id)
OurObject.find(ids.sample(records_desired))

While this approach may not truly provide more random results from a mathematical perspective, by assembling a larger subset and pulling randomly from inside it, I think you may be able to more closely achieve the feel of what people expect from randomness if the previous method seemed to return too many similar records for your needs.


published by noreply@blogger.com (Ben Witten) on 2016-05-12 15:22:00 in the "Liquid Galaxy" category

For the last few weeks, our developers have been working on syncing our Liquid Galaxy with Sketchfab. Our integration makes use of the Sketchfab API to synchronize multiple instances of Sketchfab in the immersive and panoramic environment of the Liquid Galaxy. The Liquid Galaxy already has so many amazing capabilities, and to be able to add Sketchfab to our portfolio is very exciting for us! Sketchfab, known as the ?YouTube for 3D files,? is the leading platform to publish and find 3D and VR content. Sketchfab integrates with all major 3D creation tools and publishing platforms, and is the 3D publishing partner of Adobe Photoshop, Facebook, Microsoft HoloLens and Intel RealSense. Given that Sketchfab can sync with almost any 3D format, we are excited about the new capabilities our integration provides.

Sketchfab content can be deployed onto the system in minutes! Users from many industries use Sketchfab, including architecture, hospitals, museums, gaming, design, and education. There is a natural overlap between the Liquid Galaxy and Sketchfab, as members of all of these industries utilize the Liquid Galaxy for its visually stunning and immersive atmosphere.

We recently had Alban Denoyel, cofounder of Sketchfab, into our office to demo Sketchfab on the Liquid Galaxy. We're happy to report that Alban loved it! He told us about new features that are going to be coming out on Sketchfab soon. These features will automatically roll out to Sketchfab content on the Liquid Galaxy system, and will serve to make the Liquid Galaxy's pull with 3D modeling even greater.

We?re thrilled with how well Sketchfab works on our Liquid Galaxy as is, but we?re in the process of making it even more impressive. Some Sketchfab models take a bit of time to load (on their website and on our system), so our developers are working on having models load in the background so they can be activated instantaneously on the system. We will also be extending our Sketchfab implementation to make use of some of the features already present on Sketchfab's excellent API, including displaying model annotations and animating the models.

You can view a video of Sketchfab content on the Liquid Galaxy below. If you'd like to learn more, you can call us at 212-929-6923, or contact us here.


published by noreply@blogger.com (Patrick Lewis) on 2016-05-02 13:22:00 in the "dependencies" category
The third-party gem ecosystem is one of the biggest selling points of Rails development, but the addition of a single line to your project's Gemfile can introduce literally dozens of new dependencies. A compatibility issue in any one of those gems can bring your development to a halt, and the transition to a new major version of Rails requires even more caution when managing your gem dependencies.

In this post I'll illustrate this issue by showing the steps required to get rails_admin (one of the two most popular admin interface gems for Rails) up and running even partially on a freshly-generated Rails 5 project. I'll also identify some techniques for getting unreleased and forked versions of gems installed as stopgap measures to unblock your development while the gem ecosystem catches up to the new version of Rails.

After installing the current beta3 version of Rails 5 with gem install rails --pre and creating a Rails 5 project with rails new I decided to address the first requirement of my application, admin interface, by installing the popular Rails Admin gem. The rubygems page for rails_admin shows that its most recent release 0.8.1 from mid-November 2015 lists Rails 4 as a requirement. And indeed, trying to install rails_admin 0.8.1 in a Rails 5 app via bundler fails with a dependency error:

Resolving dependencies...
Bundler could not find compatible versions for gem "rails":
In snapshot (Gemfile.lock):
rails (= 5.0.0.beta3)

In Gemfile:
rails (< 5.1, >= 5.0.0.beta3)

rails_admin (~> 0.8.1) was resolved to 0.8.1, which depends on
rails (~> 4.0)

I took a look at the GitHub page for rails_admin and noticed that recent commits make reference to Rails 5, which is an encouraging sign that its developers are working on adding compatibility with Rails 5. Looking at the gemspec in the master branch on GitHub shows that the rails_admin gem dependency has been broadened to include both Rails 4 and 5, so I updated my app's Gemfile to install rails_admin directly from the master branch on GitHub:

gem 'rails_admin', github: 'sferik/rails_admin'

This solved the above dependency of rails_admin on Rails 4 but revealed some new issues with gems that rails_admin itself depends on:

Resolving dependencies...
Bundler could not find compatible versions for gem "rack":
In snapshot (Gemfile.lock):
rack (= 2.0.0.alpha)

In Gemfile:
rails (< 5.1, >= 5.0.0.beta3) was resolved to 5.0.0.beta3, which depends on
actionmailer (= 5.0.0.beta3) was resolved to 5.0.0.beta3, which depends on
actionpack (= 5.0.0.beta3) was resolved to 5.0.0.beta3, which depends on
rack (~> 2.x)

rails_admin was resolved to 0.8.1, which depends on
rack-pjax (~> 0.7) was resolved to 0.7.0, which depends on
rack (~> 1.3)

rails (< 5.1, >= 5.0.0.beta3) was resolved to 5.0.0.beta3, which depends on
actionmailer (= 5.0.0.beta3) was resolved to 5.0.0.beta3, which depends on
actionpack (= 5.0.0.beta3) was resolved to 5.0.0.beta3, which depends on
rack-test (~> 0.6.3) was resolved to 0.6.3, which depends on
rack (>= 1.0)

rails_admin was resolved to 0.8.1, which depends on
sass-rails (< 6, >= 4.0) was resolved to 5.0.4, which depends on
sprockets (< 4.0, >= 2.8) was resolved to 3.6.0, which depends on
rack (< 3, > 1)

This bundler output shows a conflict where Rails 5 depends on rack 2.x while rails_admin's rack-pjax dependency depends on rack 1.x. I ended up resorting to a Google search which led me to the following issue in the rails_admin repo: https://github.com/sferik/rails_admin/issues/2532

Installing rack-pjax from GitHub:

gem 'rack-pjax', github: 'afcapel/rack-pjax', branch: 'master'

resolves the rack dependency conflict, and bundle install now completes without error. Things are looking up! At least until you try to run the Rake task to rails g rails_admin:install and you're presented with this mess:

/Users/patrick/.rbenv/versions/2.3.0/lib/ruby/gems/2.3.0/gems/actionpack-5.0.0.beta3/lib/action_dispatch/middleware/stack.rb:108:in `assert_index': No such middleware to insert after: ActionDispatch::ParamsParser (RuntimeError)
from /Users/patrick/.rbenv/versions/2.3.0/lib/ruby/gems/2.3.0/gems/actionpack-5.0.0.beta3/lib/action_dispatch/middleware/stack.rb:80:in `insert_after'

This error is more difficult to understand, especially given the fact that the culprit (the remotipart gem) is not actually mentioned anywhere in the error. Thankfully, commenters on the above-mentioned rails_admin issue #2532 were able to identify the remotipart gem as the source of this error and provide a link to a forked version of that gem which allows rails_admin:install to complete successfully (albeit with some functionality still not working).

In the end, my Gemfile looked something like this:

gem 'rails_admin', github: 'sferik/rails_admin'
# Use github rack-pjax to fix dependency versioning issue with Rails 5
# https://github.com/sferik/rails_admin/issues/2532
gem 'rack-pjax', github: 'afcapel/rack-pjax'
# Use forked remotipart until following issues are resolved
# https://github.com/JangoSteve/remotipart/issues/139
# https://github.com/sferik/rails_admin/issues/2532
gem 'remotipart', github: 'mshibuya/remotipart', ref: '3a6acb3'

A total of three unreleased versions of gems, including the forked remotipart gem that breaks some functionality, just to get rails_admin installed and up and running enough to start working with. And some technical debt in the form of comments about follow-up tasks to revisit the various gems as they have new versions released for Rails 5 compatibility.

This process has been a reminder that when working in a Rails 4 app it's easy to take for granted the ability to install gems and have them 'just work' in your application. When dealing with pre-release versions of Rails, don't be surprised when you have to do some investigative work to figure out why gems are failing to install or work as expected.

My experience has also underscored the importance of understanding all of your application's gem dependencies and having some awareness of their developers' intentions when it comes to keeping their gems current with new versions of Rails. As a developer it's in your best interest to minimize the amount of dependencies in your application, because adding just one gem (which turns out to have a dozen of its own dependencies) can greatly increase the potential for encountering incompatibilities.

published by noreply@blogger.com (Greg Sabino Mullane) on 2016-04-29 00:04:00 in the "postgres" category

Postgres has a wonderful feature called concurrent indexes. It allows you to create indexes on a table without blocking reads OR writes, which is quite a handy trick. There are a number of circumstances in which one might want to use concurrent indexes, the most common one being not blocking writes to production tables. There are a few other use cases as well, including:


Photograph by Nicholas A. Tonelli

  • Replacing a corrupted index
  • Replacing a bloated index
  • Replacing an existing index (e.g. better column list)
  • Changing index parameters
  • Restoring a production dump as quickly as possible

In this article, I will focus on that last use case, restoring a database as quickly as possible. We recently upgraded a client from a very old version of Postgres to the current version (9.5 as of this writing). The fact that use of pg_upgrade was not available should give you a clue as to just how old the "very old" version was!

Our strategy was to create a new 9.5 cluster, get it optimized for bulk loading, import the globals and schema, stop write connections to the old database, transfer the data from old to new, and bring the new one up for reading and writing.

The goal was to reduce the application downtime as much as reasonably possible. To that end, we did not want to wait until all the indexes were created before letting people back in, as testing showed that the index creations were the longest part of the process. We used the "--section" flags of pg_dump to create pre-data, data, and post-data sections. All of the index creation statements appeared in the post-data file.

Because the client determined that it was more important for the data to be available, and the tables writable, than it was for them to be fully indexed, we decided to try using CONCURRENT indexes. In this way, writes to the tables could happen at the same time that they were being indexed - and those writes could occur as soon as the table was populated. That was the theory anyway.

The migration went smooth - the data was transferred over quickly, the database was restarted with a new postgresql.conf (e.g. turn fsync back on), and clients were able to connect, albeit with some queries running slower than normal. We parsed the post-data file and created a new file in which all the CREATE INDEX commands were changed to CREATE INDEX CONCURRENTLY. We kicked that off, but after a certain amount of time, it seemed to freeze up.


The frogurt is also cursed.

Looking closer showed that the CREATE INDEX CONCURRENTLY statement was waiting, and waiting, and never able to complete - because other transactions were not finishing. This is why concurrent indexing is both a blessing and a curse. The concurrent index creation is so polite that it never blocks writers, but this means processes can charge ahead and be none the wiser that the create index statement is waiting on them to finish their transaction. When you also have a misbehaving application that stays "idle in transaction", it's a recipe for confusion. (Idle in transaction is what happens when your application keeps a database connection open without doing a COMMIT or ROLLBACK). A concurrent index can only completely finish being created once any transaction that has referenced the table has completed. The problem was that because the create index did not block, the app kept chugging along, spawning new processes that all ended up in idle in transaction.

At that point, the only way to get the concurrent index creation to complete was to forcibly kill all the other idle in transaction processes, forcing them to rollback and causing a lot of distress for the application. In contrast, a regular index creation would have caused other processes to block on their first attempt to access the table, and then carried on once the creation was complete, and nothing would have to rollback.

Another business decision was made - the concurrent indexes were nice, but we needed the indexes, even if some had to be created as regular indexes. Many of the indexes were able to be completed (concurrently) very quickly - and they were on not-very-busy tables - so we plowed through the index creation script, and simply canceled any concurrent index creations that were being blocked for too long. This only left a handful of uncreated indexes, so we simply dropped the "invalid" indexes (these appear when a concurrent index creation is interrupted), and reran with regular CREATE INDEX statements.

The lesson here is that nothing comes without a cost. The overly polite concurrent index creation is great at letting everyone else access the table, but it also means that large complex transactions can chug along without being blocked, and have to have all of their work rolled back. In this case, things worked out as we did 99% of the indexes as CONCURRENT, and the remaining ones as regular. All in all, the use of concurrent indexes was a big win, and they are still an amazing feature of Postgres.


published by noreply@blogger.com (Elizabeth Garrett) on 2016-04-27 18:59:00 in the "community" category

We all love a good ending. I was happy to hear that one of End Point?s clients, Cybergenetics, was involved in a case this week to free a falsely imprisoned man, Darryl Pinkins.

Darryl was convicted of a crime in Indiana in 1991. In 1995 Pinkins sought the help of the Innocence Project. His attorney Frances Watson and her students turned to Cybergenetics and their DNA interpretation technology called TrueAllele® Casework. The TrueAllele DNA identification results exonerated Pinkins. The Indiana Court of Appeals dropped all charges against Pinkins earlier this week and he walked out of jail a free man after fighting for 24 years to clear his name.

TrueAllele can separate out the people who contributed their DNA to a mixed DNA evidence sample. It then compares the separated out DNA identification information to other reference or evidence samples to see if there is a DNA match.

End Point has worked with Cybergenetics since 2003 and consults with them on security, database infrastructure, and website hosting. We congratulate Cybergenetics on their success in being part of the happy ending for Darryl Pinkins and his family!

More of the story is available at Cybergenetics? Newsroom or the Chicago Tribune.


published by noreply@blogger.com (Yaqi Chen) on 2016-04-27 17:49:00 in the "Liquid Galaxy" category

Nowadays, virtual reality is one of the hottest topics in tech, with VR enabling users to enter immersive environments built up by computer technology. I attended Mobile World Congress 2016 a few weeks ago, and it was interesting to see people sit next to one another and totally ignore one another while they were individually immersed in their own virtual reality worlds.

When everyone is so addicted to their little magic boxes, they tend to lose their connections with people around them. End Point has developed a new experience in which users can watch and share their virtually immersive world together. This experience is called the Liquid Galaxy.

When a user stands in front of Liquid Galaxy and is surrounded by a multitude of huge screens arranged in a semicircle, he puts not only his eyes but his whole body into an unprecedented 3D space. These screens are big enough to cover the audience?s entire peripheral vision and bring great visual stimulation from all directions. When using the Liquid Galaxy system, the users become fully immersed in the system and the imagery they view.


Movie Night at End Point

This digital chamber can be considered a sort of VR movie theater, where an audience can enjoy the same content, and probably the same bucket of popcorn! While this setup makes the Liquid Galaxy a natural fit for any sort of exhibit, many End Point employees have also watched full length feature movies on the system during our monthly Movie Night at our Headquarters office in Manhattan. This sort of shared experience is not something that is possible on typical VR, because unlike VR the Liquid Galaxy is serving a larger audience and presenting stories in a more interactive way.


For most meetings, exhibitions, and other special occasions, the Liquid Galaxy helps to provide an amazing and impactful experience to the audience. Any scenario can be built for users to explore, and geospatial data sets can be presented immersively.

With the ability to serve a group of people simultaneously, Liquid Galaxy increases the impact of content presentation and brings a revolutionary visual experience to its audiences. If you'd like to learn more, you can call us at 212-929-6923, or contact us here.


published by noreply@blogger.com (Ben Witten) on 2016-04-22 21:52:00 in the "Liquid Galaxy" category

The Liquid Galaxy, an immersive and panoramic presentation tool, is the perfect fit for any time you want to grab the attention of your audience and leave a lasting impression. The system has applications in a variety of industries (which include museums and aquariums, hospitality and travel, research libraries at universities, events, and real estate, to name a few) but no industry's demand rivals the popularity seen in real estate.

The Liquid Galaxy provides an excellent tool for real estate brokerages and land use agencies to showcase their properties with multiple large screens showing 3D building models and complete Google Earth data. End Point can configure the Liquid Galaxy to highlight specific buildings, areas on the map, or any set of correlated land use data, which can then be shown in a dazzling display that forms the centerpiece of a conference room or lobby. We can program the Liquid Galaxy to show floor plans, panoramic interior photos, and even Google Street View ?walking tours? around a given property.

A Liquid Galaxy in your office will provide your firm with a sophisticated and cutting edge sales tool. You will depart from the traditional ways of viewing, presenting, and even managing real estate sites by introducing your clients to multiple prime locations and properties in a wholly unique, professional and visually stunning manner. We can even highlight amenities such as mass transit, road usage, and basic demographic data for proper context.

The Liquid Galaxy allows your clients an in-depth contextual tour of multiple listings in the comfort of your office without having to travel to multiple locations. Liquid Galaxy brings properties to the client instead of taking the client to every property. This saves time and energy for both you and your prospective clients, and sets your brokerage apart as a technology leader in the market.

If you'd like to learn more about the Liquid Galaxy, you can call us at 212-929-6923, or contact us here.


published by noreply@blogger.com (Peter Hankiewicz) on 2016-04-21 23:00:00 in the "AngularJS" category

Introduction

The current state of development for web browsers is still problematic. We have multiple browsers, each browser has plenty of versions. There are multiple operating systems and devices that can be used. All of this makes it impossible to be sure that our code will work on every possible browser and system (unfortunately). With proper testing, we can make our product stable and good enough for production, but we can't expect that everything will go smoothly, well, it won't. He is always somewhere, a guy sitting in his small office and using outdated software, Internet Explorer 6 for example. Usually you want to try to support as many as possible users, here, I will explain how to help find them. Then you just need to decide if it is worth fixing an issue for them.

Browser errors logging

What can really help us and is really simple to do is browser error logging. Every time an error occurs on the client side (browser will generate an error that the user most likely won't see), we can log this error on the server side, even with a stack trace. Let's see an example:

window.onerror = function (errorMsg, url, lineNumber, column, errorObj) {
    $.post('//your.domain/client-logs', function () {
        errorMsg: errorMsg,
        url: url,
        lineNumber: lineNumber,
        column: column,
        errorObj: errorObj
    });
        
    // Tell browser to run its own error handler as well   
    return false;
};

What do we have here? We bind a function to the window.onerror event. Every time an error occurs this function will be called. Some arguments are passed together:

  • errorMsg - this is an error message, usually describing why an error occurred (for example: "Uncaught ReferenceError: heyyou is not defined"),
  • url - current url location,
  • lineNumber - script line number where an error happened,
  • column - the same as above but about column,
  • errorObj - the most important part here, an error object with a stack trace included.

What to do with this data? You will probably want to send it to a server and save it, to be able to go through this log from time to time like we do in our example:

$.post('//your.domain/client-logs', function () {
    errorMsg: errorMsg,
    url: url,
    lineNumber: lineNumber,
    column: column,
    errorObj: errorObj
});

It's very helpful, usually with proper unit and functional testing errors generated are minor, but sometimes you may find a critical issue before a bigger number of clients will actually discover it. It is a big profit.

JSNLog

JSNLog is a library that helps with client error logging. You can find it here: http://jsnlog.com/. I can fully recommend using this one, it can also do the AJAX calls, timeout handling, and many more.

Client error notification

If you want to be serious and professional every issue should be reported to a user in some way. On the other side, it's sometimes dangerous to do if the user will be spammed with information that an error occurred because of some minor error. It's not easy to find the best solution because it's not easy to identify an error priority.

Just from experience, if you have a system where users are logged on, you can create a simple script that will send an email to a user with a question regarding an issue. You can set up a limit value to avoid sending too many messages. If the user will be interested he can always reply and explain an issue. Usually the user will appreciate this interest.

Errors logging in Angular

It's worth mentioning how we can handle error logging in the Angular framework, with useful stack traces and error descriptions. See an example below:

First we need to override default log functions in Angular:

angular.module('logToServer', [])
  .service('$log', function () {
    this.log = function (msg) {
      JL('Angular').trace(msg);
    };
    this.debug = function (msg) {
      JL('Angular').debug(msg);
    };
    this.info = function (msg) {
      JL('Angular').info(msg);
    };
    this.warn = function (msg) {
      JL('Angular').warn(msg);
    };
    this.error = function (msg) {
      JL('Angular').error(msg);
    };
  });

Then override exception handler to use our function:

factory('$exceptionHandler', function () {
    return function (exception, cause) {
      JL('Angular').fatalException(cause, exception);
      throw exception;
    };
  });

We also need an interceptor to handle AJAX call errors. This time we need to override $q object like this:

factory('logToServerInterceptor', ['$q', function ($q) {
    var myInterceptor = {
      'request': function (config) {
          config.msBeforeAjaxCall = new Date().getTime();

          return config;
      },
      'response': function (response) {
        if (response.config.warningAfter) {
          var msAfterAjaxCall = new Date().getTime();
          var timeTakenInMs = msAfterAjaxCall - response.config.msBeforeAjaxCall;

          if (timeTakenInMs > response.config.warningAfter) {
            JL('Angular.Ajax').warn({ 
              timeTakenInMs: timeTakenInMs, 
              config: response.config, 
              data: response.data
            });
          }
        }

        return response;
      },
      'responseError': function (rejection) {
        var errorMessage = "timeout";
        if (rejection && rejection.status && rejection.data) {
          errorMessage = rejection.data.ExceptionMessage;
        }
        JL('Angular.Ajax').fatalException({ 
          errorMessage: errorMessage, 
          status: rejection.status, 
          config: rejection.config }, rejection.data);
        
          return $q.reject(rejection);
      }
    };

    return myInterceptor;
  }]);

How it looks all together:

angular.module('logToServer', [])
  .service('$log', function () {
    this.log = function (msg) {
      JL('Angular').trace(msg);
    };
    this.debug = function (msg) {
      JL('Angular').debug(msg);
    };
    this.info = function (msg) {
      JL('Angular').info(msg);
    };
    this.warn = function (msg) {
      JL('Angular').warn(msg);
    };
    this.error = function (msg) {
      JL('Angular').error(msg);
    };
  })
  .factory('$exceptionHandler', function () {
    return function (exception, cause) {
      JL('Angular').fatalException(cause, exception);
      throw exception;
    };
  })
  .factory('logToServerInterceptor', ['$q', function ($q) {
    var myInterceptor = {
      'request': function (config) {
          config.msBeforeAjaxCall = new Date().getTime();

          return config;
      },
      'response': function (response) {
        if (response.config.warningAfter) {
          var msAfterAjaxCall = new Date().getTime();
          var timeTakenInMs = msAfterAjaxCall - response.config.msBeforeAjaxCall;

          if (timeTakenInMs > response.config.warningAfter) {
            JL('Angular.Ajax').warn({ 
              timeTakenInMs: timeTakenInMs, 
              config: response.config, 
              data: response.data
            });
          }
        }

        return response;
      },
      'responseError': function (rejection) {
        var errorMessage = "timeout";
        if (rejection && rejection.status && rejection.data) {
          errorMessage = rejection.data.ExceptionMessage;
        }
        JL('Angular.Ajax').fatalException({ 
          errorMessage: errorMessage, 
          status: rejection.status, 
          config: rejection.config }, rejection.data);
        
          return $q.reject(rejection);
      }
    };

    return myInterceptor;
  }]);

This should handle most of the errors that could happen in the Angular framework. Here I used the JSNLog library to handle sending logs to a server.

Almost the end

There are multiple techniques for logging errors on a client side. It does not really matter which one you choose, it only matters that you do it. Especially when it's really a little amount of time to invest and make it work and a big profit in the end.


published by noreply@blogger.com (Phin Jensen) on 2016-04-19 17:21:00 in the "Conference" category

Another talk from MountainWest RubyConf that I enjoyed was How to Build a Skyscraper by Ernie Miller. This talk was less technical and instead focused on teaching principles and ideas for software development by examining some of the history of skyscrapers.

Equitable Life Building

Constructed from 1868 to 1870 and considered by some to be the first skyscraper, the Equitable Life Building was, at 130 feet, the tallest building in the world at the time. An interesting problem arose when designing it: it was too tall for stairs. If a lawyer?s office was on the seventh floor of the building, he wouldn?t want his clients to walk up six flights of stairs to meet with him.

Elevators and hoisting systems existed at the time, but they had one fatal flaw: there were no safety systems if the rope broke or was cut. While working on converting a sawmill to a bed frame factory, a man named Elisha Otis had the idea for a system to stop an elevator if its rope is cut. He and his sons designed the system and implemented it at the factory. At the time, he didn?t think much of the design, and didn?t patent it or try to sell it.

Otis? invention became popular when he showcased it at the 1854 New York World?s Fair with a live demo. Otis stood in front of a large crowd on a platform and ordered the rope holding it to be cut. Instead of plummeting to the ground, the platform was caught by the safety system after falling only a few inches.

Having a way to safely and easily travel up and down many stories literally flipped the value propositions of skyscrapers upside down. Where lower floors were desired more because they were easy to access, higher floors are now more coveted, since they are easy to access but get the advantages that come with height, such as better air, light, and less noise. A solution that seems unremarkable to you might just change everything for others.

When the Equitable Life Building was first constructed, it was described as fireproof. Unfortunately, it didn?t work out quite that way. On January 9, 1912, the timekeeper for a cafe in the building started his day by lighting the gas in his office. Instead of disposing properly of the match, he distractedly threw it into the trashcan. Within 10 minutes, the entire office was engulfed in flame, which spread to the rest of the building, completely destroying it and killing six people.

Never underestimate the power of people to break what you build.

Home Insurance Building

The Home Insurance Building, constructed in 1884, was the first building to use a fireproof metal frame to bear the weight of the building, as opposed to using load-bearing masonry. The building was designed by William LeBaron Jenney, who was struck by inspiration when his wife placed a heavy book on top of a birdcage. From Wikipedia:

"
According to a popular story, one day he came home early and surprised his wife who was reading. She put her book down on top of a bird cage and ran to meet him. He strode across the room, lifted the book and dropped it back on the bird cage two or three times. Then, he exclaimed: ?It works! It works! Don?t you see? If this little cage can hold this heavy book, why can?t an iron or steel cage be the framework for a whole building??
"

With this idea, he was able to design and build the Home Insurance Building to be 10 stories and 138 feet tall while only weighing 1/3rd of what the same building in stone would weigh because he was able to Find inspiration from unexpected places.

Monadnock Building

The Monadnock Building was designed by Daniel Burnham and John Wellborn Root. Burnham preferred simple and functional designs and was known for his stinginess while Root was more artistically inclined and known for his detailed ornamentation on building designs. Despite their philosophical differences, they were one of the world?s most successful architectural firms.

One of the initial sketches (shown) for the building included Ancient Egyptian-inspired ornamentation with slight flaring at the top. Burnham didn?t like the design, as illustrated in a letter he wrote to the property manager:

"
My notion is to have no projecting surfaces or indentations, but to have everything flush .... So tall and narrow a building must have some ornament in so conspicuous a situation ... [but] projections mean dirt, nor do they add strength to the building ... one great nuisance [is] the lodgment of pigeons and sparrows.
"

While Root was on vacation, Burnham worked to re-design the building to be straight up-and-down with no ornamentation. When Root returned, he initially objected to the design but eventually embraced it, declaring that the heavy lines of the Egyptian pyramids captured his imagination. We can learn a simple lesson from this: Learn to embrace constraints.

When construction was completed in 1891, the building was a total of 17 stories (including the attic) and 215 feet tall. At the time, it was the tallest commercial structure in the world. It is also the tallest load-bearing brick building constructed. In fact, to support the weight of the entire building, the walls at the bottom had to be six feet (1.8 m) wide.

Because of the soft soil of Chicago and the weight of the building, it was designed to settle 8 inches into the ground. By 1905, it had settled that much and several inches more, which led to the reconstruction of the first floor. By 1948, it had settled 20 inches, making the entrance a step down from the street. If you only focus on profitability, don?t be surprised when you start sinking.

Fuller Flatiron Building

The Flatiron building, constructed in 1902, was also designed by Daniel Burnham, although Root had died of pneumonia during the construction of the Monadnock building. The Flatiron building presented an interesting problem because it was to be built on an odd triangular plot of land. In fact, the building was only 6 and a half feet wide at the tip, which obviously wouldn?t work with the load-bearing masonry design of the Monadnock building.

So the building was constructed using a steel-frame structure that would keep the walls to a practical size and allow them to fully utilize the plot of land. The space you have to work with should influence how you build and you should choose the right materials for the job.

During construction of the Flatiron building, New York locals called it ?Burnham?s Folly? and began to place bets on how far the debris would fall when a wind storm came and knocked it over. However, an engineer named Corydon Purdy had designed a steel bracing system that would protect the building from wind four times as strong as it would ever feel. During a 60-mph windstorm, tenants of the building claimed that they couldn?t feel the slightest vibration inside the building. This gives us another principle we can use: Testing makes it possible to be confident about what we build, even when others aren?t.

40 Wall Street v. Chrysler Building


40 Wall Street
Photo by C R, CC BY-SA 2.0

The stories of 40 Wall Street and the Chrysler Building start with two architects, William Van Alen and H. Craig Severance. Van Alen and Severance established a partnership together in 1911 which became very successful. However, as time went on, their personal differences caused strain in the relationship and they separated on unfriendly terms in 1924. Soon after the partnership ended, they found themselves to be in competition with one another. Severance was commissioned to design 40 Wall Street while Van Alen would be designing the Chrysler Building.

The Chrysler Building was initially announced in March of 1929, planned to be built 808 feet tall. Just a month later, Severance was one-upping Van Alen by announcing his design for the building, coming in at 840 feet. By October, Van Alen announced that the steel work of the Chrysler Building was finished, putting it as the tallest building in the world, over 850 feet tall. Severance wasn?t particularly worried, as he already had plans in motion to build higher. Even after reports came in that the Chrysler Building had a 60-foot flagpole at the top, Severance made more changes for 40 Wall Street to be taller than the Chrysler Building. These plans were enough for the press to announce that 40 Wall Street had won the race to build highest since construction of the Chrysler Building was too far along to be built any higher.


The Chrysler Building
Photo by Chris Parker, CC BY-ND 2.0

Unfortunately for Severance, the 60-foot flagpole wasn?t a flagpole at all. Instead, it was part of an 185-foot steel spire which Van Alen had designed and had built and shipped to the construction site in secret. On October 23rd, 1929, the pieces of the spire were hoisted to the top of the building and installed in just 90 minutes. The spire was initially mistaken for a crane, and it wasn?t until 4 days after it was installed that the spire was recognized as a permanent part of the building, making it the tallest in the world. When all was said and done, 40 Wall Street was came in at 927 feet, with a cost of $13,000,000, while the Chrysler Building finished at 1,046 feet and cost $14,000,000.

There are two morals we can learn from this story: There is opportunity for great work in places nobody is looking and big buildings are expensive, but big egos are even more so.

Empire State Building

The Empire State Building was built in just 13 months, from March 17, 1930, to April 11, 1931. Its primary architects were Richmond Shreve and William Lamb, who were part of the team assembled by Severance to design 40 Wall Street. They were joined by Arthur Harmon to form Shreve, Lamb, & Harmon. Lamb?s partnership with Shreve was not unlike that of Van Alen and Severance or Burnham and Root. Lamb was more artistic in his architecture, but he was also pragmatic, using his time and design constraints to shape the design and characteristics of the building.

Lamb completed the building drawings in just two weeks, designing from the top down, which was a very unusual method. When designing the building, Lamb made sure that even when he was making concessions, using the building would be a pleasant experience for those who mattered. Lamb was able to complete the design so quickly because he reused previous work, specifically the Reynolds Building in Winston-Salem, NC, and the Carew Tower in Cincinnati, Ohio.

In November of 1929, Al Smith, who commissioned the building as head of Empire State, Inc., announced that the company had purchased land next to the plot where the construction would start, in order to build higher. Shreve, Lamb, and Harmon were opposed to this idea since it would force tenants of the top floors to switch elevators on the way up, and they were focused on making the experience as pleasant as possible.

John Raskob, one of the main people financing the building, wanted the building to be taller. While looking at a small model of the building, he reportedly said ?What this building needs is a hat!? and proposed his idea of building a 200-foot mooring tower for a zeppelin at the top of the building, despite several problems such as high winds making the idea unfeasible. But Raskob felt that he had to build the tallest building in the world, despite all of the problems and the higher cost that a taller building would introduce because people can rationalize anything.

There are two more things we should note about the story of the Empire State building. First, despite the fact that it was designed top-to-bottom, it wasn?t built like that. No matter how a something is designed, it needs to be built from the bottom up. Second, the Empire State Building was a big accomplishment in architecture and construction, but at no small cost. Five people died during the construction of the building, and that may seem like a small number considering the scale of the project, but we should remember that no matter how important speed is, it?s not worth losing people over.

United Nations Headquarters

The United Nations Headquarters were constructed between 1948 and 1952. It wasn?t built to be particularly tall?less than half the height of the Empire State Building?but it came with its own set of problems. As you can see in the picture, the building had a lot of windows. The wide faces of the building are almost completely covered in windows. These windows offer great lighting and views, but when the sun shines on them, they generate a lot of heat, not unlike a greenhouse. Unless you?re building a greenhouse, you probably don?t want that. It doesn?t matter how pretty your building is if nobody wants to occupy it.

The solution to the problem was created years before by an engineer named Willis Carrier, who created an ?Apparatus for Treating Air? (now called an air conditioner) to keep the paper in a printing press from being wrinkled. By creating this air conditioner, Carrier didn?t just make something cool. He made something cool that everyone can use. Without it, buildings like then UNHQ could never have been built.

Willis (or Sears) Tower


Willis Tower, left

The Willis tower was built between 1970 and 1973. Fazlur Rahman Khan was hired as the structural engineer for the Willis Tower, which needed to be very tall in order to house all of the employees of Sears. A steel frame design wouldn?t work well in Chicago (also known as the Windy City) since they tend to bend and sway in heavy winds, which can cause discomfort for people on higher floors, even causing sea-sickness in some cases.

To solve the problem, Khan invented a ?bundled tube structure?, which put the structure of a building on the outside as a thin tube. Using the tube structure not only allowed Khan to build a higher tower, but it also increased floor space and cost less per unit area. But these innovations only came because Khan realized that the higher you build, the windier it gets.

Taipei 101

Taipei 101 was constructed from 1999 to 2004 near the Pacific Ring of Fire, which is the most seismically active part of the world. Earthquakes present very different problems from the wind since they affect a building at its base, instead of the top. Because of the location of the building it needed to be able to withstand both typhoon-force winds (up to 130 mph) and extreme earthquakes, which meant that it had to be designed to be both structurally strong and flexible.

To accomplish this, the building was constructed with high-performance steel, 36 columns, and 8 ?mega-columns? packed with concrete connected by outrigger trusses which acted similarly to rubber bands. During the construction of the building, Taipei was hit by a 6.8-magnitude earthquake which destroyed smaller buildings around the skyscraper, and even knocked cranes off of the incomplete building, but when the building was inspected it was found to have no structural damage. By being rigid where it has to be and flexible where it can afford to be, Taipei 101 is one of the most stable buildings ever constructed.

Of course, being flexible introduces the problem of discomfort for people in higher parts of the building. To solve this problem, Taipei 101 was built with a massive 728-ton (1,456,000 lb) tuned mass damper, which helps to offset the swaying of the building in strong winds. We can learn from this damper: When the winds pick up, it?s good to have someone (or something) at the top pulling for you.

Burj Khalifa

The newest and tallest building on our list, the Burj Khalifa was constructed from 2004 to 2009. With the Burj Khalifa, design problems came with incorporating adequate safety features. After the terrorist attacks of September 11, 2001, the problem of evacuation became more prominent in construction and design of skyscrapers. When it comes to an evacuation, stairs are basically the only way to go, and going down many flights of stairs can be as difficult as going up them?especially if the building is burning around you. The Burj Khalifa is nearly twice as tall as the old World Trade Center, and in the event of an emergency, walking down nearly half a mile of stairs won?t work out.

So how do the people in the Burj Khalifa get out in an emergency? Well, they don?t. Instead, the Burj Khalifa is designed with periodic safe rooms protected by reinforced concrete and fireproof sheeting that will protect people inside for up to two hours of during a fire. Each room has a dedicated supply of air, which is delivered through fire resistant pipes. These safe rooms are placed every 25 floors or so, because a safe space won?t do good if it can?t be reached.

You may know that the most common cause of death in a fire is actually smoke inhalation, not the fire itself. To deal with this, the Burj Khalifa has a network of high powered fans throughout which will blow clean air from outside into the building and keep the stairwells leading to the safe rooms clear of smoke. A very important part of this is pushing the smoke out of the way, eliminating the toxic elements.

It?s important to remember that these safe spaces, as useful as they may be, are not a substitute for rescue workers coming to aid the people trapped in the building. The safe rooms are only there to protect people who can?t help themselves until help can come. Because, after all, what we build is only important because of the people who use it.

Thanks to Ernie Miller for a great talk! The video is also available on YouTube.


published by noreply@blogger.com (Kamil Ciemniewski) on 2016-04-12 11:16:00 in the "classifiers" category

Previous in series:

In my last article I presented an approach that simplifies computations of very complex probability models. It makes these complex models viable by shrinking the amount of needed memory and improving the speed of computing probabilities. The approach we were exploring is called the Naive Bayes model.

The context was the e-commerce feature in which a user is presented with the promotion box. The box shows the product category the user is most likely to buy.

Though the results we got were quite good, I promised to present an approach that gives much better ones. While the Naive Bayes approach may not be acceptable in some scenarios due to the gap between approximated and real values, the approach presented in this article will make this distance much, much smaller.

Naive Bayes as a simple Bayesian Network

When exploring the Naive Bayes model, we said that there is a probabilistic assumption the model makes in order to simplify the computations. In the last article I wrote:

"

The Naive Bayes assumption says that the distribution factorizes the way we did it only if the features are conditionally independent given the category.

"

Expressing variable dependencies as a graph

Let's imagine the visual representation of the relations between the random variables in the Naive Bayes model. Let's make it into a directed acyclic graph. Let's mark the dependence of one variable on another as a graph edge from the parent node pointing to it's dependent node.

Because of the assumption the Naive Bayes model enforces, its structure as a graph looks like the following:

You can notice there are no lines between all the "evidence" nodes. The assumption says that knowing the category, we have all needed knowledge about every single evidence node. This makes category the parent of all the other nodes. Intuitively, we can say that knowing the class (in this example, the category) we know everything about all features. It's easy to notice that this assumption doesn't hold in this example.

In our fake data generator, we made it so that e.g. relationship status depends on age. We've also made the category depend on sex and age directly. This way we can't say that knowing category we know everything about e. g. age. The random variables age and sex are not independent even if we know the value of category. It is clear that the above graph does not model the dependency relationships between these random variables.

Let's draw a graph that represents our fake data model better:

The combination of a graph like the one above and the probability distribution that follows the independencies it describes are known as a Bayesian Network.

Using the graph representation in practice - the chain rule for Bayesian Networks

The fact that our distribution is part of the Bayesian Network, allows us to use the formula for simplifying the distribution itself. The formula is called the chain rule for Bayesian Networks and for our particular example looks like the following:

p(cat, sex, age, rel, loc) = p(sex) * p(age) * p(loc) * p(rel | age) * p(cat | sex, age)

You can notice that the equation is just a product of a number of factors. There's one factor for each random variable. The factors for variables that in the graph don't have any parents are expressed as p(var) while those that do are expressed as p(var | par) or p(var | par1, par2...).

Notice that the Naive Bayes model fits perfectly into this equation. If you were to take the first graph presented in this article ? for the Naive Bayes, and use the above equation, you'd get exactly the formula we used in the last article.

Coding the updated probabilistic model

Before going further, I strongly advise you to make sure you read the previous article - about the Naive Bayes model - to fully understand the classes used in the code in this section.

Let's take our chain rule equation and simplify it:

p(cat, sex, age, rel, loc) = p(sex) * p(age) * p(loc) * p(rel | age) * p(cat | sex, age)

Again a conditional distrubution can be expressed as:

p(a | b) = p(a, b) / p(b)

This gives us:

p(cat, sex, age, rel, loc) = p(sex) * p(age) * p(loc) * (p(rel, age)/ p(age)) * (p(cat, sex, age) / p(sex, age))

We can easily factor out the p(age) with:

p(cat, sex, age, rel, loc) = p(sex) * p(loc) * p(rel, age) * (p(cat, sex, age) / p(sex, age))

Let's define needed random variables and factors:

category = RandomVariable.new :category, [ :veggies, :snacks, :meat, :drinks, :beauty, :magazines ]
age      = RandomVariable.new :age,      [ :teens, :young_adults, :adults, :elders ]
sex      = RandomVariable.new :sex,      [ :male, :female ]
relation = RandomVariable.new :relation, [ :single, :in_relationship ]
location = RandomVariable.new :location, [ :us, :canada, :europe, :asia ]

loc_dist     = Factor.new [ location ]
sex_dist     = Factor.new [ sex ]
rel_age_dist = Factor.new [ relation, age ]
cat_age_sex_dist = Factor.new [ category, age, sex ]
age_sex_dist = Factor.new [ age, sex ]

full_dist = Factor.new [ category, age, sex, relation, location ]

The learning part is as trivial as in the Naive Bayes case. The only difference is the set of distributions involved:

Model.generate(1000).each do |user|
  user.baskets.each do |basket|
    basket.line_items.each do |item|
      loc_dist.observe! location: user.location
      sex_dist.observe! sex: user.sex
      rel_age_dist.observe! relation: user.relationship, age: user.age
      cat_age_sex_dist.observe! category: item.category, age: user.age, sex: user.sex
      age_sex_dist.observe! age: user.age, sex: user.sex
      full_dist.observe! category: item.category, age: user.age, sex: user.sex,
        relation: user.relationship, location: user.location
    end
  end
end

The inference part is also very similar to the one from the previous article. Here too the only difference are the distributions involved:

infer = -> (age, sex, rel, loc) do
  all = category.values.map do |cat|
    pl  = loc_dist.value_for location: loc
    ps  = sex_dist.value_for sex: sex
    pra = rel_age_dist.value_for relation: rel, age: age
    pcas = cat_age_sex_dist.value_for category: cat, age: age, sex: sex
    pas = age_sex_dist.value_for age: age, sex: sex
    { category: cat, value: (pl * ps * pra * pcas) / pas }
  end

  all_full = category.values.map do |cat|
    val = full_dist.value_for category: cat, age: age, sex: sex,
            relation: rel, location: loc
    { category: cat, value: val }
  end

  win      = all.max      { |a, b| a[:value] <=> b[:value] }
  win_full = all_full.max { |a, b| a[:value] <=> b[:value] }

  puts "Best match for #{[ age, sex, rel, loc ]}:"
  puts "   #{win[:category]} => #{win[:value]}"
  puts "Full pointed at:"
  puts "   #{win_full[:category]} => #{win_full[:value]}nn"
end

The results

Now let's run the inference procedure with the same set of examples as in the previous post to compare the results:

infer.call :teens, :male, :single, :us
infer.call :young_adults, :male, :single, :asia
infer.call :adults, :female, :in_relationship, :europe
infer.call :elders, :female, :in_relationship, :canada

Which yields:

Best match for [:teens, :male, :single, :us]:
   snacks => 0.020610837341908994
Full pointed at:
   snacks => 0.02103999999999992

Best match for [:young_adults, :male, :single, :asia]:
   meat => 0.001801062449999991
Full pointed at:
   meat => 0.0010700000000000121

Best match for [:adults, :female, :in_relationship, :europe]:
   beauty => 0.0007693377820183494
Full pointed at:
   beauty => 0.0008300000000000074

Best match for [:elders, :female, :in_relationship, :canada]:
   veggies => 0.0024346445741176875
Full pointed at:
   veggies => 0.0034199999999999886

Just as with using the Naive Bayes, we got correct values for all cases. When you look closer though, you can notice that the resulting probability values were much closer to the original, full distribution ones. The approach we took here makes the values differ only a couple times in 10000. That result could make a difference in the e-commerce shop from the example if it were visited by millions of customers each month.


published by noreply@blogger.com (Phin Jensen) on 2016-04-09 02:14:00 in the "Conference" category

On March 21 and 22, I had the opportunity to attend the 10th and final MountainWest RubyConf at the Rose Wagner Performing Arts Center in Salt Lake City.

One talk that I really enjoyed was Writing a Test Framework from Scratch by Ryan Davis, author of MiniTest. His goal was to teach the audience how MiniTest was created, by explaining the what, why and how of decisions made throughout the process. I learned a lot from the talk and took plenty of notes, so I?d like to share some of that.

The first thing a test framework needs is an assert function, which will simply check if some value or comparison is true. If it is, great, the test passed! If not, the test failed and an exception should be raised. Here is our first assert definition:

def assert test
  raise "Failed test" unless test
end

This function is the bare minimum you need to test an application, however, it won?t be easy or enjoyable to use. The first step to improve this is to make error messages more clear. This is what the current assert function will return for an error:

path/to/microtest.rb:2:in `assert': Failed test (RuntimeError)
        from test.rb:5:in `<main>'

To make this more readable, we can change the raise statement a bit:

def assert test
  raise RuntimeError, "Failed test", caller unless test
end

A failed assert will now throw this error, which does a better job of explaining where things went wrong:

test.rb:5:in `<main>': Failed test (RuntimeError)

Now we?re ready to create another assertion function, assert_equals. A test framework can have many different types of assertions, but when testing real applications, the vast majority will be tests for equality. Writing this assertion is easy:

def assert_equal a, b
  assert a == b
end

assert_equal 4, 2+2 # this will pass
assert_equal 5, 2+2 # this will raise an error

Great, right? Wrong! Unfortunately, the error messages have gone right back to being unhelpful:

path/to/microtest.rb:6:in `assert_equal': Failed test (RuntimeError)
        from test.rb:9:in `<main>'

There are a couple of things we can do to improve these error messages. First, we can filter the backtrace to make it more clear where the error is coming from. Second, we can add a parameter to assert which will take a custom message.

def assert test, msg = "Failed test"
  unless test then
    bt = caller.drop_while { |s| s =~ /#{__FILE__}/ }
    raise RuntimeError, msg, bt
  end
end

def assert_equal a, b
  assert a == b, "Failed assert_equal #{a} vs #{b}"
end

#=> test.rb:9:in `<main>': Failed assert_equal 5 vs 4 (RuntimeError)

This is much better! We?re ready to move on to another assert function, assert_in_delta. The way floating point numbers are represented, comparing them for equality won?t work. Instead, we will check to see that they are within a certain range of each other. We can do this with a simple calculation: (a-b).abs < ?, where ? is a very small number, like 0.001 (in reality, you will probably want a smaller delta than that). Here?s the function in Ruby:

def assert_in_delta a, b
  assert (a-b).abs <= 0.001, "Failed assert_in_delta #{a} vs #{b}"
end

assert_in_delta 0.0001, 0.0002 # pass
assert_in_delta 0.5000, 0.6000 # raise

We now have a solid base for our test framework. We have a few assertions and the ability to easily write more. Our next logical step would be to make a way to put our assertions into separate tests. Organizing these assertions allows us to refactor more easily, reuse code more effectively, avoid problems with conflicting tests, and run multiple tests at once.

To do this, we will wrap our assertions into functions and those function into classes, giving us two layers of compartmentalization.

class XTest
  def first_test
    a = 1
    assert_equal 1, a # passes
  end

  def second_test
    a = 1
    a += 1
    assert_equal 2, a # passes
  end

  def third_test
    a = 1
    assert_equal 1, a # passes
  end
end

That adds some structure, but how do we run the tests now? It?s not pretty:

XTest.new.first_test
XTest.new.second_test
XTest.new.third_test

Each test function needs to be called specifically, by name, which will become very tedious once there are 5, or 10, or 1000 tests. This is obviously not the best way to run tests. Ideally, the tests would run themselves, and to do that we?ll start by adding a method to run our tests to the class:

class XTest
  def run name
    send name
  end

  # ...test methods?
end

XTest.new.run :first_test
XTest.new.run :second_test
XTest.new.run :third_test

This is still very cumbersome, but it puts us in a better position, closer to our goal of automation. Using Class.public_instance_methods, we can find which methods are tests:

XTest.public_instance_methods
# => %w[some_method one_test two_test ...]

XTest.public_instance_methods.grep(/_test$/)
# => %w[one_test two_test red_test blue_test]

And run those automatically.

class XTest
  def self.run
    public_instance_methods.grep(/_test$/).each do |name|
      self.new.run name
    end
  end
  # def run...
  # ...test methods...
end

XTest.run # => All tests run

This is much better now, but we can still improve our code. If we try to make a new set of tests, called YTest for example, we would have to copy these run methods over. It would be better to move the run methods into a new abstract class, Test, and inherit from that.

class Test
  # ...run & assertions...
end 

class XTest < Test
  # ...test methods...
end 

XTest.run

This improves our code structure significantly. However, when we have multiple classes, we get that same tedious repetition:

XTest.run
YTest.run
ZTest.run # ...ugh

To solve this, we can have the Test class create a list of classes which inherit it. Then we can write a method in Test which will run all of those classes.

class Test
  TESTS = []

  def self.inherited x
    TESTS << x
  end 

  def self.run_all_tests
    TESTS.each do |klass|
      klass.run
    end 
  end 
  # ...self.run, run, and assertions...
end 

Test.run_all_tests # => We can use this instead of XTest.run; YTest.run; etc.

We?re really making progress now. The most important feature our framework is missing now is some way of reporting test success and failure. A common way to do this is to simply print a dot when a test successfully runs.

def self.run_all_tests
  TESTS.each do |klass|
    Klass.run
  end
  puts
end 

def self.run
  public_instance_methods.grep(/_test$/).each do |name|
    self.new.run name 
    print "."
  end 
end

Now, when we run the tests, it will look something like this:

% ruby test.rb
...

Indicating that we had three successful tests. But what happens if a test fails?

% ruby test.rb
.test.rb:20:in `test_assert_equal_bad': Failed assert_equal 5 vs 4 (RuntimeError)
  [...tons of blah blah...]
  from test.rb:30:in `<main>'

The very first error we come across will stop the entire test. Instead of the error being printed naturally, we can catch it and print the error message ourselves, letting other tests continue:

def self.run
  public_instance_methods.grep(/_test$/).each do |name|
    begin
      self.new.run name
      print "."
    rescue => e
      puts
      puts "Failure: #{self}##{name}: #{e.message}"
      puts "  #{e.backtrace.first}"
    end 
  end 
end

# Output

% ruby test.rb
.
Failure: Class#test_assert_equal_bad: Failed assert_equal 5 vs 4  
  test.rb:20:in `test_assert_equal'
.

That?s better, but it?s still ugly. We have failures interrupting the visual flow and getting in the way. We can improve on this. First, we should reexamine our code and try to organize it more sensibly.

def self.run
  public_instance_methods.grep(/_test$/).each do |name|
    begin
      self.new.run name
      print "."
    rescue => e
      puts
      puts "Failure: #{self}##{name}: #{e.message}"
      puts "  #{e.backtrace.first}"
    end
  end
end

Currently, this one function is doing 4 things:

  1. Line 2 is selecting and filtering tests.
  2. The begin clause is handling errors.
  3. `self.new.run name` runs the tests.
  4. The various puts and print statements print results.

This is too many responsibilities for one function. Test.run_all_tests should simply run classes, Test.run should run multiple tests, Test#run should run a single test, and result reporting should be done by... Something else. We?ll get back to that. The first thing we can do to improve this organization is to push the exception handling into the individual test running method.

class Test
  def run name
    send name
    false
  rescue => e
    e
  end

  def self.run
    public_instance_methods.grep(/_test$/).each do |name|
      e = self.new.run name
      
      unless e then
        print "."
      else
        puts
        puts "Failure: #{self}##{name}: #{e.message}"
        puts " #{e.backtrace.first}"
      end
    end
  end
end

This is a little better, but Test.run is still handling all the result reporting. To improve on that, we can move the reporting into another function, or better yet, its own class.

class Reporter
  def report e, name
    unless e then
      print "."
    else
      puts
      puts "Failure: #{self}##{name}: #{e.message}"
      puts " #{e.backtrace.first}"
    end
  end

  def done
    puts
  end
end

class Test
  def self.run_all_tests
    reporter = Reporter.new

    TESTS.each do |klass|
      klass.run reporter
    end
   
    reporter.done
  end
 
  def self.run reporter
    public_instance_methods.grep(/_test$/).each do |name|
      e = self.new.run name
      reporter.report e, name
    end
  end

  # ...
end

By creating this Reporter class, we move all IO out of the Test class. This is a big improvement, but there?s a problem with this class. It takes too many arguments to get the information it needs, and it?s not even getting everything it should have! See what happens when we run tests with Reporter:

.
Failure: ##test_assert_bad:
Failed test
 test.rb:9:in `test_assert_bad'
.
Failure: ##test_assert_equal_bad: Failed
assert_equal 5 vs 4
 test.rb:17:in `test_assert_equal_bad'
.
Failure: ##test_assert_in_delta_bad: Failed
assert_in_delta 0.5 vs 0.6
 test.rb:25:in `test_assert_in_delta_bad'

Instead of reporting what class has the failing test, it?s saying what reporter object is running it! The quickest way to fix this would be to simply add another argument to the report function, but that just creates a more tangled architecture. It would be better to make report take a single argument that contains all the information about the error. The first step to do this is to move the error object into a Test class attribute:

class Test
  # ...
  attr_accessor :failure
  
  def initialize
    self.failure = false
  end

  def run name
    send name
    false
  rescue => e
    self.failure = e
    self
  end
end

After moving the failure, we?re ready to get rid of the name parameter. We can do this by adding a name attribute to the Test class, like we did with the failure class:

class Test
  attr_accessor :name
  attr_accessor :failure
  def initialize name
    self.name = name
    self.failure = false
  end

  def self.run reporter
    public_instance_methods.grep(/_test$/).each do |name|
      e = self.new(name).run
      reporter.report e
    end
  end
  # ...
end

This new way of calling the Test#run method requires us to change that a little bit:

class Test
  def run
    send name
    false
  rescue => e
    self.failure = e
    self
  end
end

We can now make our Reporter class work with a single argument:

class Reporter
  def report e
    unless e then
      print "."
    else
      puts
      puts "Failure: #{e.class}##{e.name}: #{e.failure.message}"
      puts " #{e.failure.backtrace.first}"
    end
  end
end

We now have a much better Reporter class, and we can now turn our attention to a new problem in Test#run: it can return two completely different classes. false for a successful test and a Test object for a failure. Tests know if they fail, so we can know when they succeed without that false value.

class Test
  # ...
  attr_accessor :failure
  alias failure? failure
  # ...
  
  def run
    send name
  rescue => e
    self.failure = e
  ensure
    return self
  end
end

class Reporter
  def report e
    unless e.failure? then
      print "."
    else
      # ...
    end
  end
end

It would now be more appropriate for the argument to Reporter#report to be named result instead of e.

class Reporter
  def report result
    unless result.failure? then
      print "."
    else
      failure = result.failure
      puts
      puts "Failure: #{result.class}##{result.name}: #{failure.message}"
      puts " #{failure.backtrace.first}"
    end
  end
end

Now, we have one more step to improve reporting. As of right now, errors will be printed with the dots. This can make it difficult to get an overview of how many tests passed or failed. To fix this, we can move failure printing and progress reporting into two different sections. One will be an overview made up of dots and "F"s, and the other a detailed summary, for example:

...F..F..F

Failure: TestClass#test_method1: failure message 1
 test.rb:1:in `test_method1?

Failure: TestClass#test_method2: failure message 2
 test.rb:5:in `test_method2?

... and so on ...

To get this kind of output, we can store failures while running tests and modify the done function to print them at the end of the tests.

class Reporter
  attr_accessor :failures
  def initialize
    self.failures = []
  end

  def report result
    unless result.failure? then
      print "."
    else
      print "F"
      failures << result
    end
  end

  def done
    puts

    failures.each do |result|
      failure = result.failure
      puts
      puts "Failure: #{result.class}##{result.name}: #{failure.message}"
      puts " #{failure.backtrace.first}"
    end
  end
end

One last bit of polishing on the reporter class. We?ll rename the report method to << and the done method to summary.

class Reporter
  # ...
  def << result
    # ...
  end

  def summary
    # ...
  end
end

class Test
  def self.run_all_tests
    # ...
    reporter.summary
  end
 
  def self.run reporter
    public_instance_methods.grep(/_test$/).each do |name|
    reporter << self.new(name).run
  end
end

We?re almost done now! We?ve got one more step. Tests should be able to run in any order, so we want to make them run in a random order every time. This is as simple as adding `.shuffle` to our Test.run function, but we?ll make it a little more readable by moving the public_instance_methods.grep statement into a new function:

class Test
  def self.test_names
    public_instance_methods.grep(/_test$/)
  end
  
  def self.run reporter
    test_names.shuffle.each do |name|
      reporter << self.new(name).run
    end
  end
end

And we?re done! This may not be the most feature-rich test framework, but it?s very simple, small, well written, and gives us a base which is easy to extend and build on. The entire framework is only about 70 lines of code.

Thanks to Ryan Davis for an excellent talk! Also check out the code and slides from the talk.


published by Eugenia on 2016-04-09 00:15:58 in the "General" category
Eugenia Loli-Queru

For the kind of illustration I’m interested in, the style requires some very smooth, matte, single-color backgrounds. Traditionally with watercolor people would do large washes of 2 to 3 colors (e.g. for a sky), but for the kind of illustration I do, which has a lot of details, traditional washes are not a way to go. I could not find a single article or youtube video that shows how to do large, non-square areas of matte, smooth painting, so after a lot of tries, I found this technique:

– Get some paint on a plastic palette. About the size of a raisin for a small area.
– On a separate palette hole, add thrice as much water as the raisin size of paint above.
– Use a size 8 “pointed-round” soft brush (Kolinsky sounds good).
– Mix the paint with some Titanium White.
– With the tip of the brush, get some paint (just a little bit, maybe about 1/6th of it), and mix it well with the water. It will create a very pale color, but it will still have a color.
– Strain away as much water as possible from the brush. It should not be full of water when you lay it on paper.
– Start laying the pale color on your paper. Use as large brush strokes as possible, and move the pools of paint towards a single direction.
– Let it dry for a minute or so.
– Add 2/6ths of the paint (basically, double as much as before), on a bit more water than before (maybe about 1.5 times as much as before).
– Mix well, strain the brush, and paint over, the same way as before.
– Let it dry for 3 minutes or so.
– Add the rest of the paint to about 2x more water as in the beginning, strain the brush, paint over again. The consistency should be that of a melted ice cream.
– Let it dry for 5 minutes before you decide if you need yet another hand on top, or add details on it.

That’s it. Basically, you need multiple layers to get a smooth, matte finish.


My illustration “Divorce Papers”

Another way to do it with gouache, is to lay gesso+medium in the paper before painting, just as if you were using acrylics. The 2-3 gesso hands would then serve the same way as the multiple hands of paint. Personally, I prefer the first method.