27/05/2010

The future can be written in RPython now | Pyevolve

Following the recent article arguing why PyPy is the future of Python, I must say, PyPy is not the future of Python, is the present. When I have tested it last time (PyPy-c 1.1.0) with Pyevolve into the optimization of a simple Sphere function, it was at least 2x slower than Unladen Swallow Q2, but in that time, PyPy was not able to JIT. Now, with this new release of PyPy and the JIT’ing support, the scenario has changed.


The future can be written in RPython now | Pyevolve

26/05/2010

voltdb = redis + sql interface? interesting:

The Fast, Scalable Open-Source DBMS You'll Never Outgrow

Created by DBMS R&D pioneer, Mike Stonebraker, VoltDB is a next-generation open-source DBMS that scales way beyond traditional databases, without sacrificing SQL or ACID for transactional data integrity. VoltDB is for database applications that support fast-growing transactional workloads and require:

  • Orders of magnitude better performance than conventional DBMS
  • Linear scalability
  • SQL as the DBMS interface
  • ACID transactions to ensure data consistency and integrity
  • High availability 24x7x365


#igrigorik voltdb = redis + sql interface? interesting: http://bit.ly/al9XiF
Official link http://voltdb.com/

Java, JEE, JavaFx and more: A graphical counter on GAEJ (Google App Engine for Java) using Images API Service

Images Service on GAEJ provides the ability to manipulate images, thus you can composite multiple images into a single one. I'll use this possibility to display a graphical hit counter. This tutorial is only a kind of how-to. I'm sure you can write real programs using the instructions given in this post. For simplicity reasons error handling are reduced to the minimum.

The idea is pretty simple, persist a counter using Memcache or DataStore, have digits from 0 to 9 as PNG images, read images as bytes, make images and composites of these images, put all composites in a List and finally use this List to get the composed image.



Java, JEE, JavaFx and more: A graphical counter on GAEJ (Google App Engine for Java) using Images API Service

24/05/2010

Search Results: All on USTREAM, Most Views listings, All entries, page 1 of 1, 05/24/10.





Search Results: All on USTREAM, Most Views listings, All entries, page 1 of 1, 05/24/10.

First Look: H.264 and VP8 Compared - StreamingMedia.com

VP8 is now free, but if the quality is substandard, who cares? Well, it turns out that the quality isn't substandard, so that's not an issue, but neither is it twice the quality of H.264 at half the bandwidth. See for yourself, below.

To set the table, Sorenson Media was kind enough to encode these comparison files for me to both H.264 and VP8 using their Squish encoding tool. They encoded a standard SD encoding test file that I've been using for years. I'll do more testing once I have access to a VP8 encoder, but wanted to share these quick and dirty results.



First Look: H.264 and VP8 Compared - StreamingMedia.com

Alex Gaynor -- PyPy is the Future of Python

Currently the most common implementation of Python is known as CPython, and it's the version of Python you get at python.org, probably 99.9% of Python developers are using it. However, I think over the next couple of years we're going to see a move away from this towards PyPy, Python written in Python. This is going to happen because PyPy offers better speed, more flexibility, and is a better platform for Python's growth, and the most important thing is you can make this transition happen.

The first thing to consider: speed. PyPy is a lot faster than CPython for a lot of tasks, and they've got the benchmarks to prove it. There's room for improvement, but it's clear that for a lot of benchmarks PyPy screams, and it's not just number crunching (although PyPy is good at that too). Although Python performance might not be a bottleneck for a lot of us (especially us web developers who like to push performance down the stack to our database), would you say no to having your code run 2x faster?

The next factor is the flexibility. By writing their interpreter in RPython PyPy can automatically generate C code (like CPython), but also JVM and .NET versions of the interpreter. Instead of writing entirely separate Jython and IronPython implementations of Python, just automatically generate them from one shared codebase. PyPy can also have its binary generated with a stackless option, just like stackless Python, again no separate implementations to maintain. Lastly, PyPy's JIT is almost totally separate from the interpreter, this means changes to the language itself can be made without needing to update the JIT, contrast this with many JITs that need to statically define fast-paths for various operations......


Alex Gaynor -- PyPy is the Future of Python

21/05/2010

Hosted SQL on App Engine For Business

Later:

Hosted SQL
Dedicated, full-featured SQL servers available for your application.
Status: In Development
Estimate: Limited Release in Q3 2010

Google Roadmap link : http://code.google.com/appengine/business/roadmap.html

20/05/2010

Google and SpringSource join hands in the heavens

Google I/O Google and VMware's SpringSource arm have teamed up to offer a series of development tools for building Java apps that can be deployed across multiple web-based hosting services. That includes Google's own App Engine, VMware-happy infrastructure services, and third-party services such as Amazon's Elastic Compute Cloud.

http://www.theregister.co.uk/2010/05/19/google_teams_with_springsource/

Google Launches Business Version Of App Engine; Collaborates With VMware

It’s no secret that Google has been ramping up its enterprise offerings. The company has made a strong push for the adoption of Google Apps, launching the Apps Marketplace, allowing Apps users to add other layers to their environments from companies like Socialwok and Zoho. Today, Google is taking it one step further. At Google I/O today, the search giant has announced that Google App Engine, a platform for building and hosting web applications in the cloud, will now include a Business version, catered towards enterprises. The new premium version allows customers to build their own business apps on Google’s cloud infrastructure. Google is also announcing a collaboration with VMware for deployment and development of apps on the new cloud infrastructure.

Google Launches Business Version Of App Engine; Collaborates With VMware

Scalable Work Queues with Beanstalk

Any web application that reaches some critical mass eventually discovers that separation of services, where possible, is a great strategy for scaling the service. In fact, oftentimes a user action can be offloaded into a background task, which can be handled asynchronously while the user continues to explore the site. However, coordinating this workflow does require some infrastructure: a message queue, or a work queue. The distinction between the two is subtle and blurry, but it does carry important architectural implications. Should you pick a messaging bus such as AMQP or XMPP, roll your own database backed system such as BJ, go with Resque .....


http://www.igvita.com/2010/05/20/scalable-work-queues-with-beanstalk/

16/05/2010

Pycon4 dal talk di Simone Deponti

"Crogioli, alambicchi e beute dove mettere i vostri dati "

Si parla di sqlalchemy orm e zodb, come gestire i dati
e quale scelte fare in funzione delle proprie necessità.

XtraDB / InnoDB internals in drawing

Source http://www.mysqlperformanceblog.com/ , Posted by Vadim


I did some drawing exercise and put XtraDB / InnoDB internals in Visio diagram:

The XtraDB differences and main parameters are marked out.

PDF version is there http://www.percona.com/docs/wiki/percona-xtradb

14/05/2010

mia azienda funzionasse come un’organizzazione terroristica

Uno spunto interessante in questo momento di crisi , utile "secondo me " anche per permettere agli individui di essere più creativi ed indipendenti magari creando cellule che si autoformano
in base hai progetti da affrontare

Vorrei che la mia azienda funzionasse come un’organizzazione terroristica… « Meeting delle Idee - http://ow.ly/1KXOE

13/05/2010

Richard Stallman arriva nella Marche, due gli incontri in Ancona

Richard Stallman arriva nella Marche, due gli incontri in Ancona: "Ebbene sì, il nostro beniamino del movimento Open Source mondiale arriva anche nelle Marche e più precisamente in Ancona per ben due incontri:

- giovedì 13 maggio ore 17:00 – presso l’assessorato all’informatizzazione del Comune di Ancona
- venerdì 14 maggio ore 10:30 – presso l’Aula A7/8 della Facoltà di Ingegneria dell’Università Politecnica delle Marche

"

10/05/2010

Amazon Web Services sign-up tutorial slide

Pixar American Mathematical Society

Moving Remy in Harmony: Pixar's Use of Harmonic Functions

This article will describe some new mathematical techniques being tested at Pixar for use in upcoming films...


American Mathematical Society

09/05/2010

Oltre il Pycon Pybirra !!!!!!!!

http://twitpic.com/1m6ofi

Pycon 2010 Effective EC2 talk

Pycon 2010 tanto per capirci, due giorni di talk all'insegna dell' innovazione un talk che mi è piaciuto

http://www.pycon.it/conference/talks/effective-ec2
Sunto, una start-up reale "http://www.adroll.com/"
Ecco i numeri
+ di 300 instanze attive (server virtuali) attivate in 15 minuti ,
-caching dns, proxy-requesting, auto low-up delle istanze
-remap del network automatico su centinaia di ip pubblici,
-fault-tollerance storage su decine di terabyte ,
solo fantscenza in italy :-(

08/05/2010

Comet web applications with Python, Django & Orbited

Comet web applications with Python, Django & Orbited

The temple of idle

The temple of idle

Yes, this incidentally means someone is so masochist that decided to hire me.

Anyway, the company I work for (Abstract Open Solutions) is in the middle of the process of switching towards git, and I sent a mail to the whole list detailing my (brief) experience into converting from SVN while still using SVN as upstream (our git server, using gitorious, is still in the works).

Since that might be useful to someone else, I decided to post the mail I sent to the internal mailing list, without censoring it.

Pycon4 Python FUSE – Beyond the Traditional File-Systems di Matteo Bertozzi

Solo lodi per Matteo Bertozzi, inciso e chiaro, ma, soprattutto completo.
Cosa chiedere di più per un talk di 60 minuti su FUSE ?

http://www.pycon.it/conference/talks/python-fuse-beyond-the-traditional-file-systems

Per le slide del talk con gli esempi
http://mbertozzi.develer.com/python-fuse

07/05/2010

lin win mac apps

Only 3 days left to pay what you want for the Humble Indie Bundle! http://j.mp/9WtdAm less than 5 seconds agoLatest: Only 3 days left to pay what you want for the Humble Indie Bundle! http://j.mp/9WtdAm less than 5 seconds ago from web

Going global with Amazon #EC2 and DNS services: http://bit.ly/dx0Lfu #aws

Interesting read: going global with Amazon #EC2 and DNS services: http://bit.ly/dx0Lfu #aws

Cleaning the cruft away - rather luverly Dust-me add-on to Firefox http://ur1.ca/z97m - spider whole site and pinpoint redundant CSS.

Cleaning the cruft away - rather luverly Dust-me add-on to Firefox http://ur1.ca/z97m - spider whole site and pinpoint redundant CSS.: " Cleaning the cruft away - rather luverly Dust-me add-on to Firefox http://ur1.ca/z97m - spider whole site and pinpoint redundant CSS."

06/05/2010

Google insegna: exploit e sicurezza

Google insegna: exploit e sicurezza: "Mountain View mette online un sito pieno di bug. Con l'obiettivo di formare gli sviluppatori. E porre il problema del buon codice sicuro come una priorita', senza che nessuno si faccia male

'Amazing Python' is now available without registration

'Amazing Python' is now available without registration: "Three weeks before going live with ThinkCode.TV, we decided to release a free screencast about solving ASCII mazes in Python. The 19 minute 'bite screencast' was made available to anyone who joined our newsletter, and a few hundred people took advantage of this opportunity.



Now, there is nothing wrong in giving away a freebie as an incentive for joining one's newsletter, and we may do so again in the future. However, we feel it's time to release Amazing Python for free, without the need to register.



We feel that this move fits well with our strategy to 'create more value than we capture'.
"

Future of RDBMS is RAM Clouds & SSD

Future of RDBMS is RAM Clouds & SSD: "

Rumors of the demise of relational database systems are greatly exaggerated. The NoSQL movement is increasingly capturing the mindshare of the developers, all the while the academia have been talking about the move away from 'RDBMS as one size fits all' for several years. However, while the new storage engines are exciting to see, it is also important to recognize that relational databases still have a bright future ahead - RDBMS systems are headed into main memory, which changes the playing field all together.


Performance is only one aspect that influences the choice of a database. Tree and graph structures are not easy to model within a relational structure, which in turn leads to complicated schemas and system overhead. For that reason alone, document-stores (Tokyo, CouchDB, MongoDB), graph stores (Neo4J), and other alternative data structure databases (Redis) are finding fertile ground for adoption. However, the end of 'RDBMS as one size fits all' does not mean the end of relational systems all together. It is too early to bury RDBMS in favor of No (or Less) SQL. We just need to reset how we think about the RDBMS.


Disks are the New Tape


The evolution of disks has been extremely uneven over the last 25 years: disk capacity has increased 1000x, data transfer speeds increased 50x, while seek and rotational delays have only gone up by a factor of 2. Hence, if we only needed to transfer several hundred kilobytes of data in the mid 80's to achieve good disk utilization, then today we need to read at least 10MB of data to amortize the costs of seeking the data - refresh your memory on seek, rotational, and transfer times of our rusty hard drives.


When the best we can hope for is 100-200 IOPS out of a modern hard drive, the trend towards significantly larger block sizes begins to make a lot more sense. Whereas your local filesystem is likely to use 4 or 8kb blocks, systems such as Google's GFS and Hadoop's HDFS are opting out for 64MB+ blocks in order to amortize the cost of seeking for the data - by using much larger blocks, the cost of seeks and access time is once again brought down to single digit percent figures over the transfer time.


Hence, as we generate and store more and more data, the role of the disks must inevitably become more archival. Batch processing systems such as Map-Reduce are well suited for this world and are quickly replacing the old business intelligence (BI) systems for exactly these reasons. In the meantime, the limitations imposed by the random access to disk mean that we need to reconsider the role of disk in our database systems.


OLTP is Headed Into Main Memory & Flash


An average random seek will take 5-10ms when hitting the physical disk and hundreds of microseconds for accessing data from cache. Compare that to a fixed cost of 5-10 microseconds for accessing data in RAM and the benefits of a 100-1000x speed difference can be transformative. Instead of treating memory as a cache, why not treat it as a primary data store? John Ousterhout and his co-authors outline a compelling argument for 'RAMCloud'. After all, if Facebook keeps over 80% of their data in memcached, and Google stores entire indexes of the web in memory many times over, then your average database-backed application should easily fit and be able to take advantage of the pure memory model also.


The moment all of the data is available in memory, it is an entirely new game: access time and seek times become irrelevant (no disk seeks), the value of optimizing for locality and access patterns is diminished by orders of magnitude, and in fact, entirely new and much richer query models can enable a new class of data-intensive applications. In a world where the developer's time is orders of magnitude more expensive than the hardware (a recent phenomenon), this also means faster iterations and less data-optimization overhead.


The downside to the RAMCloud is the equivalent order of magnitude increase in costs - RAM prices are dropping, but dollar for dollar, RAMCloud systems are still significantly more expensive. Flash storage is an obvious compromise for both speed and price. Theoretical access time for solid-state devices is on the order of 50 microseconds for reads, and 200 microseconds for writes. However, in reality, wrapping solid-state storage in SATA-like hardware devices brings us back to ~200 microseconds for reads, or ~5000 IOPS. Though, of course, innovation continues and devices such as FusionIO’s PCI-E flash storage controller bring us back to 80 microsecond reads at a cost of ~$11 per Gigabyte.


However, even the significantly higher hardware price point is often quickly offset once you factor in the saved developer time and adjacent benefits such as guaranteed performance independent of access patterns or data locality. Database servers with 32GB and 64GB of RAM are no longer unusual, and when combined with SSDs, such as the system deployed at SmugMug, often offer a much easier upgrade path than switching your underlying database system to a NoSQL alternative.


Database Architecture for the RAMCloud


Migrating your data into RAM or Flash yields significant improvements via pure speedup in hardware, however, 'it is time for a complete rewrite' argument still holds: majority of existing database systems are built with implicit assumptions for disk-backed storage. These architectures optimize for disk-based indexing structures, and have to rely on multithreading and locking-based concurrency to hide latency of the underlying storage.


When access time is measured in microseconds, optimistic and lock-free concurrency is fair game, which leads to much better multi-core performance and allows us to drop thousands of lines of code for multi-threaded data structures (concurrent B-Trees, etc). RethinkDB is a drop-in MySQL engine designed for SSD drives leveraging exactly these trends, and Drizzle is a larger fork of the entire MySQL codebase aimed at optimizing the relational model for 'cloud and net applications': massively distributed, lightweight kernel and extensible.


Migrating Into Main Memory


Best of all, you can start leveraging the benefits of storing your data in main memory even with the existing MySQL databases - most of them are small enough to make the memory buffers nothing but a leaky abstraction. Enable periodic flush to disk for InnoDB (innodb_flush_log_at_trx_commit=2), and create covering indexes for your data (a covering index is an index which itself contains all the required data to answer the query). Issue a couple of warm-up requests to load the data into memory and you are off to the races.


Of course, the above strategy is at best an intermediate solution, so investigating SSD’s as a primary storage layer, and if you are adventurous, give RethinkDB a try. Also keep an eye on Drizzle as the first production release is aimed for summer of 2010. Alternative data storage engines such as Redis, MongoDB and others are also worth looking into, but let us not forget: laws of physics still apply to NoSQL. There is no magic there. Memory is fast, disks are slow. Nothing is stopping relational systems from taking advantage of main memory or SSD storage.




"

Ruby & WebSockets: TCP for the Browser

Ruby & WebSockets: TCP for the Browser: "

WebSockets are one of the most underappreciated innovations in HTML5. Unlike local storage, canvas, web workers, or even video playback, the benefits of the WebSocket API are not immediately apparent to the end user. In fact, over the course of the past decade we have invented a dozen technologies to solve the problem of asynchronous and bi-directional communication between the browser and the server: AJAX, Comet & HTTP Streaming, BOSH, ReverseHTTP, WebHooks & PubSubHubbub, and Flash sockets amongst many others. Having said that, it does not take much experience with any of the above to realize that each has a weak spot and none solve the fundamental problem: web-browsers of yesterday were not designed for bi-directional communication.


WebSockets in HTML5 change all of that as they were designed from the ground up to be data agnostic (binary or text) with support for full-duplex communication. WebSockets are TCP for the web-browser. Unlike BOSH or equivalents, they require only a single connection, which translates into much better resource utilization for both the server and the client. Likewise, WebSockets are proxy and firewall aware, can operate over SSL and leverage the HTTP channel to accomplish all of the above - your existing load balancers, proxies and routers will work just fine.


WebSockets in the Browser: Chrome, Firefox & Safari


The WebSocket API is still a draft, but the developers of our favorite browsers have already implemented much of the functionality. Chrome’s developer build (4.0.249.0) now officially supports the API and has it enabled by default. Webkit nightly builds also support WebSockets, and Firefox has an outstanding patch under review. In other words, while mainstream adoption is still on the horizon, as developers we can start thinking about much improved architectures that WebSockets enable. A minimal example with the help of jQuery:


> websocket.html


<html>
<head>
<script src='http://ajax.googleapis.com/ajax/libs/jquery/1.3.2/jquery.min.js'></script>
<script>
$(document).ready(function(){
function debug(str){ $("#debug").append("<p>"+str+"</p>"); };

ws = new WebSocket("ws://yourservice.com/websocket");
ws.onmessage = function(evt) { $("#msg").append("<p>"+evt.data+"</p>"); };
ws.onclose = function() { debug("socket closed"); };
ws.onopen = function() {
debug("connected...");
ws.send("hello server");
};
});
</script>
</head>
<body>
<div id="debug"></div>
<div id="msg"></div>
</body>
</html>


The above example showcases the bi-directional nature of WebSockets: send pushes data to the server, and onmessage callback is invoked anytime the server pushes data to the client. No need for long-polling, HTTP header overhead, or juggling multiple connections. In fact, you could even deploy the WebSocket API today without waiting for the browser adoption by using a Flash socket as an intermediate step: web-socket-js.


Streaming Data to WebSocket Clients


WebSockets are not the same as raw TCP sockets and for a good reason. While it may seem tempting to be able to open a raw TCP connections from within the browser, the security of the browser would be immediately compromised: any website could then access the network on behalf of the user, within the same security context as the user. For example, a website could open a connection to a remote SMTP server and start delivering spam - a scary thought. Instead, WebSockets extend the HTTP protocol by defining a special handshake in order for the browser to establish a connection. In other words, it is an opt-in protocol which requires a standalone server.



Nothing stops you from talking to an SMTP, AMQP, or any other server via the raw protocol, but you will have to introduce a WebSocket server in between to mediate the connection. Kaazing Gateway already provides adapters for STOMP and Apache ActiveMQ, and you could also implement your own JavaScript wrappers for others. And if a Java based WebSocket server is not for you, Ruby EventMachine also allows us to build a very simple event-driven WebSocket server in just a few lines of code:


> websocket.rb


require 'em-websocket'

EventMachine::WebSocket.start(:host => "0.0.0.0", :port => 8080) do |ws|
ws.onopen { ws.send "Hello Client!"}
ws.onmessage { |msg| ws.send "Pong: #{msg}" }
ws.onclose { puts "WebSocket closed" }
end



Download


em-websocket (Ruby EventMachine WebSocket Server)

Downloads: 735 File Size: 0.0 KB



Consuming WebSocket Services


Support for WebSockets in Chrome and Safari also means that our mobile devices will soon support bi-directional push, which is both easier on the battery, and much more efficient for bandwidth consumption. However, WebSockets can also be utilized outside of the browser (ex: real-time data firehose), which means that a regular Ruby HTTP client should be able to handle WebSockets as well:


> em-http-websocket.rb


require 'eventmachine'

EventMachine.run {
http = EventMachine::HttpRequest.new("ws://yourservice.com/websocket").get :timeout => 0

http.errback { puts "oops" }
http.callback {
puts "WebSocket connected!"
http.send("Hello client")
}

http.stream { |msg|
puts "Recieved: #{msg}"
http.send "Pong: #{msg}"
}
}



Download


em-http-request (Asynchronous HTTP Client)

Downloads: 395 File Size: 0.0 KB



WebSocket support is still an experimental branch within em-http-request, but the aim is to provide a consistent and fully transparent API: simply specify a WebSocket resource and it will do the rest, just as if you were using a streaming HTTP connection! Best of all, HTTP & OAuth authentication, proxies and existing load balancers will all work and play nicely with this new delivery model.


WebHooks, PubSubHubbub, WebSockets, ...


Of course, WebSockets are not the panacea to every problem. WebHooks and PubSubHubbub are great protocols for intermittent push updates where a long-lived TCP connection may prove to be inefficient. Likewise, if you require non-trivial routing then AMQP is a powerful tool, and there is little reason to reinvent the powerful presence model built into XMPP. Right tool for the right job, but WebSockets are without a doubt a much-needed addition to every developers toolkit.




"

Flow Analysis & Time-based Bloom Filters

Flow Analysis & Time-based Bloom Filters: "

Working with large streams of data is becoming increasingly widespread, be it for log, user behavior, or raw firehose analysis of user generated content. There is some very interesting academic literature on this type of data crunching, although much of it is focused on query or network packet analysis and is often not directly applicable to the type of data we have to deal with in the social web. For example, if you were tasked to build (a better) 'Trending Topics' algorithm for Twitter, how would you do it?


Of course, the challenge is that it has to be practical - it needs to be 'real-time' and be able to react to emerging trends in under a minute, all the while using a reasonable amount of CPU and memory. Now, we don't know how the actual system is implemented at Twitter, nor will we look at any specific solutions - I have some ideas, but I am more curious to hear how you would approach it. Instead, I want to revisit the concept of Bloom Filters, because as I am making my way through the literature, it is surprising how sparsely they are employed for these types of tasks. Specifically, a concept I have been thinking of prototyping for some time now: time-based, counting bloom filters!


Bloom Filters: What & Why


A Bloom Filter is a probabilistic data structure which can tell if an element is a member of a set. However, the reason it is interesting is because it accomplishes this task with an incredibly efficient use of memory: instead of storing a full hash map, it is simply a bit vector which guarantees that you may have some small fraction of false positives (the filter will report that a key is in the bloom filter when it is really not), but it will never report a false negative. File system and web caches frequently use bloom filters as the first query to avoid otherwise costly database or file system lookups. There is some math involved in determining the right parameters for your bloom filter, which you can read about in an earlier post.



Of course, as is, the Bloom Filter data structure is not very useful for analyzing continuous data streams - eventually we would fill up the filter and it would begin reporting false positives all the time. But, what if your bloom filter only remembered seen data for a fixed interval of time? Imagine adding time-to-live (TTL) timestamp on each record. All of the sudden, if you knew the approximate number of messages for the interval of time you wanted to analyze, then a bloom filter is once again an incredibly fast and space-efficient (fixed memory footprint) data structure!


Time-based Bloom Filters


Arguably the key feature of bloom filters is their compact representation as a bit vector. By associating a timestamp with each record, the size of the filter immediately expands by an order of magnitude, but even with that, depending on the size of the time window you are analyzing, you could store the TTL's in just a few additional bits. Conversely, if counting bits is not mission critical, you could even used a backend such as Redis or Memcached to drive the filter as well. The direct benefit of such approach is that the data can be shared by many distributed processes. On that note, I have added a prototype Redis backend to the bloomfilter gem which implements a time-based, counting Bloom Filter. Let's take a look at a simple example:


> chrono-bloom.rb


require 'bloomfilter'

options = {
:size => 100, # size of bit vector
:hashes => 4, # number of hash functions
:seed => rand(100), # seed value for the filter
:bucket => 3 # number of bits for the counting filter
}

# Regular, in-memory counting bloom filter
bf = BloomFilter.new(options)
bf.insert("mykey")
bf.include?("mykey") # => true
bf.include?("mykey1") # => false

#
# Redis-backed bloom filter, with optional time-based semantics
#
bf = BloomFilter.new(options.merge({:type => :redis, :ttl => 2, :server => {:host => 'localhost'}}))
bf.insert("mykey")
bf.include?("mykey") # => true
sleep(3)
bf.include?("mykey") # => false

# custom 5s TTL for a key
bf.insert("newkey", nil, 5)



Download


bloomfilter.git (Ruby+Redis counting Bloom Filter)

Downloads: 1054 File Size: 0.0 KB



Storing data in Redis or Memcached is roughly an order of magnitude less efficient, but it gives us an easy to use, distributed, and fixed memory filter for analyzing continuous data streams. In other words, a useful tool for applications such as duplicate detection, trends analysis, and many others.


Mechanics of Time-Based Bloom Filters


So how does it work? Given the settings above, we create a fixed memory vector of 100 buckets (or bits in raw C implementation). Then, for each key, we hash it 4 times with different key offsets and increment the counts in those buckets - a non-negative value indicates that one of the hash functions for some key has used that bucket. Then, for a lookup, we reverse the operation: generate the 4 different hash keys and look them up, if all of them are non-zero then either we have seen this key or there has been a collision (false positive). By optimizing the size of the bit vector we can control the false positive rate - you're always trading the of amount of allocated memory vs. collision rate. Finally, we also make use of the native expire functionality in Redis to guarantee that keys are only stored for a bounded amount of time.


Time-based bloom filters have seen a few rogue mentions in the academic literature, but to the best of my knowledge, have not seen wide applications in the real world. However, it is an incredibly powerful data structure, and one that could benefit many modern, big-data applications. Gem install the bloomfilter gem and give it a try, perhaps it will help you build a better trends analysis tool. Speaking of which, what other tools, algorithms, or data structures would you use to build a 'Trending Topics' algorithm for a high-velocity stream?




"

Vmware Esxi Backup configurazioni

Questa è una semplice procedura per effettuare un backup delle configurazioni di un server
VMWARE ESXI

I comandi “per linux” sono i seguenti (N.B er windows la sintassi “stando alla doc” è la stessa)


Requisiti:
Per fare tutto cio bisogna scaricarsi ed installarsi il pacchetto “vSphere Command-Line Interface” che comprende una serie di script perl per la gestione di un server esx/esxi tra cui le procedure per il backup e restore.

Procedura di Backup
vicfg-cfgbackup --server ip.del.server --username root --password 'password' -s filebackup

Procedura di Restore
vicfg-cfgbackup --server ip.del.server.nuovo --username root --password 'password' -l filebackup



Lintk di riferimento
http://communities.vmware.com/community/vmtn/vsphere/automationtools/vsphere_cli

Doc
http://www.vmware.com/pdf/vsphere4/r40/vsp_40_vcli.pdf

acangiano: Dropbox Anywhere: https://www.dropbox.com/anywhere

acangiano: Dropbox Anywhere: https://www.dropbox.com/anywhere: "acangiano: Dropbox Anywhere: https://www.dropbox.com/anywhere"

acangiano: Flash on Android tablet prototype: http://bit.ly/bnIFLS Can you guess what happens?

acangiano: Flash on Android tablet prototype: http://bit.ly/bnIFLS Can you guess what happens?: "acangiano: Flash on Android tablet prototype: http://bit.ly/bnIFLS Can you guess what happens?"

luupux: RT @igrigorik: grr, should have submitted the "gem install mysql" talk (http://bit.ly/bifjAk) to RailsConf. so much confusion about the ...

luupux: RT @igrigorik: grr, should have submitted the "gem install mysql" talk (http://bit.ly/bifjAk) to RailsConf. so much confusion about the ...: "luupux: RT @igrigorik: grr, should have submitted the 'gem install mysql' talk (http://bit.ly/bifjAk) to RailsConf. so much confusion about the ..."

Sea of Shoes: Cat Cafes

Sea of Shoes: Cat Cafes: "Somehow while searching for restaurants to check out I stumbled upon a list of Tokyo's 'cat cafes'. Here 1000 yen will buy an hour's worth of petting with the cafe's kitties. Since 2008 more than a dozen of these cat..."

LinuxCon keynotes feature Linux insiders -- and outsiders

LinuxCon keynotes feature Linux insiders -- and outsiders: "The Linux Foundation announced keynote speakers and panels for LinuxCon, scheduled for August 10-12 in Boston. The show will feature keynote speakers including Virgin America's Ravi Simhambhatla, GNOME's Stormy Peters, the SFLC's Eben Moglen, and Forrester's Jeffrey S. Hammond, and hosts a Linux Kernel Roundtable with Ted T'so and other kernel insiders...."