30/12/2010

Nagios simple check multi ping for degrade gateway provider

This is a simple  check "multi ping" for monitoring performance from default gateway on nagios

https://github.com/luupux/check_gw



15/11/2010

Free SMSes through Google Calendar by http://www.kryogenix.org

This is a simple python code  for send  sms from google calendar via gdata api

# Requires gdata.py-1.2.1 from http://code.google.com/p/gdata-python-client/
try:
from xml.etree import ElementTree
except ImportError:
from elementtree import ElementTree
import gdata.calendar.service
import gdata.service
import atom.service
import gdata.calendar
import atom
import base64
import time

def send_sms(message_text):
cal_client = gdata.calendar.service.CalendarService()
cal_client.email = "YOUR GOOGLE MAIL ACCOUNT"
cal_client.password = "YOUR GOOGLE MAIL PASSWORD"
cal_client.source = 'calendar-sms-misuse-1.0'
cal_client.ProgrammaticLogin()

event = gdata.calendar.CalendarEventEntry()
event.title = atom.Title(text=message_text)
event.content = atom.Content(text="")

# can't set SMS reminders for under 5 minutes, so set this to 6 mins from now
start_time = time.strftime('%Y-%m-%dT%H:%M:%S.000Z', time.gmtime(time.time()+(6*60)))
end_time = time.strftime('%Y-%m-%dT%H:%M:%S.000Z', time.gmtime(time.time() + 3600))
when = gdata.calendar.When(start_time=start_time, end_time=end_time)
# can't set SMS reminders for under 5 minutes, so set this to 5
reminder = gdata.calendar.Reminder(minutes=5, extension_attributes={"method":"sms"})
when.reminder.append(reminder)
event.when.append(when)

cal_client.InsertEvent(event, '/calendar/feeds/default/private/full')


send_sms("Message body")

Original Post
http://www.kryogenix.org/days/2008/10/15/free-smses-through-google-calendar



20/08/2010

Multi-core, Threads & Message Passing

Multi-core, Threads & Message Passing: "

Moore's Law marches on, the transistor counts are continuing to increase at the predicted rate and will continue to do so for the foreseeable future. However, what has changed is where these transistors are going: instead of a single core, they are appearing in multi-core designs, which place a much higher premium on hardware and software parallelism. This is hardly news, I know. However, before we get back to arguing about the 'correct' parallelism & concurrency abstractions (threads, events, actors, channels, and so on) for our software and runtimes, it is helpful to step back and take a closer look at the actual hardware and where it is heading.


Single Core Architecture & Optimizations



The conceptual architecture of a single core system is deceivingly simple: single CPU, which is connected to a block of memory and a collection of other I/O devices. Turns out, simple is not practical. Even with modern architectures, the latency of a main memory reference (~100ns roundtrip) is prohibitively high, which combined with highly unpredictable control flow has led CPU manufacturers to introduce multi-level caches directly onto the chip: Level 1 (L1) cache reference: ~0.5 ns; Level 2 (L2) cache reference: ~7ns, and so on.


However, even that is not enough. To keep the CPU busy, most manufacturers have also introduced some cache prefetching and management schemes (ex: Intel's SmartCache), as well as invested billions of dollars into branch prediction, instruction pipelining, and other tricks to squeeze every ounce of performance. After all, if the CPU has a separate floating point and an integer unit, then there is no reason why two threads of execution could not simultaneously run on the same chip - see SMT. Remember Intel's Hyperthreading? As another point of reference, Sun's Niagara chips are designed to run four execution threads per core.


But wait, how did threads get in here? Turns out, threads are a way to expose the potential (and desired) hardware parallelism to the rest of the system. Put another way, threads are a low-level hardware and operating system feature, which we need to take full advantage of the underlying capabilities of our hardware.


Architecting for the Multi-core World


Since the manufacturers could no longer continue scaling the single core (power, density, communication), the designs have shifted to the next logical architecture: multiple cores on a single chip. After all, hardware parallelism existed all along, so the conceptual shift wasn't that large - shared memory, multiple cores, more concurrent threads of execution. Only one gotcha, remember those L1, L2 caches we introduced earlier? Turns out, they may well be the Achilles' heel for multi-core.



If you were to design a multi-core chip, would you allow your cores to share the L1, or L2 cache, or should they all be independent? Unfortunately, there is one answer to this question. Shared caches can allow higher utilization, which may lead to power savings (ex: great for laptops), as well as higher hit rates in certain scenarios. However, that same shared cache can easily create resource contention if one is not careful (DMA is a known offender). Intel's Core Duo and Xeon processors use a shared L2, whereas AMD's Optetron, Athlon, and Intel's Pentium D opted out for independent L1's and L2's. Even more interestingly, Intel's recent Itanium 2 gives each core an independent L1, L2, and an L3 cache! Different workloads benefit from different layouts.


As Phil Karlton once famously said: 'There are only two hard things in Computer Science: cache invalidation and naming things,' and as someone cleverly added later, 'and off by one errors'. Turns out, cache coherency is a major problem for all multi-core systems: if we prefetch the same block of data into an L1, L2, or L3 of each core, and one of the cores happens to make a modification to its cache, then we have a problem - the data is now in an inconsistent state across the different cores. We can't afford to go back to main memory to verify if the data is valid on each reference (as that would defeat the purpose of the cache), and a shared mutex is the very anti-pattern of independent caches!


To address this problem, hardware designers have iterated over a number of data invalidation and propagation schemes, but the key point is simple: the cores share a bus or an interconnect over which messages are propagated to keep all of the caches in sync (coherent), and therein lies the problem. While, the numbers vary, the overall consensus is that after approximately 32 cores on a single chip, the amount of required communication to support the shared memory model leads to diminished performance. Put another way, shared memory systems have limited scalability.


Turtles all the way down: Distributed Memory


So if cache coherence puts an upper bound on the number of cores we can support within the shared memory model, then lets drop the shared memory requirement! What if, instead of a monolithic view of the memory, each core instead had its own, albeit much smaller main memory? Distributed memory model has the advantage of avoiding all of the cache coherency problems we listed above. However, it is also easy to imagine a number of workloads where the distributed memory will underperform the shared memory model.


There doesn't appear to be any consensus in the industry yet, but if one had to guess, then a hybrid model seems likely: push the shared memory model as far as you can, and then stamp it out multiple times on a chip, with a distributed memory interconnect - it is cache and interconnect turtles all the way down. In other words, while message passing may be a choice today, in the future, it may well be a requirement if we want to extract the full capabilities of the hardware.


Turtles all the way up: Web Architecture


Most interesting of all, we can find the exact same architecture patterns and their associated problems in the web world. We start with a single machine running the app server and the database (CPU and main memory), which we later split into separate instances (multiple app servers share a remote DB, aka 'multi-core'), and eventually we shard the database (distributed memory) to achieve the required throughput. The similarity of the challenges and the approaches seems hardly like a coincidence. It is turtles all the way down, and it is turtles all the way up.


Threads, Events & Message Passing


As software developers, we are all intimately familiar with the shared memory model and the good news is: it is not going anywhere. However, as the core counts continue to increase, it is also very likely that we will quickly hit diminishing returns with the existing shared memory model. So, while we may disagree on whether threads are a correct application level API (see process calculi variants), they are also not going anywhere - either the VM, the language designer, or you yourself will have to deal with them.


With that in mind, the more interesting question to explore is not which abstraction is 'correct' or 'more performant' (one can always craft an optimized workload), but rather how do we make all of these paradigms work together, in a context of a simple programming model? We need threads, we need events, and we need message passing - it is not a question of which is better.




"

30/06/2010

Winning the Big Data SPAM Challenge__HadoopSummit2010


View more presentations from Yahoo Developer Network.

App Engine SDK 1.3.5 Released With New Task Queue, Python Precompilation, and Blob Features

Today we are happy to announce the 1.3.5 release of the App Engine SDK for both Python and Java developers.

Due to popular demand, we have increased the throughput of the Task Queue API, from 50 reqs/sec per app to 50 reqs/sec per queue. You can also now specify the amount of storage available to the taskqueue in your app, for those with very large queues with many millions of tasks. Stay tuned for even more Task Queue scalability improvements in the future.

Additionally, in this release we’ve also added support for precompilation of Python source files to match the same feature we launched for Java last year. For Python, you can now use precompilation to speed up application loading time and to reduce CPU usage for new app instances. You can enable precompilation by including the following lines in your app.yaml file:

derived_file_type:
- python_precompiled

This will start offline precompilation of Python modules used by your app when you deploy your application. Currently precompliation is off by default for Python applications, but it will be enabled by default in some future release. (Java precompilation has been enabled by default since the release of 1.3.1.)

To give you a taste of what this feature is like, we tested this on a modified version of Rietveld (which included a copy of Django 1.0.4 in the app directory, and which did not use the datastore in its base url). The latency and CPU usage results for the initial load of the application, after uploading a new version of the app and requesting the homepage, were:

Before precompilation enabled:
Test 1: 1450ms 1757cpu_ms
Test 2: 1298ms 1523cpu_ms
Test 3: 1539ms 1841cpu_ms
After precompilation enabled:
Test 1: 805ms 669cpu_ms
Test 2: 861ms 702cpu_ms
Test 3: 921ms 803cpu_ms

Of course, any individual app’s performance will vary, so we recommend that you experiment with the setting for your application. Please submit your feedback and results to the support group!

In addition to Task Queues and Python precompilation, we have made a few changes to the Blobstore in 1.3.5 First, we have added file-like interfaces for reading Blobs. In Python, this is supported through the BlobReader class. In Java, we have implemented the BlobstoreInputStream class, which gives an InputStream view of the blobs stored in Blobstore.

http://googleappengine.blogspot.com/2010/06/app-engine-sdk-135-released-with-new.html

28/06/2010

Weak Consistency and CAP Implications

Weak Consistency and CAP Implications: "

Migrating your web application from a single node to a distributed setup is always a deceivingly large architectural change. You may need to do it due to a resource constraint of a single machine, for better availability, to decouple components, or for a variety of other reasons. Under this new architecture, each node is on its own, and a network link is present to piece it all back together. So far so good, in fact, ideally we would also like for our new architecture to provide a few key properties: Consistency (no data conflicts), Availability (no single point of failure), and Partition tolerance (maintain availability and consistency in light of network problems).


Problem is, the CAP theorem proposed by Eric Brewer and later proved by Seth Gilbert and Nancy Lynch, shows that together, these three requirements are impossible to achieve at the same time. In other words, in a distributed system with an unreliable communications channel, it is impossible to achieve consistency and availability at the same time in the case of a network partition. Alas, such is the tradeoff.


'Pick Two' is too simple


The original CAP conjecture presented by Eric Brewer states that as architects, we can only pick two properties (CA, CP, or PA) at the same time, and many attempts have since been made to classify different distributed architectures into these three categories. Problem is, as Daniel Abadi recently pointed out (and Eric Brewer agrees), the relationships between CA, CP and AP are not nearly as clear-cut as they appear on paper. In fact, any attempt to create a hard partitioning into these buckets seems to only increase the confusion since many of the systems can arbitrarily shift their properties with just a few operational tweaks - in the real world, it is rarely an all or nothing deal.


Focus on Consistency


Following some great conversations about CAP at a recent NoSQL Summer meetup and hours of trying to reconcile all the edge cases, it is clear that the CA vs. CP vs. PA model is, in fact, a poor representation of the implications of the CAP theorem - the simplicity of the model is nice, but in reality the actual design space requires more nuance. Specifically, instead of focusing on all three properties at once, it is more productive to first focus along the continuum of “data consistency” options: none, weak, and full.


On one extreme, a system can demand no consistency. For example, a clickstream application which is used for best effort personalization can easily tolerate a few missed clicks. In fact, the data may even be partitioned by data centre, geography, or server, such that depending on where you are, a different “context” is applied - from home, your search returns one set of results, from work, another! The advantage of such a system is that it is inherently highly available (HA) as it is a share nothing, best effort architecture.


On the other extreme, a system can demand full consistency across all participating nodes, which implies some communications protocol to reach a consensus. A canonical example is a “debit / credit” scenario where full agreement across all nodes is required prior to any data write or read. In this scenario, all nodes maintain the exact same version of the data, but compromise HA in the process - if one node is down, or is in disagreement, the system is down.


CAP Implies Weak Consistency


Strong consistency and high availability are both desirable properties, however the CAP theorem shows that we can’t achieve both of these over an unreliable channel at once. Hence, CAP pushes us into a “weak consistency” model where dealing with failures is a fact of life. However, the good news is that we do have a gamut of possible strategies at our disposal.



In case of a failure, your first choice could be to choose consistency over availability. In this scenario, if a quorum can be reached, then one of the network partitions can remain available, while the second goes offline. Once the link between the two networks is restored, a simple data repair can take place - the minority partition is strictly behind, hence there are no possible data conflicts. Hence we sacrifice HA, but do continue to serve some of the clients.


On the other hand, we could lean towards availability over consistency. In this case, both sides can continue to accept reads and/or writes. Both sides of the partition remain available, and mechanisms such as vector clocks can be used to assist with conflict resolution (although, some conflicts will always require application level resolution). Repeatable reads, read-your-own-writes, and quorum updates are just a few of the examples of possible consistency vs. availability strategies in this scenario.


Hence, a simple corollary to the CAP theorem: when choosing availability under the weak consistency model, multiple versions of a data object will be present, will require conflict resolution, and it is up to your application to determine what is an acceptable consistency tradeoff and a resolution strategy for each type of object.


Speed of Light: Too Slow for PNUTS!


Interestingly enough, dealing with network partitions is not the only case for adopting “weak consistency”. The PNUTS system deployed at Yahoo must deal with WAN replication of data between different continents, and unfortunately, the speed of light imposes some strict latency limits on the performance of such a system. In Yahoo’s case, the communications latency is enough of a performance barrier such that their system is configured, by default, to operate under the “choose availability, under weak consistency” model - think of latency as a pseudo-permanent network partition.


Architecting for Weak Consistency


Instead of arguing over CA vs. CP vs. PA, first determine the consistency model for your application: strong, weak, or shared nothing / best effort. Notice that this choice has nothing to do with the underlying technology, and everything with the demands and the types of data processed by your application. From there, if you land in the weak-consistency model (and you most likely will, if you have a distributed architecture), start thinking how you can deal with the inevitable data conflicts: will you lean towards consistency and some partial downtime, or will you optimize for availability and conflict resolution?


Finally, if you are working under weak consistency, it is also worth noting that it is not a matter of picking just a single strategy. Depending on the context, the application layer can choose a different set of requirements for each data object! Systems such as Voldemort, Cassandra, and Dynamo all provide mechanisms to specify a desired level of consistency for each individual read and write. So, an order processing function can fail if it fails to establish a quorum (consistency over availability), while at the same time, a new user comment can be accepted by the same data store (availability over consistency).




"

09/06/2010

Rails Performance Needs an Overhaul

Rails Performance Needs an Overhaul: "

Browsers are getting faster; JavaScript frameworks are getting faster; MVC frameworks are getting faster; databases are getting faster. And yet, even with all of this innovation around us, it feels like there is massive gap when it comes to the end product of delivering an effective and scalable service as a developer: the performance of most of our web stacks, when measured end to end is poor at best of times, and plain terrible in most.


The fact that a vanilla Rails application requires a dedicated worker with a 50MB stack to render a login page is nothing short of absurd. There is nothing new about this, nor is this exclusive to Rails or a function of Ruby as a language - whatever language or web framework you are using, chances are, you are stuck with a similar problem. But GIL or no GIL, we ought to do better than that. Node.js is a recent innovator in the space, and as a community, we can either learn from it, or ignore it at our own peril.


Measuring End-to-End Performance



A modern web-service is composed of many moving components, all of which come together to create the final experience. First, you have to model your data layer, pick the database and then ensure that it can get your data in and out in the required amount of time - lots of innovation in this space thanks to the NoSQL movement. Then, we layer our MVC frameworks on top, and fight religious wars as developers on whose DSL is more beautiful - to me, Rails 3 deserves all the hype. On the user side, we are building faster browsers with blazing-fast JavaScript interpreters and CSS engines. However, the driveshaft (the app server) which connects the two pieces (the engine: data & MVC), and the front-end (the browser + DOM & JavaScript), is often just a checkbox in the deployment diagram. The problem is, this checkbox is also the reason why the ‘scalability’ story of our web frameworks is nothing short of terrible.


It doesn't take much to construct a pathological example where a popular framework (Rails), combined with a popular database (MySQL), and a popular app server (Mongrel) produce less than stellar results. Now the finger pointing begins. MySQL is more than capable of serving thousands of concurrent requests, the app server also claims to be threaded, and the framework even allows us to configure a database pool!


Except that, the database driver locks our VM, and both the framework and the app server still have a few mutexes deep in their guts, which impose hard limits on the concurrency (read, serial processing). The problem is, this is the default behaviour! No wonder people complain about 'scalability'. The other popular choices (Passenger / Unicorn) “work around” this problem by requiring dedicated VMs per request - that's not a feature, that's a bug!


The Rails Ecosystem


To be fair, we have come a long way since the days of WEBrick. In many ways, Mongrel made Rails viable, Rack gave us the much needed interface to become app-server independent, and the guys at Phusion gave us Passenger which both simplified the deployment, and made the resource allocation story moderately better. To complete the picture, Unicorn recently rediscovered the *nix IPC worker model, and is currently in use at Twitter. Problem is, none of this is new (at best, we are iterating on the Apache 1.x to 2.x model), nor does it solve our underlying problem.


Turns out, while all the components are separate, and its great to treat them as such, we do need to look at the entire stack as one picture when it comes to performance: the database driver needs to be smarter, the framework should take advantage of the app servers capabilities, and the app server itself can't pretend to work in isolation.


If you are looking for a great working example of this concept in action, look no further than node.js. There is nothing about node that can't be reproduced in Ruby or Python (EventMachine and Twisted), but the fact that the framework forces you to think and use the right components in place (fully async & non-blocking) is exactly why it is currently grabbing the mindshare of the early adopters. Rubyists, Pythonistas, and others can ignore this trend at their own peril. Moving forward, end-to-end performance and scalability of any framework will only become more important.


Fixing the 'Scalability' story in Ruby


The good news is, for every outlined problem, there is already a working solution. With a little extra work, the driver story is easily addressed (MySQL driver is just an example, the same story applies to virtually every other SQL/NoSQL driver), and the frameworks are steadily removing the bottlenecks one at a time.


After a few iterations at PostRank, we rewrote some key drivers, grabbed Thin (evented app server), and made heavy use of continuations in Ruby 1.9 to create our own API framework (Goliath) which is perfectly capable of serving hundreds of concurrent requests at a time from within a single Ruby VM. In fact, we even managed to avoid all the callback spaghetti that plagues node.js applications, which also means that the same continuation approach works just as well with a vanilla Rails application. It just baffles me that this is not a solved problem already.



The state of art in the end-to-end Rails stack performance is not good enough. We need to fix that.




"

27/05/2010

The future can be written in RPython now | Pyevolve

Following the recent article arguing why PyPy is the future of Python, I must say, PyPy is not the future of Python, is the present. When I have tested it last time (PyPy-c 1.1.0) with Pyevolve into the optimization of a simple Sphere function, it was at least 2x slower than Unladen Swallow Q2, but in that time, PyPy was not able to JIT. Now, with this new release of PyPy and the JIT’ing support, the scenario has changed.


The future can be written in RPython now | Pyevolve

26/05/2010

voltdb = redis + sql interface? interesting:

The Fast, Scalable Open-Source DBMS You'll Never Outgrow

Created by DBMS R&D pioneer, Mike Stonebraker, VoltDB is a next-generation open-source DBMS that scales way beyond traditional databases, without sacrificing SQL or ACID for transactional data integrity. VoltDB is for database applications that support fast-growing transactional workloads and require:

  • Orders of magnitude better performance than conventional DBMS
  • Linear scalability
  • SQL as the DBMS interface
  • ACID transactions to ensure data consistency and integrity
  • High availability 24x7x365


#igrigorik voltdb = redis + sql interface? interesting: http://bit.ly/al9XiF
Official link http://voltdb.com/

Java, JEE, JavaFx and more: A graphical counter on GAEJ (Google App Engine for Java) using Images API Service

Images Service on GAEJ provides the ability to manipulate images, thus you can composite multiple images into a single one. I'll use this possibility to display a graphical hit counter. This tutorial is only a kind of how-to. I'm sure you can write real programs using the instructions given in this post. For simplicity reasons error handling are reduced to the minimum.

The idea is pretty simple, persist a counter using Memcache or DataStore, have digits from 0 to 9 as PNG images, read images as bytes, make images and composites of these images, put all composites in a List and finally use this List to get the composed image.



Java, JEE, JavaFx and more: A graphical counter on GAEJ (Google App Engine for Java) using Images API Service

24/05/2010

Search Results: All on USTREAM, Most Views listings, All entries, page 1 of 1, 05/24/10.





Search Results: All on USTREAM, Most Views listings, All entries, page 1 of 1, 05/24/10.

First Look: H.264 and VP8 Compared - StreamingMedia.com

VP8 is now free, but if the quality is substandard, who cares? Well, it turns out that the quality isn't substandard, so that's not an issue, but neither is it twice the quality of H.264 at half the bandwidth. See for yourself, below.

To set the table, Sorenson Media was kind enough to encode these comparison files for me to both H.264 and VP8 using their Squish encoding tool. They encoded a standard SD encoding test file that I've been using for years. I'll do more testing once I have access to a VP8 encoder, but wanted to share these quick and dirty results.



First Look: H.264 and VP8 Compared - StreamingMedia.com

Alex Gaynor -- PyPy is the Future of Python

Currently the most common implementation of Python is known as CPython, and it's the version of Python you get at python.org, probably 99.9% of Python developers are using it. However, I think over the next couple of years we're going to see a move away from this towards PyPy, Python written in Python. This is going to happen because PyPy offers better speed, more flexibility, and is a better platform for Python's growth, and the most important thing is you can make this transition happen.

The first thing to consider: speed. PyPy is a lot faster than CPython for a lot of tasks, and they've got the benchmarks to prove it. There's room for improvement, but it's clear that for a lot of benchmarks PyPy screams, and it's not just number crunching (although PyPy is good at that too). Although Python performance might not be a bottleneck for a lot of us (especially us web developers who like to push performance down the stack to our database), would you say no to having your code run 2x faster?

The next factor is the flexibility. By writing their interpreter in RPython PyPy can automatically generate C code (like CPython), but also JVM and .NET versions of the interpreter. Instead of writing entirely separate Jython and IronPython implementations of Python, just automatically generate them from one shared codebase. PyPy can also have its binary generated with a stackless option, just like stackless Python, again no separate implementations to maintain. Lastly, PyPy's JIT is almost totally separate from the interpreter, this means changes to the language itself can be made without needing to update the JIT, contrast this with many JITs that need to statically define fast-paths for various operations......


Alex Gaynor -- PyPy is the Future of Python

21/05/2010

Hosted SQL on App Engine For Business

Later:

Hosted SQL
Dedicated, full-featured SQL servers available for your application.
Status: In Development
Estimate: Limited Release in Q3 2010

Google Roadmap link : http://code.google.com/appengine/business/roadmap.html

20/05/2010

Google and SpringSource join hands in the heavens

Google I/O Google and VMware's SpringSource arm have teamed up to offer a series of development tools for building Java apps that can be deployed across multiple web-based hosting services. That includes Google's own App Engine, VMware-happy infrastructure services, and third-party services such as Amazon's Elastic Compute Cloud.

http://www.theregister.co.uk/2010/05/19/google_teams_with_springsource/

Google Launches Business Version Of App Engine; Collaborates With VMware

It’s no secret that Google has been ramping up its enterprise offerings. The company has made a strong push for the adoption of Google Apps, launching the Apps Marketplace, allowing Apps users to add other layers to their environments from companies like Socialwok and Zoho. Today, Google is taking it one step further. At Google I/O today, the search giant has announced that Google App Engine, a platform for building and hosting web applications in the cloud, will now include a Business version, catered towards enterprises. The new premium version allows customers to build their own business apps on Google’s cloud infrastructure. Google is also announcing a collaboration with VMware for deployment and development of apps on the new cloud infrastructure.

Google Launches Business Version Of App Engine; Collaborates With VMware

Scalable Work Queues with Beanstalk

Any web application that reaches some critical mass eventually discovers that separation of services, where possible, is a great strategy for scaling the service. In fact, oftentimes a user action can be offloaded into a background task, which can be handled asynchronously while the user continues to explore the site. However, coordinating this workflow does require some infrastructure: a message queue, or a work queue. The distinction between the two is subtle and blurry, but it does carry important architectural implications. Should you pick a messaging bus such as AMQP or XMPP, roll your own database backed system such as BJ, go with Resque .....


http://www.igvita.com/2010/05/20/scalable-work-queues-with-beanstalk/

16/05/2010

Pycon4 dal talk di Simone Deponti

"Crogioli, alambicchi e beute dove mettere i vostri dati "

Si parla di sqlalchemy orm e zodb, come gestire i dati
e quale scelte fare in funzione delle proprie necessità.

XtraDB / InnoDB internals in drawing

Source http://www.mysqlperformanceblog.com/ , Posted by Vadim


I did some drawing exercise and put XtraDB / InnoDB internals in Visio diagram:

The XtraDB differences and main parameters are marked out.

PDF version is there http://www.percona.com/docs/wiki/percona-xtradb

14/05/2010

mia azienda funzionasse come un’organizzazione terroristica

Uno spunto interessante in questo momento di crisi , utile "secondo me " anche per permettere agli individui di essere più creativi ed indipendenti magari creando cellule che si autoformano
in base hai progetti da affrontare

Vorrei che la mia azienda funzionasse come un’organizzazione terroristica… « Meeting delle Idee - http://ow.ly/1KXOE

13/05/2010

Richard Stallman arriva nella Marche, due gli incontri in Ancona

Richard Stallman arriva nella Marche, due gli incontri in Ancona: "Ebbene sì, il nostro beniamino del movimento Open Source mondiale arriva anche nelle Marche e più precisamente in Ancona per ben due incontri:

- giovedì 13 maggio ore 17:00 – presso l’assessorato all’informatizzazione del Comune di Ancona
- venerdì 14 maggio ore 10:30 – presso l’Aula A7/8 della Facoltà di Ingegneria dell’Università Politecnica delle Marche

"

10/05/2010

Amazon Web Services sign-up tutorial slide

Pixar American Mathematical Society

Moving Remy in Harmony: Pixar's Use of Harmonic Functions

This article will describe some new mathematical techniques being tested at Pixar for use in upcoming films...


American Mathematical Society

09/05/2010

Oltre il Pycon Pybirra !!!!!!!!

http://twitpic.com/1m6ofi

Pycon 2010 Effective EC2 talk

Pycon 2010 tanto per capirci, due giorni di talk all'insegna dell' innovazione un talk che mi è piaciuto

http://www.pycon.it/conference/talks/effective-ec2
Sunto, una start-up reale "http://www.adroll.com/"
Ecco i numeri
+ di 300 instanze attive (server virtuali) attivate in 15 minuti ,
-caching dns, proxy-requesting, auto low-up delle istanze
-remap del network automatico su centinaia di ip pubblici,
-fault-tollerance storage su decine di terabyte ,
solo fantscenza in italy :-(

08/05/2010

Comet web applications with Python, Django & Orbited

Comet web applications with Python, Django & Orbited

The temple of idle

The temple of idle

Yes, this incidentally means someone is so masochist that decided to hire me.

Anyway, the company I work for (Abstract Open Solutions) is in the middle of the process of switching towards git, and I sent a mail to the whole list detailing my (brief) experience into converting from SVN while still using SVN as upstream (our git server, using gitorious, is still in the works).

Since that might be useful to someone else, I decided to post the mail I sent to the internal mailing list, without censoring it.

Pycon4 Python FUSE – Beyond the Traditional File-Systems di Matteo Bertozzi

Solo lodi per Matteo Bertozzi, inciso e chiaro, ma, soprattutto completo.
Cosa chiedere di più per un talk di 60 minuti su FUSE ?

http://www.pycon.it/conference/talks/python-fuse-beyond-the-traditional-file-systems

Per le slide del talk con gli esempi
http://mbertozzi.develer.com/python-fuse

07/05/2010

lin win mac apps

Only 3 days left to pay what you want for the Humble Indie Bundle! http://j.mp/9WtdAm less than 5 seconds agoLatest: Only 3 days left to pay what you want for the Humble Indie Bundle! http://j.mp/9WtdAm less than 5 seconds ago from web

Going global with Amazon #EC2 and DNS services: http://bit.ly/dx0Lfu #aws

Interesting read: going global with Amazon #EC2 and DNS services: http://bit.ly/dx0Lfu #aws

Cleaning the cruft away - rather luverly Dust-me add-on to Firefox http://ur1.ca/z97m - spider whole site and pinpoint redundant CSS.

Cleaning the cruft away - rather luverly Dust-me add-on to Firefox http://ur1.ca/z97m - spider whole site and pinpoint redundant CSS.: " Cleaning the cruft away - rather luverly Dust-me add-on to Firefox http://ur1.ca/z97m - spider whole site and pinpoint redundant CSS."

06/05/2010

Google insegna: exploit e sicurezza

Google insegna: exploit e sicurezza: "Mountain View mette online un sito pieno di bug. Con l'obiettivo di formare gli sviluppatori. E porre il problema del buon codice sicuro come una priorita', senza che nessuno si faccia male

'Amazing Python' is now available without registration

'Amazing Python' is now available without registration: "Three weeks before going live with ThinkCode.TV, we decided to release a free screencast about solving ASCII mazes in Python. The 19 minute 'bite screencast' was made available to anyone who joined our newsletter, and a few hundred people took advantage of this opportunity.



Now, there is nothing wrong in giving away a freebie as an incentive for joining one's newsletter, and we may do so again in the future. However, we feel it's time to release Amazing Python for free, without the need to register.



We feel that this move fits well with our strategy to 'create more value than we capture'.
"

Future of RDBMS is RAM Clouds & SSD

Future of RDBMS is RAM Clouds & SSD: "

Rumors of the demise of relational database systems are greatly exaggerated. The NoSQL movement is increasingly capturing the mindshare of the developers, all the while the academia have been talking about the move away from 'RDBMS as one size fits all' for several years. However, while the new storage engines are exciting to see, it is also important to recognize that relational databases still have a bright future ahead - RDBMS systems are headed into main memory, which changes the playing field all together.


Performance is only one aspect that influences the choice of a database. Tree and graph structures are not easy to model within a relational structure, which in turn leads to complicated schemas and system overhead. For that reason alone, document-stores (Tokyo, CouchDB, MongoDB), graph stores (Neo4J), and other alternative data structure databases (Redis) are finding fertile ground for adoption. However, the end of 'RDBMS as one size fits all' does not mean the end of relational systems all together. It is too early to bury RDBMS in favor of No (or Less) SQL. We just need to reset how we think about the RDBMS.


Disks are the New Tape


The evolution of disks has been extremely uneven over the last 25 years: disk capacity has increased 1000x, data transfer speeds increased 50x, while seek and rotational delays have only gone up by a factor of 2. Hence, if we only needed to transfer several hundred kilobytes of data in the mid 80's to achieve good disk utilization, then today we need to read at least 10MB of data to amortize the costs of seeking the data - refresh your memory on seek, rotational, and transfer times of our rusty hard drives.


When the best we can hope for is 100-200 IOPS out of a modern hard drive, the trend towards significantly larger block sizes begins to make a lot more sense. Whereas your local filesystem is likely to use 4 or 8kb blocks, systems such as Google's GFS and Hadoop's HDFS are opting out for 64MB+ blocks in order to amortize the cost of seeking for the data - by using much larger blocks, the cost of seeks and access time is once again brought down to single digit percent figures over the transfer time.


Hence, as we generate and store more and more data, the role of the disks must inevitably become more archival. Batch processing systems such as Map-Reduce are well suited for this world and are quickly replacing the old business intelligence (BI) systems for exactly these reasons. In the meantime, the limitations imposed by the random access to disk mean that we need to reconsider the role of disk in our database systems.


OLTP is Headed Into Main Memory & Flash


An average random seek will take 5-10ms when hitting the physical disk and hundreds of microseconds for accessing data from cache. Compare that to a fixed cost of 5-10 microseconds for accessing data in RAM and the benefits of a 100-1000x speed difference can be transformative. Instead of treating memory as a cache, why not treat it as a primary data store? John Ousterhout and his co-authors outline a compelling argument for 'RAMCloud'. After all, if Facebook keeps over 80% of their data in memcached, and Google stores entire indexes of the web in memory many times over, then your average database-backed application should easily fit and be able to take advantage of the pure memory model also.


The moment all of the data is available in memory, it is an entirely new game: access time and seek times become irrelevant (no disk seeks), the value of optimizing for locality and access patterns is diminished by orders of magnitude, and in fact, entirely new and much richer query models can enable a new class of data-intensive applications. In a world where the developer's time is orders of magnitude more expensive than the hardware (a recent phenomenon), this also means faster iterations and less data-optimization overhead.


The downside to the RAMCloud is the equivalent order of magnitude increase in costs - RAM prices are dropping, but dollar for dollar, RAMCloud systems are still significantly more expensive. Flash storage is an obvious compromise for both speed and price. Theoretical access time for solid-state devices is on the order of 50 microseconds for reads, and 200 microseconds for writes. However, in reality, wrapping solid-state storage in SATA-like hardware devices brings us back to ~200 microseconds for reads, or ~5000 IOPS. Though, of course, innovation continues and devices such as FusionIO’s PCI-E flash storage controller bring us back to 80 microsecond reads at a cost of ~$11 per Gigabyte.


However, even the significantly higher hardware price point is often quickly offset once you factor in the saved developer time and adjacent benefits such as guaranteed performance independent of access patterns or data locality. Database servers with 32GB and 64GB of RAM are no longer unusual, and when combined with SSDs, such as the system deployed at SmugMug, often offer a much easier upgrade path than switching your underlying database system to a NoSQL alternative.


Database Architecture for the RAMCloud


Migrating your data into RAM or Flash yields significant improvements via pure speedup in hardware, however, 'it is time for a complete rewrite' argument still holds: majority of existing database systems are built with implicit assumptions for disk-backed storage. These architectures optimize for disk-based indexing structures, and have to rely on multithreading and locking-based concurrency to hide latency of the underlying storage.


When access time is measured in microseconds, optimistic and lock-free concurrency is fair game, which leads to much better multi-core performance and allows us to drop thousands of lines of code for multi-threaded data structures (concurrent B-Trees, etc). RethinkDB is a drop-in MySQL engine designed for SSD drives leveraging exactly these trends, and Drizzle is a larger fork of the entire MySQL codebase aimed at optimizing the relational model for 'cloud and net applications': massively distributed, lightweight kernel and extensible.


Migrating Into Main Memory


Best of all, you can start leveraging the benefits of storing your data in main memory even with the existing MySQL databases - most of them are small enough to make the memory buffers nothing but a leaky abstraction. Enable periodic flush to disk for InnoDB (innodb_flush_log_at_trx_commit=2), and create covering indexes for your data (a covering index is an index which itself contains all the required data to answer the query). Issue a couple of warm-up requests to load the data into memory and you are off to the races.


Of course, the above strategy is at best an intermediate solution, so investigating SSD’s as a primary storage layer, and if you are adventurous, give RethinkDB a try. Also keep an eye on Drizzle as the first production release is aimed for summer of 2010. Alternative data storage engines such as Redis, MongoDB and others are also worth looking into, but let us not forget: laws of physics still apply to NoSQL. There is no magic there. Memory is fast, disks are slow. Nothing is stopping relational systems from taking advantage of main memory or SSD storage.




"