I'll give a shoutout too Hy. It removes the Python interface, and replace it with a LISP interface.
The neat part is that its currently compatible, and bidirectional, with 2.6, 2.7, 3.2, 3.3 and pypy.
We even added a few steps so we will have support for 3.4!
I really hope Hy catches on as the way to add DSLs to Python projects. I did this to one of my small projects and I'm very happy with the results: http://pyzmo.readthedocs.org/en/latest/#usage-hy
I have been toying with hy for the last two months and enjoying every minute. It is a very strange feeling when you are able to use the batteries-included of Python with the simple syntax of a LISP (nothing against Python's syntax). The documentation is fairly good, and library support is surprisingly good. I'm planning on using it for a very small web app (with Flask).
> It’s fast because it compiles source code to native code
Actually, that's why it's slow. pypy is significantly slower than cpython in fact, primarily for short-life scripts.
The initial start time can easily double or triple the run time in some cases.
To be fair, it's only a couple of seconds extra, but for many tiny scripts this amounts to a lot of wasted time. It's not a magical cure all in all situations.
...that said, its so fantastically easy to drop in, in most cases its worth trying to see how it performs just for fun~
Sooo... the next logical step would be to add to pypy the ability to produce a natively executable file? Or has that already been done? (Disclaimer: I know relatively little about the python ecosystem).
I'm a little rusty on the details, but IIUC the core of PyPy is an "abstract interpreter" for a restricted subset of Python known as RPython. Abstract interpretation is essentially a fold (in the functional programming sense) of the IR for a language, where you can parameterize the particular operation applied to each node. When the operation specified is "evaluate", you get partial evaluation, as described in the Wikipedia article.
The interesting part of PyPy happens when the RPython program to be evaluated is itself a Python interpreter, which it is. In this case, 'prog' is the Python interpreter written in RPython, and Istatic is your Python program. The first Futamura projection will give you a JIT; you specialize the RPython interpreter with your particular program, and it evaluates everything statically known at compile time (namely, your program) and produces an executable.
The second Futamura projection will give you a JIT compiler. Remember, since RPython is a subset of Python, an interpreter for Python written in RPython could be interpreted by itself. Moreover, it could be compiled by itself by the process described in the paragraph above. When you compile your JIT through the same process, you get a program that means the same thing (i.e. it will translate a Python program into an executable) but runs faster. The PyPy JIT that everybody's excited about for running Python scripts faster is this executable.
The third Futamura projection involves running PyPy on the toolchain that generated PyPy. Remember, this whole specializer machinery is all written in RPython, which can be interpreted by itself. So when your run the specializer machinery through itself, you get - an optimized toolset for building JIT compilers. That's what made programming language theorists excited about PyPy long before the Python community was. It's not just "faster Python", but it's a toolset for building a JIT out of anything you can build an interpreter for. Write an interpreter in RPython - for anything, Ruby, PHP, Arc, whatever - and run it through the PyPy toolchain, and it will give you a JIT compiler.
Futamura projections are what PyPy is doing, as described in this blog post. PyPy uses the first and second projections. In fact, it's likely that the author is actually familiar with the Futamura projections, but wanted to describe PyPy in more detail instead of relying on jargon.
Well, with a habitat that reaches across Africa and Asia, and their ability to thrive in rain forests and grasslands, there are.... Oh. Ohhh. Never mind.
Funny that an article written to describe why there are so many Python implementations, fails when comparing Python with C and Java.
There are quite a few interpreters available for C.
Although Sun/Oracle's implementation is JIT based only, there are other Java vendors with toolchains that support ahead of time compilation to native code.
Aye. The article also makes the well-know blunder of asking this question:
> Is Python interpreted or compiled? The question isn't really well-formed.
which is then repeated:
> C compiles to machine code, which is then run directly on your processor. Each instruction instructs your CPU to move stuff around.
No, "C" does not "compile" to machine code. The C compiler compiles C, and sometimes does so to machine code, and there are a C compilers that compile first to a form of bytecode or other symbolic representation, then to native. <language> is compiled/interpreted is a sad leftover from the heavily-diluted introductory computer books in the 1980s.
I think another source of confusion is that the term "compiler" has gained this connotation that all compilers generate low-level code from a high-level source language. However, once you understand the concepts, you realize that many things are technically compilers; source-to-source translators, for example, or ORM code generators.
Hell, I would call Python's own 2to3 tool a compiler: it generates a full parse tree using a grammar, it runs some translation rules, and it generates Python 3 code.
That's the definition of a compiler though, "A compiler is a computer program (or set of programs) that transforms source code written in a programming language (the source language) into another computer language (the target language)."
All a compiler is is a combination of a translator and (possibly - but not necessarily) an optimiser.
The current release of 2.7 is 2.7.6. New bugfix releases are planned until at least 2015 (this is from the release pep: http://www.python.org/dev/peps/pep-0373/ ).
I don't like 3.x because A) I can't unpack tuples in lambda or function arguments and B) the new "encode"/"decode" scheme sucks. They have the right idea, but it's implemented poorly. I miss being able to do .encode('hex') or .decode('hex'), and I think the reasoning against this is weak. There are a lot of other little issues like this that remove a lot of the expressivity I'm used to from 2.7. Those are just the two that bother me the most.
Shouldn't it be possible/feasible to form Python 2.7 and backport bugs/performance improvements from the 3.x branch? I am surprised that someone like Redhat has not done this yet.
The Python community will not like it, but if there is enough demand/contributors to make it so, surely they will not refuse patches?
Someone else mentioned future, http://python-future.org but I thought I would give it another shout out.
> future is the missing compatibility layer between Python 2
> and Python 3. It allows you to use a single, clean
> Python 3.x-compatible codebase to support both Python 2
> and Python 3 with minimal overhead.
So in effect, you can run with some Python3 isms on Python2. All your Python is now 3 like and you can start migrating without losing the underlying infrastructure.
This is basically an article for beginners that explains the difference between a language's specification and its actual implementation, using Python as its illustration.
You may be missing some specific extensions that are based on C, though. When I tried it last, the database libs I used were not functional. That said, there pure-Python database libraries that worked just as well. The performance loss was negligible -- database IO completely drowned out any measurable loss in my use case.
Luckily I mostly use CouchDB, which communicates over HTTP, which shouldn't be a problem. I wonder whether uWSGI is compatible though. Which server implementation do you use?
The parent is referring to the performance of the database driver. The performance of the database driver doesn't affect the performance of the rest of the application.
The interesting part is that most of the bytecode-level security problems in the Java world come from the JIT compilation, where it stops being bytecode again.
But yes, bytecode isn't magic pixi dust that makes things secure; it can be a tool to make things more secure though, with effort.
Write a plugin that downloads arbitrary x86 or arm code from the internet and jumps into it, install it on half the world's browsers, and wait a few months. You could say that that new plugin has only one security issue: it downloads code and executes it, but in practice, its security issues will dwarf those of the Java plugin. It's just very hard to build something that can do arbitrary useful stuff at good speed, yet protects you from those attempting to misuse it. Could that be done better? Nowadays, it probably could, as we can afford to run have in a separate process, and add extra layers of code between the JVM and, for example, the DOM, the filesystem, etc.
Question: This, and many other posts, suggest that PyPy is the way to go (as a user). Faster and compatible - what's not to like? But elsewhere I read, 'hold on'. It actually isn't faster with things that use a lot of C optimizations (like numpy and pandas, which I use extensively). I don't mind small startup JIT penalties, but I don't want to have my core code run more slowly, or fail. Is there a simple direction? (ie: Use standard cPython or use PyPy) Or is it, as always, 'it depends'?
Using Either in Haskell would require changing the program a little to add injections. But that exact program (modulo syntax) can be written in Typed Racket:
#lang typed/racket
(: choose : (All (A) (Listof A) -> A))
(define (choose e)
(list-ref e (random (length e))))
(define x (choose (list 1 "foo")))
Unfortunately, people tend to assume that "type systems" and "the type system in Java/ML/Haskell" have the same level of expressiveness.
I just realized that all programs have types and type systems. Some dynamically typed programs have hairy n-dimensional fractal type systems that no human would ever figure out (usually from bad code).
The inferred languages just have this dude that watches your code and says, "yeah, I understand what is going on, your stuff is internally consistent"
Well, I don't think that's correct -- it would make the term "type system" meaningless. In particular, lots of programs have bugs that type systems can catch.
I don't think it makes the standard definition of "type system" meaningless. Maybe what I am referring to above is a "type space".
I agree with some of what you said in the medium link. Python does have a type system, structural and dynamic strong. The way the plus operator works was a mistake, Lua is superior in this regard.
Take a look at the type inferencing that Shedskin and RPython in PyPy can do. If there wasn't a type system, how are those 'static strong' programs created?
The programer and the program has a type system where the language may not.
As I say pretty explicitly in linked post, Python doesn't have a type system. Also, the term "strong" type system doesn't mean anything. Also^2, "structural" type systems are about how types are compared, and to whatever degree Python compares types, it isn't structurally.
RPython is a different programming language, which is statically type. Shedskin is similar.
RPython and Shedskin as not different languages, they are subsets. All programs in RPython and Shedskin are proper python programs with the same semantics.
So when I use quickcheck with Python or Erlang I am creating a different language? Linguistically I don't understand how a proper Shedskin program which is also a Python program with the same semantics is in another language. By that definition, the runtime system effects what the language is, but this happens after it is written, so how can it do that?
No, quickcheck is a testing tool, not a type system. It never rejects your program.
What you're saying is that Shedskin and Python have a non-empty intersection. But would the language of numbers with `+` be the "same language" as Python? It has the same relationship as Shedskin.
I think I get what you are saying, that you have to take the totality of language in the current context.
And I would have to say yes, the language of numbers with "+" is in fact Python. I think it is also Haskell and lots of other languages. If I look at a corpus of language through a slit, I will always see a subset of that language. But it doesn't stop being that language by looking at with a microscope. What happens we see a code snippet like
a = 5 + b - 10
We don't know what language it is, but it doesn't matter. It is valid Ruby,Python,Lua,JavaScript, etc.
Are we,me splitting hairs? I am just a layperson, I would like to fix any discrepancies in my knowledge.
Rather than saying that Haskell solves this with the `Either` type, I believe it is more accurate to say that in this case, the return value of `random.choice` would be a "sum type." But the actual type would be dependent on the value passed to `random.choice`.
No, using the Either type wouldn't be the same at all. Yes, you can do choice [Left "foo", Right 42] (from Data.Arbitrary, used by QuickCheck) but it's not really the same thing. The two elements in the list both have the type Either String Int, unlike the Python example. Further, an Either can have only two possible kinds of values (but no-one prevents you from writing data Either3 a b c = Left a | Middle b | Right c).
You can achieve something like this with type classes, though. This is somewhat similar to requiring all the elements to fullfill an interface, kind of like List<Comparable> in Java-like languages. With existential quantification (there are other options) you can make a list where every element in a list can be of a different type as long as they derive from the same type class (like Eq, Ord, etc). The type class is still a constraint that Python doesn't have but it makes static typing possible.
"CPython makes it very easy to write C-extensions for your Python code because in the end it is executed by a C interpreter."
Say what? What C interpreter might that be? In the end it is executed by machine code compiled from C programs designed so as to implement a Python virtual machine. Much, much different and much harder to make the intended connection.
Now why, really, is it easy to write C extensions for your Python code?
Since it is written in C, it is easy for the interpreter to define a C API that exposes the Python runtime (Python.h). This means python C extensions can have access to the python runtime.
Awesome article, I thoroughly enjoyed it -- sent me on a tangent to learn more about Tracing JIT compilers, RPython (and eventually I started looking for python-interpreted lisp, haha).
blog posts that helps me understand my tools/ecosystem better (rather than just harp on some recent happening in SV, for example) is awesome.
I second this. I read the other comments about the flaws in the article, and I'll take them as valid. Even barring them, I understand a lot more about the different types of python and, the different layers in the stack, and even a bit about compilation, JIT, etc. etc. Imperfect? Maybe. Extremely helpful to me? Definitely. Thanks!
Ok, thanks for clarifying that. I just looked it up and learnt than an interpreter-based approach is another option compared to other ways of supporting a dynamic typed language( by reflection, base object declaration)
This article lost credibility pretty early for carelessly conflating tokenized python with Java bytecode. That is unfair to readers that don't already understand the difference. It's not a trifle.
Using the term "scripting language" to describe things like Python and Ruby fell out of favour quite a few years ago. Calling something a scripting language implies that you can't write a full application using it. This has been proved incorrect a thousand times over.
> Using the term "scripting language" to describe things like Python and Ruby fell out of favour quite a few years ago.
Which is unfortunate, because now we dance through phrases like "things like Python and Ruby". There is a hard-to-precisely-define category here, and it's useful to have a name for it--even if the name is flawed.
And, in my opinion, it's usually better to hang on to an old name as it becomes less and less literally true than to continually fish about for a new name. ("Dynamic language" is popular now, but what happens when someone popularizes such a language with static typing?) As an analogy, consider "touchdown" in American football. You haven't had to touch the ball down to score in my lifetime, or even in my parents lifetimes, but we still call it a "touchdown" and not a "planebreaker".
The question I always have when someone says "scripting language" is "to what aspect of the language are you referring?" Unfortunately, since AFAICT the term never seems to have been precisely defined (unlike "touchdown"), different people use it differently and loosely.
I never know what exactly the speaker is referring to:
- the language definition, or some feature of it such as the type system?
- one or more (popular) language implementations?
- the intended use (e.g. short throwaway programs, desktop apps, kernel code) and target audience (e.g. experienced programmers, casual programmers, non-programmers)?
- the actual use and target audience?
- some opinion of the speaker about the language/implementation/etc. in question?
(Of course, this is just in my experience. YMMV)
----
tl,dr; in the body of my discussions, "scripting language" is a free variable and I can't find its definition! I wish people would pass it as a parameter!
To clarify: Python is not a scripting language because you can throw a .pyc over the wall to me, but Ruby is a scripting language because it has no intermediate bytecode representation?
For me, when most people say "scripting language" they really mean a dynamic language.
Perl, Ruby, Python, JavaScript, Lisp... These are all considered "scripting" languages and the main thing they have in common is the dynamic linking and type systems.
Statically typed languages like Java are generally not used for scripting purposes.
I disagree. It's still the language that is interpreted or compiled in that specific implementation. The author is also somewhat confused by the word "compilation". Most programmers use it exclusively to talk about generation of native code ahead of time.
I'm sorry, I don't think I understand the first 2 sentences of your response.
They seem to say that a language and an implementation of that language are the same thing. Am I correct that this is what they mean? (Apologies if something else was meant)
Say we have a language X whose definition says nothing about interpretation or compilation. We also have two implementations of language X, one which is an interpreter, and one which is a compiler.
Is X an interpreted language, a compiled language, or something else?
Sorry for the misunderstanding, and thanks for the clarification.
> Is X an interpreted language, a compiled language, or something else?
Both. You can take in consideration a real example: Scheme is interpreted in all its implementations and compiled in a relevant number of them.
Languages and implementations are different things, but being interpreted, compiled, JIT-ed, translated, etc. is a property of the language implemented in the... implementation.
The real question is why is there one python. Perl was already doing everything python can do and it is pointless to write every supporting library in 20 different languages. You can write good or bad code in any language; the language is nearly irrelevant. Of course, I wouldn't propose going to Bourne shell or something.
It's much harder to write correct software in Perl because there are so many bugs it doesn't catch. Almost everything converts a string that isn't numeric to zero instead of dying. ++$a{$x} doesn't even check that %a is initialized, much less actually has $x as a key. By default syscalls that fail set $? (which can be ignored and usually is) and don't die, and making them die breaks a lot of library code that only "works" because it ignores errors. And the production version still makes ridiculous legacy distinctions like
which have no expressive value (if lists and hashes had been first-class values there'd be no reason for references to exist as a type) but only cause subtle mistakes.
Global symbol "%a" requires explicit package name at ./test3.pl line 4.
Global symbol "$x" requires explicit package name at ./test3.pl line 4.
So, I'm not sure if I see that point. Are you not a fan of autovivification? Because it: rules. I think the only thing one has to understand is the use of exists(), to see if there is, say, a key in a hash, rather than defined():
#!/usr/bin/perl
use strict;
my %foo = (batz => undef);
for (qw(bar batz)){
if(exists($foo{$_})){
print "$_ exists!\n";
if(defined($foo{$_})){
print "and $_ is defined.\n";
}
else {
print "and $_ is undefined.\n";
}
}
else {
print "$_ doesn't actually exist\n";
}
}
prints,
bar doesn't actually exist
batz exists!
and batz is undefined.
As it happens, hashes in Perl 5 autovivify by default (rather like defaultdicts in Python). In retrospect, this can seem rather surprising (if not off-putting) to newcomers, so an argument can be made that a more conservative default behavior should have been chosen (like the more "hard-nosed" behavior of standard dicts in Python).
OTOH Perl people tend to like the way we can, for example, create histograms (and nested structures like graphs) a bit more fluently with autovivifying hashes, e.g.:
Or as Perlistas would be more prone to write (if forced to use a non-vivifying hash):
exists $a{$x} ? $a{$x}++ : { $a{$x} = 0 }
Or gosh, who knows:
$a{$x} = exists $a{$x} ? $a{$x}+1 : 0
Or to have sat around and waited for 10 years or so until Hash::Util was invented (after Perl 4's initial release), provided then you knew to look for it, and how), at which point you'd finally be able to say:
use Hash::Util 'unlock_keys';
But as it happens, when Perl 4 first came out, autovivification (and its cohort, promiscuous ducktyping) were chosen as the default behavior for many of the language's core constructs (not just hashes and arrays). It's just the way Perl rolls. Whether it's the "right" tradeoff to make or not, from a socio-engineering perspective, is an interesting question (and partially a matter of taste). But in general Perl has been pretty consistent in its approach to tradeoffs like these.
>By default syscalls that fail set $? (which can be ignored and usually is) and don't die, and making them die breaks a lot of library code that only "works" because it ignores errors
The solution to this is to use the autodie pragma (https://metacpan.org/pod/autodie) which has been part of the Perl core since 5.10.1 (released in Aug 2009):
use 5.016;
use warnings;
use autodie;
open my $fh, '<', "somefile"; # this file doesn't exist
So above will now produce a runtime error - Can't open 'somefile' for reading: 'No such file or directory' at...
And autodie is lexically scoped so it won't break any library code.
And the production version still makes ridiculous legacy distinctions like...
You're ignoring context, which is a fundamental design decision in Perl. That's like criticizing C for having pointers while ignoring the design decision to expose memory as a flat space.
https://github.com/hylang/hy
It's a great example of how Python isn't that different from LLVM and JVM!