“Alike” by Daniel Martínez Lara & Rafa Cano Méndez

Python sort by multiple keys

Say we have

L = ["a", "b", "zz", "zzz", "aa"]

We want to sort by length first, then by alphabetical order. i.e. we want to obtain [‘a’, ‘b’, ‘aa’, ‘zz’, ‘zzz’] in the end. How to do that?

The usual sorted() will give us this.

>>> sorted(L)
['a', 'aa', 'b', 'zz', 'zzz']

Very much alphabetical, not what we want.

Try sorted() with key=len will give us this.

>>> sorted(L, key=len)
['a', 'b', 'zz', 'aa', 'zzz']

The string length is taken care of, but that’s just half of it.

Notice zz appears before aa, that’s because zz is before aa in the original input list. The compare function len thinks they are equal.

Turns out we can expand the compare function a little to use a lambda function, so that we can pack in a little more custom logic into the comparison operation.

>>> sorted(L, key=lambda x: len(x))
['a', 'b', 'zz', 'aa', 'zzz']

This gives us the same result. But with a lambda function, we can do more than just giving a built-in or predefined function.

Since we want to sort by length first, then by alphabetical order, how should we look at a string? What is “aa” compared to “zzz”? We can look at “aa” as (2, “aa”), which is a tuple with int 2 and the string “aa”. For “zzz” that’s (3, “zzz”).

So now, we can do our comparison using tuples.

>>> sorted(L, key=lambda x: (len(x), x))
['a', 'b', 'aa', 'zz', 'zzz']

Now the resulting order seems right. To sort in descending order for a particular field, just put a negative sign there to reverse the order.

The first 20 hours — how to learn anything | Josh Kaufman | TEDxCSU

Just put in 20 hours, no need to set aside 10,000 hours.

Python global interpreter lock

This article explains the python GIL pretty well.

In Python 2, a thread can at most perform 100 instructions/ticks before releasing the GIL. This setting can be changed with sys.setcheckinterval(). Then the OS will decide which thread will pick up the GIL next, possibly the same thread again based on the criteria the OS uses. The problem here is that the low-priority threads may keep waiting forever.

In Python 3, a thread can at most run for 5ms before releasing the GIL. Then with some system-level signals the same thread won’t pick up the GIL again immediately after. Sounds more fair. However since an I/O call will lead to voluntarily releasing the GIL, a thread with many I/O calls will keep losing the GIL to other CPU bound threads each time by 5ms (at most, say there is a long-running thread). So worst case scenario a thread with 1000 I/O calls will have an overhead of 1000 * 5ms = 5 sec. That’s quite something since you can save that 5 sec by running a non-threaded version. Ironic it is.

The root of the problem here is the GIL, or the fact that CPython is single-threaded for its memory management. Jython, IronPython, etc took away the GIL using transactional memory, but they don’t support C extensions.

Hopefully someday we will have CPython without the GIL.

Airflow Web Authentication

My airflow version is v.1.10.1 and I am using python 3.5.2

I tried user._set_password but after looking at the users db table the password field is null…

After looking at the source, the following works for me.

from flask_bcrypt import generate_password_hash
user._password = generate_password_hash('your_password', 12)

Be sure to use SSL.

















Rolling Knee Pads

Such a good idea.