Planet SciPy

Anaconda 2019-03-20 18:24:49

Announcing Public Anaconda Package Download Data

I’m very happy to announce that starting today, we will be publishing summarized download data for all conda packages served in the Anaconda Distribution, as well as the popular conda-forge and bioconda channels.  The dataset…

The post Announcing Public Anaconda Package Download Data appeared first on Anaconda.

Matthieu Brucher's blog 2019-03-18 08:40:37

Book review: Flask: Building Python Webservices

I’m thinking of writing a Web service for a project of mine. For this purpose, I wanted to learn Flask (and a bunch of other technologies), as Flask seems well established and well documented. This is a book from Packt that agglomerates 3 previously released books. One of the main questions is the relevance of […]
While My MCMC Gently Samples 2019-03-15 14:00:00

Computational Psychiatry: Combining multiple levels of analysis to understand brain disorders - PhD thesis

I noticed that as my personal website at my former university went down that my PhD thesis could not be found anywhere, so I'm posting it here.

During my PhD I explored how machine learning and computational modeling of the brain can be used to improve our understanding, and diagnostics …

Python – Meta Rabbit 2019-03-12 12:00:51

NIXML: nix + YAML for easy reproducible environments

The rise and fall of bioconda A year ago, I remember a conversation which went basically like this: Them: So, to distribute my package, what do you think I should use? Me: You should use bioconda. Them: OK, that’s interesting, but what about …? Me: No, you should use bioconda. Them: I will definitely look … Continue reading NIXML: nix + YAML for easy reproducible environments
Anaconda 2019-03-11 16:17:46

Understanding and Improving Conda’s performance

Lately, we have been responding to issues about Conda’s speed.  We’re working on it and we wanted to explain a few of the facets that we’re looking at to solve the problem.   TL;DR: make…

The post Understanding and Improving Conda’s performance appeared first on Anaconda.

Anaconda 2019-03-11 15:58:34

End of Life (EOL) for Python 2.7 is coming. Are you ready?

End of Life (EOL) for Python 2.7 is coming. Are you ready? We all knew it was coming. Back in 2014 when Guido van Rossum, Python’s creator and principal author, made the announcement, January 1,…

The post End of Life (EOL) for Python 2.7 is coming. Are you ready? appeared first on Anaconda.

Living in an Ivory Basement 2019-03-01 23:00:00

Sustaining open source: thinking about communities of effort

Thinking about how to sustain open source.

Living in an Ivory Basement 2019-02-28 23:00:00

My recent reading re sustaining open communities

What has Titus been reading lately?

Filipe Saraiva's blog 2019-02-24 22:26:11

Reduzindo a pilha

Sou fã de quadrinhos desde criança. As primeiras revistas que ganhei foram na primeira metade dos anos 90, alguns Mickeys, Mônicas, Trapalhões e X-Men. Em 98 comecei a comprar X-Men, Fabulosos X-Men e Wolverine, até os primeiros números da famigerada X-Men Premium. Sem dinheiro, enveredei pelos mangás e histórias fechadas. Quando a Panini começa a... [Read More]
Living in an Ivory Basement 2019-02-21 23:00:00

Threat models for open online scientific engagement?

What threats are there for scientists in engaging in open online discussions?

Anaconda 2019-02-14 21:26:28

Intake released on Conda-Forge

Intake is a package for cataloging, finding and loading your data. It has been developed recently by Anaconda, Inc., and continues to gain new features. To read general information about Intake and how to use…

The post Intake released on Conda-Forge appeared first on Anaconda.

Matthieu Brucher's blog 2019-02-06 08:13:29

Announcement: Audio TK 3.1.0

ATK is updated to 3.1.0 with heavy code refactoring. Old C++ standards are now dropped and it requires now a full C++17 compliant compiler. The main difference for filter support is that explicit SIMD filters using libsimdpp have been dropped while tr2::simd becomes standard and supported by gcc, clang and Visual Studio. Download link: ATK […]
Anaconda 2019-01-30 21:55:23

RPM and Debian Repositories for Miniconda

Conda, the package manager from Anaconda, is now available as either a RedHat RPM or as a Debian package. The packages are the equivalent to the Miniconda installer which only contains Conda and its dependencies.…

The post RPM and Debian Repositories for Miniconda appeared first on Anaconda.

Anaconda 2019-01-30 21:38:14

Conda 4.6 Release

The latest set of major Conda improvements are here, with version 4.6.  This release has been stewing for a while and has the feature list to show for it.  Let’s walk through some of the…

The post Conda 4.6 Release appeared first on Anaconda.

Matthieu Brucher's blog 2019-01-29 08:29:44

Book review: Effective Amazon Machine Learning

All major cloud providers provide some support for Machine Learning algorithms. They also evolve all the time. There are not many books ont he subject, due to the evolution of these services, so let’s have a look at this one. Discussion I actually started a long, traditional review, but couldn’t really go on. The main […]
Filipe Saraiva's blog 2019-01-29 01:48:22


A voz feminina robótica (chegamos no tempo onde questão de gênero e robôs podem se confundir) soou, estranha e familiar como sempre, assim que o carro finalizou a curva para a direita: “Você entrou na Avenida Universitária; o limite de velocidade é 60 quilômetros por hora”. Meu pai sorriu e começou a falar: – Desde... [Read More]
While My MCMC Gently Samples 2019-01-21 15:00:00

My foreword to "Bayesian Analysis with Python, 2nd Edition" by Osvaldo Martin

When Osvaldo asked me to write the foreword to his new book I felt honored, excited, and a bit scared, so naturally I accepted. What follows is my best attempt to convey what makes probabilistic programming so exciting to me. Osvaldo did a great job with the book, it is …

Filipe Saraiva's blog 2019-01-21 14:39:48

Call for Answers: Survey About Task Assignment

Professor Igor Steinmacher, from Northern Arizona University, is a proeminent researcher on several social dynamics in open source communities, like support of newcomers, gender bias, open sourcing proprietary software, and more. Some of his papers can de found in his website. Currently, Prof. Igor is inviting mentors from open source communities to answer a survey... [Read More]
Living in an Ivory Basement 2019-01-15 23:00:00

Revisiting authorship, and JOSS software publications

The question du jour: how should authorship on software papers be decided?

While My MCMC Gently Samples 2019-01-14 15:00:00

Using Bayesian Decision Making to Optimize Supply Chains

(c) 2019 Thomas Wiecki & Ravin Kumar

As advocates of Bayesian statistics in data science we often have to convince business-minded colleagues or customers of the added value of such an approach. While there are many good reasons for applying Bayesian modeling to solve business problems (Sean J Taylor recently had …

Two Bit Arcade - python 2019-01-11 08:00:00

Gyroscopic 3D wireframe cube — Using a 3-axis gyro for live 3D perspective

This little project combines the previous accelerometer-gyroscope code with the 3D rotating OLED cube to produce a 3D cube which responds to gyro input, making it possible to "peek around" the cube with simulated perspective, or make it spin with a flick of the wrist.

Take a look at those …

Filipe Saraiva's blog 2019-01-08 02:21:02

Mestrado em Ciência da Computação na UFPA 2019: Inteligência Computacional para Smart Grids; Metaheurísticas

Está aberto o processo seletivo para o mestrado em ciência da computação do PPGCC-UFPA. Nesse certame, estou disponibilizando 2 vagas para alunos que desenvolverão seus trabalhos junto aos demais pesquisadores no LAAI. As vagas são voltadas para os temas de inteligência computacional aplicada a Smart Grids e estudos sobre métodos metaheurísticos de otimização. Gostaria de... [Read More]
Filipe Saraiva's blog 2019-01-05 17:43:33

LaKademy 2018

Em outubro de 2018, Florianópolis foi sede da sexta edição do LaKademy, o sprint latinoamericano do KDE. Esse momento é uma oportunidade para termos em um mesmo lugar vários desenvolvedores do KDE – tanto veteranos quanto novatos – de diferentes projetos para melhorarem os respectivos softwares em que trabalham e também planejar as ações de... [Read More]
Filipe Saraiva's blog 2019-01-05 16:59:19

LaKademy 2018

Past October 2018, Florianópolis hosted the 6th edition of LaKademy, the Latin-American KDE sprint. That moment is an opportunity to put together several KDE developers – both veterans and newcomers – from different projects in order to work for improve their respective software and plan the promotional actions of the community in the subcontinent. In... [Read More]

GPU Dask Arrays, first steps

The following code creates and manipulates 2 TB of randomly generated data.

import dask.array as da

rs = da.random.RandomState()
x = rs.normal(10, 1, size=(500000, 500000), chunks=(10000, 10000))
(x + 1)[::2, ::2].sum().compute(scheduler='threads')

On a single CPU, this computation takes two hours.

On an eight-GPU single-node system this computation takes nineteen seconds.

Combine Dask Array with CuPy

Actually this computation isn’t that impressive. It’s a simple workload, for which most of the time is spent creating and destroying random data. The computation and communication patterns are simple, reflecting the simplicity commonly found in data processing workloads.

What is impressive is that we were able to create a distributed parallel GPU array quickly by composing these three existing libraries:

  1. CuPy provides a partial implementation of Numpy on the GPU.

  2. Dask Array provides chunked algorithms on top of Numpy-like libraries like Numpy and CuPy.

    This enables us to operate on more data than we could fit in memory by operating on that data in

Two Bit Arcade - python 2019-01-01 08:00:00

3-axis Accelerometer-Gyro — Measuring acceleration and orientation with an MPU6050

Measuring acceleration and rotation has a lot of useful applications, from drone or rocket stablisation to making physically interactive handheld games.

An accelerometer measures proper acceleration, meaning the rate of change of velocity relative to it's own rest frame. This is in contrast to coordinate acceleration, which is relative to …

Leonardo Uieda 2018-12-26 12:00:00

Manage project dependencies with conda environments

TL;DR: Create a conda environment for each project, capture exact versions when possible, automate activation and updating with a bash function.

I often work on several different projects involving software: Python libraries, papers, presentations, posters, this website, etc. Each project has different dependencies and there is a non-zero chance that these dependencies might be in conflict with each other. For example, I need Python 2.7 to work on a tesseroid modeling paper with a student, while my current work on

Anaconda 2018-12-21 09:32:48

Anaconda Distribution 2018.12 Released

We are changing versioning in Anaconda Distribution from a major/minor version scheme to a year.month scheme. We made this change to differentiate between the open source Anaconda Distribution and Anaconda Enterprise, our managed data science…

The post Anaconda Distribution 2018.12 Released appeared first on Anaconda.

Matthieu Brucher's blog 2018-12-18 08:07:52

From netlist to code: strategies to implement schematics modelling, the results

Last month, I presented my latest work on Audio ToolKit at ADC 2018, namely how I turned a SPICE netlist to a filter. It is now time to present some of the results here. What is ATK Modelling? The Modelling module is responsible for generating on the fly filters from a schema description. That description […]

First Impressions of GPUs and PyData

I recently moved from Anaconda to NVIDIA within the RAPIDS team, which is building a PyData-friendly GPU-enabled data science stack. For my first week I explored some of the current challenges of working with GPUs in the PyData ecosystem. This post shares my first impressions and also outlines plans for near-term work.

First, lets start with the value proposition of GPUs, significant speed increases over traditional CPUs.

GPU Performance

Like many PyData developers, I’m loosely aware that GPUs are sometimes fast, but don’t deal with them often enough to have strong feeling about them.

To get a more visceral feel for the performance differences, I logged into a GPU machine, opened up CuPy (a Numpy-like GPU library developed mostly by Chainer in Japan) and cuDF (a Pandas-like library in development at NVIDIA) and did a couple of small speed comparisons:

Compare Numpy and Cupy
>>> import numpy, cupy

>>> x = numpy.random.random((10000, 10000))
>>> y = cupy.random.random((10000, 10000))

>>> %timeit bool((numpy.sin(x) ** 2 + numpy.cos(x) ** 2 == 1).all())
446 ms ± 53.1 ms per
Anaconda 2018-12-14 09:31:16

Python Data Visualization 2018: Where Do We Go From Here?

This post is the third in a three-part series on the current state of Python data visualization and the trends that emerged from SciPy 2018.   By James A. Bednar As we saw in Part…

The post Python Data Visualization 2018: Where Do We Go From Here? appeared first on Anaconda.

Matthieu Brucher's blog 2018-12-11 08:29:30

Fun review: the Lego Rough Terrain Crane

I started my Lego adult path with the Mk2 crane, and now Lego has a new crane. This one is bigger, meaner, in some aspects, but hopefully better as well. Bigger wheels, but half of them, red instead of yellow, broader, and double crane boon instead of a triple one, so a different set of […]
Living in an Ivory Basement 2018-12-07 23:00:00

A quick read of _The genomic and proteomic landscape of the rumen microbiome_

Using short and long reads to assemble genomes from metagenomes!

Anaconda 2018-12-06 09:22:36

Intake for Cataloging Spark

By: Martin Durant Intake is an open source project for providing easy pythonic access to a wide variety of data formats, and a simple cataloging system for these data sources. Intake is a new project,…

The post Intake for Cataloging Spark appeared first on Anaconda.

Anaconda 2018-12-04 09:21:07

Using Pip in a Conda Environment

Unfortunately, issues can arise when conda and pip are used together to create an environment, especially when the tools are used back-to-back multiple times, establishing a state that can be hard to reproduce. Most of…

The post Using Pip in a Conda Environment appeared first on Anaconda.

Matthieu Brucher's blog 2018-12-04 08:25:10

Book review: Introduction to Electrical Circuit Analysis

A few weeks ago, I presented my work on automatic code generation from an electronic schema. I have many things to talk about this subject, one of them is this book. When you start analyzing a circuit, it is important to learn how to analyze a circuit. There are lots of books on electronics, but […]
Anaconda 2018-12-03 09:19:16

Python Data Visualization 2018: Moving Toward Convergence

This post is the second in a three-part series on the current state of Python data visualization and the trends that emerged from SciPy 2018. In my previous post, I provided an overview of the…

The post Python Data Visualization 2018: Moving Toward Convergence appeared first on Anaconda.

Anaconda 2018-11-28 08:59:06

Understanding Conda and Pip

Conda and pip are often considered as being nearly identical. Although some of the functionality of these two tools overlap, they were designed and should be used for different purposes. Pip is the Python Packaging…

The post Understanding Conda and Pip appeared first on Anaconda.

Support Python 2 with Cython


Many popular Python packages are dropping support for Python 2 next month. This will be painful for several large institutions. Cython can provide a temporary fix by letting us compile a Python 3 codebase into something usable by Python 2 in many cases.

It’s not clear if we should do this, but it’s an interesting and little known feature of Cython.

Background: Dropping Python 2 Might be Harder than we Expect

Many major numeric Python packages are dropping support for Python 2 at the end of this year. This includes packages like Numpy, Pandas, and Scikit-Learn. Jupyter already dropped Python 2 earlier this year.

For most developers in the ecosystem this isn’t a problem. Most of our packages are Python-3 compatible and we’ve learned how to switch libraries. However, for larger companies or government organizations it’s often far harder to switch. The PyCon 2017 keynote by Lisa Guo and Hui Ding from Instagram gives a good look into why this can be challenging for large production codebases and also gives a good

Matthieu Brucher's blog 2018-11-27 08:58:04

Monitoring embedded space quality for classification

A few weeks ago, on StackOverflow, a user asked for an accuracy measure on the embedded space for an autoencoder. This was with Keras, but I thought it would be a nice exercise for Tensorflow as well. The idea in this case is to add a few layers to the embedded space to create a […]

Anatomy of an OSS Institutional Visit

I recently visited the UK Meteorology Office, a moderately large organization that serves the weather and climate forecasting needs of the UK (and several other nations). I was there with other open source colleagues including Joe Hamman and Ryan May from open source projects like Dask, Xarray, JupyterHub, MetPy, Cartopy, and the broader Pangeo community.

This visit was like many other visits I’ve had over the years that are centered around showing open source tooling to large institutions, so I thought I’d write about it in hopes that it helps other people in this situation in the future.

My goals for these visits are the following:

  1. Teach the institution about software projects and approaches that may help them to have a more positive impact on the world
  2. Engage them in those software projects and hopefully spread around the maintenance and feature development burden a bit
Step 1: Meet allies on the ground

We were invited by early adopters within the institution, both within the UK Met Office’s Informatics Lab

Matthieu Brucher's blog 2018-11-20 08:45:13

From netlist to code: strategies to implement schematics modelling

Today, I’m presenting at the ADC my work on analog modelling for the past year. I will make a more detailed post later this year, but I’d like to put some teasers here. SPICE net lists are an efficient way of representing electronics circuits and there are several very good free and paying simulators. Unfortunately, […] 2018-11-16 23:00:00

Notes on the Frank-Wolfe Algorithm, Part II: A Primal-dual Analysis

This blog post extends the convergence theory from the first part of my notes on the Frank-Wolfe (FW) algorithm with convergence guarantees on the primal-dual gap which generalize and strengthen the convergence guarantees obtained in the first part.

MathJax.Hub.Config({ extensions: ["tex2jax.js"], jax: ["input/TeX", "output/HTML-CSS"], tex2jax …
Matthieu Brucher's blog 2018-11-12 23:31:43

Fun review: the Lego Bugatti Chiron

This year, Lego published a set based on the Bugatti Chiron, one of the craziest cars, and built near my home town. It’s the second set in the Technic car collection series, and contrary to the Porsche, the color is inspired by a gorgeous real life car (I don’t think that the real Porsche exists…). […]
Living in an Ivory Basement 2018-11-11 23:00:00

Creating a welcoming teaching/learning environment in workshops

It takes constant work to make a welcoming teaching/learning environment!

Living in an Ivory Basement 2018-11-08 23:00:00

Repeatability in Practice (2018 version)

How we do repeatability in the DIB Lab

Stéfan van der Walt - python 2018-10-31 07:00:00

Linking to emails in org-mode (using neomutt)

Where we store links to emails in org-mode, and open them using neomutt.

Filipe Saraiva's blog 2018-10-29 15:40:06

Ode ao ódio

Ontem, acompanhando a apuração para presidente no 2º turno, chorei. Chorei de raiva. Chorei de ódio. Ódio porque aquele que levou o pleito representa uma total afronta ao mínimo do que chamamos civilidade. Ele defendeu a ditadura e a tortura, reiteradamente. Prometeu prender ou exilar opositores. Prometeu perseguir professores, artistas, a intelectualidade. Disse que irá... [Read More]
Filipe Saraiva's blog 2018-10-28 14:08:15

Eleições 2018: Minha carta para a família

Família, essa é minha última manifestação política aqui no grupo antes do resultado. Vocês me conhecem, sou professor de ciência da computação na UFPA, sou um dos responsáveis pela formação dos próximos engenheiros de software e matemáticos computacionais da nossa região. Oriento alunos na graduação, no mestrado e também no doutorado, mesmo com todas as... [Read More]
Filipe Saraiva's blog 2018-10-17 16:19:00

A arquitetura de compartilhamentos do Telegram para mitigar as fake news no WhatsApp

Fake News já se tornaram o tipo de problema que teremos que enfrentar de alguma maneira o quanto antes, ou veremos democracias sendo destruídas uma a uma. Se o caso Trump nos chamava atenção mas ainda parecia distante, as eleições brasileiras de 2018 vieram pra mostrar que o tiozão gente boa pode se converter no... [Read More]
Matthieu Brucher's blog 2018-10-16 07:09:00

Audio ToolKit: Moving to C++17

Audio ToolKit started with only C++11 a long time ago, and now with version 3.1, it’s going to be full C++17. Let’s start with the problem. In Audio ToolKit, I’m using a set of meta programming functions to enable automatic conversions between types. This enables the user to connect a float input to a double […]

So you want to contribute to open source

Welcome new open source contributor!

I appreciated receiving the e-mail where you said you were excited about getting into open source and were particularly interested in working on a project that I maintain. This post has a few thoughts on the topic.

First, please forgive me for sending you to this post rather than responding with a personal e-mail. Your situation is common today, so I thought I’d write up thoughts in a public place, rather than respond personally.

This post has two parts:

  1. Some pragmatic steps on how to get started
  2. A personal recommendation to think twice about where you focus your time
Look for good first issues on Github

Most open source software (OSS) projects have a “Good first issue” label on their Github issue tracker. Here is a screenshot of how to find the “good first issue” label on the Pandas project:

(note that this may be named something else like “Easy to fix”)

This contains a list of issues that are important, but also

Filipe Saraiva's blog 2018-09-28 04:53:50

Ciro em frente!

Faltando poucos dias para o 1º turno das eleições, aproveito o momento para declarar meu voto em Ciro Gomes e convido amigos e amigas a ponderarem e também votarem no candidato. Em um conceito bastante generoso de partidos políticos, tratam-se de organizações estruturadas em torno de uma ideia de ordenamento social e que tentam, através... [Read More]
Two Bit Arcade - python 2018-09-23 07:00:00

3D wireframe cube with MicroPython — Basic 3D model rotation and projection

An ESP2866 is never going to compete with an actual graphics card. But it has more than enough oomph to explore the fundamentals of 3D graphics. In this short tutorial we'll go through the basics of creating a 3D scene and displaying it on an OLED screen using MicroPython.

This …

Spyder Blog 2018-09-21 00:00:00

QtConsole 4.4 Released!

We're excited to announce a significant update to QtConsole—the package that powers Spyder's IPython Console interface—which the Spyder team maintains in collaboration with Project Jupyter. Two of the biggest changes—user-selectable syntax highlighting themes, and enhanced external editor/IDE integration—are already built right into Spyder, so they'll likely be of more interest if you use QtConsole standalone or with another editor/IDE. However, most of the other changes should prove quite useful within Spyder as well, and many were in fact suggested and even implemented by users of our IDE. Particular highlights include a block indent/unindent feature, Select-All (Ctrl-Shift-A) being made cell-specific, Ctrl-Backspace and Ctrl-Delete behaving more intelligently across whitespace and line boundaries, Ctrl-D allowing you to easily exit ipdb, input() and the like, and numerous smaller enhancements and bug fixes. If you'd like to learn more about what's new, please check out our article over on the Jupyter blog, where we go over the major changes in more detail, with plenty


Dask Development Log

This work is supported by Anaconda Inc

To increase transparency I’m trying to blog more often about the current work going on around Dask and related projects. Nothing here is ready for production. This blogpost is written in haste, so refined polish should not be expected.

Since the last update in the 0.19.0 release blogpost two weeks ago we’ve seen activity in the following areas:

  1. Update Dask examples to use JupyterLab on Binder
  2. Render Dask examples into static HTML pages for easier viewing
  3. Consolidate and unify disparate documentation
  4. Retire the hdfs3 library in favor of the solution in Apache Arrow.
  5. Continue work on hyper-parameter selection for incrementally trained models
  6. Publish two small bugfix releases
  7. Blogpost from the Pangeo community about combining Binder with Dask
  8. Skein/Yarn Update
1: Update Dask Examples to use JupyterLab extension

The new dask-labextension embeds Dask’s dashboard plots into a JupyterLab session so that you can get easy access to information

Gaël Varoquaux - programming 2018-09-16 22:00:00

A foundation for scikit-learn at Inria

We have just announced that a foundation will be supporting scikit-learn at Inria [1]:

Growth and sustainability

This is an exciting turn for us, because it enables us to receive private funding. As a result, we will be able to have secure employment for some existing core …

Leonardo Uieda 2018-09-14 12:00:00

Introducing Verde

Verde is a Python library for processing spatial data (bathymetry, geophysics surveys, etc) and interpolating it on regular grids (i.e., gridding).

It implements Green's functions based interpolation methods and other data processing routines. The type of gridding implemented in Verde is essentially fitting various linear models to spatial data and using them to predict new data on regular grids, which is what a lot of machine learning is all about. So Verde's gridder API is inspired on scikit-learn, the state-of-the-art for machine learning in Python. The Green's functions that make up the Jacobian matrix (aka sensitivity or feature matrix) of the linear models generally come from elastic deformation theory. For example, the bi-harmonic spline (Sandwell, 1987) implemented in verde.Spline comes from the deformation of a thin elastic plate.

I submitted a

Pythonic Perambulations 2018-09-13 17:00:00

The Waiting Time Paradox, or, Why Is My Bus Always Late?

Image Source: Wikipedia License CC-BY-SA 3.0

If you, like me, frequently commute via public transit, you may be familiar with the following situation:

You arrive at the bus stop, ready to catch your bus: a line that advertises arrivals every 10 minutes. You glance at your watch and note the time... and when the bus finally comes 11 minutes later, you wonder why you always seem to be so unlucky.

Naïvely, you might expect that if buses are coming every 10 minutes and you arrive at a random time, your average wait would be something like 5 minutes. In reality, though, buses do not arrive exactly on schedule, and so you might wait longer. It turns out that under some reasonable assumptions, you can reach a startling conclusion:

When waiting for a bus that comes on average every 10 minutes, your average waiting time will be 10 minutes.

This is what is sometimes known as the waiting time paradox.

I've encountered this idea before, and always wondered

(continued...) 2018-09-05 22:00:00

Three Operator Splitting

I discuss a recently proposed optimization algorithm: the Davis-Yin three operator splitting.

Dask Release 0.19.0

This work is supported by Anaconda Inc.

I’m pleased to announce the release of Dask version 0.19.0. This is a major release with bug fixes and new features. The last release was 0.18.2 on July 23rd. This blogpost outlines notable changes since the last release blogpost for 0.18.0 on June 14th.

You can conda install Dask:

conda install dask

or pip install from PyPI:

pip install dask[complete] --upgrade

Full changelogs are available here:

Notable Changes

A ton of work has happened over the past two months, but most of the changes are small and diffuse. Stability, feature parity with upstream libraries (like Numpy and Pandas), and performance have all significantly improved, but in ways that are difficult to condense into blogpost form.

That being said, here are a few of the more exciting changes in the new release.

Python Versions

We’ve dropped official support for Python 3.4 and added official support for Python 3.7.

Deploy on Hadoop Clusters

Over the past few months Jim Crist has bulit a suite of

Planet SciPy – I Love Symposia! 2018-08-30 04:48:05

Summer school announcement: 2nd Advanced Scientific Programming in Python (ASPP) Asia Pacific!

The Advanced Scientific Programming in Python (ASPP) summer school has had 10 successful iterations in Europe and one iteration here in Melbourne earlier this year. Another European iteration is starting next week in Camerino, Italy. Now, thanks to the generous sponsorship of CSIRO, and the efforts of Benjamin Schwessinger and Genevieve Buckley, two alumni from … Continue reading Summer school announcement: 2nd Advanced Scientific Programming in Python (ASPP) Asia Pacific!
Living in an Ivory Basement 2018-08-28 22:00:00

Abstract for SIAM: Supporting and Sustaining Open Source Software Development: the Commons Perspective

How do we support and sustain open source software development?

High level performance of Pandas, Dask, Spark, and Arrow

This work is supported by Anaconda Inc


How does Dask dataframe performance compare to Pandas? Also, what about Spark dataframes and what about Arrow? How do they compare?

I get this question every few weeks. This post is to avoid repetition.

  1. This answer is likely to change over time. I’m writing this in August 2018
  2. This question and answer are very high level. More technical answers are possible, but not contained here.
Answers Pandas

If you’re coming from Python and have smallish datasets then Pandas is the right choice. It’s usable, widely understood, efficient, and well maintained.

Benefits of Parallelism

The performance benefit (or drawback) of using a parallel dataframe like Dask dataframes or Spark dataframes over Pandas will differ based on the kinds of computations you do:

  1. If you’re doing small computations then Pandas is always the right choice. The administrative costs of parallelizing will outweigh any benefit. You should not parallelize if your computations are taking less

Two Bit Arcade - python 2018-08-27 16:00:00

Displaying images on OLED screens — Using 1-bpp images in MicroPython

We've previously covered the basics of driving OLED I2C displays from MicroPython, including simple graphics commands and text. Here we look at displaying monochrome 1 bit-per-pixel images and animations using MicroPython on a Wemos D1.

Processing the images and correct choice of image-formats is important to get the most detail …

Two Bit Arcade - python 2018-08-25 08:00:00

Driving I2C OLED displays with MicroPython — I2C monochrome displays with SSD1306

These mini monochrome OLED screens make great displays for projects — perfect for data readout, simple UIs or monochrome games.

Wemos D1 v2.2+ or good imitations. Buy
0.91in OLED Screen 128x32 pixels, I2c interface. Buy
Breadboard Any size will do. Buy
Wires Loose ends, or jumper leads.
Setting …
Two Bit Arcade - python 2018-08-23 19:00:00

Raindar — Desktop daily weather, forecast app in PyQt

The Raindar UI was created using Qt Designer, and saved as .ui file, which is available for download. This was converted to an importable Python file using pyuic5.

API key

Before running the application you need to obtain a API key from This key is unique to you …

Public Institutions and Open Source Software

As general purpose open source software displaces domain-specific all-in-one solutions, many institutions are re-assessing how they build and maintain software to support their users. This is true across for-profit enterprises, government agencies, universities, and home-grown communities.

While this shift brings opportunities for growth and efficiency, it also raises questions and challenges about how these institutions should best serve their communities as they grow increasingly dependent on software developed and controlled outside of their organization.

  • How do they ensure that this software will persist for many years?
  • How do they influence this software to better serve the needs of their users?
  • How do they transition users from previous all-in-one solutions to a new open source platform?
  • How do they continue to employ their existing employees who have historically maintained software in this field?
  • If they have a mandate to support this field, what is the best role for them to play, and how can they justify their efforts to the groups that control their budget?

This blogpost

Living in an Ivory Basement 2018-08-17 22:00:00

Can bits be the basis for a digital commons? (No.)

Bits cannot be the basis for a digital commons, because they are not rivalrous.

Spyder Blog 2018-08-14 00:00:00

Spyder 3.3.0 and 3.3.1 released!

We're pleased to release the next significant update in the stable Spyder 3 line, 3.3.0, along with its follow-on bugfix point release, 3.3.1, which is now live on PyPI and conda. As always, you can update with conda update spyder in the Anaconda Prompt/Terminal/command line (on Windows/macOS/Linux, respectively) if on Anaconda (recommended), or pip update spyder otherwise. If you run into any trouble, please carefully read our new installation documentation and consult our Troubleshooting Guide, which contains straightforward solutions to the vast majority of install-related issues users have reported.

As a new minor version (3.3), it makes several substantial changes to Spyder's underpinnings that deserve some explanation, particularly the newly modular and portable console system that's now separated into its own spyder-kernels package, opening up several new options for users running Spyder in different environments. There's also a brand-new error reporting process, new options in the IPython console, usability and performance improvements for the Variable Explorer, multiple new and changed dependency requirements

While My MCMC Gently Samples 2018-08-13 14:00:00

Hierarchical Bayesian Neural Networks with Informative Priors

(c) 2018 by Thomas Wiecki

Imagine you have a machine learning (ML) problem but only small data (gasp, yes, this does exist). This often happens when your data set is nested -- you might have many data points, but only few per category. For example, in ad-tech you may want predict …

Spyder Blog 2018-08-13 00:00:00

Spyder featured on Episode 1 of Open Source Directions web show

Quansight, the company recently founded by NumPy, SciPy and Anaconda creator Travis Oliphant to help connect companies with open source communities built around data science and machine learning, just released Episode 1 of its live webcast series, and it was all about Spyder! Spyder maintainer Carlos Córdoba, recently hired by Quansight and funded part-time to work on Spyder development as we announced a few weeks ago, was the featured guest on the show.

Carlos first shared his perspective on some of the key moments in Spyder's nearly 10-year development history, from its original creation by Pierre Raybaut and Carlos' initial involvement in the project to its more recent challenges and successes. He also demonstrated basic usage of Spyder, as well as some of its standout features, in a live on-screen demo. Carlos then went on to outline the current roadmap for Spyder 4 in the near future, and explained some of the key new features planned for it. Finally, he took

Neural Ensemble News 2018-08-10 17:48:00

NeuroML2/LEMS is moving into Neural Mass Models and whole brain networks

In the last months, as part of the Google Summer of Code 2018, I have been working on a project that aimed to implement neuronal models which represent averaged population activity on NeuroML2/LEMS. The project was supported by the INCF organisation and my mentor, Padraig Gleeson, and I had 3 months to shape and bring to life all the ideas that we had in our heads. This blog post summarises the core motivation of the project, the technical challenges, what I have done, and future steps.

NeuroML version 2 and LEMS were introduced in order to standardise the description of neuroscience computational models and facilitate the shareability of results among different research groups1. However, so far, NeuroML2/LEMS have focused on modelling spiking neurons and how information is exchanged between them in networks. With the introduction of neural mass models, NeuroML2/LEMS can be extended to study interactions between large-scale systems such as cortical regions and indeed whole brain dynamics. To achieve this,
Gaël Varoquaux - programming 2018-07-31 22:00:00

Sprint on scikit-learn, in Paris and Austin

Two weeks ago, we held a scikit-learn sprint in Austin and Paris. Here is a brief report, on progresses and challenges.

Several sprints

We actually held two sprint in Austin: one open sprint, at the scipy conference sprints, which was open to new contributors, and one core sprint, for more …

Leonardo Uieda 2018-07-26 12:00:00

Websites for Earth Scientists on the academic job hunt

This is a list of the websites I use to search for academic jobs in the Earth Sciences (geophysics, geology, oceanography, meteorology, etc). They've been very useful to me (I found my current position through the CIG mailing list) and I hope that this post can help others who are looking to take the next step in their academic careers.

These sites list everything from Masters and PhD scholarships to postdoc positions and tenure-track professorships. Note that they are biased toward the US, Canada, Oceania, and Europe.

Mailing lists

Sign up for these and get email updates when new opportunities are posted (most are updated daily):

  • ES_JOBS_NET: I get around 10 emails from this list a day. Lately, I'm seeing a lot of
Spyder Blog 2018-07-23 00:00:00

State of the Spyder, Part 2: Looking up

After sharing some major milestones, development progress, and other tidbits from the past six months in Part 1 of this series (check that one out first if you haven't already), we now have some amazing news to share with you all here in Part 2, along with other status updates. That's not all, though—Part 3 will look ahead toward Spyder 4 and beyond, unveiling and explaining our full roadmap and going over the future possibilities even further afield.

Spyder Wins NumFOCUS Development Grant

First up, we're thrilled to announce a major part of what's making that plan possible (along with your support, of course!). This May, Spyder was awarded a $3000 development grant from NumFOCUS, an organization promoting better science through open code, to help with finishing Spyder 4! NumFOCUS is a nonprofit dedicated to supporting key scientific computing projects; promoting sustainability in the open source ecosystem; educating the next generation of scientists, engineers, developers and data analysts through their flagship

Leonardo Uieda 2018-07-20 12:00:00

Introducing Pooch

A friend to fetch your sample data files.

Pooch is a Python package that manages downloading data files over HTTP and storing them in a local directory. It is meant to be used by other Python libraries that ship sample data files for use in documentation, workshops, demos, etc.

For example, your package could define a module that has functions to load sample data (like scikit-learn does). If you want the data to live on the web (like in the Github repo) instead of shipping it with your package, Pooch can keep track of it and download it to the user's computer only when it's needed.

This is what a module would look like using Pooch:

Module mypackage/
import pooch

# Get the version string from your project. You have one of these, right?
Planet SciPy – I Love Symposia! 2018-07-12 18:58:35

The road to scikit-image 1.0

This is the first in a series of posts about the joint scikit-image, scikit-learn, and dask sprint that took place at the Berkeley Insitute of Data Science, May 28-Jun 1, 2018. In addition to the dask and scikit-learn teams, the sprint brought together three core developers of scikit-image (Emmanuelle Gouillart, Stéfan van der Walt, and … Continue reading The road to scikit-image 1.0
python – Dr. Randal S. Olson 2018-07-04 20:41:32

Does batting order matter in Major League Baseball? A simulation approach

If you’ve ever watched Major League Baseball, one of the feature points of the sport is the batting line-up that each team decides upon before each game. Traditional baseball logic tells us that speedy, reliable hitters like Trea Turner should
Planet SciPy – I Love Symposia! 2018-06-20 11:19:46

What do scientists know about open source?

A friend recently pointed out this great talk by Matt Bernius, What students know and don’t know about open source. If you have even a minor interest in open source it’s worth a watch, but the gist is: in the US alone, there are about 200,000 students enrolled in a computer science major. Open source … Continue reading What do scientists know about open source?
Paul Ivanov’s Journal 2018-06-12 07:00:00

Get in it

Two weeks ago, Project Jupyter had our only planned team meeting for 2018. There was too much stuff going on for me to write a poem during the event as I had in previous years (2016, and 2017), so I ended up reading one of the pieces I wrote during my evening introvert breaks in Cleveland at PyCon a few weeks earlier.

Once again, Fernando and Matthias had their gadgets ready to record (thank you both!). The video below was taken by Fernando.

Get in it
Time suspended
Gellatinous reality - the haze
submerged in murky drops summed
in swamp pond of life

believe and strive, expand the mind
A state sublime, when in your prime you came to
me and we were free to flow and fling our
cares, our dreams, our in-betweens, our
rêves perdues, our residue -- the lime of light
the black of sight -- all these converge and
merge the forks of friction filled with fright
and more -- the float of logs that plunges deep
beyond the fray, beyond the
Two Bit Arcade - python 2018-06-11 06:00:00

7Pez — Desktop unzip application with custom window decoration

This is a functionally terrible unzip application, saved only by the fact that you get to look at a cat while using it.

The original idea reflected in the name 7Pez was actually worse — to rig it up so you had to push on the head to unzip each file …

Two Bit Arcade - python 2018-06-03 16:30:00

Failamp — Multimedia playlist & player in Python, using PyQt

Failamp is a simple audio & video mediaplayer implemented in Python, using the built-in Qt playlist and media handling features. It is modelled, very loosely on the original Winamp, although nowhere near as complete (hence the fail).

The main window

The main window UI was built using Qt Designer. The screenshot …

Two Bit Arcade - python 2018-05-25 19:00:00

Brown Note — Desktop notes app using SQLAlchemy & PyQt

Relieve your creative blockages with these interactive desktop reminders.

Brown Note is a desktop notes application written in Python, using PyQt. The notes are implemented as decoration-less windows, which can be dragged around the desktop and edited. Details in the notes, and their position on the desktop, is stored in …

Python – Meta Rabbit 2018-05-25 11:58:38

Quick followups: NGLess benchmark & Notebooks as papers

A quick follow-up on two earlier posts: We finalized the benchmark for ngless that I had discussed earlier: As you can see, NGLess performs much better than either MOCAT or htseq-count. We tried to use featureCounts too, but that completely failed to produce results for some of the samples (we gave it a whopping 1TB … Continue reading Quick followups: NGLess benchmark & Notebooks as papers
Two Bit Arcade - python 2018-05-14 06:00:00

Lucky Cat Spinning-arm Display — Python-powered Maneki-neko persistence of vision scroller

This build started as something simple: a lucky cat which would turn on and off automatically in response to some event. Since lucky cats are associated with good fortune the idea was to make one do this every time I got paid. This was working pretty well but unfortunately, after …

Two Bit Arcade - python 2018-05-07 20:00:00

NSAViewer — Webcam viewer & photo booth in Python, using PyQt

This app isn't actually a direct line from your webcam to the NSA, it's a demo of using the webcam/camera support in Qt. The name is a nod to the paranoia (or is it...) of being watched through your webcam by government spooks.

I did consider making the app …