Planet SciPy

Quansight Labs 2019-06-12 05:00:00

Labs update and May highlights

Time flies when you're having fun. Here is an update of some of the highlights of my second month at Quansight Labs.

The making of a black hole image & GitHub Sponsors

Both Travis and myself were invited by GitHub to attend GitHub Satellite in Berlin. The main reason was that Nat Friedman (GitHub CEO) decided to spend the first 20 minutes of his keynote to highlight the Event Horizon Telescope's black hole image and the open source software that made that imaging possible. This included the scientific Python very prominently - NumPy, Matplotlib, Python, Cython, SciPy, AstroPy and other projects were highlighted. At the same time, Nat introduced new GitHub features like "used by", a triaging role and new dependency graph features and illustrated how those worked for NumPy. These features will be very welcome news to maintainers of almost any project.

The single most visible feature introduced was GitHub Sponsors:

I really enjoyed meeting Devon Zuegel, Product Manager of the Open Source Economy Team at GitHub, in person after previously having had

ListenData 2019-06-09 21:07:00

String Functions in Python with Examples

This tutorial outlines various string (character) functions used in Python. To manipulate strings and character values, python has several in-built functions. It means you don't need to import or have dependency on any external package to deal with string data type in Python. It's one of the advantage of using Python over other data science tools. Dealing with string values is very common in real-world. Suppose you have customers' full name and you were asked by your manager to extract first and last name of customer. Or you want to fetch information of all the products that have code starting with 'QT'.
Table of Contents

List of frequently used string functions The table below shows many common string functions along with description and its equivalent function in MS Excel. We all use MS Excel in our workplace and familiar with the functions used in MS Excel. The comparison of string functions in MS EXCEL and Python would help you to learn
Anaconda 2019-06-03 16:28:01

Anaconda Recognized as a May 2019 Gartner Peer Insights Customers’ Choice for Data Science and Machine Learning Platforms

The Anaconda team is excited to announce that we have been recognized as a May 2019 Gartner Peer Insights Customers’ Choice for Data Science and Machine Learning Platforms. According to Gartner, “The Gartner Peer Insights…

The post Anaconda Recognized as a May 2019 Gartner Peer Insights Customers’ Choice for Data Science and Machine Learning Platforms appeared first on Anaconda.

Quansight Labs 2019-06-02 05:00:00

TDK-Micronas partners with Quansight to sponsor Spyder

TDK-Micronas is sponsoring Spyder development efforts through Quansight Labs. This will enable the development of some features that have been requested by our users, as well as new features that will help TDK develop custom Spyder plugins in order to complement their Automatic Test Equipment (ATE’s) in the development of their Application Specific Integrated Circuits (ASIC’s).

At this point it may be useful to clarify the relationship the role of Quansight Labs in Spyder's development and the relationship with TDK. To quote Ralf Gommers (director of Quansight Labs):

"We're an R&D lab for open source development of core technologies around data science and scientific computing in Python. And focused on growing communities around those technologies. That's how I see it for Spyder as well: Quansight Labs enables developers to be employed to work on Spyder, and helps with connecting them to developers of other projects in similar situations. Labs should be an enabler to let the Spyder project, its community and individual developers grow. And Labs provides mechanisms to attract and coordinate funding. Of course

Quansight Labs 2019-05-31 05:00:00

metadsl: A Framework for Domain Specific Languages in Python

metadsl: A Framework for Domain Specific Languages in Python

Hello, my name is Saul Shanabrook and for the past year or so I have been at Quansight exploring the array computing ecosystem. This started with working on the xnd project, a set of low level primitives to help build cross platform NumPy-like APIs, and then started exploring Lenore Mullin's work on a mathematics of arrays. After spending quite a bit of time working on an integrated solution built on these concepts, I decided to step back to try to generalize and simplify the core concepts. The trickiest part was not actually compiling mathematical descriptions of array operations in Python, but figuring out how to make it useful to existing users. To do this, we need to meet users where they are at, which is with the APIs they are already familiar with, like numpy. The goal of metadsl is to make it easier to tackle parts

Quansight Labs 2019-05-29 05:00:00

Community-driven open source and funded development

Quansight Labs is an experiment for us in a way. One of our main aims is to channel more resources into community-driven PyData projects, to keep them healthy and accelerate their development. And do so in a way that projects themselves stay in charge.

This post explains one method we're starting to use for this. I'm writing it to be transparent with projects, the wider community and potential funders about what we're starting to do. As well as to explicitly solicit feedback on this method.

Community work orders

If you talk to someone about supporting an open source project, in particular a well-known one that they rely on (e.g. NumPy, Jupyter, Pandas), they're often willing to listen and help. What you quickly learn though is that they want to know in some detail what will be done with the funds provided. This is true not only for companies, but also for individuals. In addition, companies will likely want a written agreement and some form of reporting about the progress of the work. To meet this

Quansight Labs 2019-05-27 05:00:00

Measuring API usage for popular numerical and scientific libraries

Developers of open source software often have a difficult time understanding how others utilize their libraries. Having better data of when and how functions are being used has many benefits. Some of these are:

  • better API design
  • determining whether or not a feature can be deprecated or removed.
  • more instructive tutorials
  • understanding the adoption of new features
Python Namespace Inspection

We wrote a general tool python-api-inspect to analyze any function/attribute call within a given set of namespaces in a repository. This work was heavily inspired by a blog post on inspecting method usage with Google BigQuery for pandas, NumPy, and SciPy. The previously mentioned work used regular expressions to search for method usage. The primary issue with this approach is that it cannot handle import numpy.random as rand; rand.random(...) unless additional regular expressions are constructed for each case and will result in false positives. Additionally, BigQuery is not a free resource. Thus, this approach is not general enough and does not scale well with the number of libraries that we would like to inspect function and attribute usage.

A more robust

Anaconda 2019-05-24 20:19:58

Intake: Discovering and Exploring Data in a Graphical Interface

Motivation Do you have data that you’d like people to be able to explore on their own? Are you always passing around snippets of code to load specific data files? These are problems that people…

The post Intake: Discovering and Exploring Data in a Graphical Interface appeared first on Anaconda.

Quansight Labs 2019-05-21 20:02:50

Spyder 4.0 takes a big step closer with the release of Beta 2!

It has been almost two months since I joined Quansight in April, to start working on Spyder maintenance and development. So far, it has been a very exciting and rewarding journey under the guidance of long time Spyder maintainer Carlos Córdoba. This is the first of a series of blog posts we will be writing to showcase updates on the development of Spyder, new planned features and news on the road to Spyder 4.0 and beyond.

First off, I would like to give a warm welcome to Edgar Margffoy, who recently joined Quansight and will be working with the Spyder team to take its development even further. Edgar has been a core Spyder developer for more than two years now, and we are very excited to have his (almost) full-time commitment to the project.

Spyder 4.0 Beta 2 released!

Since August 2018, when the first beta of the 4.x series was released, the Spyder development team has been working hard on our next release. Over the past year, we've

Spyder Blog 2019-05-20 00:00:00

Spyder 4.0 takes a big step closer with the release of Beta 2!

This blogpost was originally published on the Quansight Labs website

It has been almost two months since I joined Quansight in April, to start working on Spyder maintenance and development. So far, it has been a very exciting and rewarding journey under the guidance of long time Spyder maintainer Carlos Córdoba. This is the first of a series of blog posts we will be writing to showcase updates on the development of Spyder, new planned features and news on the road to Spyder 4.0 and beyond.

First off, I would like to give a warm welcome to Edgar Margffoy, who recently joined Quansight and will be working with the Spyder team to take its development even further. Edgar has been a core Spyder developer for more than two years now, and we are very excited to have his (almost) full-time commitment to the project.

Spyder 4.0 Beta 2 released!

Since August 2018, when the first beta of the 4.x series was released, the Spyder development team has been


The Role of a Maintainer

What are the expectations and best practices for maintainers of open source software libraries? How can we do this better?

This post frames the discussion and then follows with best practices based on my personal experience and opinions. I make no claim that these are correct.

Let us Assume External Responsibility

First, the most common answer to this question is the following:

  • Q: What are expectations on OSS maintainers?
  • A: Nothing at all. They’re volunteers.

However, let’s assume for a moment that these maintainers are paid to maintain the project some modest amount, like 10 hours a week.

How can they best spend this time?

What is a Maintainer?

Next, let’s disambiguate the role of developer, reviewer, and maintainer

  1. Developers fix bugs and create features. They write code and docs and generally are agents of change in a software project. There are often many more developers than reviewers or maintainers.

  2. Reviewers are known

Living in an Ivory Basement 2019-05-14 22:00:00

Using GitHub for janky project reporting - some code

We scripted GitHub for lightweight project reporting

Paul Ivanov’s Journal 2019-05-13 07:00:00

My first DNF (Ft Bragg 600k)

It's been six years since my first ride with The San Francisco Randonneurs and four years since my first 200k. I've ridden 18 rides that are at least that distance since then (3x 300k, 2x 400k, 1x 600x), completing my first Super Randonneur Series (2-, 3-, 4-, and 600k in one year) last year after not riding much the year before that. And this weekend I had my first DNF result on the Fort Bragg 600k. I Did Not Finish.

The best response to my choice of abandoning the ride to enjoy the campground came from Peter Curley, who said "That was a very mature decision." A clear departure from typical randonneuring stubbornness and refusal to give up, I celebrated my decision to quit as a victory when I arrived at the campground and made my announcement to the volunteers. I think I was so energetic about it that they did not believe me. I was being kind to myself, to my body, and at peace with the decision by


Should I Resign from My Full Professor Job to Work Fulltime on Cocalc?

Nearly 3 years ago, I gave a talk at a Harvard mathematics conference announcing that “I am leaving academia to build a company”. What I really did is go on unpaid leave for three years from my tenured Full Professor position. No further extensions of that leave is possible, so I finally have to decide whether or not to go back to academia or resign.
How did I get here?
Nearly two decades ago, as a recently minted Berkeley math Ph.D., I was hired as a non-tenure-track faculty member in the mathematics department at Harvard. I spent five years at Harvard, then I applied for jobs, and accepted a tenured Associate Professor position in the mathematics department at UC San Diego. The mathematics community was very supportive of my number theory research; I skipped tenure track, and landed a tier-1 tenured position by the time I was 30 years old. In 2006, I moved from UCSD to a tenured Associate Professor position at the University
Anaconda 2019-05-06 22:34:36

Updated Statement About Our Relationship with DataCamp

We apologize for our poor communications about our response to the DataCamp sexual misconduct incident. We support the victims and we understand this has been a painful and ongoing struggle for them. We also recognize…

The post Updated Statement About Our Relationship with DataCamp appeared first on Anaconda.

Paul Ivanov’s Journal 2019-05-03 07:00:00

PyCon2019 poem

I'm back in Cleveland for another Pycon. Yesterday was my first full day here. Along with Matt Seale, I was a helper at Matthias Bussonnier tutorial ("IPython and Jupyter in Depth: High productivity, interactive Python). The sticky system is efficient at signaling when someone in a classroom needs help, and a lot of folks don't know that this practice was popularized by Software Carpentry workshops and continues to be used at The Carpentries.

I stepped out for a coffee refill and bumped into a large contingent of Bloomberg folks I'd never met (Princeton office). I guess we have something like 90 people at the conference this year, and I made the usual and true remark about how I go to conferences to meet the other people who work at our company. Then after his tutorial concluded, Matthias and I bumped into Tracy Teal, exchanged some stickers, and chatted about The Carpentries, Jupyter, organizing conferences, governance and sponsorship models, and a bunch of other stuff.

Matthias was a

Quansight Labs 2019-05-03 05:00:00

Labs update and April highlights

It has been an exciting first month for me at Quansight Labs. It's a good time for a summary of what we worked on in April and what is coming next.

Progress on array computing libraries

Our first bucket of activities I'd call "innovation". The most prominent projects in this bucket are XND, uarray, metadsl, python-moa, Remote Backend Compiler and arrayviews. XND is an umbrella name for a set of related array computing libraries: xnd, ndtypes, gumath, and xndtools.

Hameer Abbasi made some major steps forward with uarray: the backend and coercion semantics are now largely worked out, there is good documentation, and the unumpy package (which currently has numpy, XND and PyTorch backends) is progressing well. This blog post gives a good overview of the motivation for uarray and its main concepts.

Saul Shanabrook and Chris Ostrouchov worked out how best to put metadsl and python-moa together: metadsl can be used to create the API for python-moa to simplify the code base of the latter a lot. Chris also wrote an

Anaconda 2019-05-02 17:58:48

Anaconda’s Response to DataCamp’s CEO and Board of Directors

DataCamp has been a business partner of our company for almost two years. So we were shocked and saddened by the recent allegations of inappropriate sexual behavior and retaliatory firings made against DataCamp’s CEO and…

The post Anaconda’s Response to DataCamp’s CEO and Board of Directors appeared first on Anaconda.

Quansight Labs 2019-05-02 05:00:00

What's New in SymPy 1.4

As of November, 2018, I have been working at Quansight, under the heading of Quansight Labs. Quansight Labs is a public-benefit division of Quansight. It provides a home for a "PyData Core Team" which consists of developers, community managers, designers, and documentation writers who build open-source technology and grow open-source communities around all aspects of the AI and Data Science workflow. As a part of this, I am able to spend a fraction of my time working on SymPy. SymPy, for those who do not know, is a symbolic mathematics library written in pure Python. I am the lead maintainer of SymPy.

SymPy 1.4 was released on April 9, 2019. In this post, I'd like to go over some of the highlights for this release. The full release notes for the release can be found on the SymPy wiki.

To update to SymPy 1.4, use

conda install sympy

or if you prefer to use pip

pip install -U sympy

The SymPy 1.4 release contains over 500 changes from 38 different submodules, so I will not be

Anaconda 2019-04-30 17:01:47

Reflections on AnacondaCON 2019 with NVIDIA’s Josh Patterson

I love this month. April 19’ brings back Game of Thrones, Avengers: Endgame (that Thanos snap though), and of course AnacondaCON. I’ve been to every AnacondaCON, which makes this my third show. Of all data…

The post Reflections on AnacondaCON 2019 with NVIDIA’s Josh Patterson appeared first on Anaconda.

Quansight Labs 2019-04-30 05:04:40

uarray: A Generic Override Framework for Methods

uarray: A Generic Override Framework for Methods

uarray is an override framework for methods in Python. In the scientific Python ecosystem, and in other similar places, there has been one recurring problem: That similar tools to do a job have existed, but don't conform to a single, well-defined API. uarray tries to solve this problem in general, but also for the scientific Python ecosystem in particular, by defining APIs independent of their implementations.

Array Libraries in the Scientific Python Ecosystem

When SciPy was created, and Numeric and Numarray unified into NumPy, it jump-started Python's data science community. The ecosystem grew quickly: Academics started moving to SciPy, and the Scikits that popped up made the transition all the more smooth.

However, the scientific Python community also shifted during that time: GPUs and distributed computing emerged. Also, there were old ideas that couldn't really be used with NumPy's API, such as sparse arrays. To solve these problems, various libraries emerged:

  • Dask, for distributed NumPy
  • CuPy, for
ListenData 2019-04-27 13:52:00

Python Lambda Function with Examples

This article covers detailed explanation of lambda function of Python. You will learn how to use it in real-world data scenarios with examples.
Table of Contents

Introduction : Lambda FunctionIn non-technical language, lambda is an alternative way of defining function. You can define function inline using lambda. It means you can apply a function to some data in one line of python code and then join the result. It is called anonymous function as the function can be defined without a name.
Syntax of Lambda Function
lambda arguments: expression
Lambda function can have more than one argument but expression cannot be more than 1. The expression is evaluated and returned. Example
addition = lambda x,y: x + y
addition(2,3) returns 5
In the above python code, x,y are the arguments and x + y is the expression that gets evaluated and returned. Difference between Lambda and Def FunctionBy using both lambda and def, you can create your own user-defined function
Anaconda 2019-04-25 20:54:52

TensorFlow CPU optimizations in Anaconda

By Stan Seibert, Anaconda, Inc. & Nathan Greeneltch, Intel Corporation TensorFlow is one of the most commonly used frameworks for large-scale machine learning, especially deep learning (we’ll call it “DL” for short). This popular framework has…

The post TensorFlow CPU optimizations in Anaconda appeared first on Anaconda.

ListenData 2019-04-20 21:01:00

Loops in Python explained with examples

This tutorial covers various ways to execute loops in python with several practical examples. After reading this tutorial, you will be familiar with the concept of loop and will be able to apply loops in real world data wrangling tasks.

Table of Contents

What is Loop?Loop is an important programming concept and exist in almost every programming language (Python, C, R, Visual Basic etc.). It is used to repeat a particular operation(s) several times until a specific condition is met. It is mainly used to automate repetitive tasks.

Real World Examples of Loop
  1. Software of the ATM machine is in a loop to process transaction after transaction until you acknowledge that you have no more to do.
  2. Software program in a mobile device allows user to unlock the mobile with 5 password attempts. After that it resets mobile device.
  3. You put your favorite song on a repeat mode. It is also a loop.
  4. You want to run a particular analysis on each column of your data
Quansight Labs 2019-04-17 05:00:00

MOA: a theory for composable and verifiable tensor computations

Python-moa (mathematics of arrays) is an approach to a high level tensor compiler that is based on the work of Lenore Mullin and her dissertation. A high level compiler is necessary because there are many optimizations that a low level compiler such as gcc will miss. It is trying to solve many of the same problems as other technologies such as the taco compiler and the xla compiler. However, it takes a much different approach than others guided by the following principles.

  1. What is the shape? Everything has a shape. scalars, vectors, arrays, operations, and functions.
  2. What are the given indicies and operations required to produce a given index in the result?

Having a compiler that is guided upon these principles allows for high level reductions that other compilers will miss and allows for optimization of algorithms as a whole. Keep in mind that MOA is NOT a compiler. It is a theory that guides compiler development. Since python-moa is based on theory we get unique properties that other compilers cannot guarantee:

Read more…

Living in an Ivory Basement 2019-04-15 22:00:00

Some questions and thoughts on journal peer review.

What's up with current peer review practice?

ListenData 2019-04-14 15:31:00

Create Dummy Data in Python

This article explains various ways to create dummy or random data in Python for practice. Like R, we can create dummy data frames using pandas and numpy packages. Most of the analysts prepare data in MS Excel. Later they import it into Python to hone their data wrangling skills in Python. This is not an efficient approach. The efficient approach is to prepare random data in Python and use it later for data manipulation.

Table of Contents

1. Enter Data Manually in Editor WindowThe first step is to load pandas package and use DataFrame function
import pandas as pd
data = pd.DataFrame({"A" : ["John","Deep","Julia","Kate","Sandy"],
"MonthSales" : [25,30,35,40,45]})
       A  MonthSales
0 John 25
1 Deep 30
Anaconda 2019-04-11 16:24:17

The Human Element in AI

The over 45 speakers at AnacondaCON 2019 delved into how machine learning, artificial intelligence, enterprise, and open source communities are accomplishing great things with data — from optimizing urban farming to identifying the elements in…

The post The Human Element in AI appeared first on Anaconda.

Living in an Ivory Basement 2019-04-10 22:00:00

Things to think about when developing shotgun metagenome classifiers

Thoughts on goals and tradeoffs in classifying shotgun metagenome data.

ListenData 2019-04-09 18:47:00


The most common issue in installing python package in a company's network is failure of verification of SSL Certificate. Sometimes company blocks some websites in their network so employees can't access these websites. Whenever they try to visit these websites, it shows "Access Denied because of company's policy". It causes connection error in reaching main python website.

Error looks like this :

Could not fetch URL connection error: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:598)

PIP SSL Certification Issue

Solution :

Run the following command. Make sure to specify package name in <package_name>
pip install --trusted-host --trusted-host <package_name> -vvv
Suppose you want to install pandas package, you should submit the following line of command
pip install --trusted-host --trusted-host pandas -vvv

The --trusted-host option mark the host as trusted, even though it does not have valid or any HTTPS
About Author:

Deepanshu founded ListenData with a simple objective - Make analytics easy to understand and follow. He has over 8 years of experience in

Anaconda 2019-04-09 16:41:40

AnacondaCON 2019 Day 3 Recap: The Need for Speed, “Delightful UX” in Dev Tools, LOTR Jokes and More.

Everyone at Anaconda is still feeling the love AnacondaCON 2019. Day 3 wrapped up last Friday with one more day of talks and sessions, highlighted by some powerhouse keynotes. Let’s get right to the good…

The post AnacondaCON 2019 Day 3 Recap: The Need for Speed, “Delightful UX” in Dev Tools, LOTR Jokes and More. appeared first on Anaconda.

ListenData 2019-04-09 15:56:00

Install Python Package

Python is one of the most popular programming language for data science and analytics. It is widely used for a variety of tasks in startups and many multi-national organizations. The beauty of this programming language is that it is open-source which means it is available for free and has very active community of developers across the world. Python developers share their solutions in the form of package or module with other python users. This tutorial explains various ways how to install python package.

Ways to Install Python Package

Method 1 : If Anaconda is already installed on your System

Anaconda is the data science platform which comes with pre-installed popular python packages and powerful IDE (Spyder) which has user-friendly interface to ease writing of python programming scripts.

If Anaconda is installed on your system (laptop), click on Anaconda Prompt as shown in the image below.

Anaconda Prompt

To install a python package or module, enter the code below in Anaconda Prompt -
pip install package-name
Living in an Ivory Basement 2019-04-08 22:00:00

News from the NIH Data Commons Pilot Phase Consortium

The NIH Data Commons Pilot Phase Consortium is dead! (Long live the NIH Data Commons!)

Living in an Ivory Basement 2019-04-07 22:00:00

Critically assessing open science - the CAOS meeting.

A summary of the CAOS open science meeting

Anaconda 2019-04-05 18:25:01

Anaconda 2019.03 Release

Windows is the most popular operating system in the world and consistently has 75% or more of the worldwide desktop market. According to the JetBrains Python Developers Survey, 49% of Python developers use Windows as…

The post Anaconda 2019.03 Release appeared first on Anaconda.

Anaconda 2019-04-05 16:12:42

AnacondaCON 2019 Day 2 Recap: AI in Medicine, Cataloging the Contents of Stars, and More!

What You Missed at AnacondaCON Day 2 We’re back with a recap of Day 2 of our annual AnacondaCON. (In case you missed it, you can read our Day 1 recap here). Things started off…

The post AnacondaCON 2019 Day 2 Recap: AI in Medicine, Cataloging the Contents of Stars, and More! appeared first on Anaconda.

Anaconda 2019-04-04 17:38:11

AnacondaCON 2019 Day 1 Recap: Big-Time Learning

AnacondaCON 2019 is off to a great start. As in past years, we programmed Day 1 with product- and package-specific tutorials for those looking to get hands-on learning with Anaconda Enterprise tools. Spots in these…

The post AnacondaCON 2019 Day 1 Recap: Big-Time Learning appeared first on Anaconda.

Anaconda 2019-03-29 02:33:32

3 Ways to Upskill in Python with DataCamp and Anaconda

DataCamp is proud to partner with Anaconda to offer eight courses on Conda and Python—in addition to the more than 70 total Python courses in DataCamp’s ever-expanding data science and analytics curriculum. Not sure where…

The post 3 Ways to Upskill in Python with DataCamp and Anaconda appeared first on Anaconda.

While My MCMC Gently Samples 2019-03-15 14:00:00

Computational Psychiatry: Combining multiple levels of analysis to understand brain disorders - PhD thesis

I noticed that as my personal website at my former university went down that my PhD thesis could not be found anywhere, so I'm posting it here.

During my PhD I explored how machine learning and computational modeling of the brain can be used to improve our understanding, and diagnostics …

Python – Meta Rabbit 2019-03-12 12:00:51

NIXML: nix + YAML for easy reproducible environments

The rise and fall of bioconda A year ago, I remember a conversation which went basically like this: Them: So, to distribute my package, what do you think I should use? Me: You should use bioconda. Them: OK, that’s interesting, but what about …? Me: No, you should use bioconda. Them: I will definitely look … Continue reading NIXML: nix + YAML for easy reproducible environments
Living in an Ivory Basement 2019-03-01 23:00:00

Sustaining open source: thinking about communities of effort

Thinking about how to sustain open source.

Living in an Ivory Basement 2019-02-28 23:00:00

My recent reading re sustaining open communities

What has Titus been reading lately?

Filipe Saraiva's blog 2019-02-24 22:26:11

Reduzindo a pilha

Sou fã de quadrinhos desde criança. As primeiras revistas que ganhei foram na primeira metade dos anos 90, alguns Mickeys, Mônicas, Trapalhões e X-Men. Em 98 comecei a comprar X-Men, Fabulosos X-Men e Wolverine, até os primeiros números da famigerada X-Men Premium. Sem dinheiro, enveredei pelos mangás e histórias fechadas. Quando a Panini começa a... [Read More]
Living in an Ivory Basement 2019-02-21 23:00:00

Threat models for open online scientific engagement?

What threats are there for scientists in engaging in open online discussions?

Martin Fitzpatrick - python 2019-02-20 15:00:00

Packaging PyQt5 apps with fbs — Distribute cross-platform GUI applications with the fman Build System

fbs is a cross-platform PyQt5 packaging system which supports building desktop applications for Windows, Mac and Linux (Ubuntu, Fedora and Arch). Built on top of PyInstaller it wraps some of the rough edges and defines a standard project structure which allows the build process to be entirely automated. The included …

Announcement: Audio TK 3.1.0

ATK is updated to 3.1.0 with heavy code refactoring. Old C++ standards are now dropped and it requires now a full C++17 compliant compiler. The main difference for filter support is that explicit SIMD filters using libsimdpp have been dropped while tr2::simd becomes standard and supported by gcc, clang and Visual Studio. Download link: ATK […]
Filipe Saraiva's blog 2019-01-29 01:48:22


A voz feminina robótica (chegamos no tempo onde questão de gênero e robôs podem se confundir) soou, estranha e familiar como sempre, assim que o carro finalizou a curva para a direita: “Você entrou na Avenida Universitária; o limite de velocidade é 60 quilômetros por hora”. Meu pai sorriu e começou a falar: – Desde... [Read More]
While My MCMC Gently Samples 2019-01-21 15:00:00

My foreword to "Bayesian Analysis with Python, 2nd Edition" by Osvaldo Martin

When Osvaldo asked me to write the foreword to his new book I felt honored, excited, and a bit scared, so naturally I accepted. What follows is my best attempt to convey what makes probabilistic programming so exciting to me. Osvaldo did a great job with the book, it is …

Filipe Saraiva's blog 2019-01-21 14:39:48

Call for Answers: Survey About Task Assignment

Professor Igor Steinmacher, from Northern Arizona University, is a proeminent researcher on several social dynamics in open source communities, like support of newcomers, gender bias, open sourcing proprietary software, and more. Some of his papers can de found in his website. Currently, Prof. Igor is inviting mentors from open source communities to answer a survey... [Read More]
Living in an Ivory Basement 2019-01-15 23:00:00

Revisiting authorship, and JOSS software publications

The question du jour: how should authorship on software papers be decided?

While My MCMC Gently Samples 2019-01-14 15:00:00

Using Bayesian Decision Making to Optimize Supply Chains

(c) 2019 Thomas Wiecki & Ravin Kumar

As advocates of Bayesian statistics in data science we often have to convince business-minded colleagues or customers of the added value of such an approach. While there are many good reasons for applying Bayesian modeling to solve business problems (Sean J Taylor recently had …

Filipe Saraiva's blog 2019-01-08 02:21:02

Mestrado em Ciência da Computação na UFPA 2019: Inteligência Computacional para Smart Grids; Metaheurísticas

Está aberto o processo seletivo para o mestrado em ciência da computação do PPGCC-UFPA. Nesse certame, estou disponibilizando 2 vagas para alunos que desenvolverão seus trabalhos junto aos demais pesquisadores no LAAI. As vagas são voltadas para os temas de inteligência computacional aplicada a Smart Grids e estudos sobre métodos metaheurísticos de otimização. Gostaria de... [Read More]
Filipe Saraiva's blog 2019-01-05 17:43:33

LaKademy 2018

Em outubro de 2018, Florianópolis foi sede da sexta edição do LaKademy, o sprint latinoamericano do KDE. Esse momento é uma oportunidade para termos em um mesmo lugar vários desenvolvedores do KDE – tanto veteranos quanto novatos – de diferentes projetos para melhorarem os respectivos softwares em que trabalham e também planejar as ações de... [Read More]
Filipe Saraiva's blog 2019-01-05 16:59:19

LaKademy 2018

Past October 2018, Florianópolis hosted the 6th edition of LaKademy, the Latin-American KDE sprint. That moment is an opportunity to put together several KDE developers – both veterans and newcomers – from different projects in order to work for improve their respective software and plan the promotional actions of the community in the subcontinent. In... [Read More]

GPU Dask Arrays, first steps

The following code creates and manipulates 2 TB of randomly generated data.

import dask.array as da

rs = da.random.RandomState()
x = rs.normal(10, 1, size=(500000, 500000), chunks=(10000, 10000))
(x + 1)[::2, ::2].sum().compute(scheduler='threads')

On a single CPU, this computation takes two hours.

On an eight-GPU single-node system this computation takes nineteen seconds.

Combine Dask Array with CuPy

Actually this computation isn’t that impressive. It’s a simple workload, for which most of the time is spent creating and destroying random data. The computation and communication patterns are simple, reflecting the simplicity commonly found in data processing workloads.

What is impressive is that we were able to create a distributed parallel GPU array quickly by composing these three existing libraries:

  1. CuPy provides a partial implementation of Numpy on the GPU.

  2. Dask Array provides chunked algorithms on top of Numpy-like libraries like Numpy and CuPy.

    This enables us to operate on more data than we could fit in memory by operating on that data in

Leonardo Uieda 2018-12-26 12:00:00

Manage project dependencies with conda environments

TL;DR: Create a conda environment for each project, capture exact versions when possible, automate activation and updating with a bash function.

I often work on several different projects involving software: Python libraries, papers, presentations, posters, this website, etc. Each project has different dependencies and there is a non-zero chance that these dependencies might be in conflict with each other. For example, I need Python 2.7 to work on a tesseroid modeling paper with a student, while my current work on


First Impressions of GPUs and PyData

I recently moved from Anaconda to NVIDIA within the RAPIDS team, which is building a PyData-friendly GPU-enabled data science stack. For my first week I explored some of the current challenges of working with GPUs in the PyData ecosystem. This post shares my first impressions and also outlines plans for near-term work.

First, lets start with the value proposition of GPUs, significant speed increases over traditional CPUs.

GPU Performance

Like many PyData developers, I’m loosely aware that GPUs are sometimes fast, but don’t deal with them often enough to have strong feeling about them.

To get a more visceral feel for the performance differences, I logged into a GPU machine, opened up CuPy (a Numpy-like GPU library developed mostly by Chainer in Japan) and cuDF (a Pandas-like library in development at NVIDIA) and did a couple of small speed comparisons:

Compare Numpy and Cupy
>>> import numpy, cupy

>>> x = numpy.random.random((10000, 10000))
>>> y = cupy.random.random((10000, 10000))

>>> %timeit bool((numpy.sin(x) ** 2 + numpy.cos(x) ** 2 == 1).all())
446 ms ± 53.1 ms per
Living in an Ivory Basement 2018-12-07 23:00:00

A quick read of _The genomic and proteomic landscape of the rumen microbiome_

Using short and long reads to assemble genomes from metagenomes!

Support Python 2 with Cython


Many popular Python packages are dropping support for Python 2 next month. This will be painful for several large institutions. Cython can provide a temporary fix by letting us compile a Python 3 codebase into something usable by Python 2 in many cases.

It’s not clear if we should do this, but it’s an interesting and little known feature of Cython.

Background: Dropping Python 2 Might be Harder than we Expect

Many major numeric Python packages are dropping support for Python 2 at the end of this year. This includes packages like Numpy, Pandas, and Scikit-Learn. Jupyter already dropped Python 2 earlier this year.

For most developers in the ecosystem this isn’t a problem. Most of our packages are Python-3 compatible and we’ve learned how to switch libraries. However, for larger companies or government organizations it’s often far harder to switch. The PyCon 2017 keynote by Lisa Guo and Hui Ding from Instagram gives a good look into why this can be challenging for large production codebases and also gives a good


Anatomy of an OSS Institutional Visit

I recently visited the UK Meteorology Office, a moderately large organization that serves the weather and climate forecasting needs of the UK (and several other nations). I was there with other open source colleagues including Joe Hamman and Ryan May from open source projects like Dask, Xarray, JupyterHub, MetPy, Cartopy, and the broader Pangeo community.

This visit was like many other visits I’ve had over the years that are centered around showing open source tooling to large institutions, so I thought I’d write about it in hopes that it helps other people in this situation in the future.

My goals for these visits are the following:

  1. Teach the institution about software projects and approaches that may help them to have a more positive impact on the world
  2. Engage them in those software projects and hopefully spread around the maintenance and feature development burden a bit
Step 1: Meet allies on the ground

We were invited by early adopters within the institution, both within the UK Met Office’s Informatics Lab

(continued...) 2018-11-16 23:00:00

Notes on the Frank-Wolfe Algorithm, Part II: A Primal-dual Analysis

This blog post extends the convergence theory from the first part of my notes on the Frank-Wolfe (FW) algorithm with convergence guarantees on the primal-dual gap which generalize and strengthen the convergence guarantees obtained in the first part.

MathJax.Hub.Config({ extensions: ["tex2jax.js"], jax: ["input/TeX", "output/HTML-CSS"], tex2jax …
Living in an Ivory Basement 2018-11-11 23:00:00

Creating a welcoming teaching/learning environment in workshops

It takes constant work to make a welcoming teaching/learning environment!

Living in an Ivory Basement 2018-11-08 23:00:00

Repeatability in Practice (2018 version)

How we do repeatability in the DIB Lab

Stéfan van der Walt - python 2018-10-31 07:00:00

Linking to emails in org-mode (using neomutt)

Where we store links to emails in org-mode, and open them using neomutt.

Filipe Saraiva's blog 2018-10-29 15:40:06

Ode ao ódio

Ontem, acompanhando a apuração para presidente no 2º turno, chorei. Chorei de raiva. Chorei de ódio. Ódio porque aquele que levou o pleito representa uma total afronta ao mínimo do que chamamos civilidade. Ele defendeu a ditadura e a tortura, reiteradamente. Prometeu prender ou exilar opositores. Prometeu perseguir professores, artistas, a intelectualidade. Disse que irá... [Read More]
Filipe Saraiva's blog 2018-10-28 14:08:15

Eleições 2018: Minha carta para a família

Família, essa é minha última manifestação política aqui no grupo antes do resultado. Vocês me conhecem, sou professor de ciência da computação na UFPA, sou um dos responsáveis pela formação dos próximos engenheiros de software e matemáticos computacionais da nossa região. Oriento alunos na graduação, no mestrado e também no doutorado, mesmo com todas as... [Read More]
Filipe Saraiva's blog 2018-10-17 16:19:00

A arquitetura de compartilhamentos do Telegram para mitigar as fake news no WhatsApp

Fake News já se tornaram o tipo de problema que teremos que enfrentar de alguma maneira o quanto antes, ou veremos democracias sendo destruídas uma a uma. Se o caso Trump nos chamava atenção mas ainda parecia distante, as eleições brasileiras de 2018 vieram pra mostrar que o tiozão gente boa pode se converter no... [Read More]

So you want to contribute to open source

Welcome new open source contributor!

I appreciated receiving the e-mail where you said you were excited about getting into open source and were particularly interested in working on a project that I maintain. This post has a few thoughts on the topic.

First, please forgive me for sending you to this post rather than responding with a personal e-mail. Your situation is common today, so I thought I’d write up thoughts in a public place, rather than respond personally.

This post has two parts:

  1. Some pragmatic steps on how to get started
  2. A personal recommendation to think twice about where you focus your time
Look for good first issues on Github

Most open source software (OSS) projects have a “Good first issue” label on their Github issue tracker. Here is a screenshot of how to find the “good first issue” label on the Pandas project:

(note that this may be named something else like “Easy to fix”)

This contains a list of issues that are important, but also

Martin Fitzpatrick - python 2018-09-30 06:00:00

Dictionary Views & Set Operations — Working with dictionary view objects

The keys, values and items from a dictionary can be accessed using the .keys(), .values() and .items() methods. These methods return view objects which provide a view on the source dictionary.

The view objects dict_keys and dict_items support set-like operations (the latter only when all values are hashable) which …

Filipe Saraiva's blog 2018-09-28 04:53:50

Ciro em frente!

Faltando poucos dias para o 1º turno das eleições, aproveito o momento para declarar meu voto em Ciro Gomes e convido amigos e amigas a ponderarem e também votarem no candidato. Em um conceito bastante generoso de partidos políticos, tratam-se de organizações estruturadas em torno de uma ideia de ordenamento social e que tentam, através... [Read More]

Announcement: Audio TK 3.0.0

ATK is updated to 3.0.0 with a major ABI break and code quality improvement (see here). Bugs in different areas were fixed. Development for additional modules was also simplified (the modelling lite is such a project based on Audio Toolkit). Download link: ATK 3.0.0 Changelog: 3.0.0 * Change size for gsl::index everywhere (change of ABI) […]
Spyder Blog 2018-09-21 00:00:00

QtConsole 4.4 Released!

We're excited to announce a significant update to QtConsole—the package that powers Spyder's IPython Console interface—which the Spyder team maintains in collaboration with Project Jupyter. Two of the biggest changes—user-selectable syntax highlighting themes, and enhanced external editor/IDE integration—are already built right into Spyder, so they'll likely be of more interest if you use QtConsole standalone or with another editor/IDE. However, most of the other changes should prove quite useful within Spyder as well, and many were in fact suggested and even implemented by users of our IDE. Particular highlights include a block indent/unindent feature, Select-All (Ctrl-Shift-A) being made cell-specific, Ctrl-Backspace and Ctrl-Delete behaving more intelligently across whitespace and line boundaries, Ctrl-D allowing you to easily exit ipdb, input() and the like, and numerous smaller enhancements and bug fixes. If you'd like to learn more about what's new, please check out our article over on the Jupyter blog, where we go over the major changes in more detail, with plenty


Dask Development Log

This work is supported by Anaconda Inc

To increase transparency I’m trying to blog more often about the current work going on around Dask and related projects. Nothing here is ready for production. This blogpost is written in haste, so refined polish should not be expected.

Since the last update in the 0.19.0 release blogpost two weeks ago we’ve seen activity in the following areas:

  1. Update Dask examples to use JupyterLab on Binder
  2. Render Dask examples into static HTML pages for easier viewing
  3. Consolidate and unify disparate documentation
  4. Retire the hdfs3 library in favor of the solution in Apache Arrow.
  5. Continue work on hyper-parameter selection for incrementally trained models
  6. Publish two small bugfix releases
  7. Blogpost from the Pangeo community about combining Binder with Dask
  8. Skein/Yarn Update
1: Update Dask Examples to use JupyterLab extension

The new dask-labextension embeds Dask’s dashboard plots into a JupyterLab session so that you can get easy access to information

Gaël Varoquaux - programming 2018-09-16 22:00:00

A foundation for scikit-learn at Inria

We have just announced that a foundation will be supporting scikit-learn at Inria [1]:

Growth and sustainability

This is an exciting turn for us, because it enables us to receive private funding. As a result, we will be able to have secure employment for some existing core …

Leonardo Uieda 2018-09-14 12:00:00

Introducing Verde

Verde is a Python library for processing spatial data (bathymetry, geophysics surveys, etc) and interpolating it on regular grids (i.e., gridding).

It implements Green's functions based interpolation methods and other data processing routines. The type of gridding implemented in Verde is essentially fitting various linear models to spatial data and using them to predict new data on regular grids, which is what a lot of machine learning is all about. So Verde's gridder API is inspired on scikit-learn, the state-of-the-art for machine learning in Python. The Green's functions that make up the Jacobian matrix (aka sensitivity or feature matrix) of the linear models generally come from elastic deformation theory. For example, the bi-harmonic spline (Sandwell, 1987) implemented in verde.Spline comes from the deformation of a thin elastic plate.

I submitted a

Pythonic Perambulations 2018-09-13 17:00:00

The Waiting Time Paradox, or, Why Is My Bus Always Late?

Image Source: Wikipedia License CC-BY-SA 3.0

If you, like me, frequently commute via public transit, you may be familiar with the following situation:

You arrive at the bus stop, ready to catch your bus: a line that advertises arrivals every 10 minutes. You glance at your watch and note the time... and when the bus finally comes 11 minutes later, you wonder why you always seem to be so unlucky.

Naïvely, you might expect that if buses are coming every 10 minutes and you arrive at a random time, your average wait would be something like 5 minutes. In reality, though, buses do not arrive exactly on schedule, and so you might wait longer. It turns out that under some reasonable assumptions, you can reach a startling conclusion:

When waiting for a bus that comes on average every 10 minutes, your average waiting time will be 10 minutes.

This is what is sometimes known as the waiting time paradox.

I've encountered this idea before, and always wondered

(continued...) 2018-09-05 22:00:00

Three Operator Splitting

I discuss a recently proposed optimization algorithm: the Davis-Yin three operator splitting.

Dask Release 0.19.0

This work is supported by Anaconda Inc.

I’m pleased to announce the release of Dask version 0.19.0. This is a major release with bug fixes and new features. The last release was 0.18.2 on July 23rd. This blogpost outlines notable changes since the last release blogpost for 0.18.0 on June 14th.

You can conda install Dask:

conda install dask

or pip install from PyPI:

pip install dask[complete] --upgrade

Full changelogs are available here:

Notable Changes

A ton of work has happened over the past two months, but most of the changes are small and diffuse. Stability, feature parity with upstream libraries (like Numpy and Pandas), and performance have all significantly improved, but in ways that are difficult to condense into blogpost form.

That being said, here are a few of the more exciting changes in the new release.

Python Versions

We’ve dropped official support for Python 3.4 and added official support for Python 3.7.

Deploy on Hadoop Clusters

Over the past few months Jim Crist has bulit a suite of


Book: Building Machine Learning Systems with Python – third edition

A few year ago, Packt Publishing contacted to be a technical reviewer for the first edition of Building Machine Learning Systems with Python, and I was impressed by the writing of Luis Pedro Coelho and Willi Richert. For the second edition, I was again a technical reviewer. Writing is not easy, especially when it’s not […]
Living in an Ivory Basement 2018-08-28 22:00:00

Abstract for SIAM: Supporting and Sustaining Open Source Software Development: the Commons Perspective

How do we support and sustain open source software development?

Analog modelling: The Moog ladder filter emulation in Python

After my previous post on SPICE modelling in Python, I need to use a good support example to go up to on the fly compilation in C++. This schema will also require some changes to support more than simple nodal analysis, so this now becomes Modified Nodal Analysis with state equations. The simple model I […]
Martin Fitzpatrick - python 2018-08-26 12:00:00

Dictionaries — An rather long guide to Python's key:value store

Dictionaries are key-value stores, meaning they store, and allow retrieval of data (or values) through a unique key. This is analogous with a real dictionary where you look up definitions (data) using a given key — the word. Unlike a language dictionary however, keys in Python dictionaries are not alphabetically sorted …

Living in an Ivory Basement 2018-08-17 22:00:00

Can bits be the basis for a digital commons? (No.)

Bits cannot be the basis for a digital commons, because they are not rivalrous.

Spyder Blog 2018-08-14 00:00:00

Spyder 3.3.0 and 3.3.1 released!

We're pleased to release the next significant update in the stable Spyder 3 line, 3.3.0, along with its follow-on bugfix point release, 3.3.1, which is now live on PyPI and conda. As always, you can update with conda update spyder in the Anaconda Prompt/Terminal/command line (on Windows/macOS/Linux, respectively) if on Anaconda (recommended), or pip update spyder otherwise. If you run into any trouble, please carefully read our new installation documentation and consult our Troubleshooting Guide, which contains straightforward solutions to the vast majority of install-related issues users have reported.

As a new minor version (3.3), it makes several substantial changes to Spyder's underpinnings that deserve some explanation, particularly the newly modular and portable console system that's now separated into its own spyder-kernels package, opening up several new options for users running Spyder in different environments. There's also a brand-new error reporting process, new options in the IPython console, usability and performance improvements for the Variable Explorer, multiple new and changed dependency requirements

While My MCMC Gently Samples 2018-08-13 14:00:00

Hierarchical Bayesian Neural Networks with Informative Priors

(c) 2018 by Thomas Wiecki

Imagine you have a machine learning (ML) problem but only small data (gasp, yes, this does exist). This often happens when your data set is nested -- you might have many data points, but only few per category. For example, in ad-tech you may want predict …

Spyder Blog 2018-08-13 00:00:00

Spyder featured on Episode 1 of Open Source Directions web show

Quansight, the company recently founded by NumPy, SciPy and Anaconda creator Travis Oliphant to help connect companies with open source communities built around data science and machine learning, just released Episode 1 of its live webcast series, and it was all about Spyder! Spyder maintainer Carlos Córdoba, recently hired by Quansight and funded part-time to work on Spyder development as we announced a few weeks ago, was the featured guest on the show.

Carlos first shared his perspective on some of the key moments in Spyder's nearly 10-year development history, from its original creation by Pierre Raybaut and Carlos' initial involvement in the project to its more recent challenges and successes. He also demonstrated basic usage of Spyder, as well as some of its standout features, in a live on-screen demo. Carlos then went on to outline the current roadmap for Spyder 4 in the near future, and explained some of the key new features planned for it. Finally, he took