Planet SciPy

Quansight Labs 2019-11-14 20:00:00

A new grant for NumPy and OpenBLAS!

I'm very pleased to announce that NumPy and OpenBLAS just received a $195,000 grant from the Chan Zuckerberg Initiative, through its Essential Open Source Software for Science (EOSS) program! This is good news for both projects, and I'm particularly excited about the types of activities we'll be undertaking, what this will mean in terms of growing the community, and to be part of the first round of funded projects of this visionary program.

The program

The press release gives a high level overview of the program, and the grantee website lists the 32 successful applications. Other projects that got funded include SciPy and Matplotlib (it's the very first significant funding for both projects!), Pandas, Zarr, scikit-image, JupyterHub, and Bioconda - we're in good company!

Nicholas Sofroniew and Dario Taborelli, two of the people driving the EOSS program, wrote a blog post that's well worth reading about the motivations for starting this program and the 42 projects that applied and got funded: The Invisible Foundations of Biomedicine.

Read more… (5 min remaining to

Anaconda 2019-11-14 01:46:02

Essential Open-Source Library pandas Awarded CZI Grant to Further Development

We’re pleased to announce that pandas, the open-source library providing high-performance data structures for tabular data analysis, has received grant funding from the Chan Zuckerberg Initiative (CZI) as part of their Essential Open Source Software…

The post Essential Open-Source Library pandas Awarded CZI Grant to Further Development appeared first on Anaconda.

Quansight Labs 2019-11-12 17:00:00

File management improvements in Spyder4

Version 4.0 of Spyder—a powerful Python IDE designed for scientists, engineers and data analysts—is almost ready! It has been in the making for well over two years, and it contains lots of interesting new features. We will focus on the Files pane in this post, where we've made several improvements to the interface and file management tools.

Simplified interface

In order to simplify the Files pane's interface, the columns corresponding to size and kind are hidden by default. To change which columns are shown, use the top-right pane menu or right-click the header directly.

Read more… (7 min remaining to read)

Spyder Blog 2019-11-12 00:00:00

File management improvements in Spyder 4

This blogpost was originally published on the Quansight Labs website.

Version 4.0 of Spyder is almost ready! It has been in the making for well over two years, and it contains lots of interesting new features. We will focus on the Files pane in this post, where we've made several improvements to the interface and file management tools.

Simplified interface

In order to simplify the Files pane's interface, the columns corresponding to size and kind are hidden by default. To change which columns are shown, use the top-right pane menu or right-click the header directly.

Custom file associations

First, we added the ability to associate different external applications with specific file extensions they can open. Under the File associations tab of the Files preferences pane, you can add file types and set the external program used to open each of them by default.

Once you've set this up, files will automatically launch in the associated application when opened from the Files pane in Spyder.

Quansight Labs 2019-11-10 05:28:00

uarray: Attempting to move the ecosystem forward

There comes a time in every project where most technological hurdles have been surpassed, and its adoption is a social problem. I believe uarray and unumpy had reached such a state, a month ago.

I then proceeded, along with Ralf Gommers and Peter Bell to write NumPy Enhancement Proposal 31 or NEP-31. This generated a lot of excellent feedback on the structure and the nuances of the proposal, which you can read both on the pull request and on the mailing list discussion, which led to a lot of restructuring in the contents and the structure of the NEP, but very little in the actual proposal. I take full responsibility for this: I have a bad tendency to assume everyone knows what I'm thinking. Thankfully, I'm not alone in this: It's a known psychological phenomenon.

Read more… (2 min remaining to read)

Anaconda 2019-11-04 15:16:11

The Austin American-Statesman Names Anaconda a Winner of the Austin Top Workplaces 2019 Award

Anaconda is thrilled to have been awarded a Top Workplaces 2019 honor by The Austin American-Statesman. The list of winners is based solely on employee feedback gathered through a third-party survey administered by research partner Energage,…

The post The Austin American-Statesman Names Anaconda a Winner of the Austin Top Workplaces 2019 Award appeared first on Anaconda.

ListenData 2019-10-28 15:48:00

Loan Amortisation Schedule using R and Python

In this post, we will explain how you can calculate your monthly loan instalments the way bank calculates using R and Python. In financial world, analysts generally use MS Excel software for calculating principal and interest portion of instalment using PPMT, IPMT functions. As data science is growing and trending these days, it is important to know how you can do the same using popular data science programming languages such as R and Python.

When you take a loan from bank at x% annual interest rate for N number of years. Bank calculates monthly (or quarterly) instalments based on the following factors :

  • Loan Amount
  • Annual Interest Rate
  • Number of payments per year
  • Number of years for loan to be repaid in instalments
Loan Amortisation ScheduleIt refers to table of periodic loan payments explaining the breakup of principal and interest in each instalment/EMI until the loan is repaid at the end of its stipulated term. Monthly instalments are generally same every month
I Love Symposia! 2019-10-24 13:59:54

Introducing napari: a fast n-dimensional image viewer in Python

I'm really excited to finally, officially, share a new(ish) project called napari with the world. We have been developing napari in the open from the very first commit, but we didn't want to make any premature fanfare about it… Until now. It's still alpha software, but for months now, both the core napari team and a few collaborators/early adopters have been using napari in our daily work. I've found it life-changing.

The background

I've been looking for a great nD volume viewer in Python for the better part of a decade. In 2009, I joined Mitya Chklovskii's lab and the FlyEM team at the Janelia [Farm] Research Campus to work on the segmentation of 3D electron microscopy (EM) volumes. I started out in Matlab, but moved to Python pretty quickly and it was a very smooth transition (highly recommended! ;). Looking at my data was always annoying though. I was either looking at single 2D slices using matplotlib.pyplot.imshow, or saving the volumes in VTK format and loading them into ITK-SNAP — which worked ok

Anaconda 2019-10-23 18:50:22

Introducing Remote Content Caching with FSSpec

Fsspec is a library which acts as a common pythonic interface to many file system-like storage backends, such as remote (e.g., SSH, HDFS) and cloud (e.g., GCS, S3) services. In this article, we will present…

The post Introducing Remote Content Caching with FSSpec appeared first on Anaconda.

Anaconda 2019-10-16 13:30:16

What Can AI Teach Us about Bias and Fairness?

By: Peter Wang & Natalie Parra-Novosad As researchers, journalists, and many others have discovered, machine learning algorithms can deliver biased results. One notorious example is ProPublica’s discovery of bias in a software called COMPAS used…

The post What Can AI Teach Us about Bias and Fairness? appeared first on Anaconda.

Anaconda 2019-10-14 20:56:20

Announcing Anaconda Distribution 2019.10

We are pleased to announce the release of Anaconda Distribution 2019.10! As there were some significant changes in the previous Anaconda Distribution 2019.07 installers, this release focuses on polishing up rough edges in that release…

The post Announcing Anaconda Distribution 2019.10 appeared first on Anaconda.

Filipe Saraiva's blog 2019-10-11 16:29:56

O SERPRO e a validação de documentos digitais

No rascunho do post anterior sobre os documentos digitais no Brasil acabei escrevendo bastante sobre o papel do SERPRO nesse processo – tanto que decidi separá-lo em um post próprio. Com o lançamento do e-Título foi necessário para o TSE criar uma maneira de validar o documento digital para evitar fraudes. A tecnologia adotada foi… Continue a ler »O SERPRO e a validação de documentos digitais
Anaconda 2019-10-10 20:30:15

How to Restore Anaconda after Update to MacOS Catalina

MacOS Catalina was released on October 7, 2019, and has been causing quite a stir for Anaconda users.  Apple has decided that Anaconda’s default install location in the root folder is not allowed. It moves…

The post How to Restore Anaconda after Update to MacOS Catalina appeared first on Anaconda.

Anaconda 2019-10-09 15:54:53

Anaconda Enters a New Chapter

Today I am excited to announce that I am stepping into the role of CEO at Anaconda. Although I am a founder of the company and have previously served as president, this marks the first…

The post Anaconda Enters a New Chapter appeared first on Anaconda.

Quansight Labs 2019-10-07 05:00:00

Quansight Labs Work Update for September, 2019

As of November, 2018, I have been working at Quansight. Quansight is a new startup founded by the same people who started Anaconda, which aims to connect companies and open source communities, and offers consulting, training, support and mentoring services. I work under the heading of Quansight Labs. Quansight Labs is a public-benefit division of Quansight. It provides a home for a "PyData Core Team" which consists of developers, community managers, designers, and documentation writers who build open-source technology and grow open-source communities around all aspects of the AI and Data Science workflow.

My work at Quansight is split between doing open source consulting for various companies, and working on SymPy. SymPy, for those who do not know, is a symbolic mathematics library written in pure Python. I am the lead maintainer of SymPy.

In this post, I will detail some of the open source work that I have done recently, both as part of my open source consulting, and as part of my work on SymPy for Quansight Labs.

Bounds Checking in Numba

As part

Filipe Saraiva's blog 2019-10-07 00:29:16

Os documentos digitais (no plural) do Brasil

Já faz algum tempo o Brasil está passando por um processo de digitalização dos documentos oficiais utilizados por pessoas físicas. Entretanto, o que antes se anunciava como uma possível convergência dos mais diferentes documentos para um documento único, que serviria para tudo, passou a acontecer a digitalização de cada documento específico através do desenvolvimento de… Continue a ler »Os documentos digitais (no plural) do Brasil 2019-09-26 22:00:00

How to Evaluate the Logistic Loss and not NaN trying

A naive implementation of the logistic regression loss can results in numerical indeterminacy even for moderate values. This post takes a closer look into the source of these instabilities and discusses more robust Python implementations.

hljs.initHighlightingOnLoad(); MathJax.Hub.Config({ extensions: ["tex2jax.js"], jax: ["input/TeX", "output/HTML-CSS"], tex2jax: { inlineMath …
Anaconda 2019-09-23 13:30:28

Anaconda Enterprise Receives Honors in Fourth Annual Datanami Readers’ and Editors’ Choice Awards

SAN DIEGO, Sept. 23, 2019 — Anaconda’s enterprise data science platform has been recognized in the fourth annual Datanami Readers’ and Editors’ Choice Awards, presented during the Strata Data Conference.  The list of winners was…

The post Anaconda Enterprise Receives Honors in Fourth Annual Datanami Readers’ and Editors’ Choice Awards appeared first on Anaconda.

Paul Ivanov’s Journal 2019-09-17 07:00:00

Uvas Gold 200

My poem about a rainy 200k was published in the Fall 2019 issue of American Randonneur (a quarterly magazine published by Randonneurs USA)

I've been doing samizdat poetry for as long as I've had a web presence (since 1999), but I am now officially a published poet! (I am deliberately not counting the embarrasing hackjob that was published in a youth anthology when I was in 8th grade.)

You can find "Uvas Gold 200" on page 26 - either directly on this skeuomorphic leafing viewer or the PDF, but I'm republishing both the exposition blurb and the poem below. If you prefer to listen, I recorded a reading of it that you can download in different flavors: a local audio only, a local video, or the embeded video version below.

Uvas Gold 200k starts and ends in Fremont, CA and was held on Saturday, December 1st, 2018. The ride frontloads the climbing by going nearly half-way up Mount

Filipe Saraiva's blog 2019-09-16 13:24:54

Grupo de Estudos do Laboratório Amazônico de Estudos Sociotécnicos – UFPA

Eu e o prof. Leonardo Cruz da Faculdade de Ciências Sociais estamos juntos trabalhando no desenvolvimento do Laboratório Amazônico de Estudos Sociotécnicos da UFPA. Nossa proposta é realizar leituras e debates críticos sobre o tema da sociologia da tecnologia, produzir pesquisas teóricas e empíricas na região amazônica sobre as relações entre tecnologia e sociedade, e… Continue a ler »Grupo de Estudos do Laboratório Amazônico de Estudos Sociotécnicos – UFPA
Quansight Labs 2019-09-15 05:32:00

Ruby wrappers for the XND project

Table of Contents


Lack of stable and reliable scientific computing software has been a persistent problem for the Ruby community, making it hard for enthusiastic Ruby developers to use Ruby in everything from their web applications to their data analysis projects. One of the most important components of any successful scientific software stack is a well maintained and flexible array computation library that can act as a fast and simple way of storing in-memory data and interfacing it with various fast and battle-tested libraries like LAPACK and BLAS.

Various projects have attempted to make such libraries in the past (and some are still thriving and maintained). Some of the notable ones are numo, nmatrix, and more recently, numruby. These projects attempt

Anaconda 2019-09-05 16:30:56

Machine Learning in Healthcare: 5 Use Cases that Improve Patient Outcomes

Machine learning is accelerating the pace of scientific discovery across fields, and medicine is no exception. From language processing tools that accelerate research to predictive algorithms that alert medical staff of an impending heart attack,…

The post Machine Learning in Healthcare: 5 Use Cases that Improve Patient Outcomes appeared first on Anaconda.

Anaconda 2019-08-29 20:33:56

Canaries Can Tweet: Preview New Features with Conda Canary

Conda-canary is the pre-defaults-release channel for conda — it has the most recent version of conda. On occasion it will also have the latest pre-defaults-release of conda-build and other conda dependencies such as ruamel.yaml. Normally,…

The post Canaries Can Tweet: Preview New Features with Conda Canary appeared first on Anaconda.

Quansight Labs 2019-08-27 05:00:00

Quansight Labs Dask Update

This post provides an update on some recent Dask-related activities the Quansight Labs team has been working on.

Dask community work order

Through a community work order (CWO) with the D. E. Shaw group, the Quansight Labs team has been able to dedicate developer time towards bug fixes and feature requests for Dask. This work has touched on several portions of the Dask codebase, but generally have centered around using Dask Arrays with the distributed scheduler.

Read more… (2 min remaining to read)

Anaconda 2019-08-23 14:00:19

How to Build a Custom Anaconda Installer for R

A frequent question on the Anaconda Community mailing list is how to package R with conda for distribution. Depending on the use case, one option may be to use conda to move environments. This requires…

The post How to Build a Custom Anaconda Installer for R appeared first on Anaconda.

Filipe Saraiva's blog 2019-08-22 14:58:40

pelas ruas de Belém

as vezes pelas ruas de Belém não sei se sou eu ou minha mãe quem está ali
Anaconda 2019-08-20 11:00:37

Enterprises Need to Think Differently about Data Science. Here’s How.

Companies that are data science literate make and communicate decisions on the basis of real data models, and not merely instinct or tradition. They welcome new data science technologies as opportunities for potential innovation, rather…

The post Enterprises Need to Think Differently about Data Science. Here’s How. appeared first on Anaconda.

Quansight Labs 2019-08-16 19:19:13

Spyder 4.0 beta4: Kite integration is here

Kite is sponsoring the work discussed in this blog post, and in addition supports Spyder 4.0 development through a Quansight Labs Community Work Order.

As part of our next release, we are proud to announce an additional completion client for Spyder, Kite. Kite is a novel completion client that uses Machine Learning techniques to find and predict the best autocompletion for a given text. Additionally, it collects improved documentation for compiled packages, i.e., Matplotlib, NumPy, SciPy that cannot be obtained easily by using traditional code analysis packages such as Jedi.

Read more… (3 min remaining to read)

Spyder Blog 2019-08-16 00:00:00

Spyder 4.0: Kite integration is here

This blogpost was originally published on the Quansight Labs website.

Note: Kite is sponsoring the work discussed in this blog post, and in addition supports Spyder 4.0 development through a Quansight Labs Community Work Order.

As part of our next release, we are proud to announce an additional completion client for Spyder, Kite. Kite is a novel completion client that uses Machine Learning techniques to find and predict the best autocompletion for a given text. Additionally, it collects improved documentation for compiled packages, e.g. Matplotlib, NumPy and SciPy, that cannot be obtained easily by using traditional code analysis packages such as Jedi. Although Kite is not open source like Spyder, you can download it without charge at the Kite website.

By incorporating Kite into Spyder, we will improve and provide the ultimate autocompletion and signature retrieval experience for most of the scientific Python stack and beyond. For instance, let’s take a look at the following PyTorch completion. While

ListenData 2019-08-10 21:54:00

Object Oriented Programming in Python : Learn by Examples

This tutorial outlines object oriented programming (OOP) in Python with examples. It is a step by step guide which was designed for people who have no programming experience. Object Oriented Programming is popular and available in other programming languages besides Python which are Java, C++, PHP.
Table of Contents

What is Object Oriented Programming?In object-oriented programming (OOP), you have the flexibility to represent real-world objects like car, animal, person, ATM etc. in your code. In simple words, an object is something that possess some characteristics and can perform certain functions. For example, car is an object and can perform functions like start, stop, drive and brake. These are the function of a car. And the characteristics are color of car, mileage, maximum speed, model year etc.

In the above example, car is an object. Functions are called methods in OOP world. Characteristics are attributes (properties). Technically attributes are variables or values related to the state of the object whereas methods

ListenData 2019-07-29 20:20:00

Precision Recall Curve Simplified

This article outlines precision recall curve and how it is used in real-world data science application. It includes explanation of how it is different from ROC curve. It also highlights limitation of ROC curve and how it can be solved via area under precision-recall curve. This article also covers implementation of area under precision recall curve in Python, R and SAS.
Table of Contents

What is Precision Recall Curve?Before getting into technical details, we first need to understand precision and recall terms in layman's term. It is essential to understand the concepts in simple words so that you can recall it for future work when it is required. Both Precision and Recall are important metrics to check the performance of binary classification model. PrecisionPrecision is also called Positive Predictive Value. Suppose you are building a customer attrition model which has objective to identify customers who are likely to close relationship with the company. The use of this model is to
Living in an Ivory Basement 2019-07-22 22:00:00

Comparing two genome binnings quickly with sourmash

Comparing two sets of MAGs, for fun and profit!

ListenData 2019-07-22 09:20:00

Calculate KS Statistic with Python

Kolmogorov-Smirnov (KS) Statistics is one of the most important metrics used for validating predictive models. It is widely used in BFSI domain. If you are a part of risk or marketing analytics team working on project in banking, you must have heard of this metrics. What is KS Statistics?It stands for Kolmogorov–Smirnov which is named after Andrey Kolmogorov and Nikolai Smirnov. It compares the two cumulative distributions and returns the maximum difference between them. It is a non-parametric test which means you don't need to test any assumption related to the distribution of data. In KS Test, Null hypothesis states null both cumulative distributions are similar. Rejecting the null hypothesis means cumulative distributions are different.

In data science, it compares the cumulative distribution of events and non-events and KS is where there is a maximum difference between the two distributions. In simple words, it helps us to understand how well our predictive model is able to discriminate between events and

ListenData 2019-07-20 16:22:00

A Complete Guide to Python DateTime Functions

In this tutorial, we will cover python datetime module and how it is used to handle date, time and datetime formatted columns (variables). It includes various practical examples which would help you to gain confidence in dealing dates and times with python functions. In general, Date types columns are not easy to manipulate as it comes with a lot of challenges like dealing with leap years, different number of days in a month, different date and time formats or if date values are stored in string (character) format etc.
Table of Contents

Introduction : datetime moduleIt is a python module which provides several functions for dealing with dates and time. It has four classes as follows which are explained in the latter part of this article how these classes work.
  1. datetime
  2. date
  3. time
  4. timedelta

People who have no experience of working with real-world datasets might have not encountered date columns. They might be under impression that working with dates is rarely used and not so

ListenData 2019-07-17 17:32:00

What are *args and **kwargs and How to use them

This article explains the concepts of *args and **kwargs and how and when we use them in python program. Seasoned python developers embrace the flexibility it provides when creating functions. If you are beginner in python, you might not have heard it before. After completion of this tutorial, you will have confidence to use them in your live project.
Table of Contents

Introduction : *argsargs is a short form of arguments. With the use of *args python takes any number of arguments in user-defined function and converts user inputs to a tuple named args. In other words, *args means zero or more arguments which are stored in a tuple named args.

When you define function without *args, it has a fixed number of inputs which means it cannot accept more (or less) arguments than you defined in the function.

In the example code below, we are creating a very basic function which adds two numbers. At the same time, we created a

Quansight Labs 2019-07-15 05:00:00

Quansight presence at SciPy'19

Yesterday the SciPy'19 conference ended. It was a lot of fun, and very productive. You can really feel that there's a lot of energy in the community, and that it's growing and maturing. This post is just a quick update to summarize Quansight's presence and contributions, as well as some of the more interesting things I noticed.

A few highlights

The "Open Source Communities" track, which had a strong emphasis on topics like burnout, diversity and sustainability, as well as the keynotes by Stuart Geiger ("The Invisible Work of Maintaining and Sustaining Open-Source Software") and Carol Willing ("Jupyter: Always Open for Learning and Discovery") showed that many more people and projects are paying more attention to and evolving their thinking on the human and organizational aspects of open source.

I did not go to many technical talks, but did make sure to catch Matt Rocklin's talk "Refactoring the SciPy Ecosystem for Heterogeneous Computing". Matt clearly explained some key issues and opportunities around

ListenData 2019-07-12 21:42:00

Python : 10 Ways to Filter Pandas DataFrame

In this article, we will cover various methods to filter pandas dataframe in Python. Data Filtering is one of the most frequent data manipulation operation. It is similar to WHERE clause in SQL or you must have used filter in MS Excel for selecting specific rows based on some conditions. In terms of speed, python has an efficient way to perform filtering and aggregation. It has an excellent package called pandas for data wrangling tasks. Pandas has been built on top of numpy package which was written in C language which is a low level language. Hence data manipulation using pandas package is fast and smart way to handle big sized datasets.
Examples of Data Filtering
It is one of the most initial step of data preparation for predictive modeling or any reporting project. It is also called 'Subsetting Data'. See some of the examples of data filtering below.
  • Select all the active customers whose accounts were opened
Quansight Labs 2019-07-09 03:30:00

Ibis: Python data analysis productivity framework

Ibis is a library pretty useful on data analysis tasks that provides a pandas-like API that allows operations like create filter, add columns, apply math operations etc in a lazy mode so all the operations are just registered in memory but not executed and when you want to get the result of the expression you created, Ibis compiles that and makes a request to the remote server (remote storage and execution systems like Hadoop components or SQL databases). Its goal is to simplify analytical workflows and make you more productive.

Ibis was created by Wes McKinney and is mainly maintained by Phillip Cloud and Krisztián Szűcs. Also, recently, I was invited to become a maintainer of the Ibis repository!

Maybe you are thinking: "why should I use Ibis?". Well, if you have any of the following issues, probably you should consider using Ibis in your analytical workflow!

  • if you need to get data from a SQL database but you don't
ListenData 2019-07-04 19:51:00

Python Dictionary Comprehension with Examples

In this tutorial, we will cover how dictionary comprehension works in Python. It includes various examples which would help you to learn the concept of dictionary comprehension and how it is used in real-world scenarios.
What is Dictionary?
Dictionary is a data structure in python which is used to store data such that values are connected to their related key. Roughly it works very similar to SQL tables or data stored in statistical softwares. It has two main components -
  1. Keys : Think about columns in tables. It must be unique (like column names cannot be duplicate)
  2. Values : It is similar to rows in tables. It can be duplicate.
It is defined in curly braces { }. Each key is followed by a colon (:) and then values.
Syntax of Dictionary

d = {'a': [1,2], 'b': [3,4], 'c': [5,6]}
To extract keys, values and structure of dictionary, you can submit the following commands.

d.keys() # 'a', 'b', 'c'
d.values() # [1, 2], [3, 4], [5,

HTML outputs in Jupyter


User interaction in data science projects can be improved by adding a small amount of visual deisgn.

To motivate effort around visual design we show several simple-yet-useful examples. The code behind these examples is small and accessible to most Python developers, even if they don’t have much HTML experience.

This post in particular focuses on Jupyter’s ability to add HTML output to any object. This can either be full-fledged interactive widgets, or just rich static outputs like tables or diagrams. We hope that by showing examples here we will inspire some throughts in other projects.

This post was supported by replies to this tweet. The rest of this post is just examples.


I originally decided to write this post after reading another blogpost from the UK Met office, where they included the HTML output of their library Iris in a a blogpost

(work by Peter Killick, post by Theo McCaie)

The fact that the output provided by an interactive session is the same output that you would provide in a published result helps everyone. The interactive

ListenData 2019-07-03 15:01:00

Python list comprehension with Examples

This tutorial covers how list comprehension works in Python. It includes many examples which would help you to familiarize the concept and you should be able to implement it in your live project at the end of this lesson.
Table of Contents

What is list comprehension?Python is an object oriented programming language. Almost everything in them is treated consistently as an object. Python also features functional programming which is very similar to mathematical way of approaching problem where you assign inputs in a function and you get the same output with same input value. Given a function f(x) = x2, f(x) will always return the same result with the same x value. The function has no "side-effect" which means an operation has no effect on a variable/object that is outside the intended usage. "Side-effect" refers to leaks in your code which can modify a mutable data structure or variable.

Functional programming is also good for parallel computing as there is no

Quansight Labs 2019-07-03 11:36:54

uarray update: API changes, overhead and comparison to __array_function__

uarray is a generic override framework for objects and methods in Python. Since my last uarray blogpost, there have been plenty of developments, changes to the API and improvements to the overhead of the protocol. Let’s begin with a walk-through of the current feature set and API, and then move on to current developments and how it compares to __array_function__. For further details on the API and latest developments, please see the API page for uarray. The examples there are doctested, so they will always be current.

MotivationOther array objects

NumPy is a simple, rectangular, dense, and in-memory data store. This is great for some applications but isn't complete on its own. It doesn't encompass every single use-case. The following are examples of array objects available today that have different features and cater to a different kind of audience.

  • Dask is one of the most popular ones. It allows distributed and chunked computation.
  • CuPy is another popular one, and
Peekaboo 2019-07-02 16:11:00

Don't cite the No Free Lunch Theorem

Tldr; You probably shouldn’t be citing the "No Free Lunch" Theorem by Wolpert. If you’ve cited it somewhere, you might have used it to support the wrong conclusion. What it actually (vaguely) says is “You can’t learn from data without making assumptions”.

The paper on the “No Free Lunch Theorem”, actually called "The Lack of A Priori Distinctions Between Learning Algorithms" is one of these papers that are often cited and rarely read, and I hear many people in the ML community refer to it when supporting the claim that “one model can’t be the best at everything” or “one model won’t always be better than another model”. The point of this post is to convince you that this is not what the paper or theorem says (at least not the one usually cited by Wolpert), and you should not cite this theorem in this context; and also that common versions cited of the "No Free Lunch" Theorem (continued...)
ListenData 2019-06-28 22:46:00

15 ways to read CSV file with pandas

This tutorial explains how to read a CSV file in python using read_csv function of pandas package. Without use of read_csv function, it is not straightforward to import CSV file with python object-oriented programming. Pandas is an awesome powerful python package for data manipulation and supports various functions to load and import data from various formats. Here we are covering how to deal with common issues in importing CSV file.
Table of Contents

Install and Load Pandas Package
Make sure you have pandas package already installed on your system. If you set up python using Anaconda, it comes with pandas package so you don't need to install it again. Otherwise you can install it by using command pip install pandas. Next step is to load the package by running the following command. pd is an alias of pandas package. We will use it instead of full name "pandas".
import pandas as pd
Create Sample Data for Import
The program below creates a sample
ListenData 2019-06-25 11:31:00

Matplotlib Tutorial : Learn with Examples in 3 hours

This tutorial outlines how to perform plotting and data visualization in python using Matplotlib library. The objective of this post is to get you familiar with the basics and advanced plotting functions of the library. It contains several examples which will give you hands-on experience in generating plots in python.
Table of Contents

What is Matplotlib?It is a powerful python library for creating graphics or charts. It takes care of all of your basic and advanced plotting requirements in Python. It took inspiration from MATLAB programming language and provides a similar MATLAB like interface for graphics. The beauty of this library is that it integrates well with pandas package which is used for data manipulation. With the combination of these two libraries, you can easily perform data wrangling along with visualization and get valuable insights out of data. Like ggplot2 library in R, matplotlib library is the grammar of graphics in Python and most used library for charts in Python.

Write Short Blogposts

I encourage my colleagues to write blogposts more frequently. This is for a few reasons:

  1. It informs your broader community what you’re up to, and allows that community to communicate back to you quickly.

    You communicating to the community fosters a sense of collaboration, openness, and trust. You gain collaborators, build momentum behind your work, and curate a body of knowledge that early adopters can consume to become experts quickly.

    Getting feedback from your community helps you to course-correct early in your work, and stops you from wasting time in inefficient courses of action.

    You can only work for a long time without communicating if you are either entirely confident in what you’re doing, or reckless, or both.

  2. It increases your visibility, and so is good for your career.

    I have a great job. I find my work to be both

ListenData 2019-06-19 13:20:00

How to drop one or multiple columns in Pandas Dataframe

In this tutorial, we will cover how to drop or remove one or multiple columns from pandas dataframe.
What is pandas in Python?
pandas is a python package for data manipulation. It has several functions for the following data tasks:
  1. Drop or Keep rows and columns
  2. Aggregate data by one or more columns
  3. Sort or reorder data
  4. Merge or append multiple dataframes
  5. String Functions to handle text data
  6. DateTime Functions to handle date or time format columns
Import or Load Pandas library
To make use of any python library, we first need to load them up by using import command.
import pandas as pd
import numpy as np
Let's create a fake dataframe for illustration
The code below creates 4 columns named A through D.
df = pd.DataFrame(np.random.randn(6, 4), columns=list('ABCD'))
          A         B         C         D
0 -1.236438 -1.656038
ListenData 2019-06-09 21:07:00

String Functions in Python with Examples

This tutorial outlines various string (character) functions used in Python. To manipulate strings and character values, python has several in-built functions. It means you don't need to import or have dependency on any external package to deal with string data type in Python. It's one of the advantage of using Python over other data science tools. Dealing with string values is very common in real-world. Suppose you have customers' full name and you were asked by your manager to extract first and last name of customer. Or you want to fetch information of all the products that have code starting with 'QT'.
Table of Contents

List of frequently used string functions The table below shows many common string functions along with description and its equivalent function in MS Excel. We all use MS Excel in our workplace and familiar with the functions used in MS Excel. The comparison of string functions in MS EXCEL and Python would help you to learn
Ralf Gommers | Reflections 2019-06-05 00:00:00

The cost of an open source contribution

Open source is massively successful. Some say it’s eating the world, although to my ears that phrasing doesn’t sound entirely like a good thing. Open source maintainers are always in need of help, and over the past years I’ve seen a lot of focus on ways open source projects can grow their communities and gain new contributors. Guidance on how to go about finding new contributors is easily found. E.
Spyder Blog 2019-06-02 00:00:00

TDK-Micronas partners with Quansight to sponsor Spyder

This blogpost was originally published on the Quansight Labs website

TDK-Micronas is sponsoring Spyder development efforts through Quansight Labs. This will enable the development of some features that have been requested by our users, as well as new features that will help TDK develop custom Spyder plugins in order to complement their Automatic Test Equipment (ATE’s) in the development of their Application Specific Integrated Circuits (ASIC’s).

At this point it may be useful to clarify the relationship the role of Quansight Labs in Spyder's development and the relationship with TDK. To quote Ralf Gommers (director of Quansight Labs):

"We're an R&D lab for open source development of core technologies around data science and scientific computing in Python. And focused on growing communities around those technologies. That's how I see it for Spyder as well: Quansight Labs enables developers to be employed to work on Spyder, and helps with connecting them to developers of other projects in similar situations. Labs should be an enabler to let the Spyder project, its community and individual developers grow.

I Love Symposia! 2019-05-28 08:41:54

Why citations are not enough for open source software

A few weeks ago I wrote about why you should cite open source tools. Although I think citations important, though, there are major problems in relying on them alone to support open source work.

The biggest problem is that papers describing a software library can only give credit to the contributors at the time that the paper was written. The preferred citation for the SciPy library is “Eric Jones, Travis Oliphant, Pearu Peterson, et al”, 2001. The “et al” is not an abbreviation here, but a fixed shorthand for all other contributors. Needless to say many, many people have contributed to the SciPy library since 2001 (GitHub counts 716 contributors as of this writing), and they are unable to get credit within the academic system for those contributions. (As an aside, Google counts about 1,200 citations to SciPy, which is a breathtaking undercounting of its value and influence, and reinforces my earlier point: cite open source software! Definitely don't use this post as an excuse not to cite it!!!)

Not surprisingly, we have had

Spyder Blog 2019-05-20 00:00:00

Spyder 4.0 takes a big step closer with the release of Beta 2!

This blogpost was originally published on the Quansight Labs website

It has been almost two months since I joined Quansight in April, to start working on Spyder maintenance and development. So far, it has been a very exciting and rewarding journey under the guidance of long time Spyder maintainer Carlos Córdoba. This is the first of a series of blog posts we will be writing to showcase updates on the development of Spyder, new planned features and news on the road to Spyder 4.0 and beyond.

First off, I would like to give a warm welcome to Edgar Margffoy, who recently joined Quansight and will be working with the Spyder team to take its development even further. Edgar has been a core Spyder developer for more than two years now, and we are very excited to have his (almost) full-time commitment to the project.

Spyder 4.0 Beta 2 released!

Since August 2018, when the first beta of the 4.x series was released, the Spyder development team has been


The Role of a Maintainer

What are the expectations and best practices for maintainers of open source software libraries? How can we do this better?

This post frames the discussion and then follows with best practices based on my personal experience and opinions. I make no claim that these are correct.

Let us Assume External Responsibility

First, the most common answer to this question is the following:

  • Q: What are expectations on OSS maintainers?
  • A: Nothing at all. They’re volunteers.

However, let’s assume for a moment that these maintainers are paid to maintain the project some modest amount, like 10 hours a week.

How can they best spend this time?

What is a Maintainer?

Next, let’s disambiguate the role of developer, reviewer, and maintainer

  1. Developers fix bugs and create features. They write code and docs and generally are agents of change in a software project. There are often many more developers than reviewers or maintainers.

  2. Reviewers are known

Living in an Ivory Basement 2019-05-14 22:00:00

Using GitHub for janky project reporting - some code

We scripted GitHub for lightweight project reporting

Paul Ivanov’s Journal 2019-05-13 07:00:00

My first DNF (Ft Bragg 600k)

It's been six years since my first ride with The San Francisco Randonneurs and four years since my first 200k. I've ridden 18 rides that are at least that distance since then (3x 300k, 2x 400k, 1x 600k), completing my first Super Randonneur Series (2-, 3-, 4-, and 600k in one year) last year after not riding much the year before that. And this weekend I had my first DNF result on the Fort Bragg 600k. I Did Not Finish.

The best response to my choice of abandoning the ride to enjoy the campground came from Peter Curley, who said "That was a very mature decision." A clear departure from typical randonneuring stubbornness and refusal to give up, I celebrated my decision to quit as a victory when I arrived at the campground and made my announcement to the volunteers. I think I was so energetic about it that they did not believe me. I was being kind to myself, to my body, and at peace with the decision by


Should I Resign from My Full Professor Job to Work Fulltime on Cocalc?

Nearly 3 years ago, I gave a talk at a Harvard mathematics conference announcing that “I am leaving academia to build a company”. What I really did is go on unpaid leave for three years from my tenured Full Professor position. No further extensions of that leave is possible, so I finally have to decide whether or not to go back to academia or resign.
How did I get here?
Nearly two decades ago, as a recently minted Berkeley math Ph.D., I was hired as a non-tenure-track faculty member in the mathematics department at Harvard. I spent five years at Harvard, then I applied for jobs, and accepted a tenured Associate Professor position in the mathematics department at UC San Diego. The mathematics community was very supportive of my number theory research; I skipped tenure track, and landed a tier-1 tenured position by the time I was 30 years old. In 2006, I moved from UCSD to a tenured Associate Professor position at the University
Paul Ivanov’s Journal 2019-05-03 07:00:00

PyCon2019 poem

I'm back in Cleveland for another Pycon. Yesterday was my first full day here. Along with Matt Seale, I was a helper at Matthias Bussonnier tutorial ("IPython and Jupyter in Depth: High productivity, interactive Python). The sticky system is efficient at signaling when someone in a classroom needs help, and a lot of folks don't know that this practice was popularized by Software Carpentry workshops and continues to be used at The Carpentries.

I stepped out for a coffee refill and bumped into a large contingent of Bloomberg folks I'd never met (Princeton office). I guess we have something like 90 people at the conference this year, and I made the usual and true remark about how I go to conferences to meet the other people who work at our company. Then after his tutorial concluded, Matthias and I bumped into Tracy Teal, exchanged some stickers, and chatted about The Carpentries, Jupyter, organizing conferences, governance and sponsorship models, and a bunch of other stuff.

Matthias was a

I Love Symposia! 2019-05-02 02:31:30

Why you should cite open source tools

Every now and then, a moment or a sentence in a conversation sticks out at you, and lodges itself in the back of your brain for months or even years. In this case, the sentence is a tweet, and I fear that the only way to dislodge it is to talk about it publicly.

Last year, I complained on Twitter that a very prominent paper that was getting lots of attention used scikit-image, but failed to cite our paper. (Or the papers corresponding to many other open source packages.) I continued that scientists developing open source software depend on these citations to continue their work. (More on this in another post...) One response was that surely the developers of the open source scientific Python stack were not scientists per se, and that citations were not a priority for them.

I still sigh internally when I think of it.

That tweet manifests a pervasive perception that open source scientific software is written by God-like figures. These massively experienced software developers have easy access to funds

ListenData 2019-04-27 13:52:00

Python Lambda Function with Examples

This article covers detailed explanation of lambda function of Python. You will learn how to use it in real-world data scenarios with examples.
Table of Contents

Introduction : Lambda FunctionIn non-technical language, lambda is an alternative way of defining function. You can define function inline using lambda. It means you can apply a function to some data using a single line of python code. It is called anonymous function as the function can be defined without its name. They are a part of functional programming style which focus on readability of code and avoids changing mutable data.
Syntax of Lambda Function
lambda arguments: expression
Lambda function can have more than one argument but expression cannot be more than 1. The expression is evaluated and returned. Example
addition = lambda x,y: x + y
addition(2,3) returns 5
In the above python code, x,y are the arguments and x + y is the expression that gets evaluated and returned.
ListenData 2019-04-20 21:01:00

Loops in Python explained with examples

This tutorial covers various ways to execute loops in python with several practical examples. After reading this tutorial, you will be familiar with the concept of loop and will be able to apply loops in real world data wrangling tasks.

Table of Contents

What is Loop?Loop is an important programming concept and exist in almost every programming language (Python, C, R, Visual Basic etc.). It is used to repeat a particular operation(s) several times until a specific condition is met. It is mainly used to automate repetitive tasks.

Real World Examples of Loop
  1. Software of the ATM machine is in a loop to process transaction after transaction until you acknowledge that you have no more to do.
  2. Software program in a mobile device allows user to unlock the mobile with 5 password attempts. After that it resets mobile device.
  3. You put your favorite song on a repeat mode. It is also a loop.
  4. You want to run a particular analysis on each column of your data
Living in an Ivory Basement 2019-04-15 22:00:00

Some questions and thoughts on journal peer review.

What's up with current peer review practice?

ListenData 2019-04-14 15:31:00

Create Dummy Data in Python

This article explains various ways to create dummy or random data in Python for practice. Like R, we can create dummy data frames using pandas and numpy packages. Most of the analysts prepare data in MS Excel. Later they import it into Python to hone their data wrangling skills in Python. This is not an efficient approach. The efficient approach is to prepare random data in Python and use it later for data manipulation.

Table of Contents

1. Enter Data Manually in Editor WindowThe first step is to load pandas package and use DataFrame function
import pandas as pd
data = pd.DataFrame({"A" : ["John","Deep","Julia","Kate","Sandy"],
"MonthSales" : [25,30,35,40,45]})
       A  MonthSales
0 John 25
1 Deep 30
Living in an Ivory Basement 2019-04-10 22:00:00

Things to think about when developing shotgun metagenome classifiers

Thoughts on goals and tradeoffs in classifying shotgun metagenome data.

ListenData 2019-04-09 18:47:00


The most common issue in installing python package in a company's network is failure of verification of SSL Certificate. Sometimes company blocks some websites in their network so employees can't access these websites. Whenever they try to visit these websites, it shows "Access Denied because of company's policy". It causes connection error in reaching main python website.

Error looks like this :

Could not fetch URL connection error: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:598)

PIP SSL Certification Issue

Solution :

Run the following command. Make sure to specify package name in <package_name>
pip install --trusted-host --trusted-host <package_name> -vvv
Suppose you want to install pandas package, you should submit the following line of command
pip install --trusted-host --trusted-host pandas -vvv

The --trusted-host option mark the host as trusted, even though it does not have valid or any HTTPS
ListenData 2019-04-09 15:56:00

Install Python Package

Python is one of the most popular programming language for data science and analytics. It is widely used for a variety of tasks in startups and many multi-national organizations. The beauty of this programming language is that it is open-source which means it is available for free and has very active community of developers across the world. Python developers share their solutions in the form of package or module with other python users. This tutorial explains various ways how to install python package.

Ways to Install Python Package

Method 1 : If Anaconda is already installed on your System

Anaconda is the data science platform which comes with pre-installed popular python packages and powerful IDE (Spyder) which has user-friendly interface to ease writing of python programming scripts.

If Anaconda is installed on your system (laptop), click on Anaconda Prompt as shown in the image below.

Anaconda Prompt

To install a python package or module, enter the code below in Anaconda Prompt -
pip install package-name
Living in an Ivory Basement 2019-04-08 22:00:00

News from the NIH Data Commons Pilot Phase Consortium

The NIH Data Commons Pilot Phase Consortium is dead! (Long live the NIH Data Commons!)

Living in an Ivory Basement 2019-04-07 22:00:00

Critically assessing open science - the CAOS meeting.

A summary of the CAOS open science meeting

While My MCMC Gently Samples 2019-03-15 14:00:00

Computational Psychiatry: Combining multiple levels of analysis to understand brain disorders - PhD thesis

I noticed that as my personal website at my former university went down that my PhD thesis could not be found anywhere, so I'm posting it here.

During my PhD I explored how machine learning and computational modeling of the brain can be used to improve our understanding, and diagnostics …

Python – Meta Rabbit 2019-03-12 12:00:51

NIXML: nix + YAML for easy reproducible environments

The rise and fall of bioconda A year ago, I remember a conversation which went basically like this: Them: So, to distribute my package, what do you think I should use? Me: You should use bioconda. Them: OK, that’s interesting, but what about …? Me: No, you should use bioconda. Them: I will definitely look … Continue reading NIXML: nix + YAML for easy reproducible environments
Living in an Ivory Basement 2019-03-01 23:00:00

Sustaining open source: thinking about communities of effort

Thinking about how to sustain open source.

Living in an Ivory Basement 2019-02-28 23:00:00

My recent reading re sustaining open communities

What has Titus been reading lately?

Filipe Saraiva's blog 2019-02-24 22:26:11

Reduzindo a pilha

Sou fã de quadrinhos desde criança. As primeiras revistas que ganhei foram na primeira metade dos anos 90, alguns Mickeys, Mônicas, Trapalhões e X-Men. Em 98 comecei a comprar X-Men, Fabulosos X-Men e Wolverine, até os primeiros números da famigerada X-Men Premium. Sem dinheiro, enveredei pelos mangás e histórias fechadas. Quando a Panini começa a… Continue a ler »Reduzindo a pilha
Living in an Ivory Basement 2019-02-21 23:00:00

Threat models for open online scientific engagement?

What threats are there for scientists in engaging in open online discussions?

Martin Fitzpatrick - python 2019-02-20 15:00:00

Packaging PyQt5 apps with fbs — Distribute cross-platform GUI applications with the fman Build System

fbs is a cross-platform PyQt5 packaging system which supports building desktop applications for Windows, Mac and Linux (Ubuntu, Fedora and Arch). Built on top of PyInstaller it wraps some of the rough edges and defines a standard project structure which allows the build process to be entirely automated. The included …

Announcement: Audio TK 3.1.0

ATK is updated to 3.1.0 with heavy code refactoring. Old C++ standards are now dropped and it requires now a full C++17 compliant compiler. The main difference for filter support is that explicit SIMD filters using libsimdpp have been dropped while tr2::simd becomes standard and supported by gcc, clang and Visual Studio. Download link: ATK […]
Filipe Saraiva's blog 2019-01-29 01:48:22


A voz feminina robótica (chegamos no tempo onde questão de gênero e robôs podem se confundir) soou, estranha e familiar como sempre, assim que o carro finalizou a curva para a direita: “Você entrou na Avenida Universitária; o limite de velocidade é 60 quilômetros por hora”. Meu pai sorriu e começou a falar: – Desde… Continue a ler »Telemulta
While My MCMC Gently Samples 2019-01-21 15:00:00

My foreword to "Bayesian Analysis with Python, 2nd Edition" by Osvaldo Martin

When Osvaldo asked me to write the foreword to his new book I felt honored, excited, and a bit scared, so naturally I accepted. What follows is my best attempt to convey what makes probabilistic programming so exciting to me. Osvaldo did a great job with the book, it is …

Filipe Saraiva's blog 2019-01-21 14:39:48

Call for Answers: Survey About Task Assignment

Professor Igor Steinmacher, from Northern Arizona University, is a proeminent researcher on several social dynamics in open source communities, like support of newcomers, gender bias, open sourcing proprietary software, and more. Some of his papers can de found in his website. Currently, Prof. Igor is inviting mentors from open source communities to answer a survey… Continue a ler »Call for Answers: Survey About Task Assignment
Living in an Ivory Basement 2019-01-15 23:00:00

Revisiting authorship, and JOSS software publications

The question du jour: how should authorship on software papers be decided?

While My MCMC Gently Samples 2019-01-14 15:00:00

Using Bayesian Decision Making to Optimize Supply Chains

(c) 2019 Thomas Wiecki & Ravin Kumar

As advocates of Bayesian statistics in data science we often have to convince business-minded colleagues or customers of the added value of such an approach. While there are many good reasons for applying Bayesian modeling to solve business problems (Sean J Taylor recently had …

Filipe Saraiva's blog 2019-01-08 02:21:02

Mestrado em Ciência da Computação na UFPA 2019: Inteligência Computacional para Smart Grids; Metaheurísticas

Está aberto o processo seletivo para o mestrado em ciência da computação do PPGCC-UFPA. Nesse certame, estou disponibilizando 2 vagas para alunos que desenvolverão seus trabalhos junto aos demais pesquisadores no LAAI. As vagas são voltadas para os temas de inteligência computacional aplicada a Smart Grids e estudos sobre métodos metaheurísticos de otimização. Gostaria de… Continue a ler »Mestrado em Ciência da Computação na UFPA 2019: Inteligência Computacional para Smart Grids; Metaheurísticas
Filipe Saraiva's blog 2019-01-05 17:43:33

LaKademy 2018

Em outubro de 2018, Florianópolis foi sede da sexta edição do LaKademy, o sprint latinoamericano do KDE. Esse momento é uma oportunidade para termos em um mesmo lugar vários desenvolvedores do KDE – tanto veteranos quanto novatos – de diferentes projetos para melhorarem os respectivos softwares em que trabalham e também planejar as ações de… Continue a ler »LaKademy 2018
Filipe Saraiva's blog 2019-01-05 16:59:19

LaKademy 2018

Past October 2018, Florianópolis hosted the 6th edition of LaKademy, the Latin-American KDE sprint. That moment is an opportunity to put together several KDE developers – both veterans and newcomers – from different projects in order to work for improve their respective software and plan the promotional actions of the community in the subcontinent. In… Continue a ler »LaKademy 2018

GPU Dask Arrays, first steps

The following code creates and manipulates 2 TB of randomly generated data.

import dask.array as da

rs = da.random.RandomState()
x = rs.normal(10, 1, size=(500000, 500000), chunks=(10000, 10000))
(x + 1)[::2, ::2].sum().compute(scheduler='threads')

On a single CPU, this computation takes two hours.

On an eight-GPU single-node system this computation takes nineteen seconds.

Combine Dask Array with CuPy

Actually this computation isn’t that impressive. It’s a simple workload, for which most of the time is spent creating and destroying random data. The computation and communication patterns are simple, reflecting the simplicity commonly found in data processing workloads.

What is impressive is that we were able to create a distributed parallel GPU array quickly by composing these three existing libraries:

  1. CuPy provides a partial implementation of Numpy on the GPU.

  2. Dask Array provides chunked algorithms on top of Numpy-like libraries like Numpy and CuPy.

    This enables us to operate on more data than we could fit in memory by operating on that data in