Planet SciPy

ListenData 2019-07-22 09:20:00

Calculate KS Statistic with Python

Kolmogorov-Smirnov (KS) Statistics is one of the most important metrics used for validating predictive models. It is widely used in BFSI domain. If you are a part of risk or marketing analytics team working on project in banking, you must have heard of this metrics. What is KS Statistics?It stands for Kolmogorov–Smirnov which is named after Andrey Kolmogorov and Nikolai Smirnov. It compares the two cumulative distributions and returns the maximum difference between them. It is a non-parametric test which means you don't need to test any assumption related to the distribution of data. In KS Test, Null hypothesis states null both cumulative distributions are similar. Rejecting the null hypothesis means cumulative distributions are different.

In data science, it compares the cumulative distribution of events and non-events and KS is where there is a maximum difference between the two distributions. In simple words, it helps us to understand how well our predictive model is able to discriminate between events and

ListenData 2019-07-20 16:22:00

Python : Complete Guide to Date and Time Functions

In this tutorial, we will cover python datetime module and how it is used to handle date, time and datetime formatted columns (variables). It includes various practical examples which would help you to gain confidence in dealing dates and times with python functions. In general, Date types columns are not easy to manipulate as it comes with a lot of challenges like dealing with leap years, different number of days in a month, different date and time formats or if date values are stored in string (character) format etc.
Table of Contents

Introduction : datetime moduleIt is a python module which provides several functions for dealing with dates and time. It has four classes as follows which are explained in the latter part of this article how these classes work.
  1. datetime
  2. date
  3. time
  4. timedelta

People who have no experience of working with real-world datasets might have not encountered date columns. They might be under impression that working with dates is rarely used and

ListenData 2019-07-17 17:32:00

What are *args and **kwargs and How to use them

This article explains the concepts of *args and **kwargs and how and when we use them in python program. Seasoned python developers embrace the flexibility it provides when creating functions. If you are beginner in python, you might not have heard it before. After completion of this tutorial, you will have confidence to use them in your live project.
Table of Contents

Introduction : *argsargs is a short form of arguments. With the use of *args python takes any number of arguments in user-defined function and converts user inputs to a tuple named args. In other words, *args means zero or more arguments which are stored in a tuple named args.

When you define function without *args, it has a fixed number of inputs which means it cannot accept more (or less) arguments than you defined in the function.

In the example code below, we are creating a very basic function which adds two numbers. At the same time, we created a

Quansight Labs 2019-07-15 05:00:00

Quansight presence at SciPy'19

Yesterday the SciPy'19 conference ended. It was a lot of fun, and very productive. You can really feel that there's a lot of energy in the community, and that it's growing and maturing. This post is just a quick update to summarize Quansight's presence and contributions, as well as some of the more interesting things I noticed.

A few highlights

The "Open Source Communities" track, which had a strong emphasis on topics like burnout, diversity and sustainability, as well as the keynotes by Stuart Geiger ("The Invisible Work of Maintaining and Sustaining Open-Source Software") and Carol Willing ("Jupyter: Always Open for Learning and Discovery") showed that many more people and projects are paying more attention to and evolving their thinking on the human and organizational aspects of open source.

I did not go to many technical talks, but did make sure to catch Matt Rocklin's talk "Refactoring the SciPy Ecosystem for Heterogeneous Computing". Matt clearly explained some key issues and opportunities around

ListenData 2019-07-12 21:42:00

Python : 10 Ways to Filter Pandas DataFrame

In this article, we will cover various methods to filter pandas dataframe in Python. Data Filtering is one of the most frequent data manipulation operation. It is similar to WHERE clause in SQL or you must have used filter in MS Excel for selecting specific rows based on some conditions. In terms of speed, python has an efficient way to perform filtering and aggregation. It has an excellent package called pandas for data wrangling tasks. Pandas has been built on top of numpy package which was written in C language which is a low level language. Hence data manipulation using pandas package is fast and smart way to handle big sized datasets.
Examples of Data Filtering
It is one of the most initial step of data preparation for predictive modeling or any reporting project. It is also called 'Subsetting Data'. See some of the examples of data filtering below.
  • Select all the active customers whose accounts were opened
Quansight Labs 2019-07-09 03:30:00

Ibis: Python data analysis productivity framework

Ibis is a library pretty useful on data analysis tasks that provides a pandas-like API that allows operations like create filter, add columns, apply math operations etc in a lazy mode so all the operations are just registered in memory but not executed and when you want to get the result of the expression you created, Ibis compiles that and makes a request to the remote server (remote storage and execution systems like Hadoop components or SQL databases). Its goal is to simplify analytical workflows and make you more productive.

Ibis was created by Wes McKinney and is mainly maintained by Phillip Cloud and Krisztián Szűcs. Also, recently, I was invited to become a maintainer of the Ibis repository!

Maybe you are thinking: "why should I use Ibis?". Well, if you have any of the following issues, probably you should consider using Ibis in your analytical workflow!

  • if you need to get data from a SQL database but you don't
ListenData 2019-07-04 19:51:00

Python Dictionary Comprehension with Examples

In this tutorial, we will cover how dictionary comprehension works in Python. It includes various examples which would help you to learn the concept of dictionary comprehension and how it is used in real-world scenarios.
What is Dictionary?
Dictionary is a data structure in python which is used to store data such that values are connected to their related key. Roughly it works very similar to SQL tables or data stored in statistical softwares. It has two main components -
  1. Keys : Think about columns in tables. It must be unique (like column names cannot be duplicate)
  2. Values : It is similar to rows in tables. It can be duplicate.
It is defined in curly braces { }. Each key is followed by a colon (:) and then values.
Syntax of Dictionary

d = {'a': [1,2], 'b': [3,4], 'c': [5,6]}
To extract keys, values and structure of dictionary, you can submit the following commands.

d.keys() # 'a', 'b', 'c'
d.values() # [1, 2], [3, 4], [5,

HTML outputs in Jupyter


User interaction in data science projects can be improved by adding a small amount of visual deisgn.

To motivate effort around visual design we show several simple-yet-useful examples. The code behind these examples is small and accessible to most Python developers, even if they don’t have much HTML experience.

This post in particular focuses on Jupyter’s ability to add HTML output to any object. This can either be full-fledged interactive widgets, or just rich static outputs like tables or diagrams. We hope that by showing examples here we will inspire some throughts in other projects.

This post was supported by replies to this tweet. The rest of this post is just examples.


I originally decided to write this post after reading another blogpost from the UK Met office, where they included the HTML output of their library Iris in a a blogpost

(work by Peter Killick, post by Theo McCaie)

The fact that the output provided by an interactive session is the same output that you would provide in a published result helps everyone. The interactive

Anaconda 2019-07-03 19:31:38

Why We Removed the “Free” Channel in Conda 4.7

One of the changes we made in Conda 4.7 was the removal of a software collection called “free” from the default channel configuration. The “free” channel is our collection of packages prior to the switch…

The post Why We Removed the “Free” Channel in Conda 4.7 appeared first on Anaconda.

ListenData 2019-07-03 15:01:00

Python list comprehension with Examples

This tutorial covers how list comprehension works in Python. It includes many examples which would help you to familiarize the concept and you should be able to implement it in your live project at the end of this lesson.
Table of Contents

What is list comprehension?Python is an object oriented programming language. Almost everything in them is treated consistently as an object. Python also features functional programming which is very similar to mathematical way of approaching problem where you assign inputs in a function and you get the same output with same input value. Given a function f(x) = x2, f(x) will always return the same result with the same x value. The function has no "side-effect" which means an operation has no effect on a variable/object that is outside the intended usage. "Side-effect" refers to leaks in your code which can modify a mutable data structure or variable.

Functional programming is also good for parallel computing as there is no

Quansight Labs 2019-07-03 11:36:54

uarray update: API changes, overhead and comparison to __array_function__

uarray is a generic override framework for objects and methods in Python. Since my last uarray blogpost, there have been plenty of developments, changes to the API and improvements to the overhead of the protocol. Let’s begin with a walk-through of the current feature set and API, and then move on to current developments and how it compares to __array_function__. For further details on the API and latest developments, please see the API page for uarray. The examples there are doctested, so they will always be current.

MotivationOther array objects

NumPy is a simple, rectangular, dense, and in-memory data store. This is great for some applications but isn't complete on its own. It doesn't encompass every single use-case. The following are examples of array objects available today that have different features and cater to a different kind of audience.

  • Dask is one of the most popular ones. It allows distributed and chunked computation.
  • CuPy is another popular one, and
Peekaboo 2019-07-02 16:11:00

Don't cite the No Free Lunch Theorem

Tldr; You probably shouldn’t be citing the "No Free Lunch" Theorem by Wolpert. If you’ve cited it somewhere, you might have used it to support the wrong conclusion. What it actually (vaguely) says is “You can’t learn from data without making assumptions”.

The paper on the “No Free Lunch Theorem”, actually called "The Lack of A Priori Distinctions Between Learning Algorithms" is one of these papers that are often cited and rarely read, and I hear many people in the ML community refer to it when supporting the claim that “one model can’t be the best at everything” or “one model won’t always be better than another model”. The point of this post is to convince you that this is not what the paper or theorem says (at least not the one usually cited by Wolpert), and you should not cite this theorem in this context; and also that common versions cited of the "No Free Lunch" Theorem (continued...)
ListenData 2019-06-28 22:46:00

15 ways to read CSV file with pandas

This tutorial explains how to read a CSV file in python using read_csv function of pandas package. Without use of read_csv function, it is not straightforward to import CSV file with python object-oriented programming. Pandas is an awesome powerful python package for data manipulation and supports various functions to load and import data from various formats. Here we are covering how to deal with common issues in importing CSV file.
Table of Contents

Install and Load Pandas Package
Make sure you have pandas package already installed on your system. If you set up python using Anaconda, it comes with pandas package so you don't need to install it again. Otherwise you can install it by using command pip install pandas. Next step is to load the package by running the following command. pd is an alias of pandas package. We will use it instead of full name "pandas".
import pandas as pd
Create Sample Data for Import
The program below creates a sample pandas
Anaconda 2019-06-25 20:54:52

TensorFlow CPU optimizations in Anaconda

By Stan Seibert, Anaconda, Inc. & Nathan Greeneltch, Intel Corporation TensorFlow is one of the most commonly used frameworks for large-scale machine learning, especially deep learning (we’ll call it “DL” for short). This popular framework…

The post TensorFlow CPU optimizations in Anaconda appeared first on Anaconda.

Anaconda 2019-06-25 16:56:09

How We Made Conda Faster in 4.7

We’ve witnessed a lot of community grumbling about Conda’s speed, and we’ve experienced it ourselves. Thanks to a contract from NASA via the SBIR program, we’ve been able to dedicate a lot of time recently…

The post How We Made Conda Faster in 4.7 appeared first on Anaconda.

ListenData 2019-06-25 11:31:00

Matplotlib Tutorial – Learn Plotting in Python in 3 hours

This tutorial outlines how to perform plotting and data visualization in python using Matplotlib library. The objective of this post is to get you familiar with the basics and advanced plotting functions of the library. It contains several examples which will give you hands-on experience in generating plots in python.
Table of Contents

What is Matplotlib?It is a powerful python library for creating graphics or charts. It takes care of all of your basic and advanced plotting requirements in Python. It took inspiration from MATLAB programming language and provides a similar MATLAB like interface for graphics. The beauty of this library is that it integrates well with pandas package which is used for data manipulation. With the combination of these two libraries, you can easily perform data wrangling along with visualization and get valuable insights out of data. Like ggplot2 library in R, matplotlib library is the grammar of graphics in Python and most used library for charts in Python.

Write Short Blogposts

I encourage my colleagues to write blogposts more frequently. This is for a few reasons:

  1. It informs your broader community what you’re up to, and allows that community to communicate back to you quickly.

    You communicating to the community fosters a sense of collaboration, openness, and trust. You gain collaborators, build momentum behind your work, and curate a body of knowledge that early adopters can consume to become experts quickly.

    Getting feedback from your community helps you to course-correct early in your work, and stops you from wasting time in inefficient courses of action.

    You can only work for a long time without communicating if you are either entirely confident in what you’re doing, or reckless, or both.

  2. It increases your visibility, and so is good for your career.

    I have a great job. I find my work to be both

ListenData 2019-06-19 13:20:00

Drop one or more columns in Pandas Dataframe

In this tutorial, we will cover how to drop or remove one or multiple columns from pandas dataframe.
What is pandas in Python?
pandas is a python package for data manipulation. It has several functions for the following data tasks:
  1. Drop or Keep rows and columns
  2. Aggregate data by one or more columns
  3. Sort or reorder data
  4. Merge or append multiple dataframes
  5. String Functions to handle text data
  6. DateTime Functions to handle date or time format columns
Import or Load Pandas library
To make use of any python library, we first need to load them up by using import command.
import pandas as pd
import numpy as np
Let's create a fake dataframe for illustration
The code below creates 4 columns named A through D.
df = pd.DataFrame(np.random.randn(6, 4), columns=list('ABCD'))
          A         B         C         D
0 -1.236438 -1.656038
Quansight Labs 2019-06-12 05:00:00

Labs update and May highlights

Time flies when you're having fun. Here is an update of some of the highlights of my second month at Quansight Labs.

The making of a black hole image & GitHub Sponsors

Both Travis and myself were invited by GitHub to attend GitHub Satellite in Berlin. The main reason was that Nat Friedman (GitHub CEO) decided to spend the first 20 minutes of his keynote to highlight the Event Horizon Telescope's black hole image and the open source software that made that imaging possible. This included the scientific Python very prominently - NumPy, Matplotlib, Python, Cython, SciPy, AstroPy and other projects were highlighted. At the same time, Nat introduced new GitHub features like "used by", a triaging role and new dependency graph features and illustrated how those worked for NumPy. These features will be very welcome news to maintainers of almost any project.

The single most visible feature introduced was GitHub Sponsors:

I really enjoyed meeting Devon Zuegel, Product Manager of the Open Source Economy Team at GitHub, in person after previously having had

ListenData 2019-06-09 21:07:00

String Functions in Python with Examples

This tutorial outlines various string (character) functions used in Python. To manipulate strings and character values, python has several in-built functions. It means you don't need to import or have dependency on any external package to deal with string data type in Python. It's one of the advantage of using Python over other data science tools. Dealing with string values is very common in real-world. Suppose you have customers' full name and you were asked by your manager to extract first and last name of customer. Or you want to fetch information of all the products that have code starting with 'QT'.
Table of Contents

List of frequently used string functions The table below shows many common string functions along with description and its equivalent function in MS Excel. We all use MS Excel in our workplace and familiar with the functions used in MS Excel. The comparison of string functions in MS EXCEL and Python would help you to learn
Anaconda 2019-06-06 22:34:36

Updated Statement About Our Relationship with DataCamp

We apologize for our poor communications about our response to the DataCamp sexual misconduct incident. We support the victims and we understand this has been a painful and ongoing struggle for them. We also recognize…

The post Updated Statement About Our Relationship with DataCamp appeared first on Anaconda.

Ralf Gommers | Reflections 2019-06-05 00:00:00

The cost of an open source contribution

Open source is massively successful. Some say it’s eating the world, although to my ears that phrasing doesn’t sound entirely like a good thing. Open source maintainers are always in need of help, and over the past years I’ve seen a lot of focus on ways open source projects can grow their communities and gain new contributors. Guidance on how to go about finding new contributors is easily found. E.
Anaconda 2019-06-03 16:28:01

Anaconda Recognized as a May 2019 Gartner Peer Insights Customers’ Choice for Data Science and Machine Learning Platforms

The Anaconda team is excited to announce that we have been recognized as a May 2019 Gartner Peer Insights Customers’ Choice for Data Science and Machine Learning Platforms. According to Gartner, “The Gartner Peer Insights…

The post Anaconda Recognized as a May 2019 Gartner Peer Insights Customers’ Choice for Data Science and Machine Learning Platforms appeared first on Anaconda.

Quansight Labs 2019-06-02 05:00:00

TDK-Micronas partners with Quansight to sponsor Spyder

TDK-Micronas is sponsoring Spyder development efforts through Quansight Labs. This will enable the development of some features that have been requested by our users, as well as new features that will help TDK develop custom Spyder plugins in order to complement their Automatic Test Equipment (ATE’s) in the development of their Application Specific Integrated Circuits (ASIC’s).

At this point it may be useful to clarify the relationship the role of Quansight Labs in Spyder's development and the relationship with TDK. To quote Ralf Gommers (director of Quansight Labs):

"We're an R&D lab for open source development of core technologies around data science and scientific computing in Python. And focused on growing communities around those technologies. That's how I see it for Spyder as well: Quansight Labs enables developers to be employed to work on Spyder, and helps with connecting them to developers of other projects in similar situations. Labs should be an enabler to let the Spyder project, its community and individual developers grow. And Labs provides mechanisms to attract and coordinate funding. Of course

Quansight Labs 2019-05-31 05:00:00

metadsl: A Framework for Domain Specific Languages in Python

metadsl: A Framework for Domain Specific Languages in Python

Hello, my name is Saul Shanabrook and for the past year or so I have been at Quansight exploring the array computing ecosystem. This started with working on the xnd project, a set of low level primitives to help build cross platform NumPy-like APIs, and then started exploring Lenore Mullin's work on a mathematics of arrays. After spending quite a bit of time working on an integrated solution built on these concepts, I decided to step back to try to generalize and simplify the core concepts. The trickiest part was not actually compiling mathematical descriptions of array operations in Python, but figuring out how to make it useful to existing users. To do this, we need to meet users where they are at, which is with the APIs they are already familiar with, like numpy. The goal of metadsl is to make it easier to tackle parts

Quansight Labs 2019-05-29 05:00:00

Community-driven open source and funded development

Quansight Labs is an experiment for us in a way. One of our main aims is to channel more resources into community-driven PyData projects, to keep them healthy and accelerate their development. And do so in a way that projects themselves stay in charge.

This post explains one method we're starting to use for this. I'm writing it to be transparent with projects, the wider community and potential funders about what we're starting to do. As well as to explicitly solicit feedback on this method.

Community work orders

If you talk to someone about supporting an open source project, in particular a well-known one that they rely on (e.g. NumPy, Jupyter, Pandas), they're often willing to listen and help. What you quickly learn though is that they want to know in some detail what will be done with the funds provided. This is true not only for companies, but also for individuals. In addition, companies will likely want a written agreement and some form of reporting about the progress of the work. To meet this

I Love Symposia! 2019-05-28 08:41:54

Why citations are not enough for open source software

A few weeks ago I wrote about why you should cite open source tools. Although I think citations important, though, there are major problems in relying on them alone to support open source work.

The biggest problem is that papers describing a software library can only give credit to the contributors at the time that the paper was written. The preferred citation for the SciPy library is “Eric Jones, Travis Oliphant, Pearu Peterson, et al”, 2001. The “et al” is not an abbreviation here, but a fixed shorthand for all other contributors. Needless to say many, many people have contributed to the SciPy library since 2001 (GitHub counts 716 contributors as of this writing), and they are unable to get credit within the academic system for those contributions. (As an aside, Google counts about 1,200 citations to SciPy, which is a breathtaking undercounting of its value and influence, and reinforces my earlier point: cite open source software! Definitely don't use this post as an excuse not to cite it!!!)

Not surprisingly, we have had

Quansight Labs 2019-05-27 05:00:00

Measuring API usage for popular numerical and scientific libraries

Developers of open source software often have a difficult time understanding how others utilize their libraries. Having better data of when and how functions are being used has many benefits. Some of these are:

  • better API design
  • determining whether or not a feature can be deprecated or removed.
  • more instructive tutorials
  • understanding the adoption of new features
Python Namespace Inspection

We wrote a general tool python-api-inspect to analyze any function/attribute call within a given set of namespaces in a repository. This work was heavily inspired by a blog post on inspecting method usage with Google BigQuery for pandas, NumPy, and SciPy. The previously mentioned work used regular expressions to search for method usage. The primary issue with this approach is that it cannot handle import numpy.random as rand; rand.random(...) unless additional regular expressions are constructed for each case and will result in false positives. Additionally, BigQuery is not a free resource. Thus, this approach is not general enough and does not scale well with the number of libraries that we would like to inspect function and attribute usage.

A more robust

Anaconda 2019-05-24 20:19:58

Intake: Discovering and Exploring Data in a Graphical Interface

Motivation Do you have data that you’d like people to be able to explore on their own? Are you always passing around snippets of code to load specific data files? These are problems that people…

The post Intake: Discovering and Exploring Data in a Graphical Interface appeared first on Anaconda.

Quansight Labs 2019-05-21 20:02:50

Spyder 4.0 takes a big step closer with the release of Beta 2!

It has been almost two months since I joined Quansight in April, to start working on Spyder maintenance and development. So far, it has been a very exciting and rewarding journey under the guidance of long time Spyder maintainer Carlos Córdoba. This is the first of a series of blog posts we will be writing to showcase updates on the development of Spyder, new planned features and news on the road to Spyder 4.0 and beyond.

First off, I would like to give a warm welcome to Edgar Margffoy, who recently joined Quansight and will be working with the Spyder team to take its development even further. Edgar has been a core Spyder developer for more than two years now, and we are very excited to have his (almost) full-time commitment to the project.

Spyder 4.0 Beta 2 released!

Since August 2018, when the first beta of the 4.x series was released, the Spyder development team has been working hard on our next release. Over the past year, we've

Spyder Blog 2019-05-20 00:00:00

Spyder 4.0 takes a big step closer with the release of Beta 2!

This blogpost was originally published on the Quansight Labs website

It has been almost two months since I joined Quansight in April, to start working on Spyder maintenance and development. So far, it has been a very exciting and rewarding journey under the guidance of long time Spyder maintainer Carlos Córdoba. This is the first of a series of blog posts we will be writing to showcase updates on the development of Spyder, new planned features and news on the road to Spyder 4.0 and beyond.

First off, I would like to give a warm welcome to Edgar Margffoy, who recently joined Quansight and will be working with the Spyder team to take its development even further. Edgar has been a core Spyder developer for more than two years now, and we are very excited to have his (almost) full-time commitment to the project.

Spyder 4.0 Beta 2 released!

Since August 2018, when the first beta of the 4.x series was released, the Spyder development team has been


The Role of a Maintainer

What are the expectations and best practices for maintainers of open source software libraries? How can we do this better?

This post frames the discussion and then follows with best practices based on my personal experience and opinions. I make no claim that these are correct.

Let us Assume External Responsibility

First, the most common answer to this question is the following:

  • Q: What are expectations on OSS maintainers?
  • A: Nothing at all. They’re volunteers.

However, let’s assume for a moment that these maintainers are paid to maintain the project some modest amount, like 10 hours a week.

How can they best spend this time?

What is a Maintainer?

Next, let’s disambiguate the role of developer, reviewer, and maintainer

  1. Developers fix bugs and create features. They write code and docs and generally are agents of change in a software project. There are often many more developers than reviewers or maintainers.

  2. Reviewers are known

Living in an Ivory Basement 2019-05-14 22:00:00

Using GitHub for janky project reporting - some code

We scripted GitHub for lightweight project reporting

Paul Ivanov’s Journal 2019-05-13 07:00:00

My first DNF (Ft Bragg 600k)

It's been six years since my first ride with The San Francisco Randonneurs and four years since my first 200k. I've ridden 18 rides that are at least that distance since then (3x 300k, 2x 400k, 1x 600x), completing my first Super Randonneur Series (2-, 3-, 4-, and 600k in one year) last year after not riding much the year before that. And this weekend I had my first DNF result on the Fort Bragg 600k. I Did Not Finish.

The best response to my choice of abandoning the ride to enjoy the campground came from Peter Curley, who said "That was a very mature decision." A clear departure from typical randonneuring stubbornness and refusal to give up, I celebrated my decision to quit as a victory when I arrived at the campground and made my announcement to the volunteers. I think I was so energetic about it that they did not believe me. I was being kind to myself, to my body, and at peace with the decision by


Should I Resign from My Full Professor Job to Work Fulltime on Cocalc?

Nearly 3 years ago, I gave a talk at a Harvard mathematics conference announcing that “I am leaving academia to build a company”. What I really did is go on unpaid leave for three years from my tenured Full Professor position. No further extensions of that leave is possible, so I finally have to decide whether or not to go back to academia or resign.
How did I get here?
Nearly two decades ago, as a recently minted Berkeley math Ph.D., I was hired as a non-tenure-track faculty member in the mathematics department at Harvard. I spent five years at Harvard, then I applied for jobs, and accepted a tenured Associate Professor position in the mathematics department at UC San Diego. The mathematics community was very supportive of my number theory research; I skipped tenure track, and landed a tier-1 tenured position by the time I was 30 years old. In 2006, I moved from UCSD to a tenured Associate Professor position at the University
Paul Ivanov’s Journal 2019-05-03 07:00:00

PyCon2019 poem

I'm back in Cleveland for another Pycon. Yesterday was my first full day here. Along with Matt Seale, I was a helper at Matthias Bussonnier tutorial ("IPython and Jupyter in Depth: High productivity, interactive Python). The sticky system is efficient at signaling when someone in a classroom needs help, and a lot of folks don't know that this practice was popularized by Software Carpentry workshops and continues to be used at The Carpentries.

I stepped out for a coffee refill and bumped into a large contingent of Bloomberg folks I'd never met (Princeton office). I guess we have something like 90 people at the conference this year, and I made the usual and true remark about how I go to conferences to meet the other people who work at our company. Then after his tutorial concluded, Matthias and I bumped into Tracy Teal, exchanged some stickers, and chatted about The Carpentries, Jupyter, organizing conferences, governance and sponsorship models, and a bunch of other stuff.

Matthias was a

Quansight Labs 2019-05-03 05:00:00

Labs update and April highlights

It has been an exciting first month for me at Quansight Labs. It's a good time for a summary of what we worked on in April and what is coming next.

Progress on array computing libraries

Our first bucket of activities I'd call "innovation". The most prominent projects in this bucket are XND, uarray, metadsl, python-moa, Remote Backend Compiler and arrayviews. XND is an umbrella name for a set of related array computing libraries: xnd, ndtypes, gumath, and xndtools.

Hameer Abbasi made some major steps forward with uarray: the backend and coercion semantics are now largely worked out, there is good documentation, and the unumpy package (which currently has numpy, XND and PyTorch backends) is progressing well. This blog post gives a good overview of the motivation for uarray and its main concepts.

Saul Shanabrook and Chris Ostrouchov worked out how best to put metadsl and python-moa together: metadsl can be used to create the API for python-moa to simplify the code base of the latter a lot. Chris also wrote an

Anaconda 2019-05-02 17:58:48

Anaconda’s Response to DataCamp’s CEO and Board of Directors

DataCamp has been a business partner of our company for almost two years. So we were shocked and saddened by the recent allegations of inappropriate sexual behavior and retaliatory firings made against DataCamp’s CEO and…

The post Anaconda’s Response to DataCamp’s CEO and Board of Directors appeared first on Anaconda.

I Love Symposia! 2019-05-02 02:31:30

Why you should cite open source tools

Every now and then, a moment or a sentence in a conversation sticks out at you, and lodges itself in the back of your brain for months or even years. In this case, the sentence is a tweet, and I fear that the only way to dislodge it is to talk about it publicly.

Last year, I complained on Twitter that a very prominent paper that was getting lots of attention used scikit-image, but failed to cite our paper. (Or the papers corresponding to many other open source packages.) I continued that scientists developing open source software depend on these citations to continue their work. (More on this in another post...) One response was that surely the developers of the open source scientific Python stack were not scientists per se, and that citations were not a priority for them.

I still sigh internally when I think of it.

That tweet manifests a pervasive perception that open source scientific software is written by God-like figures. These massively experienced software developers have easy access to funds

Anaconda 2019-04-30 17:01:47

Reflections on AnacondaCON 2019 with NVIDIA’s Josh Patterson

I love this month. April 19’ brings back Game of Thrones, Avengers: Endgame (that Thanos snap though), and of course AnacondaCON. I’ve been to every AnacondaCON, which makes this my third show. Of all data…

The post Reflections on AnacondaCON 2019 with NVIDIA’s Josh Patterson appeared first on Anaconda.

ListenData 2019-04-27 13:52:00

Python Lambda Function with Examples

This article covers detailed explanation of lambda function of Python. You will learn how to use it in real-world data scenarios with examples.
Table of Contents

Introduction : Lambda FunctionIn non-technical language, lambda is an alternative way of defining function. You can define function inline using lambda. It means you can apply a function to some data using a single line of python code. It is called anonymous function as the function can be defined without its name. They are a part of functional programming style which focus on readability of code and avoids changing mutable data.
Syntax of Lambda Function
lambda arguments: expression
Lambda function can have more than one argument but expression cannot be more than 1. The expression is evaluated and returned. Example
addition = lambda x,y: x + y
addition(2,3) returns 5
In the above python code, x,y are the arguments and x + y is the expression that gets evaluated and returned.

ListenData 2019-04-20 21:01:00

Loops in Python explained with examples

This tutorial covers various ways to execute loops in python with several practical examples. After reading this tutorial, you will be familiar with the concept of loop and will be able to apply loops in real world data wrangling tasks.

Table of Contents

What is Loop?Loop is an important programming concept and exist in almost every programming language (Python, C, R, Visual Basic etc.). It is used to repeat a particular operation(s) several times until a specific condition is met. It is mainly used to automate repetitive tasks.

Real World Examples of Loop
  1. Software of the ATM machine is in a loop to process transaction after transaction until you acknowledge that you have no more to do.
  2. Software program in a mobile device allows user to unlock the mobile with 5 password attempts. After that it resets mobile device.
  3. You put your favorite song on a repeat mode. It is also a loop.
  4. You want to run a particular analysis on each column of your data
Living in an Ivory Basement 2019-04-15 22:00:00

Some questions and thoughts on journal peer review.

What's up with current peer review practice?

ListenData 2019-04-14 15:31:00

Create Dummy Data in Python

This article explains various ways to create dummy or random data in Python for practice. Like R, we can create dummy data frames using pandas and numpy packages. Most of the analysts prepare data in MS Excel. Later they import it into Python to hone their data wrangling skills in Python. This is not an efficient approach. The efficient approach is to prepare random data in Python and use it later for data manipulation.

Table of Contents

1. Enter Data Manually in Editor WindowThe first step is to load pandas package and use DataFrame function
import pandas as pd
data = pd.DataFrame({"A" : ["John","Deep","Julia","Kate","Sandy"],
"MonthSales" : [25,30,35,40,45]})
       A  MonthSales
0 John 25
1 Deep 30
Anaconda 2019-06-11 16:24:17

The Human Element in AI

The over 45 speakers at AnacondaCON 2019 delved into how machine learning, artificial intelligence, enterprise, and open source communities are accomplishing great things with data — from optimizing urban farming to identifying the elements in…

The post The Human Element in AI appeared first on Anaconda.

Living in an Ivory Basement 2019-04-10 22:00:00

Things to think about when developing shotgun metagenome classifiers

Thoughts on goals and tradeoffs in classifying shotgun metagenome data.

ListenData 2019-04-09 18:47:00


The most common issue in installing python package in a company's network is failure of verification of SSL Certificate. Sometimes company blocks some websites in their network so employees can't access these websites. Whenever they try to visit these websites, it shows "Access Denied because of company's policy". It causes connection error in reaching main python website.

Error looks like this :

Could not fetch URL connection error: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:598)

PIP SSL Certification Issue

Solution :

Run the following command. Make sure to specify package name in <package_name>
pip install --trusted-host --trusted-host <package_name> -vvv
Suppose you want to install pandas package, you should submit the following line of command
pip install --trusted-host --trusted-host pandas -vvv

The --trusted-host option mark the host as trusted, even though it does not have valid or any HTTPS

About Author:

Deepanshu founded ListenData with a simple objective - Make analytics easy to understand and follow. He has over 8 years of experience in

Anaconda 2019-04-09 16:41:40

AnacondaCON 2019 Day 3 Recap: The Need for Speed, “Delightful UX” in Dev Tools, LOTR Jokes and More.

Everyone at Anaconda is still feeling the love AnacondaCON 2019. Day 3 wrapped up last Friday with one more day of talks and sessions, highlighted by some powerhouse keynotes. Let’s get right to the good…

The post AnacondaCON 2019 Day 3 Recap: The Need for Speed, “Delightful UX” in Dev Tools, LOTR Jokes and More. appeared first on Anaconda.

ListenData 2019-04-09 15:56:00

Install Python Package

Python is one of the most popular programming language for data science and analytics. It is widely used for a variety of tasks in startups and many multi-national organizations. The beauty of this programming language is that it is open-source which means it is available for free and has very active community of developers across the world. Python developers share their solutions in the form of package or module with other python users. This tutorial explains various ways how to install python package.

Ways to Install Python Package

Method 1 : If Anaconda is already installed on your System

Anaconda is the data science platform which comes with pre-installed popular python packages and powerful IDE (Spyder) which has user-friendly interface to ease writing of python programming scripts.

If Anaconda is installed on your system (laptop), click on Anaconda Prompt as shown in the image below.

Anaconda Prompt

To install a python package or module, enter the code below in Anaconda Prompt -
pip install package-name
Living in an Ivory Basement 2019-04-08 22:00:00

News from the NIH Data Commons Pilot Phase Consortium

The NIH Data Commons Pilot Phase Consortium is dead! (Long live the NIH Data Commons!)

Living in an Ivory Basement 2019-04-07 22:00:00

Critically assessing open science - the CAOS meeting.

A summary of the CAOS open science meeting

Anaconda 2019-04-05 18:25:01

Anaconda 2019.03 Release

Windows is the most popular operating system in the world and consistently has 75% or more of the worldwide desktop market. According to the JetBrains Python Developers Survey, 49% of Python developers use Windows as…

The post Anaconda 2019.03 Release appeared first on Anaconda.

Anaconda 2019-04-05 16:12:42

AnacondaCON 2019 Day 2 Recap: AI in Medicine, Cataloging the Contents of Stars, and More!

What You Missed at AnacondaCON Day 2 We’re back with a recap of Day 2 of our annual AnacondaCON. (In case you missed it, you can read our Day 1 recap here). Things started off…

The post AnacondaCON 2019 Day 2 Recap: AI in Medicine, Cataloging the Contents of Stars, and More! appeared first on Anaconda.

While My MCMC Gently Samples 2019-03-15 14:00:00

Computational Psychiatry: Combining multiple levels of analysis to understand brain disorders - PhD thesis

I noticed that as my personal website at my former university went down that my PhD thesis could not be found anywhere, so I'm posting it here.

During my PhD I explored how machine learning and computational modeling of the brain can be used to improve our understanding, and diagnostics …

Python – Meta Rabbit 2019-03-12 12:00:51

NIXML: nix + YAML for easy reproducible environments

The rise and fall of bioconda A year ago, I remember a conversation which went basically like this: Them: So, to distribute my package, what do you think I should use? Me: You should use bioconda. Them: OK, that’s interesting, but what about …? Me: No, you should use bioconda. Them: I will definitely look … Continue reading NIXML: nix + YAML for easy reproducible environments
Living in an Ivory Basement 2019-03-01 23:00:00

Sustaining open source: thinking about communities of effort

Thinking about how to sustain open source.

Living in an Ivory Basement 2019-02-28 23:00:00

My recent reading re sustaining open communities

What has Titus been reading lately?

Filipe Saraiva's blog 2019-02-24 22:26:11

Reduzindo a pilha

Sou fã de quadrinhos desde criança. As primeiras revistas que ganhei foram na primeira metade dos anos 90, alguns Mickeys, Mônicas, Trapalhões e X-Men. Em 98 comecei a comprar X-Men, Fabulosos X-Men e Wolverine, até os primeiros números da famigerada X-Men Premium. Sem dinheiro, enveredei pelos mangás e histórias fechadas. Quando a Panini começa a... [Read More]
Living in an Ivory Basement 2019-02-21 23:00:00

Threat models for open online scientific engagement?

What threats are there for scientists in engaging in open online discussions?

Martin Fitzpatrick - python 2019-02-20 15:00:00

Packaging PyQt5 apps with fbs — Distribute cross-platform GUI applications with the fman Build System

fbs is a cross-platform PyQt5 packaging system which supports building desktop applications for Windows, Mac and Linux (Ubuntu, Fedora and Arch). Built on top of PyInstaller it wraps some of the rough edges and defines a standard project structure which allows the build process to be entirely automated. The included …

Announcement: Audio TK 3.1.0

ATK is updated to 3.1.0 with heavy code refactoring. Old C++ standards are now dropped and it requires now a full C++17 compliant compiler. The main difference for filter support is that explicit SIMD filters using libsimdpp have been dropped while tr2::simd becomes standard and supported by gcc, clang and Visual Studio. Download link: ATK […]
Filipe Saraiva's blog 2019-01-29 01:48:22


A voz feminina robótica (chegamos no tempo onde questão de gênero e robôs podem se confundir) soou, estranha e familiar como sempre, assim que o carro finalizou a curva para a direita: “Você entrou na Avenida Universitária; o limite de velocidade é 60 quilômetros por hora”. Meu pai sorriu e começou a falar: – Desde... [Read More]
While My MCMC Gently Samples 2019-01-21 15:00:00

My foreword to "Bayesian Analysis with Python, 2nd Edition" by Osvaldo Martin

When Osvaldo asked me to write the foreword to his new book I felt honored, excited, and a bit scared, so naturally I accepted. What follows is my best attempt to convey what makes probabilistic programming so exciting to me. Osvaldo did a great job with the book, it is …

Filipe Saraiva's blog 2019-01-21 14:39:48

Call for Answers: Survey About Task Assignment

Professor Igor Steinmacher, from Northern Arizona University, is a proeminent researcher on several social dynamics in open source communities, like support of newcomers, gender bias, open sourcing proprietary software, and more. Some of his papers can de found in his website. Currently, Prof. Igor is inviting mentors from open source communities to answer a survey... [Read More]
Living in an Ivory Basement 2019-01-15 23:00:00

Revisiting authorship, and JOSS software publications

The question du jour: how should authorship on software papers be decided?

While My MCMC Gently Samples 2019-01-14 15:00:00

Using Bayesian Decision Making to Optimize Supply Chains

(c) 2019 Thomas Wiecki & Ravin Kumar

As advocates of Bayesian statistics in data science we often have to convince business-minded colleagues or customers of the added value of such an approach. While there are many good reasons for applying Bayesian modeling to solve business problems (Sean J Taylor recently had …

Filipe Saraiva's blog 2019-01-08 02:21:02

Mestrado em Ciência da Computação na UFPA 2019: Inteligência Computacional para Smart Grids; Metaheurísticas

Está aberto o processo seletivo para o mestrado em ciência da computação do PPGCC-UFPA. Nesse certame, estou disponibilizando 2 vagas para alunos que desenvolverão seus trabalhos junto aos demais pesquisadores no LAAI. As vagas são voltadas para os temas de inteligência computacional aplicada a Smart Grids e estudos sobre métodos metaheurísticos de otimização. Gostaria de... [Read More]
Filipe Saraiva's blog 2019-01-05 17:43:33

LaKademy 2018

Em outubro de 2018, Florianópolis foi sede da sexta edição do LaKademy, o sprint latinoamericano do KDE. Esse momento é uma oportunidade para termos em um mesmo lugar vários desenvolvedores do KDE – tanto veteranos quanto novatos – de diferentes projetos para melhorarem os respectivos softwares em que trabalham e também planejar as ações de... [Read More]
Filipe Saraiva's blog 2019-01-05 16:59:19

LaKademy 2018

Past October 2018, Florianópolis hosted the 6th edition of LaKademy, the Latin-American KDE sprint. That moment is an opportunity to put together several KDE developers – both veterans and newcomers – from different projects in order to work for improve their respective software and plan the promotional actions of the community in the subcontinent. In... [Read More]

GPU Dask Arrays, first steps

The following code creates and manipulates 2 TB of randomly generated data.

import dask.array as da

rs = da.random.RandomState()
x = rs.normal(10, 1, size=(500000, 500000), chunks=(10000, 10000))
(x + 1)[::2, ::2].sum().compute(scheduler='threads')

On a single CPU, this computation takes two hours.

On an eight-GPU single-node system this computation takes nineteen seconds.

Combine Dask Array with CuPy

Actually this computation isn’t that impressive. It’s a simple workload, for which most of the time is spent creating and destroying random data. The computation and communication patterns are simple, reflecting the simplicity commonly found in data processing workloads.

What is impressive is that we were able to create a distributed parallel GPU array quickly by composing these three existing libraries:

  1. CuPy provides a partial implementation of Numpy on the GPU.

  2. Dask Array provides chunked algorithms on top of Numpy-like libraries like Numpy and CuPy.

    This enables us to operate on more data than we could fit in memory by operating on that data in

Leonardo Uieda 2018-12-26 12:00:00

Manage project dependencies with conda environments

TL;DR: Create a conda environment for each project, capture exact versions when possible, automate activation and updating with a bash function.

I often work on several different projects involving software: Python libraries, papers, presentations, posters, this website, etc. Each project has different dependencies and there is a non-zero chance that these dependencies might be in conflict with each other. For example, I need Python 2.7 to work on a tesseroid modeling paper with a student, while my current work on


First Impressions of GPUs and PyData

I recently moved from Anaconda to NVIDIA within the RAPIDS team, which is building a PyData-friendly GPU-enabled data science stack. For my first week I explored some of the current challenges of working with GPUs in the PyData ecosystem. This post shares my first impressions and also outlines plans for near-term work.

First, lets start with the value proposition of GPUs, significant speed increases over traditional CPUs.

GPU Performance

Like many PyData developers, I’m loosely aware that GPUs are sometimes fast, but don’t deal with them often enough to have strong feeling about them.

To get a more visceral feel for the performance differences, I logged into a GPU machine, opened up CuPy (a Numpy-like GPU library developed mostly by Chainer in Japan) and cuDF (a Pandas-like library in development at NVIDIA) and did a couple of small speed comparisons:

Compare Numpy and Cupy
>>> import numpy, cupy

>>> x = numpy.random.random((10000, 10000))
>>> y = cupy.random.random((10000, 10000))

>>> %timeit bool((numpy.sin(x) ** 2 + numpy.cos(x) ** 2 == 1).all())
446 ms ± 53.1 ms per
Living in an Ivory Basement 2018-12-07 23:00:00

A quick read of _The genomic and proteomic landscape of the rumen microbiome_

Using short and long reads to assemble genomes from metagenomes!

Support Python 2 with Cython


Many popular Python packages are dropping support for Python 2 next month. This will be painful for several large institutions. Cython can provide a temporary fix by letting us compile a Python 3 codebase into something usable by Python 2 in many cases.

It’s not clear if we should do this, but it’s an interesting and little known feature of Cython.

Background: Dropping Python 2 Might be Harder than we Expect

Many major numeric Python packages are dropping support for Python 2 at the end of this year. This includes packages like Numpy, Pandas, and Scikit-Learn. Jupyter already dropped Python 2 earlier this year.

For most developers in the ecosystem this isn’t a problem. Most of our packages are Python-3 compatible and we’ve learned how to switch libraries. However, for larger companies or government organizations it’s often far harder to switch. The PyCon 2017 keynote by Lisa Guo and Hui Ding from Instagram gives a good look into why this can be challenging for large production codebases and also gives a good


Anatomy of an OSS Institutional Visit

I recently visited the UK Meteorology Office, a moderately large organization that serves the weather and climate forecasting needs of the UK (and several other nations). I was there with other open source colleagues including Joe Hamman and Ryan May from open source projects like Dask, Xarray, JupyterHub, MetPy, Cartopy, and the broader Pangeo community.

This visit was like many other visits I’ve had over the years that are centered around showing open source tooling to large institutions, so I thought I’d write about it in hopes that it helps other people in this situation in the future.

My goals for these visits are the following:

  1. Teach the institution about software projects and approaches that may help them to have a more positive impact on the world
  2. Engage them in those software projects and hopefully spread around the maintenance and feature development burden a bit
Step 1: Meet allies on the ground

We were invited by early adopters within the institution, both within the UK Met Office’s Informatics Lab

(continued...) 2018-11-16 23:00:00

Notes on the Frank-Wolfe Algorithm, Part II: A Primal-dual Analysis

This blog post extends the convergence theory from the first part of my notes on the Frank-Wolfe (FW) algorithm with convergence guarantees on the primal-dual gap which generalize and strengthen the convergence guarantees obtained in the first part.

MathJax.Hub.Config({ extensions: ["tex2jax.js"], jax: ["input/TeX", "output/HTML-CSS"], tex2jax …
Living in an Ivory Basement 2018-11-11 23:00:00

Creating a welcoming teaching/learning environment in workshops

It takes constant work to make a welcoming teaching/learning environment!

Living in an Ivory Basement 2018-11-08 23:00:00

Repeatability in Practice (2018 version)

How we do repeatability in the DIB Lab

Stéfan van der Walt - python 2018-10-31 07:00:00

Linking to emails in org-mode (using neomutt)

Where we store links to emails in org-mode, and open them using neomutt.

Filipe Saraiva's blog 2018-10-29 15:40:06

Ode ao ódio

Ontem, acompanhando a apuração para presidente no 2º turno, chorei. Chorei de raiva. Chorei de ódio. Ódio porque aquele que levou o pleito representa uma total afronta ao mínimo do que chamamos civilidade. Ele defendeu a ditadura e a tortura, reiteradamente. Prometeu prender ou exilar opositores. Prometeu perseguir professores, artistas, a intelectualidade. Disse que irá... [Read More]
Filipe Saraiva's blog 2018-10-28 14:08:15

Eleições 2018: Minha carta para a família

Família, essa é minha última manifestação política aqui no grupo antes do resultado. Vocês me conhecem, sou professor de ciência da computação na UFPA, sou um dos responsáveis pela formação dos próximos engenheiros de software e matemáticos computacionais da nossa região. Oriento alunos na graduação, no mestrado e também no doutorado, mesmo com todas as... [Read More]
Ralf Gommers | Reflections 2018-10-20 00:00:00

The making of the NumPy Roadmap

NumPy now has a roadmap - long overdue and a major step forward for the project. We’re not done yet (see my previous post on this topic) and updating the technical roadmap with new ideas and priorities should happen regularly. Despite everything having been done in the open, via minutes of in-person meetings and shared roadmap drafts on the numpy-discussion mailing list, it turns out that it’s not completely clear to the community and even some maintainers how we got to this point.
Filipe Saraiva's blog 2018-10-17 16:19:00

A arquitetura de compartilhamentos do Telegram para mitigar as fake news no WhatsApp

Fake News já se tornaram o tipo de problema que teremos que enfrentar de alguma maneira o quanto antes, ou veremos democracias sendo destruídas uma a uma. Se o caso Trump nos chamava atenção mas ainda parecia distante, as eleições brasileiras de 2018 vieram pra mostrar que o tiozão gente boa pode se converter no... [Read More]
Ralf Gommers | Reflections 2018-10-13 00:00:00

2018 NumFOCUS Summit - a summary

During 22-25 Sep 2018 I attended the NumFOCUS Summit. It consisted of two 2-day parts: the Sustainability Workshop (attended by 1-3 maintainers of almost all sponsored projects), and the Project Forum (attended by a subset of those maintainers plus a number of key industry stakeholders). I attended wearing two hats: as a member of the NumFOCUS Board of Directors, and as a maintainer of NumPy. Besides me, NumPy was represented by Allan Haldane, whom I had the pleasure to meet in person for the first time.

So you want to contribute to open source

Welcome new open source contributor!

I appreciated receiving the e-mail where you said you were excited about getting into open source and were particularly interested in working on a project that I maintain. This post has a few thoughts on the topic.

First, please forgive me for sending you to this post rather than responding with a personal e-mail. Your situation is common today, so I thought I’d write up thoughts in a public place, rather than respond personally.

This post has two parts:

  1. Some pragmatic steps on how to get started
  2. A personal recommendation to think twice about where you focus your time
Look for good first issues on Github

Most open source software (OSS) projects have a “Good first issue” label on their Github issue tracker. Here is a screenshot of how to find the “good first issue” label on the Pandas project:

(note that this may be named something else like “Easy to fix”)

This contains a list of issues that are important, but also