SciPy

Planet SciPy

Matthieu Brucher's blog 2018-09-18 07:52:31

Compiling C++ code in memory with clang

I have tried to find the proper receipts to compile on the fly C++ code with clang and LLVM. It’s actually not that easy to achieve if you are not targeting LLVM Intermediate Representation, and unfortunately, the code here, working for LLVM 7, may not work for LLVM 8. Or 6. The pipeline There are […]

Dask Development Log

This work is supported by Anaconda Inc

To increase transparency I’m trying to blog more often about the current work going on around Dask and related projects. Nothing here is ready for production. This blogpost is written in haste, so refined polish should not be expected.

Since the last update in the 0.19.0 release blogpost two weeks ago we’ve seen activity in the following areas:

  1. Update Dask examples to use JupyterLab on Binder
  2. Render Dask examples into static HTML pages for easier viewing
  3. Consolidate and unify disparate documentation
  4. Retire the hdfs3 library in favor of the solution in Apache Arrow.
  5. Continue work on hyper-parameter selection for incrementally trained models
  6. Publish two small bugfix releases
  7. Blogpost from the Pangeo community about combining Binder with Dask
  8. Skein/Yarn Update
1: Update Dask Examples to use JupyterLab extension

The new dask-labextension embeds Dask’s dashboard plots into a JupyterLab session so that you can get easy access to information

(continued...)
Gaël Varoquaux - programming 2018-09-16 22:00:00

A foundation for scikit-learn at Inria

We have just announced that a foundation will be supporting scikit-learn at Inria [1]: scikit-learn.fondation-inria.fr

Growth and sustainability

This is an exciting turn for us, because it enables us to receive private funding. As a result, we will be able to have secure employment for some existing core …

Anaconda 2018-09-14 21:08:09

Key Trends and Takeaways from Strata New York 2018

By Elizabeth Winkler Another Strata conference has come and gone. We had an incredible time meeting with a huge number of Anaconda users who came by our booth to chat! We also noticed some really interesting trends when it comes to the future of data science, machine learning, and AI. The future of ML/AI is containerized. …
Read more →

The post Key Trends and Takeaways from Strata New York 2018 appeared first on Anaconda.

Leonardo Uieda 2018-09-14 12:00:00

Introducing Verde

Verde is a Python library for processing spatial data (bathymetry, geophysics surveys, etc) and interpolating it on regular grids (i.e., gridding).

It implements Green's functions based interpolation methods and other data processing routines. The type of gridding implemented in Verde is essentially fitting various linear models to spatial data and using them to predict new data on regular grids, which is what a lot of machine learning is all about. So Verde's gridder API is inspired on scikit-learn, the state-of-the-art for machine learning in Python. The Green's functions that make up the Jacobian matrix (aka sensitivity or feature matrix) of the linear models generally come from elastic deformation theory. For example, the bi-harmonic spline (Sandwell, 1987) implemented in verde.Spline comes from the deformation of a thin elastic plate.

I submitted a

(continued...)
Pythonic Perambulations 2018-09-13 17:00:00

The Waiting Time Paradox, or, Why Is My Bus Always Late?

Image Source: Wikipedia License CC-BY-SA 3.0

If you, like me, frequently commute via public transit, you may be familiar with the following situation:

You arrive at the bus stop, ready to catch your bus: a line that advertises arrivals every 10 minutes. You glance at your watch and note the time... and when the bus finally comes 11 minutes later, you wonder why you always seem to be so unlucky.

Naïvely, you might expect that if buses are coming every 10 minutes and you arrive at a random time, your average wait would be something like 5 minutes. In reality, though, buses do not arrive exactly on schedule, and so you might wait longer. It turns out that under some reasonable assumptions, you can reach a startling conclusion:

When waiting for a bus that comes on average every 10 minutes, your average waiting time will be 10 minutes.

This is what is sometimes known as the waiting time paradox.

I've encountered this idea before, and always wondered

(continued...)
Anaconda 2018-09-10 16:37:37

Intake: Caching Data on First Read Makes Future Analysis Faster

By Mike McCarty Intake provides easy access data sources from remote/cloud storage. However, for large files, the cost of downloading files every time data is read can be extremely high. To overcome this obstacle, we have developed a “download once, read many times” caching strategy to store and manage data sources on the local file system. …
Read more →

The post Intake: Caching Data on First Read Makes Future Analysis Faster appeared first on Anaconda.

Anaconda 2018-09-10 12:00:28

AI Enablement Platform for Teams at Scale—Accelerate Your AI/ML Productivity with Anaconda Enterprise and Cisco UCS

By Daniel Rodriguez Anaconda Enterprise is a software platform for developing, governing, and automating data science and machine learning pipelines from laptop to production. It is the de-facto standard for data science and machine learning, with over 6 million data scientists using its open source solution locally to develop and score Machine Learning models. Anaconda …
Read more →

The post AI Enablement Platform for Teams at Scale—Accelerate Your AI/ML Productivity with Anaconda Enterprise and Cisco UCS appeared first on Anaconda.

Filipe Saraiva's blog 2018-09-09 15:17:30

Akademy 2018

Look for your favorite KDE contributor at Akademy 2018 Group Photo This year I was in Vienna to attend Akademy 2018, the annual KDE world summit. It was my fourth Akademy after Berlin’2012 (in fact, Desktop Summit ), Brno’2014, and Berlin’2016 (together with QtCon). Interesting, I go to Akademy each 2 years – let’s try... [Read More]
Anaconda 2018-09-07 16:24:20

TensorFlow in Anaconda

By Jonathan Helmus TensorFlow is a Python library for high-performance numerical calculations that allows users to create sophisticated deep learning and machine learning applications. Released as open source software in 2015, TensorFlow has seen tremendous growth and popularity in the data science community. There are a number of methods that can be used to install …
Read more →

The post TensorFlow in Anaconda appeared first on Anaconda.

fa.bianp.net 2018-09-05 22:00:00

Three Operator Splitting

I discuss a recently proposed optimization algorithm: the Davis-Yin three operator splitting.

Dask Release 0.19.0

This work is supported by Anaconda Inc.

I’m pleased to announce the release of Dask version 0.19.0. This is a major release with bug fixes and new features. The last release was 0.18.2 on July 23rd. This blogpost outlines notable changes since the last release blogpost for 0.18.0 on June 14th.

You can conda install Dask:

conda install dask

or pip install from PyPI:

pip install dask[complete] --upgrade

Full changelogs are available here:

Notable Changes

A ton of work has happened over the past two months, but most of the changes are small and diffuse. Stability, feature parity with upstream libraries (like Numpy and Pandas), and performance have all significantly improved, but in ways that are difficult to condense into blogpost form.

That being said, here are a few of the more exciting changes in the new release.

Python Versions

We’ve dropped official support for Python 3.4 and added official support for Python 3.7.

Deploy on Hadoop Clusters

Over the past few months Jim Crist has bulit a suite of

(continued...)
Anaconda 2018-09-04 19:18:41

Python 3.7 Package Build Out & Miniconda Release

By Ray Donnelly & Crystal Soja We are pleased to announce that Python 3.7 packages for all supported platforms and packages of the Anaconda Distribution Repository (repo.anaconda.com) are now available. There are 865 packages built for Linux, 864 packages built for macOS, and 779 packages built for Windows. Python 3.7, released June 27, 2018, represents …
Read more →

The post Python 3.7 Package Build Out & Miniconda Release appeared first on Anaconda.

NumFOCUS 2018-09-04 19:00:10

NumFOCUS Sustainer Weeks

The post NumFOCUS Sustainer Weeks appeared first on NumFOCUS.

Anaconda 2018-09-04 14:33:57

Anaconda Welcomes Maggie Key as SVP of Customer Success

Former VP of Accruent joins executive team to build out and embed customer success program within Anaconda AUSTIN, Texas – September 4, 2018 – Anaconda, Inc., the most popular Python data science platform provider with 2.5 million downloads per month, today announced the addition of Maggie Key to its executive team as SVP of Customer Success. …
Read more →

The post Anaconda Welcomes Maggie Key as SVP of Customer Success appeared first on Anaconda.

Matthieu Brucher's blog 2018-09-04 07:36:41

Book: Building Machine Learning Systems with Python – third edition

A few year ago, Packt Publishing contacted to be a technical reviewer for the first edition of Building Machine Learning Systems with Python, and I was impressed by the writing of Luis Pedro Coelho and Willi Richert. For the second edition, I was again a technical reviewer. Writing is not easy, especially when it’s not […]
Anaconda 2018-08-31 18:10:30

Distributed Auto-ML with TPOT with Dask

By Tom Augspurger This work is supported byAnaconda, Inc. This post describes a recent improvement made to TPOT. TPOT is an automated machine learning library for Python. It does some feature engineering and hyper-parameter optimization for you. TPOT uses genetic algorithms to evaluate which models are performing well and how to choose new models to try out in the next …
Read more →

The post Distributed Auto-ML with TPOT with Dask appeared first on Anaconda.

Planet SciPy – I Love Symposia! 2018-08-30 04:48:05

Summer school announcement: 2nd Advanced Scientific Programming in Python (ASPP) Asia Pacific!

The Advanced Scientific Programming in Python (ASPP) summer school has had 10 successful iterations in Europe and one iteration here in Melbourne earlier this year. Another European iteration is starting next week in Camerino, Italy. Now, thanks to the generous sponsorship of CSIRO, and the efforts of Benjamin Schwessinger and Genevieve Buckley, two alumni from … Continue reading Summer school announcement: 2nd Advanced Scientific Programming in Python (ASPP) Asia Pacific!
Anaconda 2018-08-29 18:20:44

How PNC Financial Services Leveraged Anaconda to Enable Data Science and Machine Learning Capabilities Across the Company

As an AI software company passionate about the real-world practice of data science, machine learning, and predictive analytics, we take great pleasure in hearing about the inspiring and innovative ways our customers use our products to drive their businesses forward and change the worlds around them. Earlier this year, we hosted our second annual AnacondaCON …
Read more →

The post How PNC Financial Services Leveraged Anaconda to Enable Data Science and Machine Learning Capabilities Across the Company appeared first on Anaconda.

Living in an Ivory Basement 2018-08-28 22:00:00

Abstract for SIAM: Supporting and Sustaining Open Source Software Development: the Commons Perspective

How do we support and sustain open source software development?

Matthieu Brucher's blog 2018-08-28 07:31:27

Analog modelling: The Moog ladder filter emulation in Python

After my previous post on SPICE modelling in Python, I need to use a good support example to go up to on the fly compilation in C++. This schema will also require some changes to support more than simple nodal analysis, so this now becomes Modified Nodal Analysis with state equations. The simple model I […]

High level performance of Pandas, Dask, Spark, and Arrow

This work is supported by Anaconda Inc

Question

How does Dask dataframe performance compare to Pandas? Also, what about Spark dataframes and what about Arrow? How do they compare?

I get this question every few weeks. This post is to avoid repetition.

Caveats
  1. This answer is likely to change over time. I’m writing this in August 2018
  2. This question and answer are very high level. More technical answers are possible, but not contained here.
Answers Pandas

If you’re coming from Python and have smallish datasets then Pandas is the right choice. It’s usable, widely understood, efficient, and well maintained.

Benefits of Parallelism

The performance benefit (or drawback) of using a parallel dataframe like Dask dataframes or Spark dataframes over Pandas will differ based on the kinds of computations you do:

  1. If you’re doing small computations then Pandas is always the right choice. The administrative costs of parallelizing will outweigh any benefit. You should not parallelize if your computations are taking less

(continued...)
.pyMadeThis 2018-08-27 16:00:00

Displaying images on OLED screens — Using 1-bpp images in MicroPython

We've previously covered the basics of driving OLED I2C displays from MicroPython, including simple graphics commands and text. Here we look at displaying monochrome 1 bit-per-pixel images and animations using MicroPython on a Wemos D1.

Processing the images and correct choice of image-formats is important to get the most ...

.pyMadeThis 2018-08-26 12:00:00

Dictionaries — An almost complete guide to Python's key:value store

Dictionaries are key-value stores, meaning they store, and allow retrieval of data (or values) through a unique key. This is analogous with a real dictionary where you look up definitions (data) using a given key — the word. Unlike a language dictionary however, keys in Python dictionaries are not alphabetically sorted ...

.pyMadeThis 2018-08-25 08:00:00

Driving I2C OLED displays with MicroPython — I2C monochrome displays with SSD1306

These mini monochrome OLED screens make great displays for projects — perfect for data readout, simple UIs or monochrome games.

Requirements
Wemos D1 v2.2+ or good imitations. Buy
0.91in OLED Screen 128x32 pixels, I2c interface. Buy
Breadboard Any size will do. Buy
Wires Loose ends, or jumper leads.
Setting ...
.pyMadeThis 2018-08-23 19:00:00

Raindar — Desktop daily weather, forecast app in PyQt

The Raindar UI was created using Qt Designer, and saved as .ui file, which is available for download. This was converted to an importable Python file using pyuic5.

API key

Before running the application you need to obtain a API key from OpenWeatherMap.org. This key is unique to you ...

Public Institutions and Open Source Software

As general purpose open source software displaces domain-specific all-in-one solutions, many institutions are re-assessing how they build and maintain software to support their users. This is true across for-profit enterprises, government agencies, universities, and home-grown communities.

While this shift brings opportunities for growth and efficiency, it also raises questions and challenges about how these institutions should best serve their communities as they grow increasingly dependent on software developed and controlled outside of their organization.

  • How do they ensure that this software will persist for many years?
  • How do they influence this software to better serve the needs of their users?
  • How do they transition users from previous all-in-one solutions to a new open source platform?
  • How do they continue to employ their existing employees who have historically maintained software in this field?
  • If they have a mandate to support this field, what is the best role for them to play, and how can they justify their efforts to the groups that control their budget?

This blogpost

(continued...)
Anaconda 2018-08-20 13:31:43

Anaconda Funded by Citi Ventures

Scott Collison, CEO Today, we’re incredibly happy to announce funding from Citi Ventures and welcome them as a new investor and partner. Following its initial investment in Anaconda and led by a belief in our products and the success we’ve had, Citi also became an Anaconda customer to take advantage of our leading platform for …
Read more →

The post Anaconda Funded by Citi Ventures appeared first on Anaconda.

Cloud Lock-in and Open Standards

This post is from conversations with Peter Wang, Yuvi Panda, and several others. Yuvi expresses his own views on this topic on his blog.

Summary

When moving to the cloud we should be mindful to avoid vendor lock-in by adopting open standards.

Adoption of cloud computing

Cloud computing is taking over both for-profit enterprises and public/scientific institutions. The Cloud is cheap, flexible, requires little up-front investment, and enables greater collaboration. Cloud vendors like Amazon Web Services (AWS), Google Cloud Platform (GCP), and Microsoft Azure compete to create stable, easy to use platforms to serve the needs of a variety of institutions, both big and small. This presents both a great opportunity for society, but also a risk of future lock-in at a large scale.

Cloud vendors build services to lock in users

Some of the competition between cloud vendors is about providing lower costs, higher availability, improved scaling, and so on, that are strictly a benefit for consumers. This is great.

However some of the competition is in the form of

(continued...)
Living in an Ivory Basement 2018-08-17 22:00:00

Can bits be the basis for a digital commons? (No.)

Bits cannot be the basis for a digital commons, because they are not rivalrous.

Anaconda 2018-08-15 19:15:21

Introducing Skein: Deploy Python on Apache YARN the Easy Way

By Jim Crist *This post is reprinted with permission from Jim Crist’s blog. The original post can be found here.In this post, I introduce Skein, a new tool and library for deploying applications on Apache YARN. I provide background on why this work was necessary, and demonstrate deploying a simple Python application on a YARN cluster. Introduction …
Read more →

The post Introducing Skein: Deploy Python on Apache YARN the Easy Way appeared first on Anaconda.

Spyder Blog 2018-08-14 00:00:00

Spyder 3.3.0 and 3.3.1 released!

We're pleased to release the next significant update in the stable Spyder 3 line, 3.3.0, along with its follow-on bugfix point release, 3.3.1, which is now live on PyPI and conda. As always, you can update with conda update spyder in the Anaconda Prompt/Terminal/command line (on Windows/macOS/Linux, respectively) if on Anaconda (recommended), or pip update spyder otherwise. If you run into any trouble, please carefully read our new installation documentation and consult our Troubleshooting Guide, which contains straightforward solutions to the vast majority of install-related issues users have reported.

As a new minor version (3.3), it makes several substantial changes to Spyder's underpinnings that deserve some explanation, particularly the newly modular and portable console system that's now separated into its own spyder-kernels package, opening up several new options for users running Spyder in different environments. There's also a brand-new error reporting process, new options in the IPython console, usability and performance improvements for the Variable Explorer, multiple new and changed dependency requirements

(continued...)
While My MCMC Gently Samples 2018-08-13 14:00:00

Hierarchical Bayesian Neural Networks with Informative Priors

(c) 2018 by Thomas Wiecki

Imagine you have a machine learning (ML) problem but only small data (gasp, yes, this does exist). This often happens when your data set is nested -- you might have many data points, but only few per category. For example, in ad-tech you may want predict …

Spyder Blog 2018-08-13 00:00:00

Spyder featured on Episode 1 of Open Source Directions web show

Quansight, the company recently founded by NumPy, SciPy and Anaconda creator Travis Oliphant to help connect companies with open source communities built around data science and machine learning, just released Episode 1 of its live webcast series, and it was all about Spyder! Spyder maintainer Carlos Córdoba, recently hired by Quansight and funded part-time to work on Spyder development as we announced a few weeks ago, was the featured guest on the show.

Carlos first shared his perspective on some of the key moments in Spyder's nearly 10-year development history, from its original creation by Pierre Raybaut and Carlos' initial involvement in the project to its more recent challenges and successes. He also demonstrated basic usage of Spyder, as well as some of its standout features, in a live on-screen demo. Carlos then went on to outline the current roadmap for Spyder 4 in the near future, and explained some of the key new features planned for it. Finally, he took

(continued...)
Neural Ensemble News 2018-08-10 17:48:00

NeuroML2/LEMS is moving into Neural Mass Models and whole brain networks

In the last months, as part of the Google Summer of Code 2018, I have been working on a project that aimed to implement neuronal models which represent averaged population activity on NeuroML2/LEMS. The project was supported by the INCF organisation and my mentor, Padraig Gleeson, and I had 3 months to shape and bring to life all the ideas that we had in our heads. This blog post summarises the core motivation of the project, the technical challenges, what I have done, and future steps.

Background
NeuroML version 2 and LEMS were introduced in order to standardise the description of neuroscience computational models and facilitate the shareability of results among different research groups1. However, so far, NeuroML2/LEMS have focused on modelling spiking neurons and how information is exchanged between them in networks. With the introduction of neural mass models, NeuroML2/LEMS can be extended to study interactions between large-scale systems such as cortical regions and indeed whole brain dynamics. To achieve this,
(continued...)
Living in an Ivory Basement 2018-08-09 22:00:00

"Labor" and "Engaged effort"

Are "effort" and "labor" the same?

NumFOCUS 2018-08-09 15:12:14

Announcing Julia 1.0

The post Announcing Julia 1.0 appeared first on NumFOCUS.

Building SAGA optimization for Dask arrays

This work is supported by ETH Zurich, Anaconda Inc, and the Berkeley Institute for Data Science

At a recent Scikit-learn/Scikit-image/Dask sprint at BIDS, Fabian Pedregosa (a machine learning researcher and Scikit-learn developer) and Matthew Rocklin (Dask core developer) sat down together to develop an implementation of the incremental optimization algorithm SAGA on parallel Dask datasets. The result is a sequential algorithm that can be run on any dask array, and so allows the data to be stored on disk or even distributed among different machines.

It was interesting both to see how the algorithm performed and also to see the ease and challenges to run a research algorithm on a Dask distributed dataset.

Start

We started with an initial implementation that Fabian had written for Numpy arrays using Numba. The following code solves an optimization problem of the form

min_x \sum_{i=1}^n f(a_i^t x, b_i)
import numpy as np
from numba import njit
from sklearn.linear_model.sag import get_auto_step_size
from sklearn.utils.extmath import row_norms

@njit
def deriv_logistic(p, y):
    # derivative of logistic loss
 
(continued...)

Dask Development Log

This work is supported by Anaconda Inc

To increase transparency I’m trying to blog more often about the current work going on around Dask and related projects. Nothing here is ready for production. This blogpost is written in haste, so refined polish should not be expected.

Over the last two weeks we’ve seen activity in the following areas:

  1. An experimental Actor solution for stateful processing
  2. Machine learning experiments with hyper-parameter selection and parameter servers.
  3. Development of more preprocessing transformers
  4. Statistical profiling of the distributed scheduler’s internal event loop thread and internal optimizations
  5. A new release of dask-yarn
  6. A new narrative on dask-stories about modelling mobile networks
  7. Support for LSF clusters in dask-jobqueue
  8. Test suite cleanup for intermittent failures
Stateful processing with Actors

Some advanced workloads want to directly manage and mutate state on workers. A task-based framework like Dask can be forced into this kind of workload using long-running-tasks, but it’s an uncomfortable experience. To address this we’ve been

(continued...)
Gaël Varoquaux - programming 2018-07-31 22:00:00

Sprint on scikit-learn, in Paris and Austin

Two weeks ago, we held a scikit-learn sprint in Austin and Paris. Here is a brief report, on progresses and challenges.

Several sprints

We actually held two sprint in Austin: one open sprint, at the scipy conference sprints, which was open to new contributors, and one core sprint, for more …

Leonardo Uieda 2018-07-26 12:00:00

Websites for Earth Scientists on the academic job hunt

This is a list of the websites I use to search for academic jobs in the Earth Sciences (geophysics, geology, oceanography, meteorology, etc). They've been very useful to me (I found my current position through the CIG mailing list) and I hope that this post can help others who are looking to take the next step in their academic careers.

These sites list everything from Masters and PhD scholarships to postdoc positions and tenure-track professorships. Note that they are biased toward the US, Canada, Oceania, and Europe.

Mailing lists

Sign up for these and get email updates when new opportunities are posted (most are updated daily):

  • ES_JOBS_NET: I get around 10 emails from this list a day. Lately, I'm seeing a lot of
(continued...)

Pickle isn't slow, it's a protocol

This work is supported by Anaconda Inc

tl;dr: Pickle isn’t slow, it’s a protocol. Protocols are important for ecosystems.

A recent Dask issue showed that using Dask with PyTorch was slow because sending PyTorch models between Dask workers took a long time (Dask GitHub issue).

This turned out to be because serializing PyTorch models with pickle was very slow (1 MB/s for GPU based models, 50 MB/s for CPU based models). There is no architectural reason why this needs to be this slow. Every part of the hardware pipeline is much faster than this.

We could have fixed this in Dask by special-casing PyTorch models (Dask has it’s own optional serialization system for performance), but being good ecosystem citizens, we decided to raise the performance problem in an issue upstream (PyTorch Github issue). This resulted in a five-line-fix to PyTorch that turned a 1-50 MB/s serialization bandwidth into a 1 GB/s bandwidth, which is more than fast enough for many use cases (PR to PyTorch).

    
(continued...)
Spyder Blog 2018-07-23 00:00:00

State of the Spyder, Part 2: Looking up

After sharing some major milestones, development progress, and other tidbits from the past six months in Part 1 of this series (check that one out first if you haven't already), we now have some amazing news to share with you all here in Part 2, along with other status updates. That's not all, though—Part 3 will look ahead toward Spyder 4 and beyond, unveiling and explaining our full roadmap and going over the future possibilities even further afield.

Spyder Wins NumFOCUS Development Grant

First up, we're thrilled to announce a major part of what's making that plan possible (along with your support, of course!). This May, Spyder was awarded a $3000 development grant from NumFOCUS, an organization promoting better science through open code, to help with finishing Spyder 4! NumFOCUS is a nonprofit dedicated to supporting key scientific computing projects; promoting sustainability in the open source ecosystem; educating the next generation of scientists, engineers, developers and data analysts through their flagship

(continued...)
Leonardo Uieda 2018-07-20 12:00:00

Introducing Pooch

A friend to fetch your sample data files.

Pooch is a Python package that manages downloading data files over HTTP and storing them in a local directory. It is meant to be used by other Python libraries that ship sample data files for use in documentation, workshops, demos, etc.

For example, your package could define a datasets.py module that has functions to load sample data (like scikit-learn does). If you want the data to live on the web (like in the Github repo) instead of shipping it with your package, Pooch can keep track of it and download it to the user's computer only when it's needed.

This is what a datasets.py module would look like using Pooch:

"""
Module mypackage/datasets.py
"""
import pooch

# Get the version string from your project. You have one of these, right?
from
(continued...)

Dask Development Log, Scipy 2018

This work is supported by Anaconda Inc

To increase transparency I’m trying to blog more often about the current work going on around Dask and related projects. Nothing here is ready for production. This blogpost is written in haste, so refined polish should not be expected.

Last week many Dask developers gathered for the annual SciPy 2018 conference. As a result, very little work was completed, but many projects were started or discussed. To reflect this change in activity this blogpost will highlight possible changes and opportunities for readers to further engage in development.

Dask on HPC Machines

The dask-jobqueue project was a hit at the conference. Dask-jobqueue helps people launch Dask on traditional job schedulers like PBS, SGE, SLURM, Torque, LSF, and others that are commonly found on high performance computers. These are very common among scientific, research, and high performance machine learning groups but commonly a bit hard to use with anything other than MPI.

This project came up in the Pangeo talk, lightning talks, and the Dask

(continued...)

Who uses Dask?

This work is supported by Anaconda Inc

People often ask general questions like “Who uses Dask?” or more specific questions like the following:

  1. For what applications do people use Dask dataframe?
  2. How many machines do people often use with Dask?
  3. How far does Dask scale?
  4. Does dask get used on imaging data?
  5. Does anyone use Dask with Kubernetes/Yarn/SGE/Mesos/… ?
  6. Does anyone in the insurance industry use Dask?

This yields interesting and productive conversations where new users can dive into historical use cases which informs their choices if and how they use the project in the future.

New users can learn a lot from existing users.

To further enable this conversation we’ve made a new tiny project, dask-stories. This is a small documentation page where people can submit how they use Dask and have that published for others to see.

To seed this site six generous users have written down how their group uses Dask. You can read about them here:

(continued...)
Planet SciPy – I Love Symposia! 2018-07-12 18:58:35

The road to scikit-image 1.0

This is the first in a series of posts about the joint scikit-image, scikit-learn, and dask sprint that took place at the Berkeley Insitute of Data Science, May 28-Jun 1, 2018. In addition to the dask and scikit-learn teams, the sprint brought together three core developers of scikit-image (Emmanuelle Gouillart, Stéfan van der Walt, and … Continue reading The road to scikit-image 1.0
Living in an Ivory Basement 2018-07-08 22:00:00

The Open Source Anti-Sisyphean League

We need an Open Source Anti-Sisyphean League!

python – Dr. Randal S. Olson 2018-07-04 20:41:32

Does batting order matter in Major League Baseball? A simulation approach

If you’ve ever watched Major League Baseball, one of the feature points of the sport is the batting line-up that each team decides upon before each game. Traditional baseball logic tells us that speedy, reliable hitters like Trea Turner should
Living in an Ivory Basement 2018-07-01 22:00:00

A framework for thinking about Open Source Sustainability?

Can we apply Common Pool Resource work to open online projects?

Living in an Ivory Basement 2018-06-25 22:00:00

How open is too open?

How open is too open?

Planet SciPy – I Love Symposia! 2018-06-20 11:19:46

What do scientists know about open source?

A friend recently pointed out this great talk by Matt Bernius, What students know and don’t know about open source. If you have even a minor interest in open source it’s worth a watch, but the gist is: in the US alone, there are about 200,000 students enrolled in a computer science major. Open source … Continue reading What do scientists know about open source?
Filipe Saraiva's blog 2018-06-17 13:19:41

De quando falei sobre ficção científica com minha psicóloga

— Então Amanda, sabe, as vezes penso que uma das coisas que me faz assim foram essas quantidades de ficção científica que li na infância e na adolescência… mas não qualquer ficção, digo apenas daquelas sobre viagens no tempo e realidades alternativas. Meu filme preferido é De Volta Para o Futuro, as histórias que mais... [Read More]
Paul Ivanov’s Journal 2018-06-12 07:00:00

Get in it

Two weeks ago, Project Jupyter had our only planned team meeting for 2018. There was too much stuff going on for me to write a poem during the event as I had in previous years (2016, and 2017), so I ended up reading one of the pieces I wrote during my evening introvert breaks in Cleveland at PyCon a few weeks earlier.

Once again, Fernando and Matthias had their gadgets ready to record (thank you both!). The video below was taken by Fernando.

Get in it
Time suspended
Gellatinous reality - the haze
submerged in murky drops summed
in swamp pond of life

believe and strive, expand the mind
A state sublime, when in your prime you came to
me and we were free to flow and fling our
cares, our dreams, our in-betweens, our
rêves perdues, our residue -- the lime of light
the black of sight -- all these converge and
merge the forks of friction filled with fright
and more -- the float of logs that plunges deep
beyond the fray, beyond the
(continued...)
.pyMadeThis 2018-06-11 06:00:00

7Pez — Desktop unzip application with custom window decoration

This is a functionally terrible unzip application, saved only by the fact that you get to look at a cat while using it.

The original idea reflected in the name 7Pez was actually worse — to rig it up so you had to push on the head to unzip each file ...

Living in an Ivory Basement 2018-06-09 22:00:00

How long does it take to produce scientific software?

How long does it take to produce scientific software?

Filipe Saraiva's blog 2018-06-03 22:04:47

A greve pelo ponto biométrico

Poucos dias atrás, Belém saiu de uma greve dos rodoviários que colocou a cidade de joelhos. Por 5 dias Belém ficou sem qualquer ônibus, com o sindicato descumprindo a determinação ditada pela justiça do trabalho de 80% da frota na rua. Encarando pesadas multas por conta disso mas ainda assim firmes, essa situação demonstrou como... [Read More]
.pyMadeThis 2018-06-03 16:30:00

Failamp — Multimedia playlist & player in Python, using PyQt

Failamp is a simple audio & video mediaplayer implemented in Python, using the built-in Qt playlist and media handling features. It is modelled, very loosely on the original Winamp, although nowhere near as complete (hence the fail).

The main window

The main window UI was built using Qt Designer. The screenshot ...

.pyMadeThis 2018-05-31 06:00:00

Creating a window with PyQt5 — The first step in creating your GUI application

The first step in creating desktop applications with PyQt is getting a window to show up on your desktop. Thankfully, with PyQt that is pretty simple.

Below are a few short examples to creating PyQt apps and getting a window on the screen. If this works you know you have ...

Living in an Ivory Basement 2018-05-30 22:00:00

Communicating outside of big consortia is tough! (but important!)

It's hard enough to keep people inside informed...

Living in an Ivory Basement 2018-05-28 22:00:00

Open-source style community engagement for the Data Commons Pilot Phase Consortium

Keeping the Data Commons community coordinated and engaged

.pyMadeThis 2018-05-27 19:00:00

QtWebEngineWidgets, the new browser API in PyQt 5.6 — Simplified page model and asynchronous methods

With the release of Qt 5.5 the Qt WebKit API was deprecated and replaced with the new QtWebEngine API, based on Chromium. The WebKit API was subsequently removed from Qt entirely with the release of Qt 5.6 in mid-2016.

The change to use Chromium for web widgets within ...

.pyMadeThis 2018-05-25 19:00:00

Brown Note — Desktop notes app using SQLAlchemy & PyQt

Relieve your creative blockages with these interactive desktop reminders.

Brown Note is a desktop notes application written in Python, using PyQt. The notes are implemented as decoration-less windows, which can be dragged around the desktop and edited. Details in the notes, and their position on the desktop, is stored in ...

Python – Meta Rabbit 2018-05-25 11:58:38

Quick followups: NGLess benchmark & Notebooks as papers

A quick follow-up on two earlier posts: We finalized the benchmark for ngless that I had discussed earlier: As you can see, NGLess performs much better than either MOCAT or htseq-count. We tried to use featureCounts too, but that completely failed to produce results for some of the samples (we gave it a whopping 1TB … Continue reading Quick followups: NGLess benchmark & Notebooks as papers
.pyMadeThis 2018-05-14 06:00:00

Lucky Cat Spinning-arm Display — Python-powered Maneki-neko persistence of vision scroller

This build started as something simple: a lucky cat which would turn on and off automatically in response to some event. Since lucky cats are associated with good fortune the idea was to make one do this every time I got paid. This was working pretty well but unfortunately, after ...

Matthieu Brucher's blog 2018-05-08 07:52:42

Address Sanitizer: alternative to valgrind

Recently, at work, I encountered a strange bug with GCC 7.2 and clang 6 (I didn’t test it with Visual Studio 2017 for different reasons). The bug was not visible on “old” compilers like gcc 4, Visual Studio 2013 or even Intel Compiler 2017. In debug mode, everything was fine, but in release mode, the […]
Spyder Blog 2018-05-06 00:00:00

State of the Spyder, Part 1: Looking back

As we approach some major development milestones, now is as good a time as ever to share with you some perspective on where we've been, what's happening now, and where we're going in the world of Spyder. In this post, part one of a three part series, we'll take a look back over the past six months at some of the key events, accomplishments and challenges for Spyder and its community, and how that all leads up to where we are now.

Stay tuned right here, since part two will share several exciting announcements that affect the project (in a good way, we promise!) and its immediate future. Even better, part three will formally announce the next Spyder 3 release and—what I'm sure you are all looking forward to—the plan for the first official Spyder 4 beta, plus our schedule and feature roadmap for Spyder 4 and beyond!

A Call Answered

Starting off, as we announced back in mid-November, our funding from Anaconda, Inc was

(continued...)
While My MCMC Gently Samples 2018-05-03 14:00:00

An intuitive, visual guide to copulas

(c) 2018 by Thomas Wiecki

People seemed to enjoy my intuitive and visual explanation of Markov chain Monte Carlo so I thought it would be fun to do another one, this time focused on copulas.

If you ask a statistician what a copula is they might say "a copula is …

Matthieu Brucher's blog 2018-05-01 07:00:34

Analog modelling: A prototype generic modeller in Python

A few month ago, mystran published on KVR a small SPICE simulator for real-time processing. I liked the idea, the drawback being that the code is generic and not tailored like a static version of the optimizer. So I wondered if it was doable. But for this, I have to start from the basics and […]
Matthieu Brucher's blog 2018-04-24 07:24:55

Announcement: ATKSideChainCompressor 3.0.0

I’m happy to announce the update of ATK Side-Chain Compressor based on the Audio Toolkit and JUCE. It is available on Windows (AVX compatible processors) and OS X (min. 10.9, SSE4.2) in different formats. This update changes storage format and allows linked channels to be steered by a mix of power coming from each channel, […]
Matthieu Brucher's blog 2018-04-17 07:07:48

Book review: C++17 Quick Syntax Reference: A Pocket Guide to the Language, APIs and Library

I work on a day-to-day basis on a big project that has many developers with different C++ level. Scott Meyers wrote a wonderful book on modern C++ (that I still need to review one day, especially since there is a new Effective Modern C++), but it is not for beginners. So I’m looking for that […]
python – Dr. Randal S. Olson 2018-04-12 00:08:59

Traveling salesman portrait in Python

Last week, Antonio S. Chinchón made an interesting post showing how to create a traveling salesman portrait in R. Essentially, the idea is to sample a bunch of dark pixels in an image, solve the well-known traveling salesman problem for
Boom! Michael Droettboom's blog 2018-04-11 04:00:00

Profiling WebAssembly

Tips for profiling WebAssembly

Matthieu Brucher's blog 2018-04-10 07:04:08

Book review: LLVM Cookbook

After the book on LLVM core libraries, I want to have a look at the cookbook. Discussion The idea was that once I had a broad view of LLVM, I could try to apply some recipes for what I wanted to do. Let’s just say that I was deeply mistaken. First, the two authors have […]
Boom! Michael Droettboom's blog 2018-04-04 04:00:00

Scientific Python in the Browser

An early report on getting the scientific Python stack compiled to WebAssembly.

Matthieu Brucher's blog 2018-04-03 07:13:38

Book review: Getting Started with LLVM Core Libraries

LLVM has always intrigued me. Actually, I always thought about one day writing a compiler. But it was more a challenge than a requirement for any of my works, private or professional, so never dived into it. The design of LLVM was also very well thought, and probably close to something I would have had […]
Leonardo Uieda 2018-03-25 12:00:00

The future of Fatiando a Terra

I started developing the Fatiando a Terra Python library in 2010. Since then, many other open-source Python libraries for geophysics have appeared, each with unique capabilities. In this post, I'll explore where I think Fatiando fits in this larger ecosystem and how we can better fill our niche.

What is Fatiando a Terra?

Fatiando is a Python library for modeling and inversion in geophysics. It's composed of different subpackages:

  • fatiando.gridder: functions for dealing with spatial data. It's mostly used to generate point scatters or coordinate arrays for regular grids. Both are required as inputs for modeling or creating synthetic datasets.
  • fatiando.mesher: classes that represent geometric objects (polygons, prisms, spheres, etc) and regular meshes. These classes are used to define the geometry and physical properties of our models. They
(continued...)
fa.bianp.net 2018-03-20 23:00:00

Notes on the Frank-Wolfe Algorithm, Part I

This blog post is the first in a series discussing different theoretical and practical aspects of the Frank-Wolfe algorithm.

MathJax.Hub.Config({ tex2jax: { inlineMath: [ ['$','$'], ["\\(","\\)"] ], displayMath: [ ['$$','$$'], ["\\[","\\]"] ], processEscapes: true }, TeX: { equationNumbers: { autoNumber: "AMS" }, }, }); hljs.initHighlightingOnLoad();
$$ \def\xx{\boldsymbol x} \def\yy{\boldsymbol y} \def\ss{\boldsymbol s} \def\dd{\boldsymbol d} \DeclareMathOperator …
Matthieu Brucher's blog 2018-03-20 08:19:41

Writing custom checks for clang-tidy

I started taking a heavier interest in clang-tidy a few months ago, as I was looking at static analyzers. I found at the time that it was quite complicated to work on clang internal AST. It is a wonderful tool, but it is also a very complex one. Thankfully, the cfe-dev mailing list is full […]
Leonardo Uieda 2018-03-15 12:00:00

A template for reproducible papers

At the PINGA lab, we have been experimenting with ways to increase the reproducibility of our research by publishing the git repositories that accompany our papers. You can find them on our Github organzation. I've synthesized the experience of the last 4 years into a template in the pinga-lab/paper-template repository.

The template reflects the tools we've been using and the type of research that we do:

  • Most papers are proposing a new methodology rather than the analysis of a dataset.
  • There is always an application to a dataset to show the method works. We can't always publish the data but we include it in the repository whenever we can.
  • All papers include an implementation of the proposed method.
  • Our code is usually written in Python and executed in Jupyter notebooks.
  • The focus
(continued...)
Filipe Saraiva's blog 2018-03-11 16:04:48

Procurando recomendações de “distros KDE”

Sou usuário e empacotador do Mageia desde o lançamento do fork, e não me levem a mal, para mim continua sendo uma distribuição de excelente qualidade para o seu propósito: comunitária, aberta para as mais diferentes contribuições e com ênfase na estabilidade. Mageia é das poucas distros com suporte há mais de 8 ambientes desktop... [Read More]
Leonardo Uieda 2018-03-09 12:00:00

Podcasts in my playlist (2018 edition)

Last year, I posted my podcast playlist in response to a similar post by John Leeman (of Don't Panic Geocast fame). In a recent episode (maybe episode 158), John asked listeners for an updated list of recommendations. Here are mine.

I'll start with the new additions since last year, then the ones that stayed with me throughout 2017, and finally the ones that I'm looking to get started this year.

New additions:

  • Gastropod: A podcast that "looks at food through the lens of science and history". In each episode, the hosts dive deep into the science behind a type of food/process/ingredient and how it became what it is today. One of my favorite episodes is about koji, the fungus behind sake,
(continued...)
Technical Discovery 2018-03-06 06:37:00

Reflections on Anaconda as I start a new chapter with Quansight


Leaving the company you founded is always a tough decision and a tough process that involves many people. It requires a series of potentially emotional "crucial-conversations."  It is actually not that uncommon in venture-backed companies for one or more of the original founders to leave at some point.  There is a decent article on the topic here:  https://hbswk.hbs.edu/item/the-founding-ceos-dilemma-stay-or-go.



Still it is extremely difficult to let go. You live and breathe the company you start.  Years of working to connect as many people as possible to the dream gives you a feeling of "ownership" and connection that no stock certificate can replace. Starting a company is a lot of work.  It takes a lot of effort. There are many decisions to make and many voices to incorporate. Hiring, firing, raising money, engaging customers, engaging employees, planning projects, organizing events, and aligning a pastiche of personalities while staying relevant in a rapidly evolving technology jungle is difficult.

As a founder over 40
(continued...)
Filipe Saraiva's blog 2018-03-03 16:46:31

Quantas metaheurísticas cabem em um fusca?

Certos problemas de otimização são de solução impossível em tempo computacional hábil – enquanto não for possível provar que P = NP, não haverá algoritmo exato que os resolva. Por outro lado, esses problemas são de grande importância pois modelam situações do mundo real enfrentadas por organizações em geral. Então, o que fazer? Uma das... [Read More]
Filipe Saraiva's blog 2018-02-26 13:55:23

Papo Livre sobre KDE

Papo Livre é um podcast que vem movimentando a cena do software livre no país. Tocado pelos amigos Antonio Terceiro, Paulo Santana e Thiago Mendonça, o projeto já tem quase 1 ano e trouxe para os ouvintes muita informação e entrevistas com brasileiros criadores ou participantes dos mais diferentes projetos de software livre. Semanas atrás... [Read More]
Spyder Blog 2018-02-23 00:00:00

Introducing the unittest plugin

Automatic testing can increase the quality of your code. This is especially true of dynamic languages like Python, where a typo may only be noticed when that particular code path is executed. The new Spyder unittest plugin lets you run tests and view the results, all within the IDE. Here, I'll demonstrate what it can do by way of a real-world example.

There are numerous unit testing frameworks available for Python, of which the plugin supports several of the most prominent. However, I'm using my favorite here, pytest. I prefer to write the tests in a separate file from the code, so that's what I'll do here.

Installing the plugin

If you use the Anaconda distribution (as we recommend), then you can install the Spyder unittest plugin with the command

conda install -c spyder-ide spyder-unittest

This will also grab all its mandatory dependencies (including Spyder itself if necessary). The -c option instructs conda to use the custom channel spyder-ide run by the Spyder

(continued...)
Python – Meta Rabbit 2018-02-05 10:05:03

Python’s Weak Performance Matters

Here is an argument I used to make, but now disagree with: Just to add another perspective, I find many “performance” problems in the real world can often be attributed to factors other than the raw speed of the CPython interpreter. Yes, I’d love it if the interpreter were faster, but in my experience a … Continue reading Python’s Weak Performance Matters
Prabhu Ramachandran 2018-01-31 10:21:00

VTK-8.1.0 wheels for all platforms on pypi!


I cannot believe it has been 6 years since my last blog post!  Anyway, I have some good news to announce here.

In the Python community, VTK has always been somewhat difficult to install (in comparison to pure Python packages). One has required to either use a specific package management tool or resort to source builds. This has been a major problem when trying to install tools that rely on VTK, like Mayavi.

During the SciPy 2017 conference held at Austin last year, a few of the Kitware developers, notably Jean-Christophe Fillion-Robin  (JC for short) and some of the VTK developers got together with some of us from the SciPy community and decided to try and put together wheels for VTK.

JC did the hard work of figuring this out and setting up a nice VTKPythonPackage during the sprints to make this process easy. As of last week (Jan 27, 2018) Mac OS X wheels were not supported. Last weekend,
(continued...)
Geology and Python 2018-01-24 19:00:00

Fast and Reliable Top of Atmosphere (TOA) calculations of Landsat-8 data in Python

How to efficiently extract reflectance information from Landsat-8 Level-1 Data Product images.