SciPy

Planet SciPy

Matthieu Brucher's blog 2018-10-16 07:09:00

Audio ToolKit: Moving to C++17

Audio ToolKit started with only C++11 a long time ago, and now with version 3.1, it’s going to be full C++17. Let’s start with the problem. In Audio ToolKit, I’m using a set of meta programming functions to enable automatic conversions between types. This enables the user to connect a float input to a double […]

So you want to contribute to open source

Welcome new open source contributor!

I appreciated receiving the e-mail where you said you were excited about getting into open source and were particularly interested in working on a project that I maintain. This post has a few thoughts on the topic.

First, please forgive me for sending you to this post rather than responding with a personal e-mail. Your situation is common today, so I thought I’d write up thoughts in a public place, rather than respond personally.

This post has two parts:

  1. Some pragmatic steps on how to get started
  2. A personal recommendation to think twice about where you focus your time
Look for good first issues on Github

Most open source software (OSS) projects have a “Good first issue” label on their Github issue tracker. Here is a screenshot of how to find the “good first issue” label on the Pandas project:

(note that this may be named something else like “Easy to fix”)

This contains a list of issues that are important, but also

(continued...)
Anaconda 2018-10-10 15:00:44

Anaconda Enterprise 5.2.2: Now With Apache Zeppelin and GPU improvements

Anaconda Enterprise 5.2 introduced exciting features such as GPU-acceleration, scalable machine learning, and cloud-native model management in July. Today we’re releasing Anaconda Enterprise 5.2.2 with a number of enhancements in IDEs (Integrated Development Environments), GPU resource management, source code control, and (of course) bug fixes.One of the biggest new benefits is the addition of Apache Zeppelin …
Read more →

The post Anaconda Enterprise 5.2.2: Now With Apache Zeppelin and GPU improvements appeared first on Anaconda.

Anaconda 2018-10-10 10:00:57

Bringing Dataframe Acceleration to the GPU with RAPIDS Open-Source Software from NVIDIA

Today we are excited to talk about the RAPIDS GPU dataframe release along with our partners in this effort: NVIDIA, BlazingDB, and Quansight. RAPIDS is the culmination of 18 months of open source development to address a common need in data science: fast, scalable processing of tabular data for extract-transform-load (ETL) operations. ETL tasks typically …
Read more →

The post Bringing Dataframe Acceleration to the GPU with RAPIDS Open-Source Software from NVIDIA appeared first on Anaconda.

Anaconda 2018-10-05 15:32:34

Intake: Parsing Data from Filenames and Paths

By Julia Signell Motivation Do you have data in collections of files, where information is encoded both in the contents and the file/directory names? Perhaps something like '{year}/{month}/{day}/{site}/measurement.csv'? This is a very common problem for which people build custom code all the time. Intake provides a systematic way to declare that information in a concise spec. …
Read more →

The post Intake: Parsing Data from Filenames and Paths appeared first on Anaconda.

Anaconda 2018-10-04 14:30:59

Preparing Your Organization for Implementing an AI Platform

By Victor Ghadban You already know that implementing an enterprise-ready AI enablement platform is key to executing your organization’s AI and machine learning initiatives. But can software so complex really be easy to implement? How can you avoid disruptions to your business? What can you do to prepare? After spending the last 20 years working …
Read more →

The post Preparing Your Organization for Implementing an AI Platform appeared first on Anaconda.

Anaconda 2018-10-02 20:31:09

Anaconda Distribution 5.3.0 Released

We’re excited to announce the release of Anaconda Distribution 5.3.0! Anaconda Distribution is the world’s most popular and easiest way to learn and perform data science and machine learning. Here’s a rundown of new features. In addition to our Python 2.7 Anaconda installers, as well as Python 3.6 Anaconda metapackages, Anaconda Distribution 5.3 is compiled …
Read more →

The post Anaconda Distribution 5.3.0 Released appeared first on Anaconda.

.pyMadeThis 2018-09-30 06:00:00

Dictionary Views & Set Operations — Working with dictionary view objects

The keys, values and items from a dictionary can be accessed using the .keys(), .values() and .items() methods. These methods return view objects which provide a view on the source dictionary.

The view objects dict_keys and dict_items support set-like operations (the latter only when all values are hashable) which ...

.pyMadeThis 2018-09-30 06:00:00

Dictionary Views & Set Operations — Working with dictionary view objects

The keys, values and items from a dictionary can be accessed using the .keys(), .values() and .items() methods. These methods return view objects which provide a view on the source dictionary.

The view objects dict_keys and dict_items support set-like operations (the latter only when all values are hashable) which ...

Filipe Saraiva's blog 2018-09-28 04:53:50

Ciro em frente!

Faltando poucos dias para o 1º turno das eleições, aproveito o momento para declarar meu voto em Ciro Gomes e convido amigos e amigas a ponderarem e também votarem no candidato. Em um conceito bastante generoso de partidos políticos, tratam-se de organizações estruturadas em torno de uma ideia de ordenamento social e que tentam, através... [Read More]
Matthieu Brucher's blog 2018-09-25 07:19:33

Announcement: Audio TK 3.0.0

ATK is updated to 3.0.0 with a major ABI break and code quality improvement (see here). Bugs in different areas were fixed. Development for additional modules was also simplified (the modelling lite is such a project based on Audio Toolkit). Download link: ATK 3.0.0 Changelog: 3.0.0 * Change size for gsl::index everywhere (change of ABI) […]
.pyMadeThis 2018-09-23 07:00:00

3D wireframe cube with MicroPython — Basic 3D model rotation and projection

An ESP2866 is never going to compete with an actual graphics card. It certainly won't produce anything approaching modern games. But it still makes a nice platform to explore the basics of 3D graphics. In this short tutorial we'll go through the basics of creating a 3D scene ...

Filipe Saraiva's blog 2018-09-22 21:13:08

Akademy 2018

Procure seu colaborador favorito do KDE na Foto em grupo oficial do Akademy 2018 Estive em Viena para participar do Akademy 2018, o encontro anual do KDE. Este foi o meu quarto Akademy, sendo antecedido por Berlin’2012 (na verdade, Desktop Summit ), Brno’2014, e Berlin’2016 (junto com a QtCon). Interessante, vou ao Akademy a cada... [Read More]
Anaconda 2018-09-21 16:24:24

AI Opportunities for Financial Services Companies

By Michael Grant AI is undeniably a hot topic right now, and financial services companies are not immune to the hype. And in truth, they shouldn’t be: the applications of advanced AI within financial services are numerous, and the potential for cost savings and new value generation is high.At the same time, the financial services …
Read more →

The post AI Opportunities for Financial Services Companies appeared first on Anaconda.

Spyder Blog 2018-09-21 00:00:00

QtConsole 4.4 Released!

We're excited to announce a significant update to QtConsole—the package that powers Spyder's IPython Console interface—which the Spyder team maintains in collaboration with Project Jupyter. Two of the biggest changes—user-selectable syntax highlighting themes, and enhanced external editor/IDE integration—are already built right into Spyder, so they'll likely be of more interest if you use QtConsole standalone or with another editor/IDE. However, most of the other changes should prove quite useful within Spyder as well, and many were in fact suggested and even implemented by users of our IDE. Particular highlights include a block indent/unindent feature, Select-All (Ctrl-Shift-A) being made cell-specific, Ctrl-Backspace and Ctrl-Delete behaving more intelligently across whitespace and line boundaries, Ctrl-D allowing you to easily exit ipdb, input() and the like, and numerous smaller enhancements and bug fixes. If you'd like to learn more about what's new, please check out our article over on the Jupyter blog, where we go over the major changes in more detail, with plenty

(continued...)
Matthieu Brucher's blog 2018-09-20 07:28:44

Book review: Continuous Delivery With Docker And Jenkins

A decade ago, the objective was to have a build farm and do continuous integration (on each commit, build the application and run unit tests). Now, the objective is continuous delivery. This means that the new build is directly put into production. All the major applications are doing this, from Chrome to Spotify. You may […]
Anaconda 2018-09-18 14:34:55

Anaconda and Kx Systems Partner to Deliver kdb+ Database System and Related Machine Learning Libraries

Anaconda, Inc., the most popular Python data science platform provider with 2.5 million downloads per month, is pleased to announce an exciting new partnership with Kx Systems, a provider of fast, efficient, and flexible tools for processing real-time and historical data. As part of our partnership, Anaconda has added the kdb+ database system, and related …
Read more →

The post Anaconda and Kx Systems Partner to Deliver kdb+ Database System and Related Machine Learning Libraries appeared first on Anaconda.

Matthieu Brucher's blog 2018-09-18 07:52:31

Compiling C++ code in memory with clang

I have tried to find the proper receipts to compile on the fly C++ code with clang and LLVM. It’s actually not that easy to achieve if you are not targeting LLVM Intermediate Representation, and unfortunately, the code here, working for LLVM 7, may not work for LLVM 8. Or 6. The pipeline There are […]

Dask Development Log

This work is supported by Anaconda Inc

To increase transparency I’m trying to blog more often about the current work going on around Dask and related projects. Nothing here is ready for production. This blogpost is written in haste, so refined polish should not be expected.

Since the last update in the 0.19.0 release blogpost two weeks ago we’ve seen activity in the following areas:

  1. Update Dask examples to use JupyterLab on Binder
  2. Render Dask examples into static HTML pages for easier viewing
  3. Consolidate and unify disparate documentation
  4. Retire the hdfs3 library in favor of the solution in Apache Arrow.
  5. Continue work on hyper-parameter selection for incrementally trained models
  6. Publish two small bugfix releases
  7. Blogpost from the Pangeo community about combining Binder with Dask
  8. Skein/Yarn Update
1: Update Dask Examples to use JupyterLab extension

The new dask-labextension embeds Dask’s dashboard plots into a JupyterLab session so that you can get easy access to information

(continued...)
Gaël Varoquaux - programming 2018-09-16 22:00:00

A foundation for scikit-learn at Inria

We have just announced that a foundation will be supporting scikit-learn at Inria [1]: scikit-learn.fondation-inria.fr

Growth and sustainability

This is an exciting turn for us, because it enables us to receive private funding. As a result, we will be able to have secure employment for some existing core …

Anaconda 2018-09-14 21:08:09

Key Trends and Takeaways from Strata New York 2018

By Elizabeth Winkler Another Strata conference has come and gone. We had an incredible time meeting with a huge number of Anaconda users who came by our booth to chat! We also noticed some really interesting trends when it comes to the future of data science, machine learning, and AI. The future of ML/AI is containerized. …
Read more →

The post Key Trends and Takeaways from Strata New York 2018 appeared first on Anaconda.

Leonardo Uieda 2018-09-14 12:00:00

Introducing Verde

Verde is a Python library for processing spatial data (bathymetry, geophysics surveys, etc) and interpolating it on regular grids (i.e., gridding).

It implements Green's functions based interpolation methods and other data processing routines. The type of gridding implemented in Verde is essentially fitting various linear models to spatial data and using them to predict new data on regular grids, which is what a lot of machine learning is all about. So Verde's gridder API is inspired on scikit-learn, the state-of-the-art for machine learning in Python. The Green's functions that make up the Jacobian matrix (aka sensitivity or feature matrix) of the linear models generally come from elastic deformation theory. For example, the bi-harmonic spline (Sandwell, 1987) implemented in verde.Spline comes from the deformation of a thin elastic plate.

I submitted a

(continued...)
Pythonic Perambulations 2018-09-13 17:00:00

The Waiting Time Paradox, or, Why Is My Bus Always Late?

Image Source: Wikipedia License CC-BY-SA 3.0

If you, like me, frequently commute via public transit, you may be familiar with the following situation:

You arrive at the bus stop, ready to catch your bus: a line that advertises arrivals every 10 minutes. You glance at your watch and note the time... and when the bus finally comes 11 minutes later, you wonder why you always seem to be so unlucky.

Naïvely, you might expect that if buses are coming every 10 minutes and you arrive at a random time, your average wait would be something like 5 minutes. In reality, though, buses do not arrive exactly on schedule, and so you might wait longer. It turns out that under some reasonable assumptions, you can reach a startling conclusion:

When waiting for a bus that comes on average every 10 minutes, your average waiting time will be 10 minutes.

This is what is sometimes known as the waiting time paradox.

I've encountered this idea before, and always wondered

(continued...)
Anaconda 2018-09-10 16:37:37

Intake: Caching Data on First Read Makes Future Analysis Faster

By Mike McCarty Intake provides easy access data sources from remote/cloud storage. However, for large files, the cost of downloading files every time data is read can be extremely high. To overcome this obstacle, we have developed a “download once, read many times” caching strategy to store and manage data sources on the local file system. …
Read more →

The post Intake: Caching Data on First Read Makes Future Analysis Faster appeared first on Anaconda.

Anaconda 2018-09-10 12:00:28

AI Enablement Platform for Teams at Scale—Accelerate Your AI/ML Productivity with Anaconda Enterprise and Cisco UCS

By Daniel Rodriguez Anaconda Enterprise is a software platform for developing, governing, and automating data science and machine learning pipelines from laptop to production. It is the de-facto standard for data science and machine learning, with over 6 million data scientists using its open source solution locally to develop and score Machine Learning models. Anaconda …
Read more →

The post AI Enablement Platform for Teams at Scale—Accelerate Your AI/ML Productivity with Anaconda Enterprise and Cisco UCS appeared first on Anaconda.

Filipe Saraiva's blog 2018-09-09 15:17:30

Akademy 2018

Look for your favorite KDE contributor at Akademy 2018 Group Photo This year I was in Vienna to attend Akademy 2018, the annual KDE world summit. It was my fourth Akademy after Berlin’2012 (in fact, Desktop Summit ), Brno’2014, and Berlin’2016 (together with QtCon). Interesting, I go to Akademy each 2 years – let’s try... [Read More]
fa.bianp.net 2018-09-05 22:00:00

Three Operator Splitting

I discuss a recently proposed optimization algorithm: the Davis-Yin three operator splitting.

Dask Release 0.19.0

This work is supported by Anaconda Inc.

I’m pleased to announce the release of Dask version 0.19.0. This is a major release with bug fixes and new features. The last release was 0.18.2 on July 23rd. This blogpost outlines notable changes since the last release blogpost for 0.18.0 on June 14th.

You can conda install Dask:

conda install dask

or pip install from PyPI:

pip install dask[complete] --upgrade

Full changelogs are available here:

Notable Changes

A ton of work has happened over the past two months, but most of the changes are small and diffuse. Stability, feature parity with upstream libraries (like Numpy and Pandas), and performance have all significantly improved, but in ways that are difficult to condense into blogpost form.

That being said, here are a few of the more exciting changes in the new release.

Python Versions

We’ve dropped official support for Python 3.4 and added official support for Python 3.7.

Deploy on Hadoop Clusters

Over the past few months Jim Crist has bulit a suite of

(continued...)
NumFOCUS 2018-09-04 19:00:10

NumFOCUS Sustainer Weeks

The post NumFOCUS Sustainer Weeks appeared first on NumFOCUS.

Matthieu Brucher's blog 2018-09-04 07:36:41

Book: Building Machine Learning Systems with Python – third edition

A few year ago, Packt Publishing contacted to be a technical reviewer for the first edition of Building Machine Learning Systems with Python, and I was impressed by the writing of Luis Pedro Coelho and Willi Richert. For the second edition, I was again a technical reviewer. Writing is not easy, especially when it’s not […]
Planet SciPy – I Love Symposia! 2018-08-30 04:48:05

Summer school announcement: 2nd Advanced Scientific Programming in Python (ASPP) Asia Pacific!

The Advanced Scientific Programming in Python (ASPP) summer school has had 10 successful iterations in Europe and one iteration here in Melbourne earlier this year. Another European iteration is starting next week in Camerino, Italy. Now, thanks to the generous sponsorship of CSIRO, and the efforts of Benjamin Schwessinger and Genevieve Buckley, two alumni from … Continue reading Summer school announcement: 2nd Advanced Scientific Programming in Python (ASPP) Asia Pacific!
Living in an Ivory Basement 2018-08-28 22:00:00

Abstract for SIAM: Supporting and Sustaining Open Source Software Development: the Commons Perspective

How do we support and sustain open source software development?

Matthieu Brucher's blog 2018-08-28 07:31:27

Analog modelling: The Moog ladder filter emulation in Python

After my previous post on SPICE modelling in Python, I need to use a good support example to go up to on the fly compilation in C++. This schema will also require some changes to support more than simple nodal analysis, so this now becomes Modified Nodal Analysis with state equations. The simple model I […]

High level performance of Pandas, Dask, Spark, and Arrow

This work is supported by Anaconda Inc

Question

How does Dask dataframe performance compare to Pandas? Also, what about Spark dataframes and what about Arrow? How do they compare?

I get this question every few weeks. This post is to avoid repetition.

Caveats
  1. This answer is likely to change over time. I’m writing this in August 2018
  2. This question and answer are very high level. More technical answers are possible, but not contained here.
Answers Pandas

If you’re coming from Python and have smallish datasets then Pandas is the right choice. It’s usable, widely understood, efficient, and well maintained.

Benefits of Parallelism

The performance benefit (or drawback) of using a parallel dataframe like Dask dataframes or Spark dataframes over Pandas will differ based on the kinds of computations you do:

  1. If you’re doing small computations then Pandas is always the right choice. The administrative costs of parallelizing will outweigh any benefit. You should not parallelize if your computations are taking less

(continued...)
.pyMadeThis 2018-08-27 16:00:00

Displaying images on OLED screens — Using 1-bpp images in MicroPython

We've previously covered the basics of driving OLED I2C displays from MicroPython, including simple graphics commands and text. Here we look at displaying monochrome 1 bit-per-pixel images and animations using MicroPython on a Wemos D1.

Processing the images and correct choice of image-formats is important to get the most ...

.pyMadeThis 2018-08-26 12:00:00

Dictionaries — An almost complete guide to Python's key:value store

Dictionaries are key-value stores, meaning they store, and allow retrieval of data (or values) through a unique key. This is analogous with a real dictionary where you look up definitions (data) using a given key — the word. Unlike a language dictionary however, keys in Python dictionaries are not alphabetically sorted ...

.pyMadeThis 2018-08-25 08:00:00

Driving I2C OLED displays with MicroPython — I2C monochrome displays with SSD1306

These mini monochrome OLED screens make great displays for projects — perfect for data readout, simple UIs or monochrome games.

Requirements
Wemos D1 v2.2+ or good imitations. Buy
0.91in OLED Screen 128x32 pixels, I2c interface. Buy
Breadboard Any size will do. Buy
Wires Loose ends, or jumper leads.
Setting ...
.pyMadeThis 2018-08-23 19:00:00

Raindar — Desktop daily weather, forecast app in PyQt

The Raindar UI was created using Qt Designer, and saved as .ui file, which is available for download. This was converted to an importable Python file using pyuic5.

API key

Before running the application you need to obtain a API key from OpenWeatherMap.org. This key is unique to you ...

Public Institutions and Open Source Software

As general purpose open source software displaces domain-specific all-in-one solutions, many institutions are re-assessing how they build and maintain software to support their users. This is true across for-profit enterprises, government agencies, universities, and home-grown communities.

While this shift brings opportunities for growth and efficiency, it also raises questions and challenges about how these institutions should best serve their communities as they grow increasingly dependent on software developed and controlled outside of their organization.

  • How do they ensure that this software will persist for many years?
  • How do they influence this software to better serve the needs of their users?
  • How do they transition users from previous all-in-one solutions to a new open source platform?
  • How do they continue to employ their existing employees who have historically maintained software in this field?
  • If they have a mandate to support this field, what is the best role for them to play, and how can they justify their efforts to the groups that control their budget?

This blogpost

(continued...)

Cloud Lock-in and Open Standards

This post is from conversations with Peter Wang, Yuvi Panda, and several others. Yuvi expresses his own views on this topic on his blog.

Summary

When moving to the cloud we should be mindful to avoid vendor lock-in by adopting open standards.

Adoption of cloud computing

Cloud computing is taking over both for-profit enterprises and public/scientific institutions. The Cloud is cheap, flexible, requires little up-front investment, and enables greater collaboration. Cloud vendors like Amazon Web Services (AWS), Google Cloud Platform (GCP), and Microsoft Azure compete to create stable, easy to use platforms to serve the needs of a variety of institutions, both big and small. This presents both a great opportunity for society, but also a risk of future lock-in at a large scale.

Cloud vendors build services to lock in users

Some of the competition between cloud vendors is about providing lower costs, higher availability, improved scaling, and so on, that are strictly a benefit for consumers. This is great.

However some of the competition is in the form of

(continued...)
Living in an Ivory Basement 2018-08-17 22:00:00

Can bits be the basis for a digital commons? (No.)

Bits cannot be the basis for a digital commons, because they are not rivalrous.

Spyder Blog 2018-08-14 00:00:00

Spyder 3.3.0 and 3.3.1 released!

We're pleased to release the next significant update in the stable Spyder 3 line, 3.3.0, along with its follow-on bugfix point release, 3.3.1, which is now live on PyPI and conda. As always, you can update with conda update spyder in the Anaconda Prompt/Terminal/command line (on Windows/macOS/Linux, respectively) if on Anaconda (recommended), or pip update spyder otherwise. If you run into any trouble, please carefully read our new installation documentation and consult our Troubleshooting Guide, which contains straightforward solutions to the vast majority of install-related issues users have reported.

As a new minor version (3.3), it makes several substantial changes to Spyder's underpinnings that deserve some explanation, particularly the newly modular and portable console system that's now separated into its own spyder-kernels package, opening up several new options for users running Spyder in different environments. There's also a brand-new error reporting process, new options in the IPython console, usability and performance improvements for the Variable Explorer, multiple new and changed dependency requirements

(continued...)
While My MCMC Gently Samples 2018-08-13 14:00:00

Hierarchical Bayesian Neural Networks with Informative Priors

(c) 2018 by Thomas Wiecki

Imagine you have a machine learning (ML) problem but only small data (gasp, yes, this does exist). This often happens when your data set is nested -- you might have many data points, but only few per category. For example, in ad-tech you may want predict …

Spyder Blog 2018-08-13 00:00:00

Spyder featured on Episode 1 of Open Source Directions web show

Quansight, the company recently founded by NumPy, SciPy and Anaconda creator Travis Oliphant to help connect companies with open source communities built around data science and machine learning, just released Episode 1 of its live webcast series, and it was all about Spyder! Spyder maintainer Carlos Córdoba, recently hired by Quansight and funded part-time to work on Spyder development as we announced a few weeks ago, was the featured guest on the show.

Carlos first shared his perspective on some of the key moments in Spyder's nearly 10-year development history, from its original creation by Pierre Raybaut and Carlos' initial involvement in the project to its more recent challenges and successes. He also demonstrated basic usage of Spyder, as well as some of its standout features, in a live on-screen demo. Carlos then went on to outline the current roadmap for Spyder 4 in the near future, and explained some of the key new features planned for it. Finally, he took

(continued...)
Neural Ensemble News 2018-08-10 17:48:00

NeuroML2/LEMS is moving into Neural Mass Models and whole brain networks

In the last months, as part of the Google Summer of Code 2018, I have been working on a project that aimed to implement neuronal models which represent averaged population activity on NeuroML2/LEMS. The project was supported by the INCF organisation and my mentor, Padraig Gleeson, and I had 3 months to shape and bring to life all the ideas that we had in our heads. This blog post summarises the core motivation of the project, the technical challenges, what I have done, and future steps.

Background
NeuroML version 2 and LEMS were introduced in order to standardise the description of neuroscience computational models and facilitate the shareability of results among different research groups1. However, so far, NeuroML2/LEMS have focused on modelling spiking neurons and how information is exchanged between them in networks. With the introduction of neural mass models, NeuroML2/LEMS can be extended to study interactions between large-scale systems such as cortical regions and indeed whole brain dynamics. To achieve this,
(continued...)
Living in an Ivory Basement 2018-08-09 22:00:00

"Labor" and "Engaged effort"

Are "effort" and "labor" the same?

Building SAGA optimization for Dask arrays

This work is supported by ETH Zurich, Anaconda Inc, and the Berkeley Institute for Data Science

At a recent Scikit-learn/Scikit-image/Dask sprint at BIDS, Fabian Pedregosa (a machine learning researcher and Scikit-learn developer) and Matthew Rocklin (Dask core developer) sat down together to develop an implementation of the incremental optimization algorithm SAGA on parallel Dask datasets. The result is a sequential algorithm that can be run on any dask array, and so allows the data to be stored on disk or even distributed among different machines.

It was interesting both to see how the algorithm performed and also to see the ease and challenges to run a research algorithm on a Dask distributed dataset.

Start

We started with an initial implementation that Fabian had written for Numpy arrays using Numba. The following code solves an optimization problem of the form

min_x \sum_{i=1}^n f(a_i^t x, b_i)
import numpy as np
from numba import njit
from sklearn.linear_model.sag import get_auto_step_size
from sklearn.utils.extmath import row_norms

@njit
def deriv_logistic(p, y):
    # derivative of logistic loss
 
(continued...)

Dask Development Log

This work is supported by Anaconda Inc

To increase transparency I’m trying to blog more often about the current work going on around Dask and related projects. Nothing here is ready for production. This blogpost is written in haste, so refined polish should not be expected.

Over the last two weeks we’ve seen activity in the following areas:

  1. An experimental Actor solution for stateful processing
  2. Machine learning experiments with hyper-parameter selection and parameter servers.
  3. Development of more preprocessing transformers
  4. Statistical profiling of the distributed scheduler’s internal event loop thread and internal optimizations
  5. A new release of dask-yarn
  6. A new narrative on dask-stories about modelling mobile networks
  7. Support for LSF clusters in dask-jobqueue
  8. Test suite cleanup for intermittent failures
Stateful processing with Actors

Some advanced workloads want to directly manage and mutate state on workers. A task-based framework like Dask can be forced into this kind of workload using long-running-tasks, but it’s an uncomfortable experience. To address this we’ve been

(continued...)
Gaël Varoquaux - programming 2018-07-31 22:00:00

Sprint on scikit-learn, in Paris and Austin

Two weeks ago, we held a scikit-learn sprint in Austin and Paris. Here is a brief report, on progresses and challenges.

Several sprints

We actually held two sprint in Austin: one open sprint, at the scipy conference sprints, which was open to new contributors, and one core sprint, for more …

Leonardo Uieda 2018-07-26 12:00:00

Websites for Earth Scientists on the academic job hunt

This is a list of the websites I use to search for academic jobs in the Earth Sciences (geophysics, geology, oceanography, meteorology, etc). They've been very useful to me (I found my current position through the CIG mailing list) and I hope that this post can help others who are looking to take the next step in their academic careers.

These sites list everything from Masters and PhD scholarships to postdoc positions and tenure-track professorships. Note that they are biased toward the US, Canada, Oceania, and Europe.

Mailing lists

Sign up for these and get email updates when new opportunities are posted (most are updated daily):

  • ES_JOBS_NET: I get around 10 emails from this list a day. Lately, I'm seeing a lot of
(continued...)
Spyder Blog 2018-07-23 00:00:00

State of the Spyder, Part 2: Looking up

After sharing some major milestones, development progress, and other tidbits from the past six months in Part 1 of this series (check that one out first if you haven't already), we now have some amazing news to share with you all here in Part 2, along with other status updates. That's not all, though—Part 3 will look ahead toward Spyder 4 and beyond, unveiling and explaining our full roadmap and going over the future possibilities even further afield.

Spyder Wins NumFOCUS Development Grant

First up, we're thrilled to announce a major part of what's making that plan possible (along with your support, of course!). This May, Spyder was awarded a $3000 development grant from NumFOCUS, an organization promoting better science through open code, to help with finishing Spyder 4! NumFOCUS is a nonprofit dedicated to supporting key scientific computing projects; promoting sustainability in the open source ecosystem; educating the next generation of scientists, engineers, developers and data analysts through their flagship

(continued...)

Pickle isn't slow, it's a protocol

This work is supported by Anaconda Inc

tl;dr: Pickle isn’t slow, it’s a protocol. Protocols are important for ecosystems.

A recent Dask issue showed that using Dask with PyTorch was slow because sending PyTorch models between Dask workers took a long time (Dask GitHub issue).

This turned out to be because serializing PyTorch models with pickle was very slow (1 MB/s for GPU based models, 50 MB/s for CPU based models). There is no architectural reason why this needs to be this slow. Every part of the hardware pipeline is much faster than this.

We could have fixed this in Dask by special-casing PyTorch models (Dask has it’s own optional serialization system for performance), but being good ecosystem citizens, we decided to raise the performance problem in an issue upstream (PyTorch Github issue). This resulted in a five-line-fix to PyTorch that turned a 1-50 MB/s serialization bandwidth into a 1 GB/s bandwidth, which is more than fast enough for many use cases (PR to PyTorch).

    
(continued...)
Leonardo Uieda 2018-07-20 12:00:00

Introducing Pooch

A friend to fetch your sample data files.

Pooch is a Python package that manages downloading data files over HTTP and storing them in a local directory. It is meant to be used by other Python libraries that ship sample data files for use in documentation, workshops, demos, etc.

For example, your package could define a datasets.py module that has functions to load sample data (like scikit-learn does). If you want the data to live on the web (like in the Github repo) instead of shipping it with your package, Pooch can keep track of it and download it to the user's computer only when it's needed.

This is what a datasets.py module would look like using Pooch:

"""
Module mypackage/datasets.py
"""
import pooch

# Get the version string from your project. You have one of these, right?
from
(continued...)

Dask Development Log, Scipy 2018

This work is supported by Anaconda Inc

To increase transparency I’m trying to blog more often about the current work going on around Dask and related projects. Nothing here is ready for production. This blogpost is written in haste, so refined polish should not be expected.

Last week many Dask developers gathered for the annual SciPy 2018 conference. As a result, very little work was completed, but many projects were started or discussed. To reflect this change in activity this blogpost will highlight possible changes and opportunities for readers to further engage in development.

Dask on HPC Machines

The dask-jobqueue project was a hit at the conference. Dask-jobqueue helps people launch Dask on traditional job schedulers like PBS, SGE, SLURM, Torque, LSF, and others that are commonly found on high performance computers. These are very common among scientific, research, and high performance machine learning groups but commonly a bit hard to use with anything other than MPI.

This project came up in the Pangeo talk, lightning talks, and the Dask

(continued...)
Planet SciPy – I Love Symposia! 2018-07-12 18:58:35

The road to scikit-image 1.0

This is the first in a series of posts about the joint scikit-image, scikit-learn, and dask sprint that took place at the Berkeley Insitute of Data Science, May 28-Jun 1, 2018. In addition to the dask and scikit-learn teams, the sprint brought together three core developers of scikit-image (Emmanuelle Gouillart, Stéfan van der Walt, and … Continue reading The road to scikit-image 1.0
Living in an Ivory Basement 2018-07-08 22:00:00

The Open Source Anti-Sisyphean League

We need an Open Source Anti-Sisyphean League!

python – Dr. Randal S. Olson 2018-07-04 20:41:32

Does batting order matter in Major League Baseball? A simulation approach

If you’ve ever watched Major League Baseball, one of the feature points of the sport is the batting line-up that each team decides upon before each game. Traditional baseball logic tells us that speedy, reliable hitters like Trea Turner should
Living in an Ivory Basement 2018-07-01 22:00:00

A framework for thinking about Open Source Sustainability?

Can we apply Common Pool Resource work to open online projects?

Living in an Ivory Basement 2018-06-25 22:00:00

How open is too open?

How open is too open?

Planet SciPy – I Love Symposia! 2018-06-20 11:19:46

What do scientists know about open source?

A friend recently pointed out this great talk by Matt Bernius, What students know and don’t know about open source. If you have even a minor interest in open source it’s worth a watch, but the gist is: in the US alone, there are about 200,000 students enrolled in a computer science major. Open source … Continue reading What do scientists know about open source?
Filipe Saraiva's blog 2018-06-17 13:19:41

De quando falei sobre ficção científica com minha psicóloga

— Então Amanda, sabe, as vezes penso que uma das coisas que me faz assim foram essas quantidades de ficção científica que li na infância e na adolescência… mas não qualquer ficção, digo apenas daquelas sobre viagens no tempo e realidades alternativas. Meu filme preferido é De Volta Para o Futuro, as histórias que mais... [Read More]
Paul Ivanov’s Journal 2018-06-12 07:00:00

Get in it

Two weeks ago, Project Jupyter had our only planned team meeting for 2018. There was too much stuff going on for me to write a poem during the event as I had in previous years (2016, and 2017), so I ended up reading one of the pieces I wrote during my evening introvert breaks in Cleveland at PyCon a few weeks earlier.

Once again, Fernando and Matthias had their gadgets ready to record (thank you both!). The video below was taken by Fernando.

Get in it
Time suspended
Gellatinous reality - the haze
submerged in murky drops summed
in swamp pond of life

believe and strive, expand the mind
A state sublime, when in your prime you came to
me and we were free to flow and fling our
cares, our dreams, our in-betweens, our
rêves perdues, our residue -- the lime of light
the black of sight -- all these converge and
merge the forks of friction filled with fright
and more -- the float of logs that plunges deep
beyond the fray, beyond the
(continued...)
.pyMadeThis 2018-06-11 06:00:00

7Pez — Desktop unzip application with custom window decoration

This is a functionally terrible unzip application, saved only by the fact that you get to look at a cat while using it.

The original idea reflected in the name 7Pez was actually worse — to rig it up so you had to push on the head to unzip each file ...

Living in an Ivory Basement 2018-06-09 22:00:00

How long does it take to produce scientific software?

How long does it take to produce scientific software?

Filipe Saraiva's blog 2018-06-03 22:04:47

A greve pelo ponto biométrico

Poucos dias atrás, Belém saiu de uma greve dos rodoviários que colocou a cidade de joelhos. Por 5 dias Belém ficou sem qualquer ônibus, com o sindicato descumprindo a determinação ditada pela justiça do trabalho de 80% da frota na rua. Encarando pesadas multas por conta disso mas ainda assim firmes, essa situação demonstrou como... [Read More]
.pyMadeThis 2018-06-03 16:30:00

Failamp — Multimedia playlist & player in Python, using PyQt

Failamp is a simple audio & video mediaplayer implemented in Python, using the built-in Qt playlist and media handling features. It is modelled, very loosely on the original Winamp, although nowhere near as complete (hence the fail).

The main window

The main window UI was built using Qt Designer. The screenshot ...

.pyMadeThis 2018-05-31 06:00:00

Creating a window with PyQt5 — The first step in creating your GUI application

The first step in creating desktop applications with PyQt is getting a window to show up on your desktop. Thankfully, with PyQt that is pretty simple.

Below are a few short examples to creating PyQt apps and getting a window on the screen. If this works you know you have ...

Living in an Ivory Basement 2018-05-30 22:00:00

Communicating outside of big consortia is tough! (but important!)

It's hard enough to keep people inside informed...

Living in an Ivory Basement 2018-05-28 22:00:00

Open-source style community engagement for the Data Commons Pilot Phase Consortium

Keeping the Data Commons community coordinated and engaged

.pyMadeThis 2018-05-27 19:00:00

QtWebEngineWidgets, the new browser API in PyQt 5.6 — Simplified page model and asynchronous methods

With the release of Qt 5.5 the Qt WebKit API was deprecated and replaced with the new QtWebEngine API, based on Chromium. The WebKit API was subsequently removed from Qt entirely with the release of Qt 5.6 in mid-2016.

The change to use Chromium for web widgets within ...

.pyMadeThis 2018-05-25 19:00:00

Brown Note — Desktop notes app using SQLAlchemy & PyQt

Relieve your creative blockages with these interactive desktop reminders.

Brown Note is a desktop notes application written in Python, using PyQt. The notes are implemented as decoration-less windows, which can be dragged around the desktop and edited. Details in the notes, and their position on the desktop, is stored in ...

Python – Meta Rabbit 2018-05-25 11:58:38

Quick followups: NGLess benchmark & Notebooks as papers

A quick follow-up on two earlier posts: We finalized the benchmark for ngless that I had discussed earlier: As you can see, NGLess performs much better than either MOCAT or htseq-count. We tried to use featureCounts too, but that completely failed to produce results for some of the samples (we gave it a whopping 1TB … Continue reading Quick followups: NGLess benchmark & Notebooks as papers
.pyMadeThis 2018-05-14 06:00:00

Lucky Cat Spinning-arm Display — Python-powered Maneki-neko persistence of vision scroller

This build started as something simple: a lucky cat which would turn on and off automatically in response to some event. Since lucky cats are associated with good fortune the idea was to make one do this every time I got paid. This was working pretty well but unfortunately, after ...

Matthieu Brucher's blog 2018-05-08 07:52:42

Address Sanitizer: alternative to valgrind

Recently, at work, I encountered a strange bug with GCC 7.2 and clang 6 (I didn’t test it with Visual Studio 2017 for different reasons). The bug was not visible on “old” compilers like gcc 4, Visual Studio 2013 or even Intel Compiler 2017. In debug mode, everything was fine, but in release mode, the […]
.pyMadeThis 2018-05-07 20:00:00

NSAViewer — Webcam viewer & photo booth in Python, using PyQt

This app isn't actually a direct line from your webcam to the NSA, it's a demo of using the webcam/camera support in Qt. The name is a nod to the paranoia (or is it...) of being watched through your webcam by government spooks.

I did consider making ...

Spyder Blog 2018-05-06 00:00:00

State of the Spyder, Part 1: Looking back

As we approach some major development milestones, now is as good a time as ever to share with you some perspective on where we've been, what's happening now, and where we're going in the world of Spyder. In this post, part one of a three part series, we'll take a look back over the past six months at some of the key events, accomplishments and challenges for Spyder and its community, and how that all leads up to where we are now.

Stay tuned right here, since part two will share several exciting announcements that affect the project (in a good way, we promise!) and its immediate future. Even better, part three will formally announce the next Spyder 3 release and—what I'm sure you are all looking forward to—the plan for the first official Spyder 4 beta, plus our schedule and feature roadmap for Spyder 4 and beyond!

A Call Answered

Starting off, as we announced back in mid-November, our funding from Anaconda, Inc was

(continued...)
While My MCMC Gently Samples 2018-05-03 14:00:00

An intuitive, visual guide to copulas

(c) 2018 by Thomas Wiecki

People seemed to enjoy my intuitive and visual explanation of Markov chain Monte Carlo so I thought it would be fun to do another one, this time focused on copulas.

If you ask a statistician what a copula is they might say "a copula is …

Matthieu Brucher's blog 2018-05-01 07:00:34

Analog modelling: A prototype generic modeller in Python

A few month ago, mystran published on KVR a small SPICE simulator for real-time processing. I liked the idea, the drawback being that the code is generic and not tailored like a static version of the optimizer. So I wondered if it was doable. But for this, I have to start from the basics and […]
.pyMadeThis 2018-04-29 17:00:00

Megasolid Idiom — Simple rich text editor in Python

Megasolid Idiom is a rich text word processor implemented in Python and Qt. You can use it to open, edit and save HTML-formatted files, with a WYSIWYG (what you see is what you get) format view. Only basic formatting, headings, lists and images are supported.

Megasolid Idiom is based on ...

Matthieu Brucher's blog 2018-04-24 07:24:55

Announcement: ATKSideChainCompressor 3.0.0

I’m happy to announce the update of ATK Side-Chain Compressor based on the Audio Toolkit and JUCE. It is available on Windows (AVX compatible processors) and OS X (min. 10.9, SSE4.2) in different formats. This update changes storage format and allows linked channels to be steered by a mix of power coming from each channel, […]
.pyMadeThis 2018-04-23 06:00:00

Calculon — Writing a simple desktop calculator in Python

Calculators are one of the simplest desktop applications, found by default on every window system. Over time these have been extended to support scientific and programmer modes, but fundamentally they all work the same.

In this short write up we implement a working standard desktop calculator using PyQt. This implementation ...

Matthieu Brucher's blog 2018-04-17 07:07:48

Book review: C++17 Quick Syntax Reference: A Pocket Guide to the Language, APIs and Library

I work on a day-to-day basis on a big project that has many developers with different C++ level. Scott Meyers wrote a wonderful book on modern C++ (that I still need to review one day, especially since there is a new Effective Modern C++), but it is not for beginners. So I’m looking for that […]
.pyMadeThis 2018-04-16 06:00:00

No2Pads — Basic Notepad editor in Python, using PyQt

Notepad doesn't need much introduction. It's a plaintext editor that's been part of Windows since the beginning, and similar applications exist in every GUI desktop ever created.

Here we reimplement Notepad in Python using PyQt, a task that is made particularly easy by Qt providing a text ...

python – Dr. Randal S. Olson 2018-04-12 00:08:59

Traveling salesman portrait in Python

Last week, Antonio S. Chinchón made an interesting post showing how to create a traveling salesman portrait in R. Essentially, the idea is to sample a bunch of dark pixels in an image, solve the well-known traveling salesman problem for
Boom! Michael Droettboom's blog 2018-04-11 04:00:00

Profiling WebAssembly

Tips for profiling WebAssembly

.pyMadeThis 2018-04-09 06:00:00

Moonsweeper — Minesweeper clone in Python, using PyQt

Explore the mysterious moon of Q'tee without getting too close to the alien natives!

Moonsweeper is a single-player puzzle video game. The objective of the game is to explore the area around your landed space rocket, without coming too close to the deadly B'ug aliens. Your trusty tricounter ...

.pyMadeThis 2018-04-08 21:00:00

Mozarella Ashbadger — A tabbed web-browser in Python, using PyQt

Mozarella Ashbadger is the latest revolution in web browsing! Go back and forward! Print! Save files! Get help! (you'll need it). Any similarity to other browsers is entirely coincidental.

This is an updated version of the basic PyQt-based browser Mooseache which adds support for tabbed browsing. If you want ...

.pyMadeThis 2018-04-08 12:00:00

MooseAche — Simple web-browser in Python, using PyQt

MooseAche is the latest revolution in web browsing! Go back and forward! Save files! Get help! (you'll need it). Any similarity to other browsers is entirely coincidental.

The full source code for MooseAche is available in the 15 minute apps repository. You can download/clone to get a working ...