Planet SciPy 2021-01-21 08:26:00

How to Make your TensorBoard Projects Easy to Share and Collaborate on

We all know that teamwork is an essential part of every machine learning project. Although each engineer has their own piece of...

The post How to Make your TensorBoard Projects Easy to Share and Collaborate on appeared first on

Martin Fitzpatrick - python 2021-01-21 07:00:00

micro:bit Space Invaders — Playable retro game in just 25 pixels

How much game can you fit into 25 pixels? Quite a bit it turns out.

This is a mini clone of arcade classic Space Invaders for the BBC micro:bit microcomputer. Using the accelerometer and two buttons for input, to can beat off wave after wave of aliens that advance … 2021-01-20 11:12:51

Best Machine Learning Model Management Tools That You Need to Know

Developing your model is an essential part of working on ML projects. And it’s usually a tough challenge.  Every data scientist has...

The post Best Machine Learning Model Management Tools That You Need to Know appeared first on 2021-01-19 07:41:00

How to Keep Track of Experiments in PyTorch Using Neptune

Machine Learning development seems a lot like conventional software development since both of them require us to write a lot of code....

The post How to Keep Track of Experiments in PyTorch Using Neptune appeared first on 2021-01-18 10:49:00

How to Organize Your LightGBM ML Model Development Process – Examples of Best Practices

LightGBM is a distributed and efficient gradient boosting framework that uses tree-based learning. It’s known for its fast training, accuracy, and efficient...

The post How to Organize Your LightGBM ML Model Development Process – Examples of Best Practices appeared first on

jbencook 2021-01-15 11:00:00

Normalizing Images in PyTorch

You can use the torchvision Normalize() transform to subtract the mean and divide by the standard deviation for image tensors in PyTorch. But it's important to understand how the transform works and how to reverse it. 2021-01-15 07:58:00

How to Organize Deep Learning Projects – Examples of Best Practices

For a successful deep learning project, you need a lot of iterations, a lot of time, and a lot of effort. To...

The post How to Organize Deep Learning Projects – Examples of Best Practices appeared first on 2021-01-14 09:26:00

MLOps: What It Is, Why it Matters, and How To Implement it (from a Data Scientist Perspective)

What is this MLOps thing?  It was the question I had on my mind, but until recently, I had only heard about...

The post MLOps: What It Is, Why it Matters, and How To Implement it (from a Data Scientist Perspective) appeared first on

jbencook 2021-01-14 06:00:00

Dropping columns and rows in Pandas

There are a few ways to drop columns and rows in Pandas. This post describes the easiest way to do it and provides a few alternatives that can sometimes be useful. 2021-01-13 08:34:00

MLflow vs. TensorBoard vs. Neptune – What Are the Differences?

You see endless columns and rows, random colors, and don’t know where to find any values? Ahh, the beautifully chaotic spreadsheet of...

The post MLflow vs. TensorBoard vs. Neptune – What Are the Differences? appeared first on 2021-01-12 09:25:59

This Week in Machine Learning: Language & Robotics, 10 Underappreciated Python Packages, Avocado Armchair, and More

The new year brings new opportunities, news, and discoveries. Today, we’re bringing you the first weekly roundup in the new year 2021....

The post This Week in Machine Learning: Language & Robotics, 10 Underappreciated Python Packages, Avocado Armchair, and More appeared first on 2021-01-12 09:17:00

Best 8 Machine Learning Model Deployment Tools That You Need to Know

Machine learning is nothing new in the tech world. It brought about a revolutionary change for many industries, with the ability to...

The post Best 8 Machine Learning Model Deployment Tools That You Need to Know appeared first on

jbencook 2021-01-12 06:00:00

The easiest way to rename a column in Pandas

Two easy recipes for renaming column(s) in a Pandas DataFrame. 2021-01-11 08:10:00

How to Organize Your XGBoost Machine Learning (ML) Model Development Process – Best Practices

XGBoost is a top gradient boosting library that is available in Python, Java, C++, R, and Julia.  The library offers support for GPU...

The post How to Organize Your XGBoost Machine Learning (ML) Model Development Process – Best Practices appeared first on

jbencook 2021-01-11 06:00:00

Reshaping arrays: How the NumPy reshape operation works

This post explains how the NumPy reshape operation works, how to use it and gotchas to watch out for.
jbencook 2021-01-08 06:00:00

Calculating the norm of an array in NumPy: all about np.linalg.norm()

You can calculate the L1 and L2 norms of a vector or the Frobenius norm of a matrix in NumPy with np.linalg.norm(). This post explains the API and gives a few concrete usage examples.
ListenData 2021-01-06 10:35:00

Run SAS in Python without Installation

In the past few years python has gained a huge popularity as a programming language in data science world. Many banks and pharma organisations have started using Python and some of them are in transition stage, migrating SAS syntax library to Python. Many big organisations have been using SAS since early 2000 and they developed a hundreds of SAS codes for various tasks ranging from data extraction to model building and validation. Hence it's a marathon task to migrate SAS code to any other programming language. Migration can only be done in phases so day to day tasks would not be hit by development and testing of python code. Since Python is open source it becomes difficult sometimes in terms of maintaining the existing code. Some SAS procedures are very robust and powerful in nature its alternative in Python is still not implemented, might be doable but not a straightforward way for average developer or analyst.

Do you wish

Quansight Labs 2021-01-04 08:00:00

Welcoming Tania Allard as Quansight Labs co-director

Today I'm incredibly excited to welcome Tania Allard to Quansight as Co-Director of Quansight Labs. Tania (GitHub, Twitter, personal site) is a well-known and prolific PyData community member. In the past few years she has been involved as a conference organizer (JupyterCon, SciPy, PyJamas, PyCon UK, PyCon LatAm, JuliaCon and more), as a community builder (PyLadies, NumFOCUS, RForwards), as a contributor to Matplotlib and Jupyter, and as a regular speaker and mentor. She also brings relevant experience in both industry and academia - she joins us from Microsoft where she was a senior developer advocate, and has a PhD in computational modelling.

Read more… (4 min remaining to read)

jbencook 2021-01-04 06:00:00

Scientific notation in Python and NumPy

Using and suppressing scientific notation in Python and NumPy.
jbencook 2021-01-02 06:00:00

NumPy tile

The unofficial guide to np.tile() with examples
jbencook 2021-01-01 06:00:00

The NumPy square operation

You can get the element-wise square of an input with np.square(). This is not exactly the same as x**2.
jbencook 2021-01-01 06:00:00

NumPy random uniform number generator

The unofficial guide to np.random.uniform()
jbencook 2020-12-31 06:00:00

Sorting a list of tuples in Python

The best way to sort a list of tuples in Python.
jbencook 2020-12-31 06:00:00

The Keras Dense layer

The Keras dense layer can be a little confusing. This post will give you everything you need to start using it.
Filipe Saraiva's blog 2020-12-30 12:43:56

Disnatia X/Potências de X

Nenhuma equipe de heróis me é tão querida quanto X-Men. Lá pelo final dos anos 90 comecei a colecionar por alguns anos, mas em seguida veio o fatídico aumento de preço com as Super-Heróis Premium, o que me acabou desmotivando a comprar. De lá para cá, acompanho esporadicamente, lendo notícias sobre, comprando uma ou outra… Continue a ler »Disnatia X/Potências de X
jbencook 2020-12-30 06:00:00

The NumPy square root operation

Everything you ever wanted to know about the numpy square root operation...
jbencook 2020-12-28 06:00:00

Combinations in Python

There are lots of ways to generate combinations in Python. This post will show you all of them.
jbencook 2020-12-26 06:00:00

The Keras custom layer explained

Sometimes you need to define your own Keras custom layer. This tutorial explains how custom layers work for tensorflow>=1.7.0 (up to at least 2.4.0) which includes a fairly stable version of the Keras API.
jbencook 2020-12-24 06:00:00

Maximum likelihood estimate for the uniform distribution

If you have a random sample drawn from a continuous uniform(a, b) distribution stored in an array x, the maximum likelihood estimate (MLE) of a is...
Quansight Labs 2020-12-22 09:00:00

Develop a JupyterLab Winter Theme

JupyterLab 3.0 is about to be released and provides many improvements to the extension system. Theming is a way to extend JupyterLab and benefits from those improvements.

While theming is often disregarded as a purely cosmetic endeavour, it can greatly improve software. Theming can be great help for accessibility, and the Jupyter team pays attention to making the default appearance accessibility-aware by using sufficient contrast. For users with a high visual acuity you may also choose to increase the information density.

Theming can also be a great way to improve communication by increasing or decreasing emphasis of the user interface, which can be of use for teaching or presenting. Theming may also help with security, for example, by having a clear distinction between staging and production.

Finally Theming can be a great way to express oneself, for example, by using a branded version of software that fits well into a context, or expressing one's artistic preferences or opinions.

In the following blog post, we will show you step-by-step how you

ListenData 2020-12-21 14:50:00

Wish Christmas with Python and R

This post is dedicated to all the Python and R Programming Lovers...Flaunt your knowledge in your peer group with the following programs. As a data science professional, you want your wish to be special on eve of christmas. If you observe the code, you may also learn 1-2 tricks which you can use later in your daily tasks.

Method 1 : Run the following program and see what I mean

R Code

paste(rep(intToUtf8(acos(exp(0)/2)*180/pi+2^4+3*2),2), collapse = intToUtf8(0)),
LETTERS[5^(3-1)], intToUtf8(atan(1/sqrt(3))*180/pi+2),
sep = intToUtf8(0)

Python Code

import math
import datetime

(chr(int(math.acos(math.log(1))*180/math.pi-13)) \
+, 2, 1).strftime('%B')[1] \
+ 2 *, 2, 1).strftime('%B')[3] \
+, 2, 1).strftime('%B')[7] \
+ chr(int(math.atan(1/math.sqrt(3))*180/math.pi+2)) \
+, 10, 1).strftime('%B')[1] \
+ chr(int(math.acos(math.log(1))*180/math.pi-18)) \
+, 4, 1).strftime('%B')[2:4] \
+ chr(int(math.acos(math.exp(0)/2)*180/math.pi+2**4+3*2+1)) \
+ chr(int(math.acos(math.exp(0)/2)*180/math.pi+2**4+2*4)) \
+ chr(int(math.acos(math.log(1))*180/math.pi-13)) \
+ "{:c}".format(97) \
+ chr(int(math.atan(1/math.sqrt(3))*180/math.pi*3-7))).upper()
Method 2 : Audio Wish for Christmas

Turn on computer speakers before running the code.

R Code

christmas_file <- tempfile()
download.file("", christmas_file, mode = "wb")
(continued...) 2020-12-20 23:00:00

On the Link Between Optimization and Polynomials, Part 2

An analysis of momentum can be tightened using a combination Chebyshev polynomials of the first and second kind. Through this connection we'll derive one of the most iconic methods in optimization: Polyak momentum.

ListenData 2020-12-19 15:59:00

How to use variable in a query in pandas

Suppose you want to reference a variable in a query in pandas package in Python. This seems to be a straightforward task but it becomes daunting sometimes. Let's discuss it with examples in the article below.

Let's create a sample dataframe having 3 columns and 4 rows. This dataframe is used for demonstration purpose.

import pandas as pd
df = pd.DataFrame({"col1" : range(1,5),
"col2" : ['A A','B B','A A','B B'],
"col3" : ['A A','A A','B B','B B']
Filter a value A A in column col2
In order to do reference of a variable in query, you need to use @.
NumFOCUS 2020-12-18 21:21:54

NumFOCUS hires Open Source Developer Advocate!

  NumFOCUS is pleased to announce that Arliss Collins has been hired as our organization’s first Open Source Developer Advocate. Founded in 2012, NumFOCUS has finally grown beyond just providing non-technical needs for our 40+ sponsored projects! As our first technical hire, Arliss will work to help understand our projects from a technical perspective and […]

The post NumFOCUS hires Open Source Developer Advocate! appeared first on NumFOCUS.

NumFOCUS 2020-12-11 19:37:25

A Pivotal Time in NumFOCUS’s Project Aimed DEI Efforts

NumFOCUS is pleased to announce the launch of our Contributor Diversification & Retention Research Project funded by a grant from the Gordon and Betty Moore Foundation.  “We were eager to support NumFOCUS’s diversity initiative because it aims to get to the heart of what is preventing greater participation in data science. We are hopeful that […]

The post A Pivotal Time in NumFOCUS’s Project Aimed DEI Efforts appeared first on NumFOCUS.

NumFOCUS 2020-11-23 14:44:42

Anaconda Announces Multi-Year Partnership with NumFOCUS

A key stakeholder in the open source scientific computing ecosystem has further formalized their long-standing partnership with NumFOCUS. Anaconda, the Austin, Texas-based software development and consulting company which provides global distribution of Python and R software packages, last month introduced their Anaconda Dividend Program. Through this initiative, Anaconda plans to direct a portion of their […]

The post Anaconda Announces Multi-Year Partnership with NumFOCUS appeared first on NumFOCUS.

Pierre de Buyl's homepage - scipy 2020-11-23 10:00:00

What's in a model

During the coronavirus epidemic, the belgian federal group of scientific experts came up regularly in the official communication of the government. How can scientists understand the spread of an epidemic? By using a model: a mathematical description of a phenomenon. By varying the parameters of the model, one can test …

Quansight Labs 2020-11-19 17:29:55

A second CZI grant for NumPy and OpenBLAS

I am happy to announce that NumPy and OpenBLAS have once again been awarded a grant from the Chan Zuckerberg Initiative through Cycle 3 of the Essential Open Source Software for Science (EOSS) program. This new grant totaling $140,000 will fund part of our efforts to improve usability and sustainability in both projects and is excellent news for the scientific computing community, which will certainly benefit from this work downstream.

Read more… (4 min remaining to read)

NumFOCUS 2020-11-18 18:36:55

NumFOCUS Receives Support from Heising-Simons

NumFOCUS is grateful to announce that we received a grant award of $50,000 in October from the Heising-Simons Foundation. This generous grant funding will provide general support resources to NumFOCUS and will benefit all of our Sponsored and Affiliated Projects as well as our organization’s several programs and initiatives. “This grant award from Heising-Simons will […]

The post NumFOCUS Receives Support from Heising-Simons appeared first on NumFOCUS.

Quansight Labs 2020-11-18 05:00:30

Introduction to Design in Open Source

This blog post is a conversation. Portions lead by Tim George are marked with TG, and those lead by Isabela Presedo-Floyd are marked with IPF.

TG: When I speak with other designers, one common theme I see concerning why they chose this career path is they want to make a difference in the world. We design because we imagine a better world and we want to help make it real. Part of the reason we design as a career is we're unable to go through life without designing; we're always thinking about how things are and how they could be better. This ethos also exists in many open-source communities. It seems like it ought to be an ideal match.

So what's the disconnect? I'm still exploring that myself, but after a few years in open source I want to share my observations, experiences, and hope for a stronger collaboration between design and development. I don't think I have a complete solution, and some days I'm not even sure I grasp the entire

Quansight Labs 2020-11-13 06:00:00

Querying multiple backends with Ibis

In our recent Ibis post, we discussed querying & retrieving data using a familiar Pandas-like interface. That discussion focused on the fluent API that Ibis provides to query structure from a SQLite database—in particular, using a single specific backend. In this post, we'll explore Ibis's ability to answer questions about data using two different Ibis backends.

import ibis.omniscidb, dask, intake, sqlalchemy, pandas, pyarrow as arrow, altair, h5py as hdf5
Ibis in the scientific Python ecosystem

Before we delve into the technical details of using Ibis, we'll consider Ibis in the greater historical context of the scientific Python ecosystem. It was started by Wes McKinney, the creator of Pandas, as way to query information on the Hadoop distributed file system and PySpark. More backends were added later as Ibis became a general tool for data queries.

Throughout the rest of this post, we'll highlight the ability of Ibis to generically prescribe query expressions across different data storage systems.

Read more… (3 min remaining to

Quansight Labs 2020-11-12 19:00:06

Manylinux1 is obsolete, manylinux2010 is almost EOL, what is next?

The basic installation format for users who install packages via pip is the wheel format. Wheel names are composed of four parts: a package-name-and-version tag (which can be further broken down), a Python tag, an ABI tag, and a platform tag. More information on the tags can be found in PEP 425. So a package like NumPy will be available on PyPI as numpy-1.19.2-cp36-cp36m-win_amd64.whl for 64-bit windows and numpy-1.19.2-cp36-cp36m-macosx_10_9_x86_64.whl for macOS. Note that only the plaform tag win_amd64 or macosx_10_9_x86_64 differs.

But what about Linux? There is no single, vendor controlled, "Linux platform" e.g., Ubuntu, RedHat, Fedora, Debian, FreeBSD all package software at slightly different versions. What most Linux distributions do have in common is the glibc runtime library, and a smattering of various additional system libraries. So it is possible to define a least common denominator (LCD) of software expected to be on a Linux platform (exceptions apply, e.g. non-glibc distributions).

The decision to converge on a LCD common platform gave birth to the manylinux1 standard. Going back to our example, numpy

Filipe Saraiva's blog 2020-11-05 14:50:03

Bate-papo com Vivi Reis sobre tecnologia e política

Hoje à noite (5 de novembro) às 20h conversarei com Vivi Reis, candidata a vereadora pelo PSOL em Belém. No bate-papo vamos focar bastante sobre temas que entrelaçam tecnologia e política. Entre os pontos, teremos o Escritório de Dados, dados e políticas públicas, software livre na administração pública, conectividade em Belém, inclusão digital, aplicativos cidadãos,… Continue a ler »Bate-papo com Vivi Reis sobre tecnologia e política
Spyder Blog 2020-11-05 00:00:00

New features in Spyder 4's new debugger!

IPython is a great improvement over the standard Python interpreter, bringing many enhancements such as autocompletion and "magic" commands. When debugging, however, many of these features become inaccessible. With Spyder, we aim to bring back these capabilities and more for a truly premium debugging experience! (And believe me, I use this debugger a lot, and not only because I write code that might contain bugs :p).

In this post, I will describe the debugger improvements we've already made in Spyder 4, as well as those that are already implemented or under review for Spyder 4.2 and beyond.

Make the debugger more like IPython

IPython improves on the stock Python interpreter by adding syntax highlighting, completion, and history. We have done the same for the debugger!

The output is prettier (and easier to read) than plain black text, as it was in Spyder 3!

Code completion and history for the debugger use the same functionality as the IPython console, so you should not notice any difference in behaviour. Just press

NumFOCUS 2020-11-04 00:10:51

JupyterCon 2020: Code of Conduct Reports

Following the reports to the NumFOCUS Code-of-Conduct committee on Jeremy Howard’s keynote at JupyterCon 2020, and the controversy that followed, the NumFOCUS Code of Conduct Committee issued a public apology to Jeremy Howard and escalated the case to the board of directors. The context In his keynote at JupyterCon 2020, Jeremy Howard gave a point-by-point rebuttal of […]

The post JupyterCon 2020: Code of Conduct Reports appeared first on NumFOCUS.

NumFOCUS 2020-10-30 18:51:02

Public Apology to Jeremy Howard

We, the NumFOCUS Code of Conduct Enforcement Committee, issue a public apology to Jeremy Howard for our handling of the JupyterCon 2020 reports. We should have done better. We thank you for sharing your experience and we will use it to improve our policies going forward. We acknowledge that it was an extremely stressful experience, […]

The post Public Apology to Jeremy Howard appeared first on NumFOCUS.

Paul Ivanov’s Journal 2020-10-29 07:00:00

Money and California Propositions (2020)

Ten years ago, I made some plots for how much money was contributed to and spent by the various proposition campaigns in California.

I decided to update these for this election, and here's the result:

Just in case you didn't get the full picture, here is the same data plotted on a common scale:

So, whereas 10 years ago, we had a total of ~$58 million on the election, the overwhelming amount of in support, this time, we had ~$662 million, an 11 fold increase!

The Cal-Access Campaign Finance Activity: Propositions & Ballot Measures source I used last time was still there, but there are way more propositions this time (12 vs 5), and the money details are broken out by committee, with some propositions have a dozen committees. Another wrinkle is that website has protected by some fancy scraping protection. I could browse it just fine in Firefox, even with Javascript turned off, but couldn't download it using wget, curl,

NumFOCUS 2020-10-26 18:13:17

TARDIS Joins NumFOCUS as a Sponsored Project

NumFOCUS is pleased to announce the newest addition to our fiscally sponsored projects: TARDIS TARDIS is an open-source, Monte Carlo based radiation transport simulator for supernovae ejecta. TARDIS simulates photons traveling through the outer layers of an exploded star including relevant physics like atomic interactions between the photons and the expanding gas. The TARDIS collaboration […]

The post TARDIS Joins NumFOCUS as a Sponsored Project appeared first on NumFOCUS.

Filipe Saraiva's blog 2020-10-26 13:51:04

Por um Escritório de Dados para Políticas Públicas em Belém

Dados sempre foram determinantes para a concepção e implementação de políticas públicas nas mais diferentes esferas governamentais. Acompanhamentos de indicadores econômicos, de saúde, de violência, de deslocamentos urbanos, de distribuição espacial da população, de áreas de cobertura de locais de lazer, entre outros, são apenas alguns dos dados que podem embasar o desenho de políticas… Continue a ler »Por um Escritório de Dados para Políticas Públicas em Belém
ListenData 2020-10-23 16:03:00

Translating Web Page while Scraping

Suppose you need to scrape data from a website after translating the web page in R and Python. In google chrome, there is an option (or functionality) to translate any foreign language. If you are an english speaker and don't know any other foreign language and you want to extract data from the website which does not have option to convert language to English, this article would help you how to perform translation of a webpage.
What is Selenium?You may not familiar with Selenium so it is important to understand the background. Selenium is an open-source tool which is very popular in testing domain and used for automating web browsers. It allows you to write test scripts in several programming languages. Selenium is available in both R and Python. Translate Page in Web Scraping in R and PythonIn R there is a package named RSelenium whereas Selenium can be installed by installing selenium package in Python. (continued...)
NumFOCUS 2020-10-23 15:25:08

NumFOCUS Earns Transparency Recognition from GuideStar

Earlier this week, NumFOCUS earned our first-ever Silver Seal of Transparency from GuideStar, an independent organization which classifies nonprofit organizations based on multiple metrics pertaining to transparency and accountability. Fewer than 5% of US-based nonprofits have received this type of recognition. “This respected acknowledgment comes as we prepare to enter our year-end fundraising season,” said […]

The post NumFOCUS Earns Transparency Recognition from GuideStar appeared first on NumFOCUS.

ListenData 2020-10-11 14:45:00

Learn Python for Data Science

This tutorial would help you to learn Data Science with Python by examples. It is designed for beginners who want to get started with Data Science in Python. Python is an open source language and it is widely used as a high-level programming language for general-purpose programming. It has gained high popularity in data science world. In the PyPL Popularity of Programming language index, Python scored second rank with a 14 percent share. In advanced analytics and predictive analytics market, it is ranked among top 3 programming languages for advanced analytics.
Data Science with Python Tutorial
Python is widely used and very popular for a variety of software engineering tasks such as website development, cloud-architecture, back-end etc. It is equally popular in data science world. In advanced analytics world, there has been several debates on R vs. Python. There are some areas such as number of libraries for statistical analysis, where R wins over Python but Python is catching up
Paul Ivanov’s Journal 2020-10-08 07:00:00

aka: also known as

I was chatting with Anthony Scopatz last week, and one of the things we covered was how it'd be cool to have a subcommand launcher, kind of like git, where the subcommands were swappable. If you're not familiar, git automatically calls out to git-something (note the dash) whenever you run

$ git something

and something is not one of the builtin git commands. For me, ~/bin is in my PATH, so

$ git lost
git: 'lost' is not a git command. See 'git --help'.
$ echo "echo how rude!" > ~/bin/git-lost; chmod +x ~/bin/git-lost
$ git lost
how rude!

And so what Anthony was talking about was having two commands that are supposed to do the same thing, and being able to switch between them. For example: maybe we have git-away and git-gone and both of them perform a similar function, and we wish call our preferred one when we run git lost.

One way to do this would be to copy or symlink our chosen version as git-lost, and replace that file whenever

Quansight Labs 2020-09-29 16:00:00

Design of the Versioned HDF5 Library

In a previous post, we introduced the Versioned HDF5 library and described some of its features. In this post, we'll go into detail on how the underlying design of the library works on a technical level.

Read more… (6 min remaining to read)

ListenData 2020-09-20 08:18:00

How to rename columns in Pandas Dataframe

In this tutorial, we will cover various methods to rename columns in pandas dataframe in Python. Renaming or changing the names of columns is one of the most common data wrangling task. If you are not from programming background and worked only in Excel Spreadsheets in the past you might feel it not so easy doing this in Python as you can easily rename columns in MS Excel by just typing in the cell what you want to have. If you are from database background it is similar to ALIAS in SQL. In Python there is a popular data manipulation package called pandas which simplifies doing these kind of data operations.
2 Methods to rename columns in Pandas
In Pandas there are two simple methods to rename name of columns.

First step is to install pandas package if it is not already installed. You can check if the package is installed on your machine by running

Quansight Labs 2020-09-11 11:00:00

Performance of the Versioned HDF5 Library

In several industry and science applications, a filesystem-like storage model such as HDF5 is the more appropriate solution for manipulating large amounts of data. However, suppose that data changes over time. In that case, it's not obvious how to track those different versions, since HDF5 is a binary format and is not well suited for traditional version control systems and tools.

In a previous post, we introduced the Versioned HDF5 library, which implements a mechanism for storing binary data sets in a versioned way that feels natural to users of other version control systems, and described some of its features. In this post, we'll show some of the performance analysis we did while developing the library, hopefully making the case that reading and writing versioned HDF5 files can be done with a nice, intuitive API while being as efficient as possible. The tests presented here show that using the Versioned HDF5 library results in reduced disk space usage,

Quansight Labs 2020-09-10 05:00:00

PyTorch-Ignite: training and evaluating neural networks flexibly and transparently

Authors: Victor Fomin (Quansight), Sylvain Desroziers (IFPEN, France)
This post is a general introduction of PyTorch-Ignite. It intends to give a brief but illustrative overview of what PyTorch-Ignite can offer for Deep Learning enthusiasts, professionals and researchers. Following the same philosophy as PyTorch, PyTorch-Ignite aims to keep it simple, flexible and extensible but performant and scalable.

Read more… (28 min remaining to read)

Quansight Labs 2020-08-30 09:00:00

Traitlets - an introduction & use in Jupyter configuration management

You have probably seen Traitlets in applications, you likely even use it. The package has nearly 5 million downloads on conda-forge alone.

But, what is Traitlets ?

In this post we'll answer this question along with where Traitlets came from, its applications, and a bit of history.

Read more… (8 min remaining to read)

Filipe Saraiva's blog 2020-08-29 18:48:00

Seqtembro de eventos virtuais e gratuitos sobre Qt e KDE

(Ok a piada com seqtembro funciona melhor na versão em inglês, seqtember, mas simbora) Por uma grande coincidência, obra do destino, ou nada disso, teremos um Setembro de 2020 repleto de eventos virtuais e gratuitos de alta qualidade sobre Qt e KDE. Começando de 4 à 11 do referido mês teremos o Akademy 2020, o… Continue a ler »Seqtembro de eventos virtuais e gratuitos sobre Qt e KDE
Neural Ensemble News 2020-08-08 19:27:00

CARLsim5 Released!


CARLsim5 is an efficient, easy-to-use, GPU-accelerated library for simulating large-scale spiking neural network (SNN) models with a high degree of biological detail. It allows execution of networks of Izhikevich spiking neurons with realistic synaptic dynamics using multiple off-the-shelf GPUs and x86 CPUs. The simulator provides a PyNN-like programming interface in C/C++, which allows for details and parameters to be specified at the synapse, neuron, and network level.

The present release, CARLsim 5, builds on the efficiency and scalability of earlier releases (Nageswaran et al., 2009; Richert et al., 2011, and Beyeler et al., 2015; Chou et al., 2018). The functionality of the simulator has been greatly expanded by the addition of a number of features that enable and simplify the creation, tuning, and simulation of complex networks with spatial structure.

New Features

1. PyNN Compatibility

pyCARL is a interface between the simulator-independent language PyNN and a CARLsim5 based back-end. In other words, you can write the code for a SNN model once, using the

Filipe Saraiva's blog 2020-08-04 23:27:02

O que será do Lev com o “fim” da Saraiva?

Disclaimer: apesar do sobrenome, não tenho qualquer relação com a Saraiva. E também não tenho respostas para a pergunta do título. Como usuário do Lev acompanho com interesse a agonia da Saraiva. A rede de livrarias, uma das maiores do Brasil, está há anos em um imbróglio judicial devendo diversas editoras, em um processo que… Continue a ler »O que será do Lev com o “fim” da Saraiva?
NumFOCUS 2020-07-31 17:52:20

Dask Life Sciences Fellow [Open Job]

Dask is an open-source library for parallel computing in Python that interoperates with existing Python data science libraries like Numpy, Pandas, Scikit-Learn, and Jupyter.  Dask is used today across many different scientific domains. Recently, we’ve observed an increase in use in a few life sciences applications: Large scale imaging in microscopy Single cell analysis Genomics […]

The post Dask Life Sciences Fellow [Open Job] appeared first on NumFOCUS.

Spyder Blog 2020-07-25 10:00:00

STX Next, Python development company, uses Spyder to improve their workflow

STX Next, one of Europe's largest Python development companies, has shared with us how Spyder has been a powerful tool for them when performing data analysis. It is a pleasure for us on the Spyder team to work every day to improve the workflow of developers, scientists, engineers and data analysts. We are very glad to receive and share a STX Next testimonial about Spyder, along with an interview with one of their developers, Michael Wiśniewski, who has found Spyder very useful in his job.

What Michael Wiśniewski says about Spyder

In an era of a continuously growing demand for analysis of vast amounts of data, we are facing increasingly complex tasks to perform. Sure, we are not alone—there are many great tools designed for scientists and data analysts. We have NumPy, SciPy, Matplotlib, Pandas, and others. But, wouldn't it be nice to have one extra tool that could combine all the required packages into one compact working environment? Asking this question is precisely how

NumFOCUS 2020-07-24 16:31:53

NumFOCUS Introduces New Supporter Program

Today NumFOCUS is pleased to introduce a new program for our individual supporters, called Open Science Champions. Each year, our community members generously support NumFOCUS and our Projects in several ways; this program is intended to connect these various forms of support so that we can engage with our community most effectively and offer our […]

The post NumFOCUS Introduces New Supporter Program appeared first on NumFOCUS.

Filipe Saraiva's blog 2020-07-24 14:49:05

Educação Vigiada

Essa época de pandemia tem sido de produção em muitas frentes, o que infelizmente implica na redução de tempo para divulgação das mesmas aqui no blog. Nesse post quero me redimir dessa falta falando de um dos projetos que acho dos mais importantes que contribui recentemente, o Educação Vigiada. Há alguns meses o projeto Educação… Continue a ler »Educação Vigiada
NumFOCUS 2020-07-14 20:36:34

Open Source Developer Advocate

Position Overview The primary role of the Open Source Developer Advocate is to represent and support developers of NumFOCUS open source projects by serving as a link to internal and external stakeholders as well as the global user community. You will generate attention and support by applying your technical knowledge, passion for open source data […]

The post Open Source Developer Advocate appeared first on NumFOCUS.

Filipe Saraiva's blog 2020-07-10 23:09:48

Engrenagem Ep. 04 – Aplicações KDE favoritas dos KDErs brasileiros

Nesse sábado dia 11/07 às 10h o KDE Brasil vai voltar com episódios do Engrenagem, o videocast da comunidade brasileira (que está há 4 anos sem episódios inéditos 🙂 ). Para retomar os trabalhos, o episódio trará 6 colaboradores brasileiros (Ângela, Aracele, Caio, Filipe (eu), Fred e Tomaz) falando de suas aplicações KDE favoritas –… Continue a ler »Engrenagem Ep. 04 – Aplicações KDE favoritas dos KDErs brasileiros
Spyder Blog 2020-07-08 10:00:00

Writing docs is not just writing docs

This blogpost was originally published on the Quansight Labs website.

I joined the Spyder team almost two years ago, and I never thought I was going to end up working on docs. Six months ago I started a project with CAM Gerlach and Carlos Cordoba to improve Spyder’s documentation. At first, I didn’t actually understand how important docs are for software, especially for open source projects. However, during all this time I’ve learned how documentation has a huge impact on the open-source community and I’ve been thankful to have been able to do this. But, from the beginning, I asked myself “why am I the ‘right person’ for this?”

Improving Spyder’s documentation started as part of a NumFOCUS Small Development Grant awarded at the end of last year. The goal of the project was not only to update the documentation for Spyder 4, but also to make it more user-friendly, so users can understand Spyder’s key concepts and get started with it more

Filipe Saraiva's blog 2020-06-25 13:15:22

Sobre o livro “Uma História de Desigualdade”

Finalizei a leitura do premiado livro do Pedro de Souza, “Uma História de Desigualdade – A Concentração de Renda entre os Ricos no Brasil 1926 – 2013“, baseado na tese que defendeu no programa de sociologia da UnB. É um livro de fôlego e que faz jus a todos os elogios que recebeu desde o… Continue a ler »Sobre o livro “Uma História de Desigualdade”
Spyder Blog 2020-06-12 18:00:00

Thanking the people behind Spyder 4

This blogpost was originally published on the Quansight Labs website.

After more than three years in development and more than 5000 commits from 60 authors around the world, Spyder 4 finally saw the light on December 5, 2019! I decided to wait until now to write a blogpost about it because shortly after the initial release, we found several critical performance issues and some regressions with respect to Spyder 3, most of which are fixed now in version 4.1.3, released on May 8th 2020.

This new release comes with a lengthy list of user-requested features aimed at providing an enhanced development experience at the level of top general-purpose editors and IDEs, while strengthening Spyder's specialized focus on scientific programming in Python. The interested reader can take a look at some of them in previous blog posts, and in detail in our Changelog. However, this post is not meant to describe those improvements, but to acknowledge all people that contributed

Gaël Varoquaux - programming 2020-05-27 22:00:00

Technical discussions are hard; a few tips


This post discuss the difficulties of communicating while developing open-source projects and tries to gives some simple advice.

A large software project is above all a social exercise in which technical experts try to reach good decisions together, for instance on github pull requests. But communication is difficult, in …

Pierre de Buyl's homepage - scipy 2020-05-19 09:00:00

Tidynamics, what use?

In 2018 I published small Python library, tidynamics. The scope was deliberately limited: compute the typical correlation functions for stochastic and molecular dynamics: the autocorrelation and the mean-square displacement. Two years later, I wonder about its usage.

NumFOCUS 2020-05-18 19:48:24

Moderna, IMC Renew NumFOCUS Corporate Sponsorships

Monday, May 18th, 2020 Two NumFOCUS corporate supporters recently made fresh commitments to our open source mission. Trading firm IMC and biotechnology company Moderna Therapeutics each renewed their corporate sponsorships earlier this month. Both companies have supported NumFOCUS since 2018 at our Silver and Bronze sponsorship levels, respectively. Asked about his company’s decision to partner […]

The post Moderna, IMC Renew NumFOCUS Corporate Sponsorships appeared first on NumFOCUS.

Paul Ivanov’s Journal 2020-05-17 07:00:00

Lazy River of Curious Content 0

This is the first post of what I'm calling a Lazy River of Curious Content. This is a way to review stuff that I've been doing, dealing with, or find interesting during the week recently (This was originally written two weeks ago, May 3rd, my shoddy internet connectivity kept me from posting it.). I'm loosely following the format that Justin Sherrill uses with great effect over at

Learn NixOS by turning a Raspberry Pi into a Wireless Router Friend of the show, Anthony Scopatz, tried NixOS for the first time and provides a detailed report:

"While I had read the NixOS pamphlets, and listened politely when the faithful came knocking on my door at inconvenient times, I had never walked the path of functional Linux enlightenment myself"

Reading through that made me file away a todo of writing up how I use propellor (and why). But those todo sometimes just pile up for a while...

An interview of one of my long time nerd-crushes, Rob Pike. The questions focus on the Go programming

Living in an Ivory Basement 2020-05-06 22:00:00

sourmash databases as zip files, in sourmash v3.3.0

Use compressed databases directly!

Filipe Saraiva's blog 2020-05-05 18:29:16

LaKademy 2019

Em novembro passado, colaboradores latinoamericanos do KDE desembarcaram em Salvador/Brasil para participarem de mais uma edição do LaKademy – o Latin American Akademy. Aquela foi a sétima edição do evento (ou oitava, se você contar o Akademy-BR como o primeiro LaKademy) e a segunda com Salvador como a cidade que hospedou o evento. Sem problemas… Continue a ler »LaKademy 2019
Filipe Saraiva's blog 2020-05-04 21:20:54

Akademy 2019

Em setembro de 2019 a cidade italiana de Milão sediou o principal encontro mundial dos colaboradores do KDE – o Akademy, onde membros de diferentes áreas como tradutores, desenvolvedores, artistas, pessoal de promo e mais se reúnem por alguns dias para pensar e construir o futuro dos projetos e comunidade(s) do KDE Antes de chegar… Continue a ler »Akademy 2019
Spyder Blog 2020-04-22 17:00:00

Creating the ultimate terminal experience in Spyder 4 with Spyder-Terminal

This blogpost was originally published on the Quansight Labs website.

The Spyder-Terminal project is revitalized! The new 0.3.0 version adds numerous features that improve the user experience, and enhances compatibility with the latest Spyder 4 release, in part thanks to the improvements made in the xterm.js project.

Upgrade to ES6/JSX syntax

First, we were able to update all the old JavaScript files to use ES6/JSX syntax and the tests for the client terminal. This change simplified the code base and maintenance and allows us to easily extend the project to new functionalities that the xterm.js API offers. In order to compile this code and run it inside Spyder, we migrated our deployment to Webpack.

Multiple shells per operating system

In the new release, you now have the ability to configure which shell to use in the terminal. On Linux and UNIX systems, bash, sh, ksh, zsh, csh, pwsh, tcsh, screen, tmux, dash and rbash are supported, while cmd and powershell are the

Living in an Ivory Basement 2020-04-19 22:00:00

Software and workflow development practices (April 2020 update)

How we develop software and workflows in the DIB Lab, in 2020.

Martin Fitzpatrick - python 2020-04-13 11:01:00

Is it getting better yet? — An optimistic visual guide to the Coronavirus pandemic

As the apocalypse rumbles on, I found myself wondering "Is it getting any better?"

Daily updates of spiralling case numbers (and worse, deaths) does little to give a sense of whether we're getting to, or already past, the worst of it.

To answer that question for myself and you, I …

Living in an Ivory Basement 2020-04-12 22:00:00

How to give a bad online talk

A bad example... 2020-04-06 22:00:00

On the Link Between Polynomials and Optimization, Part 1

There's a fascinating link between minimization of quadratic functions and polynomials. A link that goes deep and allows to phrase optimization problems in the language of polynomials and vice versa. Using this connection, we can tap into centuries of research in the theory of polynomials and shed new light on …

Paul Ivanov’s Journal 2020-04-03 07:00:00

pheriday 3: infrastructure

Looks like we can't inline audio for your browser. That's cool, just find the direct file links below.

paul's habitual errant ramblings (on Fr)idays

pheridays: 3

2020-04-10: A week ago, I recorded a 5 minute audio segment of some stuff I've been thinking about, but when I started to write it up I stumbled into and kept dropping down a deep technostalgic hole.

fall down along with me:

The recording is just shy of five minutes long, you can also download it in different formats, depending on your needs, if the audio tag above doesn't suit you: (2.9 Mb) (4.5 Mb) (6.3 Mb)


Stuff I mentioned in the audio:

Propellor - "configuration management system using Haskell and Git" by Joey Hess

OpenWRT - specifically - reducing Bufferbloat

Mumble - "a free, open source, low latency, high quality voice chat application." - "the hacker's forge" also know as by Drew DeVault

Jitsi - "Multi-platform open-source video conferencing"

OpenFire - "real time collaboration (RTC) server licensed under the

Living in an Ivory Basement 2020-02-16 23:00:00

Two talks at JGI in May: sourmash, spacegraphcats, and disease associations in the human microbiome.

Using k-mers and taxonomy to find contamination in metagenomes

Leonardo Uieda 2020-01-23 12:00:00

Advancing research software in the UK through an SSI fellowship

I have been selected as part of the 2020 cohort of Fellows of the Software Sustainability Institute!

The Institute cultivates world-class research with software. It's based at the universities of Edinburgh, Manchester, Southampton, and Oxford in the UK. Their motto says it all:

The SSI has a yearly fellowship program to fund the organization of communities around scientific software (creating of local user groups, workshops, hackathons, etc). Even more importantly, they organize several events to get current and past fellows in the same place doing awesome stuff. I'm really looking forward to this year's Collaborations Workshop (registration is open to all, not just fellows). I applied at the end of last year and was selected to join the 2020 cohort of fellows along with some truly amazing people.

My plan for the fellowship is to

Peekaboo 2020-01-07 17:26:00

Don't fund Software that doesn't exist

I’ve been happy to see an increase in funding for open source software across research areas and across funding bodies. However, I observed that a majority of funding from, say, the NSF, goes to projects that do not exist yet, and where the funding is supposed to create a new project, or to extend projects that are developed and used within a single research lab. I think this top-down approach to creating software comes from a misunderstanding of the existing open source software that is used in science. This post collects thoughts on the effectiveness of current grant-based funding and how to improve it from the perspective of the grant-makers.
Instead of the current approach of funding new projects, I would recommend funding existing open source software, ideally software that is widely used, and underfunded. The story of the underfunded but critically important open source software (which I’ll refer to as infrastructure software) should be an old tale by now.
Living in an Ivory Basement 2020-01-01 23:00:00

sourmash-oddify: a workflow for exploring contamination in metagenome-assembled genomes

Using k-mers and taxonomy to find contamination in metagenomes

Leonardo Uieda 2019-12-08 12:00:00

Two PhD studentships at the University of Liverpool

I have two open positions for funded studentships at the University of Liverpool. Applications are open until 10 January 2020.

Project descriptions

Follow the links for more detailed versions.

Bringing machine learning techniques to geophysical data processing

The goal of this project is to investigate the use of existing machine learning techniques to process gravity and magnetics data using the Equivalent Layer Method. The methods and software developed during this project can be applied to process large amounts of gravity and magnetics data, including airborne and satellite surveys, and produce data products that can enable further scientific investigations. Examples of such data products include global gravity gradient grids from GOCE satellite measurements, regional magnetic grids for the UK, gravity grids for the Moon and Mars, etc.

Large-scale mapping of the thickness of the

Gaël Varoquaux - programming 2019-12-01 05:00:00

Getting a big scientific prize for open-source software


An important acknowledgement for a different view of doing science: open, collaborative, and more than a proof of concept.

A few days ago, Loïc Estève, Alexandre Gramfort, Olivier Grisel, Bertrand Thirion, and myself received the “Académie des Sciences Inria prize for transfer”, for our contributions to the scikit-learn project …

Spyder Blog 2019-11-28 20:00:00

Variable Explorer improvements in Spyder 4

This blogpost was originally published on the Quansight Labs website.

Spyder 4 will be released very soon with lots of interesting new features that you'll want to check out, reflecting years of effort by the team to improve the user experience. In this post, we will be talking about the improvements made to the Variable Explorer.

These include the brand new Object Explorer for inspecting arbitrary Python variables, full support for MultiIndex dataframes with multiple dimensions, and the ability to filter and search for variables by name and type, and much more.

It is important to mention that several of the above improvements were made possible through integrating the work of two other projects. Code from gtabview was used to implement the multi-dimensional Pandas indexes, while objbrowser was the foundation of the new Object Explorer.

New viewer for arbitrary Python objects

For Spyder 4 we added a long-requested feature: full support for inspecting any kind of Python object through the Variable

Spyder Blog 2019-11-12 00:00:00

File management improvements in Spyder 4

This blogpost was originally published on the Quansight Labs website.

Version 4.0 of Spyder is almost ready! It has been in the making for well over two years, and it contains lots of interesting new features. We will focus on the Files pane in this post, where we've made several improvements to the interface and file management tools.

Simplified interface

In order to simplify the Files pane's interface, the columns corresponding to size and kind are hidden by default. To change which columns are shown, use the top-right pane menu or right-click the header directly.

Custom file associations

First, we added the ability to associate different external applications with specific file extensions they can open. Under the File associations tab of the Files preferences pane, you can add file types and set the external program used to open each of them by default.

Once you've set this up, files will automatically launch in the associated application when opened from the Files pane in Spyder.

ListenData 2019-10-28 15:48:00

Loan Amortisation Schedule using R and Python

In this post, we will explain how you can calculate your monthly loan instalments the way bank calculates using R and Python. In financial world, analysts generally use MS Excel software for calculating principal and interest portion of instalment using PPMT, IPMT functions. As data science is growing and trending these days, it is important to know how you can do the same using popular data science programming languages such as R and Python.

When you take a loan from bank at x% annual interest rate for N number of years. Bank calculates monthly (or quarterly) instalments based on the following factors :

  • Loan Amount
  • Annual Interest Rate
  • Number of payments per year
  • Number of years for loan to be repaid in instalments
Loan Amortisation ScheduleIt refers to table of periodic loan payments explaining the breakup of principal and interest in each instalment/EMI until the loan is repaid at the end of its stipulated term. Monthly instalments are generally same every month
I Love Symposia! 2019-10-24 13:59:54

Introducing napari: a fast n-dimensional image viewer in Python

I'm really excited to finally, officially, share a new(ish) project called napari with the world. We have been developing napari in the open from the very first commit, but we didn't want to make any premature fanfare about it… Until now. It's still alpha software, but for months now, both the core napari team and a few collaborators/early adopters have been using napari in our daily work. I've found it life-changing.

The background

I've been looking for a great nD volume viewer in Python for the better part of a decade. In 2009, I joined Mitya Chklovskii's lab and the FlyEM team at the Janelia [Farm] Research Campus to work on the segmentation of 3D electron microscopy (EM) volumes. I started out in Matlab, but moved to Python pretty quickly and it was a very smooth transition (highly recommended! ;). Looking at my data was always annoying though. I was either looking at single 2D slices using matplotlib.pyplot.imshow, or saving the volumes in VTK format and loading them into ITK-SNAP — which worked ok

(continued...) 2019-09-26 22:00:00

How to Evaluate the Logistic Loss and not NaN trying

A naive implementation of the logistic regression loss can results in numerical indeterminacy even for moderate values. This post takes a closer look into the source of these instabilities and discusses more robust Python implementations.