SciPy

Planet SciPy

neptune.ai 2020-11-24 08:11:00

Pandas Plot: Deep Dive Into Plotting Directly with Pandas

Data Visualisation is an essential step in any data science pipeline. Exploring your data visually opens your mind to a lot of...

The post Pandas Plot: Deep Dive Into Plotting Directly with Pandas appeared first on neptune.ai.

Pierre de Buyl's homepage - scipy 2020-11-23 10:00:00

What's in a model

During the coronavirus epidemic, the belgian federal group of scientific experts came up regularly in the official communication of the government. How can scientists understand the spread of an epidemic? By using a model: a mathematical description of a phenomenon. By varying the parameters of the model, one can test …

neptune.ai 2020-11-23 07:33:00

Machine Learning Model Management in 2020 and Beyond – Everything That You Need to Know

Model management is a relatively new issue when it comes to Machine Learning. Since the technique is widely used in business, the...

The post Machine Learning Model Management in 2020 and Beyond – Everything That You Need to Know appeared first on neptune.ai.

neptune.ai 2020-11-20 12:57:00

Dive Into Football Analytics With TensorFlow Object Detection API

When it comes to football, it is surprising to see how a team is able to win a football match against a...

The post Dive Into Football Analytics With TensorFlow Object Detection API appeared first on neptune.ai.

Quansight Labs 2020-11-19 17:29:55

A second CZI grant for NumPy and OpenBLAS

I am happy to announce that NumPy and OpenBLAS have once again been awarded a grant from the Chan Zuckerberg Initiative through Cycle 3 of the Essential Open Source Software for Science (EOSS) program. This new grant totaling $140,000 will fund part of our efforts to improve usability and sustainability in both projects and is excellent news for the scientific computing community, which will certainly benefit from this work downstream.

Read more… (4 min remaining to read)

neptune.ai 2020-11-19 07:40:00

Image Classification: Tips and Tricks From 13 Kaggle Competitions (+ Tons of References)

Success in any field can be distilled into a set of small rules and fundamentals that produce great results when coupled together. ...

The post Image Classification: Tips and Tricks From 13 Kaggle Competitions (+ Tons of References) appeared first on neptune.ai.

neptune.ai 2020-11-18 09:02:00

TensorBoard vs Neptune: How Are They ACTUALLY Different

ML model development typically involves a tedious workflow of managing data, feature engineering, model training and evaluation.  A data scientist could easily...

The post TensorBoard vs Neptune: How Are They ACTUALLY Different appeared first on neptune.ai.

Quansight Labs 2020-11-18 05:00:30

Introduction to Design in Open Source

This blog post is a conversation. Portions lead by Tim George are marked with TG, and those lead by Isabela Presedo-Floyd are marked with IPF.

TG: When I speak with other designers, one common theme I see concerning why they chose this career path is they want to make a difference in the world. We design because we imagine a better world and we want to help make it real. Part of the reason we design as a career is we're unable to go through life without designing; we're always thinking about how things are and how they could be better. This ethos also exists in many open-source communities. It seems like it ought to be an ideal match.

So what's the disconnect? I'm still exploring that myself, but after a few years in open source I want to share my observations, experiences, and hope for a stronger collaboration between design and development. I don't think I have a complete solution, and some days I'm not even sure I grasp the entire

(continued...)
neptune.ai 2020-11-17 08:49:00

The Best Tools for Reinforcement Learning in Python You Actually Want to Try

Nowadays, Deep Reinforcement Learning (RL) is one of the hottest topics in the Data Science community. The fast development of RL has...

The post The Best Tools for Reinforcement Learning in Python You Actually Want to Try appeared first on neptune.ai.

neptune.ai 2020-11-16 18:19:17

This Week in Machine Learning: Why ML Models Fail, AI & Plastic Waste, Vatican Library, and Bears

Machine learning is fascinating. New things happen every second while we’re busy performing our daily tasks. If you want to know what...

The post This Week in Machine Learning: Why ML Models Fail, AI & Plastic Waste, Vatican Library, and Bears appeared first on neptune.ai.

neptune.ai 2020-11-16 11:07:00

pyLDAvis: Topic Modelling Exploration Tool That Every NLP Data Scientist Should Know

Have you ever wanted to classify news, papers, or tweets based on their topics? Knowing how to do this can help you...

The post pyLDAvis: Topic Modelling Exploration Tool That Every NLP Data Scientist Should Know appeared first on neptune.ai.

neptune.ai 2020-11-13 09:48:00

Best Machine Learning as a Service Platforms (MLaaS) That You Want to Check as a Data Scientist

The availability of tremendous computing power in the cloud was one of the factors behind the machine learning revolution. Thus, it is...

The post Best Machine Learning as a Service Platforms (MLaaS) That You Want to Check as a Data Scientist appeared first on neptune.ai.

Quansight Labs 2020-11-13 06:00:00

Querying multiple backends with Ibis

In our recent Ibis post, we discussed querying & retrieving data using a familiar Pandas-like interface. That discussion focused on the fluent API that Ibis provides to query structure from a SQLite database—in particular, using a single specific backend. In this post, we'll explore Ibis's ability to answer questions about data using two different Ibis backends.

import ibis.omniscidb, dask, intake, sqlalchemy, pandas, pyarrow as arrow, altair, h5py as hdf5
Ibis in the scientific Python ecosystem

Before we delve into the technical details of using Ibis, we'll consider Ibis in the greater historical context of the scientific Python ecosystem. It was started by Wes McKinney, the creator of Pandas, as way to query information on the Hadoop distributed file system and PySpark. More backends were added later as Ibis became a general tool for data queries.

Throughout the rest of this post, we'll highlight the ability of Ibis to generically prescribe query expressions across different data storage systems.

Read more… (3 min remaining to

(continued...)
Quansight Labs 2020-11-12 19:00:06

Manylinux1 is obsolete, manylinux2010 is almost EOL, what is next?

The basic installation format for users who install packages via pip is the wheel format. Wheel names are composed of four parts: a package-name-and-version tag (which can be further broken down), a Python tag, an ABI tag, and a platform tag. More information on the tags can be found in PEP 425. So a package like NumPy will be available on PyPI as numpy-1.19.2-cp36-cp36m-win_amd64.whl for 64-bit windows and numpy-1.19.2-cp36-cp36m-macosx_10_9_x86_64.whl for macOS. Note that only the plaform tag win_amd64 or macosx_10_9_x86_64 differs.

But what about Linux? There is no single, vendor controlled, "Linux platform" e.g., Ubuntu, RedHat, Fedora, Debian, FreeBSD all package software at slightly different versions. What most Linux distributions do have in common is the glibc runtime library, and a smattering of various additional system libraries. So it is possible to define a least common denominator (LCD) of software expected to be on a Linux platform (exceptions apply, e.g. non-glibc distributions).

The decision to converge on a LCD common platform gave birth to the manylinux1 standard. Going back to our example, numpy

(continued...)
neptune.ai 2020-11-12 10:07:00

What Image Processing Techniques Are Actually Used in the ML Industry?

Processing can be used to improve the quality of your image, or to help you extract useful information from it. It’s useful...

The post What Image Processing Techniques Are Actually Used in the ML Industry? appeared first on neptune.ai.

Filipe Saraiva's blog 2020-11-05 14:50:03

Bate-papo com Vivi Reis sobre tecnologia e política

Hoje à noite (5 de novembro) às 20h conversarei com Vivi Reis, candidata a vereadora pelo PSOL em Belém. No bate-papo vamos focar bastante sobre temas que entrelaçam tecnologia e política. Entre os pontos, teremos o Escritório de Dados, dados e políticas públicas, software livre na administração pública, conectividade em Belém, inclusão digital, aplicativos cidadãos,… Continue a ler »Bate-papo com Vivi Reis sobre tecnologia e política
Spyder Blog 2020-11-05 00:00:00

New features in Spyder 4's new debugger!

IPython is a great improvement over the standard Python interpreter, bringing many enhancements such as autocompletion and "magic" commands. When debugging, however, many of these features become inaccessible. With Spyder, we aim to bring back these capabilities and more for a truly premium debugging experience! (And believe me, I use this debugger a lot, and not only because I write code that might contain bugs :p).

In this post, I will describe the debugger improvements we've already made in Spyder 4, as well as those that are already implemented or under review for Spyder 4.2 and beyond.

Make the debugger more like IPython

IPython improves on the stock Python interpreter by adding syntax highlighting, completion, and history. We have done the same for the debugger!

The output is prettier (and easier to read) than plain black text, as it was in Spyder 3!

Code completion and history for the debugger use the same functionality as the IPython console, so you should not notice any difference in behaviour. Just press

(continued...)
NumFOCUS 2020-11-04 00:10:51

JupyterCon 2020: Code of Conduct Reports

Following the reports to the NumFOCUS Code-of-Conduct committee on Jeremy Howard’s keynote at JupyterCon 2020, and the controversy that followed, the NumFOCUS Code of Conduct Committee issued a public apology to Jeremy Howard and escalated the case to the board of directors. The context In his keynote at JupyterCon 2020, Jeremy Howard gave a point-by-point rebuttal of […]

The post JupyterCon 2020: Code of Conduct Reports appeared first on NumFOCUS.

NumFOCUS 2020-10-30 18:51:02

Public Apology to Jeremy Howard

We, the NumFOCUS Code of Conduct Enforcement Committee, issue a public apology to Jeremy Howard for our handling of the JupyterCon 2020 reports. We should have done better. We thank you for sharing your experience and we will use it to improve our policies going forward. We acknowledge that it was an extremely stressful experience, […]

The post Public Apology to Jeremy Howard appeared first on NumFOCUS.

Paul Ivanov’s Journal 2020-10-29 07:00:00

Money and California Propositions (2020)

Ten years ago, I made some plots for how much money was contributed to and spent by the various proposition campaigns in California.

I decided to update these for this election, and here's the result:

Just in case you didn't get the full picture, here is the same data plotted on a common scale:

So, whereas 10 years ago, we had a total of ~$58 million on the election, the overwhelming amount of in support, this time, we had ~$662 million, an 11 fold increase!

The Cal-Access Campaign Finance Activity: Propositions & Ballot Measures source I used last time was still there, but there are way more propositions this time (12 vs 5), and the money details are broken out by committee, with some propositions have a dozen committees. Another wrinkle is that website has protected by some fancy scraping protection. I could browse it just fine in Firefox, even with Javascript turned off, but couldn't download it using wget, curl,

(continued...)
NumFOCUS 2020-10-26 18:13:17

TARDIS Joins NumFOCUS as a Sponsored Project

NumFOCUS is pleased to announce the newest addition to our fiscally sponsored projects: TARDIS TARDIS is an open-source, Monte Carlo based radiation transport simulator for supernovae ejecta. TARDIS simulates photons traveling through the outer layers of an exploded star including relevant physics like atomic interactions between the photons and the expanding gas. The TARDIS collaboration […]

The post TARDIS Joins NumFOCUS as a Sponsored Project appeared first on NumFOCUS.

Filipe Saraiva's blog 2020-10-26 13:51:04

Por um Escritório de Dados para Políticas Públicas em Belém

Dados sempre foram determinantes para a concepção e implementação de políticas públicas nas mais diferentes esferas governamentais. Acompanhamentos de indicadores econômicos, de saúde, de violência, de deslocamentos urbanos, de distribuição espacial da população, de áreas de cobertura de locais de lazer, entre outros, são apenas alguns dos dados que podem embasar o desenho de políticas… Continue a ler »Por um Escritório de Dados para Políticas Públicas em Belém
ListenData 2020-10-23 16:03:00

Translating Web Page while Scraping

Suppose you need to scrape data from a website after translating the web page in R and Python. In google chrome, there is an option (or functionality) to translate any foreign language. If you are an english speaker and don't know any other foreign language and you want to extract data from the website which does not have option to convert language to English, this article would help you how to perform translation of a webpage.
What is Selenium?You may not familiar with Selenium so it is important to understand the background. Selenium is an open-source tool which is very popular in testing domain and used for automating web browsers. It allows you to write test scripts in several programming languages. Selenium is available in both R and Python. Translate Page in Web Scraping in R and PythonIn R there is a package named RSelenium whereas Selenium can be installed by installing selenium package in Python. (continued...)
NumFOCUS 2020-10-23 15:25:08

NumFOCUS Earns Transparency Recognition from GuideStar

Earlier this week, NumFOCUS earned our first-ever Silver Seal of Transparency from GuideStar, an independent organization which classifies nonprofit organizations based on multiple metrics pertaining to transparency and accountability. Fewer than 5% of US-based nonprofits have received this type of recognition. “This respected acknowledgment comes as we prepare to enter our year-end fundraising season,” said […]

The post NumFOCUS Earns Transparency Recognition from GuideStar appeared first on NumFOCUS.

ListenData 2020-10-11 14:45:00

Learn Python for Data Science

This tutorial would help you to learn Data Science with Python by examples. It is designed for beginners who want to get started with Data Science in Python. Python is an open source language and it is widely used as a high-level programming language for general-purpose programming. It has gained high popularity in data science world. In the PyPL Popularity of Programming language index, Python scored second rank with a 14 percent share. In advanced analytics and predictive analytics market, it is ranked among top 3 programming languages for advanced analytics.
Data Science with Python Tutorial

Table of Contents

Python 2 vs. 3Google yields thousands of articles on this topic. Some bloggers opposed and some in favor of 2.7. If you filter your search criteria and look for only recent articles, you would find Python 2 is no longer supported by the Python Software Foundation. Hence it does not make any sense to learn 2.7 if you start learning
(continued...)
Paul Ivanov’s Journal 2020-10-08 07:00:00

aka: also known as

I was chatting with Anthony Scopatz last week, and one of the things we covered was how it'd be cool to have a subcommand launcher, kind of like git, where the subcommands were swappable. If you're not familiar, git automatically calls out to git-something (note the dash) whenever you run

$ git something

and something is not one of the builtin git commands. For me, ~/bin is in my PATH, so

$ git lost
git: 'lost' is not a git command. See 'git --help'.
$ echo "echo how rude!" > ~/bin/git-lost; chmod +x ~/bin/git-lost
$ git lost
how rude!

And so what Anthony was talking about was having two commands that are supposed to do the same thing, and being able to switch between them. For example: maybe we have git-away and git-gone and both of them perform a similar function, and we wish call our preferred one when we run git lost.

One way to do this would be to copy or symlink our chosen version as git-lost, and replace that file whenever

(continued...)
Quansight Labs 2020-09-29 16:00:00

Design of the Versioned HDF5 Library

In a previous post, we introduced the Versioned HDF5 library and described some of its features. In this post, we'll go into detail on how the underlying design of the library works on a technical level.

Read more… (6 min remaining to read)

ListenData 2020-09-20 08:18:00

How to rename columns in Pandas Dataframe

In this tutorial, we will cover various methods to rename columns in pandas dataframe in Python. Renaming or changing the names of columns is one of the most common data wrangling task. If you are not from programming background and worked only in Excel Spreadsheets in the past you might feel it not so easy doing this in Python as you can easily rename columns in MS Excel by just typing in the cell what you want to have. If you are from database background it is similar to ALIAS in SQL. In Python there is a popular data manipulation package called pandas which simplifies doing these kind of data operations.
2 Methods to rename columns in Pandas
In Pandas there are two simple methods to rename name of columns.

First step is to install pandas package if it is not already installed. You can check if the package is installed on your machine by running

(continued...)
Quansight Labs 2020-09-11 11:00:00

Performance of the Versioned HDF5 Library

In several industry and science applications, a filesystem-like storage model such as HDF5 is the more appropriate solution for manipulating large amounts of data. However, suppose that data changes over time. In that case, it's not obvious how to track those different versions, since HDF5 is a binary format and is not well suited for traditional version control systems and tools.

In a previous post, we introduced the Versioned HDF5 library, which implements a mechanism for storing binary data sets in a versioned way that feels natural to users of other version control systems, and described some of its features. In this post, we'll show some of the performance analysis we did while developing the library, hopefully making the case that reading and writing versioned HDF5 files can be done with a nice, intuitive API while being as efficient as possible. The tests presented here show that using the Versioned HDF5 library results in reduced disk space usage,

(continued...)
Quansight Labs 2020-09-10 05:00:00

PyTorch-Ignite: training and evaluating neural networks flexibly and transparently

Authors: Victor Fomin (Quansight), Sylvain Desroziers (IFPEN, France)
This post is a general introduction of PyTorch-Ignite. It intends to give a brief but illustrative overview of what PyTorch-Ignite can offer for Deep Learning enthusiasts, professionals and researchers. Following the same philosophy as PyTorch, PyTorch-Ignite aims to keep it simple, flexible and extensible but performant and scalable.

Read more… (28 min remaining to read)

Quansight Labs 2020-08-30 09:00:00

Traitlets - an introduction & use in Jupyter configuration management

You have probably seen Traitlets in applications, you likely even use it. The package has nearly 5 million downloads on conda-forge alone.

But, what is Traitlets ?

In this post we'll answer this question along with where Traitlets came from, its applications, and a bit of history.

Read more… (8 min remaining to read)

Filipe Saraiva's blog 2020-08-29 18:48:00

Seqtembro de eventos virtuais e gratuitos sobre Qt e KDE

(Ok a piada com seqtembro funciona melhor na versão em inglês, seqtember, mas simbora) Por uma grande coincidência, obra do destino, ou nada disso, teremos um Setembro de 2020 repleto de eventos virtuais e gratuitos de alta qualidade sobre Qt e KDE. Começando de 4 à 11 do referido mês teremos o Akademy 2020, o… Continue a ler »Seqtembro de eventos virtuais e gratuitos sobre Qt e KDE
Quansight Labs 2020-08-24 12:00:00

IPython reproducible builds

Starting with IPython 7.16.1 (released in June 2020), you should be able to recreate the sdist (.tar.gz) and wheel (.whl), and get byte for byte identical result to the wheels published on PyPI. This is a critical step toward being able to trust your computing platforms, and a key component to improve efficiency of build and packaging platforms. It also potentially impacts fast conda environment creation for users. The following goes into some reasons for why you should care.

Read more… (5 min remaining to read)

Quansight Labs 2020-08-21 13:00:00

Introducing Versioned HDF5

The problem of storing and manipulating large amounts of data is a challenge in many scientific computing and industry applications. One of the standard data models for this is HDF5, an open technology that implements a hierarchical structure (similar to a file-system structure) for storing large amounts of possibly heterogeneous data within a single file. Data in an HDF5 file is organized into groups and datasets; you can think about these as the folders and files in your local file system, respectively. You can also optionally store metadata associated with each item in a file, which makes this a self-describing and powerful data storage model.

Read more… (3 min remaining to read)

Neural Ensemble News 2020-08-08 19:27:00

CARLsim5 Released!

Introduction

CARLsim5 is an efficient, easy-to-use, GPU-accelerated library for simulating large-scale spiking neural network (SNN) models with a high degree of biological detail. It allows execution of networks of Izhikevich spiking neurons with realistic synaptic dynamics using multiple off-the-shelf GPUs and x86 CPUs. The simulator provides a PyNN-like programming interface in C/C++, which allows for details and parameters to be specified at the synapse, neuron, and network level.


The present release, CARLsim 5, builds on the efficiency and scalability of earlier releases (Nageswaran et al., 2009; Richert et al., 2011, and Beyeler et al., 2015; Chou et al., 2018). The functionality of the simulator has been greatly expanded by the addition of a number of features that enable and simplify the creation, tuning, and simulation of complex networks with spatial structure.


New Features

1. PyNN Compatibility

pyCARL is a interface between the simulator-independent language PyNN and a CARLsim5 based back-end. In other words, you can write the code for a SNN model once, using the

(continued...)
Filipe Saraiva's blog 2020-08-04 23:27:02

O que será do Lev com o “fim” da Saraiva?

Disclaimer: apesar do sobrenome, não tenho qualquer relação com a Saraiva. E também não tenho respostas para a pergunta do título. Como usuário do Lev acompanho com interesse a agonia da Saraiva. A rede de livrarias, uma das maiores do Brasil, está há anos em um imbróglio judicial devendo diversas editoras, em um processo que… Continue a ler »O que será do Lev com o “fim” da Saraiva?
NumFOCUS 2020-07-31 17:52:20

Dask Life Sciences Fellow [Open Job]

Dask is an open-source library for parallel computing in Python that interoperates with existing Python data science libraries like Numpy, Pandas, Scikit-Learn, and Jupyter.  Dask is used today across many different scientific domains. Recently, we’ve observed an increase in use in a few life sciences applications: Large scale imaging in microscopy Single cell analysis Genomics […]

The post Dask Life Sciences Fellow [Open Job] appeared first on NumFOCUS.

Spyder Blog 2020-07-25 10:00:00

STX Next, Python development company, uses Spyder to improve their workflow

STX Next, one of Europe's largest Python development companies, has shared with us how Spyder has been a powerful tool for them when performing data analysis. It is a pleasure for us on the Spyder team to work every day to improve the workflow of developers, scientists, engineers and data analysts. We are very glad to receive and share a STX Next testimonial about Spyder, along with an interview with one of their developers, Michael Wiśniewski, who has found Spyder very useful in his job.

What Michael Wiśniewski says about Spyder

In an era of a continuously growing demand for analysis of vast amounts of data, we are facing increasingly complex tasks to perform. Sure, we are not alone—there are many great tools designed for scientists and data analysts. We have NumPy, SciPy, Matplotlib, Pandas, and others. But, wouldn't it be nice to have one extra tool that could combine all the required packages into one compact working environment? Asking this question is precisely how

(continued...)
NumFOCUS 2020-07-24 16:31:53

NumFOCUS Introduces New Supporter Program

Today NumFOCUS is pleased to introduce a new program for our individual supporters, called Open Science Champions. Each year, our community members generously support NumFOCUS and our Projects in several ways; this program is intended to connect these various forms of support so that we can engage with our community most effectively and offer our […]

The post NumFOCUS Introduces New Supporter Program appeared first on NumFOCUS.

Filipe Saraiva's blog 2020-07-24 14:49:05

Educação Vigiada

Essa época de pandemia tem sido de produção em muitas frentes, o que infelizmente implica na redução de tempo para divulgação das mesmas aqui no blog. Nesse post quero me redimir dessa falta falando de um dos projetos que acho dos mais importantes que contribui recentemente, o Educação Vigiada. Há alguns meses o projeto Educação… Continue a ler »Educação Vigiada
NumFOCUS 2020-07-14 20:36:34

Open Source Developer Advocate

Position Overview The primary role of the Open Source Developer Advocate is to represent and support developers of NumFOCUS open source projects by serving as a link to internal and external stakeholders as well as the global user community. You will generate attention and support by applying your technical knowledge, passion for open source data […]

The post Open Source Developer Advocate appeared first on NumFOCUS.

Filipe Saraiva's blog 2020-07-10 23:09:48

Engrenagem Ep. 04 – Aplicações KDE favoritas dos KDErs brasileiros

Nesse sábado dia 11/07 às 10h o KDE Brasil vai voltar com episódios do Engrenagem, o videocast da comunidade brasileira (que está há 4 anos sem episódios inéditos 🙂 ). Para retomar os trabalhos, o episódio trará 6 colaboradores brasileiros (Ângela, Aracele, Caio, Filipe (eu), Fred e Tomaz) falando de suas aplicações KDE favoritas –… Continue a ler »Engrenagem Ep. 04 – Aplicações KDE favoritas dos KDErs brasileiros
Spyder Blog 2020-07-08 10:00:00

Writing docs is not just writing docs

This blogpost was originally published on the Quansight Labs website.

I joined the Spyder team almost two years ago, and I never thought I was going to end up working on docs. Six months ago I started a project with CAM Gerlach and Carlos Cordoba to improve Spyder’s documentation. At first, I didn’t actually understand how important docs are for software, especially for open source projects. However, during all this time I’ve learned how documentation has a huge impact on the open-source community and I’ve been thankful to have been able to do this. But, from the beginning, I asked myself “why am I the ‘right person’ for this?”

Improving Spyder’s documentation started as part of a NumFOCUS Small Development Grant awarded at the end of last year. The goal of the project was not only to update the documentation for Spyder 4, but also to make it more user-friendly, so users can understand Spyder’s key concepts and get started with it more

(continued...)
Filipe Saraiva's blog 2020-06-25 13:15:22

Sobre o livro “Uma História de Desigualdade”

Finalizei a leitura do premiado livro do Pedro de Souza, “Uma História de Desigualdade – A Concentração de Renda entre os Ricos no Brasil 1926 – 2013“, baseado na tese que defendeu no programa de sociologia da UnB. É um livro de fôlego e que faz jus a todos os elogios que recebeu desde o… Continue a ler »Sobre o livro “Uma História de Desigualdade”
Spyder Blog 2020-06-12 18:00:00

Thanking the people behind Spyder 4

This blogpost was originally published on the Quansight Labs website.

After more than three years in development and more than 5000 commits from 60 authors around the world, Spyder 4 finally saw the light on December 5, 2019! I decided to wait until now to write a blogpost about it because shortly after the initial release, we found several critical performance issues and some regressions with respect to Spyder 3, most of which are fixed now in version 4.1.3, released on May 8th 2020.

This new release comes with a lengthy list of user-requested features aimed at providing an enhanced development experience at the level of top general-purpose editors and IDEs, while strengthening Spyder's specialized focus on scientific programming in Python. The interested reader can take a look at some of them in previous blog posts, and in detail in our Changelog. However, this post is not meant to describe those improvements, but to acknowledge all people that contributed

(continued...)
Gaël Varoquaux - programming 2020-05-27 22:00:00

Technical discussions are hard; a few tips

Note

This post discuss the difficulties of communicating while developing open-source projects and tries to gives some simple advice.

A large software project is above all a social exercise in which technical experts try to reach good decisions together, for instance on github pull requests. But communication is difficult, in …

Pierre de Buyl's homepage - scipy 2020-05-19 09:00:00

Tidynamics, what use?

In 2018 I published small Python library, tidynamics. The scope was deliberately limited: compute the typical correlation functions for stochastic and molecular dynamics: the autocorrelation and the mean-square displacement. Two years later, I wonder about its usage.

NumFOCUS 2020-05-18 19:48:24

Moderna, IMC Renew NumFOCUS Corporate Sponsorships

Monday, May 18th, 2020 Two NumFOCUS corporate supporters recently made fresh commitments to our open source mission. Trading firm IMC and biotechnology company Moderna Therapeutics each renewed their corporate sponsorships earlier this month. Both companies have supported NumFOCUS since 2018 at our Silver and Bronze sponsorship levels, respectively. Asked about his company’s decision to partner […]

The post Moderna, IMC Renew NumFOCUS Corporate Sponsorships appeared first on NumFOCUS.

NumFOCUS 2020-05-18 14:58:09

NumFOCUS Projects helping combat the COVID-19 pandemic

Open source tools are uniquely positioned to help combat the ongoing COVID-19 pandemic through their adaptable and collaborative nature. NumFOCUS sponsored and affiliated projects are being used on a global scale to meet the needs of researchers and data scientists. Our projects are being used in groundbreaking scientific efforts to create response models, visualize and […]

The post NumFOCUS Projects helping combat the COVID-19 pandemic appeared first on NumFOCUS.

Paul Ivanov’s Journal 2020-05-17 07:00:00

Lazy River of Curious Content 0

This is the first post of what I'm calling a Lazy River of Curious Content. This is a way to review stuff that I've been doing, dealing with, or find interesting during the week recently (This was originally written two weeks ago, May 3rd, my shoddy internet connectivity kept me from posting it.). I'm loosely following the format that Justin Sherrill uses with great effect over at https://dragonflydigest.com

Learn NixOS by turning a Raspberry Pi into a Wireless Router Friend of the show, Anthony Scopatz, tried NixOS for the first time and provides a detailed report:

"While I had read the NixOS pamphlets, and listened politely when the faithful came knocking on my door at inconvenient times, I had never walked the path of functional Linux enlightenment myself"

Reading through that made me file away a todo of writing up how I use propellor (and why). But those todo sometimes just pile up for a while...

An interview of one of my long time nerd-crushes, Rob Pike. The questions focus on the Go programming

(continued...)
Living in an Ivory Basement 2020-05-06 22:00:00

sourmash databases as zip files, in sourmash v3.3.0

Use compressed databases directly!

Filipe Saraiva's blog 2020-05-05 18:29:16

LaKademy 2019

Em novembro passado, colaboradores latinoamericanos do KDE desembarcaram em Salvador/Brasil para participarem de mais uma edição do LaKademy – o Latin American Akademy. Aquela foi a sétima edição do evento (ou oitava, se você contar o Akademy-BR como o primeiro LaKademy) e a segunda com Salvador como a cidade que hospedou o evento. Sem problemas… Continue a ler »LaKademy 2019
Filipe Saraiva's blog 2020-05-04 21:20:54

Akademy 2019

Em setembro de 2019 a cidade italiana de Milão sediou o principal encontro mundial dos colaboradores do KDE – o Akademy, onde membros de diferentes áreas como tradutores, desenvolvedores, artistas, pessoal de promo e mais se reúnem por alguns dias para pensar e construir o futuro dos projetos e comunidade(s) do KDE Antes de chegar… Continue a ler »Akademy 2019
NumFOCUS 2020-05-01 16:32:10

Yellowbrick Update – April 2020

Yellowbrick released Version 1.1 on February 25, 2020.  If you haven’t yet upgraded simply type pip install yellowbrick -U or conda install -c districtdatalabs yellow-brick into your terminal/command prompt to get it.  The major improvement in v1.1 is introducing quick methods or one-liners to generate your favorite ML plots more quickly with Yellowbrick.  Dr.  Rebecca […]

The post Yellowbrick Update – April 2020 appeared first on NumFOCUS.

NumFOCUS 2020-04-29 18:34:15

2020 PyData Conferences Update [COVID-19]

We wanted to give an update to our community regarding the upcoming 2020 PyData conferences. We have been closely monitoring the situation and to help ensure the safety of our community given the threat of the COVID-19 virus, the following in-person events have been postponed to 2021: PyData Miami PyData Amsterdam PyData LA PyData London PyData […]

The post 2020 PyData Conferences Update [COVID-19] appeared first on NumFOCUS.

NumFOCUS 2020-04-28 18:16:15

Scientific Software Developer- Contract Basis [SunPy Project]

Scientific Software Developer- Contract Basis NumFOCUS is seeking a Scientific Software Developer to support the SunPy project. SunPy is a Python-based open source scientific software package supporting solar physics data analysis. This is a 1 year contract.     The successful applicant will work to improve SunPy’s functionality. There are four main tasks:   Report on […]

The post Scientific Software Developer- Contract Basis [SunPy Project] appeared first on NumFOCUS.

Spyder Blog 2020-04-22 17:00:00

Creating the ultimate terminal experience in Spyder 4 with Spyder-Terminal

This blogpost was originally published on the Quansight Labs website.

The Spyder-Terminal project is revitalized! The new 0.3.0 version adds numerous features that improve the user experience, and enhances compatibility with the latest Spyder 4 release, in part thanks to the improvements made in the xterm.js project.

Upgrade to ES6/JSX syntax

First, we were able to update all the old JavaScript files to use ES6/JSX syntax and the tests for the client terminal. This change simplified the code base and maintenance and allows us to easily extend the project to new functionalities that the xterm.js API offers. In order to compile this code and run it inside Spyder, we migrated our deployment to Webpack.

Multiple shells per operating system

In the new release, you now have the ability to configure which shell to use in the terminal. On Linux and UNIX systems, bash, sh, ksh, zsh, csh, pwsh, tcsh, screen, tmux, dash and rbash are supported, while cmd and powershell are the

(continued...)
Living in an Ivory Basement 2020-04-19 22:00:00

Software and workflow development practices (April 2020 update)

How we develop software and workflows in the DIB Lab, in 2020.

Filipe Saraiva's blog 2020-04-16 15:12:29

LaKademy 2019

Past November 2019 KDE fellows from Latin-America arrived in Salvador – Brazil to attend an one more edition of LaKademy – the Latin American Akademy. That was the 7th edition of the event (or the 8th, if you count Akademy-BR as the first LaKademy) and the second one with Salvador as host city. No problem… Continue a ler »LaKademy 2019
Martin Fitzpatrick - python 2020-04-13 11:01:00

Is it getting better yet? An optimistic visual guide to the Coronavirus pandemic

As the apocalypse rumbles on, I found myself wondering "Is it getting any better?"

Daily updates of spiralling case numbers (and worse, deaths) does little to give a sense of whether we're getting to, or already past, the worst of it.

To answer that question for myself and you, I …

Living in an Ivory Basement 2020-04-12 22:00:00

How to give a bad online talk

A bad example...

fa.bianp.net 2020-04-06 22:00:00

On the Link Between Polynomials and Optimization

There's a fascinating link between minimization of quadratic functions and polynomials. A link that goes deep and allows to phrase optimization problems in the language of polynomials and vice versa. Using this connection, we can tap into centuries of research in the theory of polynomials and shed new light on …

Paul Ivanov’s Journal 2020-04-03 07:00:00

pheriday 3: infrastructure

Looks like we can't inline audio for your browser. That's cool, just find the direct file links below.

paul's habitual errant ramblings (on Fr)idays

pheridays: 3

2020-04-10: A week ago, I recorded a 5 minute audio segment of some stuff I've been thinking about, but when I started to write it up I stumbled into and kept dropping down a deep technostalgic hole.

fall down along with me:

https://pirsquared.org/blog/pheriday-infrastructure.html

The recording is just shy of five minutes long, you can also download it in different formats, depending on your needs, if the audio tag above doesn't suit you:

https://pirsquared.org/pheridays/2020-04-03.ogg (2.9 Mb)
https://pirsquared.org/pheridays/2020-04-03.mp3 (4.5 Mb)
https://pirsquared.org/pheridays/2020-04-03.m4a (6.3 Mb)

--

Stuff I mentioned in the audio:

Propellor - "configuration management system using Haskell and Git" by Joey Hess

OpenWRT - specifically - reducing Bufferbloat

Mumble - "a free, open source, low latency, high quality voice chat application."

sourcehut.org - "the hacker's forge" also know as sr.ht by Drew DeVault

Jitsi - "Multi-platform open-source video conferencing"

OpenFire - "real time collaboration (RTC) server licensed under the

(continued...)
Living in an Ivory Basement 2020-02-16 23:00:00

Two talks at JGI in May: sourmash, spacegraphcats, and disease associations in the human microbiome.

Using k-mers and taxonomy to find contamination in metagenomes

Leonardo Uieda 2020-01-23 12:00:00

Advancing research software in the UK through an SSI fellowship

I have been selected as part of the 2020 cohort of Fellows of the Software Sustainability Institute!

The Institute cultivates world-class research with software. It's based at the universities of Edinburgh, Manchester, Southampton, and Oxford in the UK. Their motto says it all:

The SSI has a yearly fellowship program to fund the organization of communities around scientific software (creating of local user groups, workshops, hackathons, etc). Even more importantly, they organize several events to get current and past fellows in the same place doing awesome stuff. I'm really looking forward to this year's Collaborations Workshop (registration is open to all, not just fellows). I applied at the end of last year and was selected to join the 2020 cohort of fellows along with some truly amazing people.

My plan for the fellowship is to

(continued...)
Peekaboo 2020-01-07 17:26:00

Don't fund Software that doesn't exist

I’ve been happy to see an increase in funding for open source software across research areas and across funding bodies. However, I observed that a majority of funding from, say, the NSF, goes to projects that do not exist yet, and where the funding is supposed to create a new project, or to extend projects that are developed and used within a single research lab. I think this top-down approach to creating software comes from a misunderstanding of the existing open source software that is used in science. This post collects thoughts on the effectiveness of current grant-based funding and how to improve it from the perspective of the grant-makers.
Instead of the current approach of funding new projects, I would recommend funding existing open source software, ideally software that is widely used, and underfunded. The story of the underfunded but critically important open source software (which I’ll refer to as infrastructure software) should be an old tale by now.
(continued...)
Living in an Ivory Basement 2020-01-01 23:00:00

sourmash-oddify: a workflow for exploring contamination in metagenome-assembled genomes

Using k-mers and taxonomy to find contamination in metagenomes

Leonardo Uieda 2019-12-08 12:00:00

Two PhD studentships at the University of Liverpool

I have two open positions for funded studentships at the University of Liverpool. Applications are open until 10 January 2020.

Project descriptions

Follow the links for more detailed versions.

Bringing machine learning techniques to geophysical data processing

The goal of this project is to investigate the use of existing machine learning techniques to process gravity and magnetics data using the Equivalent Layer Method. The methods and software developed during this project can be applied to process large amounts of gravity and magnetics data, including airborne and satellite surveys, and produce data products that can enable further scientific investigations. Examples of such data products include global gravity gradient grids from GOCE satellite measurements, regional magnetic grids for the UK, gravity grids for the Moon and Mars, etc.

Large-scale mapping of the thickness of the

(continued...)
Gaël Varoquaux - programming 2019-12-01 05:00:00

Getting a big scientific prize for open-source software

Note

An important acknowledgement for a different view of doing science: open, collaborative, and more than a proof of concept.

A few days ago, Loïc Estève, Alexandre Gramfort, Olivier Grisel, Bertrand Thirion, and myself received the “Académie des Sciences Inria prize for transfer”, for our contributions to the scikit-learn project …

Spyder Blog 2019-11-28 20:00:00

Variable Explorer improvements in Spyder 4

This blogpost was originally published on the Quansight Labs website.

Spyder 4 will be released very soon with lots of interesting new features that you'll want to check out, reflecting years of effort by the team to improve the user experience. In this post, we will be talking about the improvements made to the Variable Explorer.

These include the brand new Object Explorer for inspecting arbitrary Python variables, full support for MultiIndex dataframes with multiple dimensions, and the ability to filter and search for variables by name and type, and much more.

It is important to mention that several of the above improvements were made possible through integrating the work of two other projects. Code from gtabview was used to implement the multi-dimensional Pandas indexes, while objbrowser was the foundation of the new Object Explorer.

New viewer for arbitrary Python objects

For Spyder 4 we added a long-requested feature: full support for inspecting any kind of Python object through the Variable

(continued...)
Spyder Blog 2019-11-12 00:00:00

File management improvements in Spyder 4

This blogpost was originally published on the Quansight Labs website.

Version 4.0 of Spyder is almost ready! It has been in the making for well over two years, and it contains lots of interesting new features. We will focus on the Files pane in this post, where we've made several improvements to the interface and file management tools.

Simplified interface

In order to simplify the Files pane's interface, the columns corresponding to size and kind are hidden by default. To change which columns are shown, use the top-right pane menu or right-click the header directly.

Custom file associations

First, we added the ability to associate different external applications with specific file extensions they can open. Under the File associations tab of the Files preferences pane, you can add file types and set the external program used to open each of them by default.

Once you've set this up, files will automatically launch in the associated application when opened from the Files pane in Spyder.

(continued...)
ListenData 2019-10-28 15:48:00

Loan Amortisation Schedule using R and Python

In this post, we will explain how you can calculate your monthly loan instalments the way bank calculates using R and Python. In financial world, analysts generally use MS Excel software for calculating principal and interest portion of instalment using PPMT, IPMT functions. As data science is growing and trending these days, it is important to know how you can do the same using popular data science programming languages such as R and Python.

When you take a loan from bank at x% annual interest rate for N number of years. Bank calculates monthly (or quarterly) instalments based on the following factors :

  • Loan Amount
  • Annual Interest Rate
  • Number of payments per year
  • Number of years for loan to be repaid in instalments
Loan Amortisation ScheduleIt refers to table of periodic loan payments explaining the breakup of principal and interest in each instalment/EMI until the loan is repaid at the end of its stipulated term. Monthly instalments are generally same every month
(continued...)
I Love Symposia! 2019-10-24 13:59:54

Introducing napari: a fast n-dimensional image viewer in Python

I'm really excited to finally, officially, share a new(ish) project called napari with the world. We have been developing napari in the open from the very first commit, but we didn't want to make any premature fanfare about it… Until now. It's still alpha software, but for months now, both the core napari team and a few collaborators/early adopters have been using napari in our daily work. I've found it life-changing.

The background

I've been looking for a great nD volume viewer in Python for the better part of a decade. In 2009, I joined Mitya Chklovskii's lab and the FlyEM team at the Janelia [Farm] Research Campus to work on the segmentation of 3D electron microscopy (EM) volumes. I started out in Matlab, but moved to Python pretty quickly and it was a very smooth transition (highly recommended! ;). Looking at my data was always annoying though. I was either looking at single 2D slices using matplotlib.pyplot.imshow, or saving the volumes in VTK format and loading them into ITK-SNAP — which worked ok

(continued...)
fa.bianp.net 2019-09-26 22:00:00

How to Evaluate the Logistic Loss and not NaN trying

A naive implementation of the logistic regression loss can results in numerical indeterminacy even for moderate values. This post takes a closer look into the source of these instabilities and discusses more robust Python implementations.

Paul Ivanov’s Journal 2019-09-17 07:00:00

Uvas Gold 200

My poem about a rainy 200k was published in the Fall 2019 issue of American Randonneur (a quarterly magazine published by Randonneurs USA)

I've been doing samizdat poetry for as long as I've had a web presence (since 1999), but I am now officially a published poet! (I am deliberately not counting the embarrasing hackjob that was published in a youth anthology when I was in 8th grade.)

You can find "Uvas Gold 200" on page 26 - either directly on this skeuomorphic leafing viewer or the PDF, but I'm republishing both the exposition blurb and the poem below. If you prefer to listen, I recorded a reading of it that you can download in different flavors: a local audio only, a local video, or the embeded video version below.

Uvas Gold 200k starts and ends in Fremont, CA and was held on Saturday, December 1st, 2018. The ride frontloads the climbing by going nearly half-way up Mount

(continued...)
Spyder Blog 2019-08-16 00:00:00

Spyder 4.0: Kite integration is here

This blogpost was originally published on the Quansight Labs website.

Note: Kite is sponsoring the work discussed in this blog post, and in addition supports Spyder 4.0 development through a Quansight Labs Community Work Order.

As part of our next release, we are proud to announce an additional completion client for Spyder, Kite. Kite is a novel completion client that uses Machine Learning techniques to find and predict the best autocompletion for a given text. Additionally, it collects improved documentation for compiled packages, e.g. Matplotlib, NumPy and SciPy, that cannot be obtained easily by using traditional code analysis packages such as Jedi. Although Kite is not open source like Spyder, you can download it without charge at the Kite website.

By incorporating Kite into Spyder, we will improve and provide the ultimate autocompletion and signature retrieval experience for most of the scientific Python stack and beyond. For instance, let’s take a look at the following PyTorch completion. While

(continued...)
ListenData 2019-08-10 21:54:00

Object Oriented Programming in Python : Learn by Examples

This tutorial outlines object oriented programming (OOP) in Python with examples. It is a step by step guide which was designed for people who have no programming experience. Object Oriented Programming is popular and available in other programming languages besides Python which are Java, C++, PHP.
Table of Contents

What is Object Oriented Programming?In object-oriented programming (OOP), you have the flexibility to represent real-world objects like car, animal, person, ATM etc. in your code. In simple words, an object is something that possess some characteristics and can perform certain functions. For example, car is an object and can perform functions like start, stop, drive and brake. These are the function of a car. And the characteristics are color of car, mileage, maximum speed, model year etc.

In the above example, car is an object. Functions are called methods in OOP world. Characteristics are attributes (properties). Technically attributes are variables or values related to the state of the object whereas methods

(continued...)
ListenData 2019-07-29 20:20:00

Precision Recall Curve Simplified

This article outlines precision recall curve and how it is used in real-world data science application. It includes explanation of how it is different from ROC curve. It also highlights limitation of ROC curve and how it can be solved via area under precision-recall curve. This article also covers implementation of area under precision recall curve in Python, R and SAS.
Table of Contents

What is Precision Recall Curve?Before getting into technical details, we first need to understand precision and recall terms in layman's term. It is essential to understand the concepts in simple words so that you can recall it for future work when it is required. Both Precision and Recall are important metrics to check the performance of binary classification model. PrecisionPrecision is also called Positive Predictive Value. Suppose you are building a customer attrition model which has objective to identify customers who are likely to close relationship with the company. The use of this model is to
(continued...)
Living in an Ivory Basement 2019-07-22 22:00:00

Comparing two genome binnings quickly with sourmash

Comparing two sets of MAGs, for fun and profit!

ListenData 2019-07-22 09:20:00

Calculate KS Statistic with Python

Kolmogorov-Smirnov (KS) Statistics is one of the most important metrics used for validating predictive models. It is widely used in BFSI domain. If you are a part of risk or marketing analytics team working on project in banking, you must have heard of this metrics. What is KS Statistics?It stands for Kolmogorov–Smirnov which is named after Andrey Kolmogorov and Nikolai Smirnov. It compares the two cumulative distributions and returns the maximum difference between them. It is a non-parametric test which means you don't need to test any assumption related to the distribution of data. In KS Test, Null hypothesis states null both cumulative distributions are similar. Rejecting the null hypothesis means cumulative distributions are different.

In data science, it compares the cumulative distribution of events and non-events and KS is where there is a maximum difference between the two distributions. In simple words, it helps us to understand how well our predictive model is able to discriminate between events and

(continued...)
ListenData 2019-07-20 16:22:00

A Complete Guide to Python DateTime Functions

In this tutorial, we will cover python datetime module and how it is used to handle date, time and datetime formatted columns (variables). It includes various practical examples which would help you to gain confidence in dealing dates and times with python functions. In general, Date types columns are not easy to manipulate as it comes with a lot of challenges like dealing with leap years, different number of days in a month, different date and time formats or if date values are stored in string (character) format etc.
Table of Contents

Introduction : datetime moduleIt is a python module which provides several functions for dealing with dates and time. It has four classes as follows which are explained in the latter part of this article how these classes work.
  1. datetime
  2. date
  3. time
  4. timedelta

People who have no experience of working with real-world datasets might have not encountered date columns. They might be under impression that working with dates is rarely used and not so

(continued...)
ListenData 2019-07-17 17:32:00

What are *args and **kwargs and How to use them

This article explains the concepts of *args and **kwargs and how and when we use them in python program. Seasoned python developers embrace the flexibility it provides when creating functions. If you are beginner in python, you might not have heard it before. After completion of this tutorial, you will have confidence to use them in your live project.
Table of Contents

Introduction : *argsargs is a short form of arguments. With the use of *args python takes any number of arguments in user-defined function and converts user inputs to a tuple named args. In other words, *args means zero or more arguments which are stored in a tuple named args.

When you define function without *args, it has a fixed number of inputs which means it cannot accept more (or less) arguments than you defined in the function.

In the example code below, we are creating a very basic function which adds two numbers. At the same time, we created a

(continued...)
ListenData 2019-07-12 21:42:00

Python : 10 Ways to Filter Pandas DataFrame

In this article, we will cover various methods to filter pandas dataframe in Python. Data Filtering is one of the most frequent data manipulation operation. It is similar to WHERE clause in SQL or you must have used filter in MS Excel for selecting specific rows based on some conditions. In terms of speed, python has an efficient way to perform filtering and aggregation. It has an excellent package called pandas for data wrangling tasks. Pandas has been built on top of numpy package which was written in C language which is a low level language. Hence data manipulation using pandas package is fast and smart way to handle big sized datasets.
Examples of Data Filtering
It is one of the most initial step of data preparation for predictive modeling or any reporting project. It is also called 'Subsetting Data'. See some of the examples of data filtering below.
  • Select all the active customers whose accounts were opened
(continued...)
ListenData 2019-07-04 19:51:00

Python Dictionary Comprehension with Examples

In this tutorial, we will cover how dictionary comprehension works in Python. It includes various examples which would help you to learn the concept of dictionary comprehension and how it is used in real-world scenarios.
What is Dictionary?
Dictionary is a data structure in python which is used to store data such that values are connected to their related key. Roughly it works very similar to SQL tables or data stored in statistical softwares. It has two main components -
  1. Keys : Think about columns in tables. It must be unique (like column names cannot be duplicate)
  2. Values : It is similar to rows in tables. It can be duplicate.
It is defined in curly braces { }. Each key is followed by a colon (:) and then values.
Syntax of Dictionary

d = {'a': [1,2], 'b': [3,4], 'c': [5,6]}
To extract keys, values and structure of dictionary, you can submit the following commands.

d.keys() # 'a', 'b', 'c'
d.values() # [1, 2], [3, 4], [5,
(continued...)

HTML outputs in Jupyter

Summary

User interaction in data science projects can be improved by adding a small amount of visual deisgn.

To motivate effort around visual design we show several simple-yet-useful examples. The code behind these examples is small and accessible to most Python developers, even if they don’t have much HTML experience.

This post in particular focuses on Jupyter’s ability to add HTML output to any object. This can either be full-fledged interactive widgets, or just rich static outputs like tables or diagrams. We hope that by showing examples here we will inspire some throughts in other projects.

This post was supported by replies to this tweet. The rest of this post is just examples.

Iris

I originally decided to write this post after reading another blogpost from the UK Met office, where they included the HTML output of their library Iris in a a blogpost

(work by Peter Killick, post by Theo McCaie)

The fact that the output provided by an interactive session is the same output that you would provide in a published result helps everyone. The interactive

(continued...)
ListenData 2019-07-03 15:01:00

Python list comprehension : Learn by Examples

This tutorial covers how list comprehension works in Python. It includes many examples which would help you to familiarize the concept and you should be able to implement it in your live project at the end of this lesson.
Table of Contents

What is list comprehension?Python is an object oriented programming language. Almost everything in them is treated consistently as an object. Python also features functional programming which is very similar to mathematical way of approaching problem where you assign inputs in a function and you get the same output with same input value. Given a function f(x) = x2, f(x) will always return the same result with the same x value. The function has no "side-effect" which means an operation has no effect on a variable/object that is outside the intended usage. "Side-effect" refers to leaks in your code which can modify a mutable data structure or variable.

Functional programming is also good for parallel computing as there is no

(continued...)
Peekaboo 2019-07-02 16:11:00

Don't cite the No Free Lunch Theorem

Tldr; You probably shouldn’t be citing the "No Free Lunch" Theorem by Wolpert. If you’ve cited it somewhere, you might have used it to support the wrong conclusion. What it actually (vaguely) says is “You can’t learn from data without making assumptions”.

The paper on the “No Free Lunch Theorem”, actually called "The Lack of A Priori Distinctions Between Learning Algorithms" is one of these papers that are often cited and rarely read, and I hear many people in the ML community refer to it when supporting the claim that “one model can’t be the best at everything” or “one model won’t always be better than another model”. The point of this post is to convince you that this is not what the paper or theorem says (at least not the one usually cited by Wolpert), and you should not cite this theorem in this context; and also that common versions cited of the "No Free Lunch" Theorem (continued...)
ListenData 2019-06-28 22:46:00

15 ways to read CSV file with pandas

This tutorial explains how to read a CSV file in python using read_csv function of pandas package. Without use of read_csv function, it is not straightforward to import CSV file with python object-oriented programming. Pandas is an awesome powerful python package for data manipulation and supports various functions to load and import data from various formats. Here we are covering how to deal with common issues in importing CSV file.
Table of Contents

Install and Load Pandas Package
Make sure you have pandas package already installed on your system. If you set up python using Anaconda, it comes with pandas package so you don't need to install it again. Otherwise you can install it by using command pip install pandas. Next step is to load the package by running the following command. pd is an alias of pandas package. We will use it instead of full name "pandas".
import pandas as pd
Create Sample Data for Import
The program below creates a sample
(continued...)
ListenData 2019-06-25 11:31:00

Matplotlib Tutorial : Learn by Examples

This tutorial outlines how to perform plotting and data visualization in python using Matplotlib library. The objective of this post is to get you familiar with the basics and advanced plotting functions of the library. It contains several examples which will give you hands-on experience in generating plots in python.
Table of Contents

What is Matplotlib?It is a powerful python library for creating graphics or charts. It takes care of all of your basic and advanced plotting requirements in Python. It took inspiration from MATLAB programming language and provides a similar MATLAB like interface for graphics. The beauty of this library is that it integrates well with pandas package which is used for data manipulation. With the combination of these two libraries, you can easily perform data wrangling along with visualization and get valuable insights out of data. Like ggplot2 library in R, matplotlib library is the grammar of graphics in Python and most used library for charts in Python.
Basics
(continued...)

Write Short Blogposts

I encourage my colleagues to write blogposts more frequently. This is for a few reasons:

  1. It informs your broader community what you’re up to, and allows that community to communicate back to you quickly.

    You communicating to the community fosters a sense of collaboration, openness, and trust. You gain collaborators, build momentum behind your work, and curate a body of knowledge that early adopters can consume to become experts quickly.

    Getting feedback from your community helps you to course-correct early in your work, and stops you from wasting time in inefficient courses of action.

    You can only work for a long time without communicating if you are either entirely confident in what you’re doing, or reckless, or both.

  2. It increases your visibility, and so is good for your career.

    I have a great job. I find my work to be both

(continued...)
ListenData 2019-06-19 13:20:00

How to drop one or multiple columns from Pandas Dataframe

In this tutorial, we will cover how to drop or remove one or multiple columns from pandas dataframe.
What is pandas in Python?
pandas is a python package for data manipulation. It has several functions for the following data tasks:
  1. Drop or Keep rows and columns
  2. Aggregate data by one or more columns
  3. Sort or reorder data
  4. Merge or append multiple dataframes
  5. String Functions to handle text data
  6. DateTime Functions to handle date or time format columns
Import or Load Pandas library
To make use of any python library, we first need to load them up by using import command.
import pandas as pd
import numpy as np
Let's create a fake dataframe for illustration
The code below creates 4 columns named A through D.
df = pd.DataFrame(np.random.randn(6, 4), columns=list('ABCD'))
          A         B         C         D
0 -1.236438 -1.656038
(continued...)
ListenData 2019-06-09 21:07:00

String Functions in Python with Examples

This tutorial outlines various string (character) functions used in Python. To manipulate strings and character values, python has several in-built functions. It means you don't need to import or have dependency on any external package to deal with string data type in Python. It's one of the advantage of using Python over other data science tools. Dealing with string values is very common in real-world. Suppose you have customers' full name and you were asked by your manager to extract first and last name of customer. Or you want to fetch information of all the products that have code starting with 'QT'.
Table of Contents

List of frequently used string functions The table below shows many common string functions along with description and its equivalent function in MS Excel. We all use MS Excel in our workplace and familiar with the functions used in MS Excel. The comparison of string functions in MS EXCEL and Python would help you to learn
(continued...)