SciPy

Planet SciPy

ListenData 2020-10-23 16:03:00

Translating Web Page while Scraping

Suppose you need to translate web page while scraping data from the website in R and Python. In google chrome, there is an option (or functionality) to translate any foreign language. If you are an english speaker and don't know any other foreign language and you want to extract data from the website which does not have option to convert language to English, this article would help you how to perform translation of a webpage.
What is Selenium?You may not familiar with Selenium so it is important to understand the background. Selenium is an open-source tool which is very popular in testing domain and used for automating web browsers. It allows you to write test scripts in several programming languages. Selenium is available in both R and Python. Translate Page in Web Scraping in R and PythonIn R there is a package named RSelenium whereas Selenium can be installed by installing selenium package in Python. Following (continued...)
NumFOCUS 2020-10-23 15:25:08

NumFOCUS Earns Transparency Recognition from GuideStar

Earlier this week, NumFOCUS earned our first-ever Silver Seal of Transparency from GuideStar, an independent organization which classifies nonprofit organizations based on multiple metrics pertaining to transparency and accountability. Fewer than 5% of US-based nonprofits have received this type of recognition. “This respected acknowledgment comes as we prepare to enter our year-end fundraising season,” said […]

The post NumFOCUS Earns Transparency Recognition from GuideStar appeared first on NumFOCUS.

neptune.ai 2020-10-23 07:11:26

Understanding Gradient Clipping (and How It Can Fix Exploding Gradients Problem)

Introduction By the end of this article you’ll know: What is Gradient Clipping and how does it occur? Types of Clipping techniques...

The post Understanding Gradient Clipping (and How It Can Fix Exploding Gradients Problem) appeared first on neptune.ai.

neptune.ai 2020-10-22 07:06:19

Understanding GAN Loss Functions

Ian Goodfellow introduced Generative Adversarial Networks (GAN) in 2014. It was one of the most beautiful, yet straightforward implementations of Neural Networks,...

The post Understanding GAN Loss Functions appeared first on neptune.ai.

neptune.ai 2020-10-21 07:40:01

Brier Score: Understanding Model Calibration

Do you ever encounter a storm when the probability of rain in your weather app is below 10%? Well, this shows perfectly...

The post Brier Score: Understanding Model Calibration appeared first on neptune.ai.

neptune.ai 2020-10-20 07:34:41

Data Augmentation in Python: Everything You Need to Know

In machine learning (ML), if the situation when the model does not generalize well from the training data to unseen data is...

The post Data Augmentation in Python: Everything You Need to Know appeared first on neptune.ai.

neptune.ai 2020-10-19 07:37:13

Deep Dive into ML Models in Production Using Tensorflow Extended (TFX) and Kubeflow

If I had a dollar for every machine learning model wasting in Jupyter Notebooks across the industry, I’d be a millionaire. — Me It...

The post Deep Dive into ML Models in Production Using Tensorflow Extended (TFX) and Kubeflow appeared first on neptune.ai.

neptune.ai 2020-10-16 12:12:49

PyTorch Loss Functions: The Ultimate Guide

The way you configure your loss functions can make or break the performance of your algorithm. By correctly configuring the loss function,...

The post PyTorch Loss Functions: The Ultimate Guide appeared first on neptune.ai.

neptune.ai 2020-10-15 08:26:16

Image Processing Techniques That You Can Use in Machine Learning Projects

Image processing is a method to perform operations on an image to extract information from it or enhance it. Digital image processing...

The post Image Processing Techniques That You Can Use in Machine Learning Projects appeared first on neptune.ai.

neptune.ai 2020-10-14 07:41:48

This Week in Machine Learning: Quantum Chemistry, Synthetic Biology, GPT-3 Bot on Reddit, and Relationships

Machine Learning has application in so many different fields, that sometimes it may be hard to keep track of all the new...

The post This Week in Machine Learning: Quantum Chemistry, Synthetic Biology, GPT-3 Bot on Reddit, and Relationships appeared first on neptune.ai.

neptune.ai 2020-10-14 07:38:03

Computer Vision in Machine Learning Industry – Top 12 Best Resources and How to Use Them to Follow Current Trends

To stay on top of the latest trends in machine learning, you need to be fast. Things change quickly and round the...

The post Computer Vision in Machine Learning Industry – Top 12 Best Resources and How to Use Them to Follow Current Trends appeared first on neptune.ai.

neptune.ai 2020-10-13 07:46:47

Essential Pil (Pillow) Image Tutorial (for Machine Learning People)

PIL stands for Python Image Library. In this article, we will look at its fork: Pillow. PIL has not been updated since...

The post Essential Pil (Pillow) Image Tutorial (for Machine Learning People) appeared first on neptune.ai.

ListenData 2020-10-11 14:45:00

Learn Python for Data Science

This tutorial would help you to learn Data Science with Python by examples. It is designed for beginners who want to get started with Data Science in Python. Python is an open source language and it is widely used as a high-level programming language for general-purpose programming. It has gained high popularity in data science world. In the PyPL Popularity of Programming language index, Python scored second rank with a 14 percent share. In advanced analytics and predictive analytics market, it is ranked among top 3 programming languages for advanced analytics.
Data Science with Python Tutorial

Table of Contents

Python 2 vs. 3Google yields thousands of articles on this topic. Some bloggers opposed and some in favor of 2.7. If you filter your search criteria and look for only recent articles, you would find Python 2 is no longer supported by the Python Software Foundation. Hence it does not make any sense to learn 2.7 if you start learning
(continued...)
Paul Ivanov’s Journal 2020-10-08 07:00:00

aka: also known as

I was chatting with Anthony Scopatz last week, and one of the things we covered was how it'd be cool to have a subcommand launcher, kind of like git, where the subcommands were swappable. If you're not familiar, git automatically calls out to git-something (note the dash) whenever you run

$ git something

and something is not one of the builtin git commands. For me, ~/bin is in my PATH, so

$ git lost
git: 'lost' is not a git command. See 'git --help'.
$ echo "echo how rude!" > ~/bin/git-lost; chmod +x ~/bin/git-lost
$ git lost
how rude!

And so what Anthony was talking about was having two commands that are supposed to do the same thing, and being able to switch between them. For example: maybe we have git-away and git-gone and both of them perform a similar function, and we wish call our preferred one when we run git lost.

One way to do this would be to copy or symlink our chosen version as git-lost, and replace that file whenever

(continued...)
Quansight Labs 2020-09-29 16:00:00

Design of the Versioned HDF5 Library

In a previous post, we introduced the Versioned HDF5 library and described some of its features. In this post, we'll go into detail on how the underlying design of the library works on a technical level.

Read more… (6 min remaining to read)

ListenData 2020-09-20 08:18:00

How to rename columns in Pandas Dataframe

In this tutorial, we will cover various methods to rename columns in pandas dataframe in Python. Renaming or changing the names of columns is one of the most common data wrangling task. If you are not from programming background and worked only in Excel Spreadsheets in the past you might feel it not so easy doing this in Python as you can easily rename columns in MS Excel by just typing in the cell what you want to have. If you are from database background it is similar to ALIAS in SQL. In Python there is a popular data manipulation package called pandas which simplifies doing these kind of data operations.
2 Methods to rename columns in Pandas
In Pandas there are two simple methods to rename name of columns.

First step is to install pandas package if it is not already installed. You can check if the package is installed on your machine by running

(continued...)
Quansight Labs 2020-09-11 11:00:00

Performance of the Versioned HDF5 Library

In several industry and science applications, a filesystem-like storage model such as HDF5 is the more appropriate solution for manipulating large amounts of data. However, suppose that data changes over time. In that case, it's not obvious how to track those different versions, since HDF5 is a binary format and is not well suited for traditional version control systems and tools.

In a previous post, we introduced the Versioned HDF5 library, which implements a mechanism for storing binary data sets in a versioned way that feels natural to users of other version control systems, and described some of its features. In this post, we'll show some of the performance analysis we did while developing the library, hopefully making the case that reading and writing versioned HDF5 files can be done with a nice, intuitive API while being as efficient as possible. The tests presented here show that using the Versioned HDF5 library results in reduced disk space usage,

(continued...)
Quansight Labs 2020-09-10 05:00:00

PyTorch-Ignite: training and evaluating neural networks flexibly and transparently

Authors: Victor Fomin (Quansight), Sylvain Desroziers (IFPEN, France)
This post is a general introduction of PyTorch-Ignite. It intends to give a brief but illustrative overview of what PyTorch-Ignite can offer for Deep Learning enthusiasts, professionals and researchers. Following the same philosophy as PyTorch, PyTorch-Ignite aims to keep it simple, flexible and extensible but performant and scalable.

Read more… (28 min remaining to read)

Quansight Labs 2020-08-30 09:00:00

Traitlets - an introduction & use in Jupyter configuration management

You have probably seen Traitlets in applications, you likely even use it. The package has nearly 5 million downloads on conda-forge alone.

But, what is Traitlets ?

In this post we'll answer this question along with where Traitlets came from, its applications, and a bit of history.

Read more… (8 min remaining to read)

Filipe Saraiva's blog 2020-08-29 18:48:00

Seqtembro de eventos virtuais e gratuitos sobre Qt e KDE

(Ok a piada com seqtembro funciona melhor na versão em inglês, seqtember, mas simbora) Por uma grande coincidência, obra do destino, ou nada disso, teremos um Setembro de 2020 repleto de eventos virtuais e gratuitos de alta qualidade sobre Qt e KDE. Começando de 4 à 11 do referido mês teremos o Akademy 2020, o… Continue a ler »Seqtembro de eventos virtuais e gratuitos sobre Qt e KDE
Quansight Labs 2020-08-24 12:00:00

IPython reproducible builds

Starting with IPython 7.16.1 (released in June 2020), you should be able to recreate the sdist (.tar.gz) and wheel (.whl), and get byte for byte identical result to the wheels published on PyPI. This is a critical step toward being able to trust your computing platforms, and a key component to improve efficiency of build and packaging platforms. It also potentially impacts fast conda environment creation for users. The following goes into some reasons for why you should care.

Read more… (5 min remaining to read)

Quansight Labs 2020-08-21 13:00:00

Introducing Versioned HDF5

The problem of storing and manipulating large amounts of data is a challenge in many scientific computing and industry applications. One of the standard data models for this is HDF5, an open technology that implements a hierarchical structure (similar to a file-system structure) for storing large amounts of possibly heterogeneous data within a single file. Data in an HDF5 file is organized into groups and datasets; you can think about these as the folders and files in your local file system, respectively. You can also optionally store metadata associated with each item in a file, which makes this a self-describing and powerful data storage model.

Read more… (3 min remaining to read)

Neural Ensemble News 2020-08-08 19:27:00

CARLsim5 Released!

Introduction

CARLsim5 is an efficient, easy-to-use, GPU-accelerated library for simulating large-scale spiking neural network (SNN) models with a high degree of biological detail. It allows execution of networks of Izhikevich spiking neurons with realistic synaptic dynamics using multiple off-the-shelf GPUs and x86 CPUs. The simulator provides a PyNN-like programming interface in C/C++, which allows for details and parameters to be specified at the synapse, neuron, and network level.


The present release, CARLsim 5, builds on the efficiency and scalability of earlier releases (Nageswaran et al., 2009; Richert et al., 2011, and Beyeler et al., 2015; Chou et al., 2018). The functionality of the simulator has been greatly expanded by the addition of a number of features that enable and simplify the creation, tuning, and simulation of complex networks with spatial structure.


New Features

1. PyNN Compatibility

pyCARL is a interface between the simulator-independent language PyNN and a CARLsim5 based back-end. In other words, you can write the code for a SNN model once, using the

(continued...)
Quansight Labs 2020-08-05 20:55:42

Designing with and for developers

Open source is notorious for lack of design presence, enough so that my search to prove this fact has turned up nearly nothing. There’s many ways that such a gap in community might manifest, but one that I never anticipated was working with developers that had never interacted with a designer before.

A quick note for context: I’m writing this as a UX/UI designer working with open source projects for a little over a year. Because there are so many ways design processes can happen (enough to warrant its own blog post), this post is not intended to discuss design process deeply. My goal here is to pass on some of what I’ve learned that helps me design in this unusual space in hopes that it can help someone else. This post might seem most relevant for designers, but I think this experience could be helpful for developers as well.

Read more… (5 min remaining to read)

Filipe Saraiva's blog 2020-08-04 23:27:02

O que será do Lev com o “fim” da Saraiva?

Disclaimer: apesar do sobrenome, não tenho qualquer relação com a Saraiva. E também não tenho respostas para a pergunta do título. Como usuário do Lev acompanho com interesse a agonia da Saraiva. A rede de livrarias, uma das maiores do Brasil, está há anos em um imbróglio judicial devendo diversas editoras, em um processo que… Continue a ler »O que será do Lev com o “fim” da Saraiva?
NumFOCUS 2020-07-31 17:52:20

Dask Life Sciences Fellow [Open Job]

Dask is an open-source library for parallel computing in Python that interoperates with existing Python data science libraries like Numpy, Pandas, Scikit-Learn, and Jupyter.  Dask is used today across many different scientific domains. Recently, we’ve observed an increase in use in a few life sciences applications: Large scale imaging in microscopy Single cell analysis Genomics […]

The post Dask Life Sciences Fellow [Open Job] appeared first on NumFOCUS.

Spyder Blog 2020-07-25 10:00:00

STX Next, Python development company, uses Spyder to improve their workflow

STX Next, one of Europe's largest Python development companies, has shared with us how Spyder has been a powerful tool for them when performing data analysis. It is a pleasure for us on the Spyder team to work every day to improve the workflow of developers, scientists, engineers and data analysts. We are very glad to receive and share a STX Next testimonial about Spyder, along with an interview with one of their developers, Michael Wiśniewski, who has found Spyder very useful in his job.

What Michael Wiśniewski says about Spyder

In an era of a continuously growing demand for analysis of vast amounts of data, we are facing increasingly complex tasks to perform. Sure, we are not alone—there are many great tools designed for scientists and data analysts. We have NumPy, SciPy, Matplotlib, Pandas, and others. But, wouldn't it be nice to have one extra tool that could combine all the required packages into one compact working environment? Asking this question is precisely how

(continued...)
NumFOCUS 2020-07-24 16:31:53

NumFOCUS Introduces New Supporter Program

Today NumFOCUS is pleased to introduce a new program for our individual supporters, called Open Science Champions. Each year, our community members generously support NumFOCUS and our Projects in several ways; this program is intended to connect these various forms of support so that we can engage with our community most effectively and offer our […]

The post NumFOCUS Introduces New Supporter Program appeared first on NumFOCUS.

Filipe Saraiva's blog 2020-07-24 14:49:05

Educação Vigiada

Essa época de pandemia tem sido de produção em muitas frentes, o que infelizmente implica na redução de tempo para divulgação das mesmas aqui no blog. Nesse post quero me redimir dessa falta falando de um dos projetos que acho dos mais importantes que contribui recentemente, o Educação Vigiada. Há alguns meses o projeto Educação… Continue a ler »Educação Vigiada
Quansight Labs 2020-07-21 06:00:00

Quansight Labs: what I learned in my first 3 months

I joined Quansight at the beginning of April, splitting my time between PyTorch (as part of a larger Quansight team) and contributing to Quansight Labs supported community-driven projects in the Python scientific and data science software stack, primarily to NumPy. I have found my next home; the people, the projects, and the atmosphere are an all around win-win for me and (I hope) for the projects to which I contribute.

Read more… (2 min remaining to read)

NumFOCUS 2020-07-14 20:36:34

Open Source Developer Advocate

Position Overview The primary role of the Open Source Developer Advocate is to represent and support developers of NumFOCUS open source projects by serving as a link to internal and external stakeholders as well as the global user community. You will generate attention and support by applying your technical knowledge, passion for open source data […]

The post Open Source Developer Advocate appeared first on NumFOCUS.

Quansight Labs 2020-07-11 05:39:56

Learn NixOS by turning a Raspberry Pi into a Wireless Router

I recently moved, and my new place has a relatively small footprint. (Yes, I moved during the COVID-19 pandemic. And yes, it was crazy.) I quickly realized that was going to need a wireless router of some sort, or more formally, a wireless access point (WAP). Using my Ubuntu laptop's "wireless hotspot" capability was a nice temporary solution, but it had a few serious drawbacks.

Read more… (14 min remaining to read)

Filipe Saraiva's blog 2020-07-10 23:09:48

Engrenagem Ep. 04 – Aplicações KDE favoritas dos KDErs brasileiros

Nesse sábado dia 11/07 às 10h o KDE Brasil vai voltar com episódios do Engrenagem, o videocast da comunidade brasileira (que está há 4 anos sem episódios inéditos 🙂 ). Para retomar os trabalhos, o episódio trará 6 colaboradores brasileiros (Ângela, Aracele, Caio, Filipe (eu), Fred e Tomaz) falando de suas aplicações KDE favoritas –… Continue a ler »Engrenagem Ep. 04 – Aplicações KDE favoritas dos KDErs brasileiros
Spyder Blog 2020-07-08 10:00:00

Writing docs is not just writing docs

This blogpost was originally published on the Quansight Labs website.

I joined the Spyder team almost two years ago, and I never thought I was going to end up working on docs. Six months ago I started a project with CAM Gerlach and Carlos Cordoba to improve Spyder’s documentation. At first, I didn’t actually understand how important docs are for software, especially for open source projects. However, during all this time I’ve learned how documentation has a huge impact on the open-source community and I’ve been thankful to have been able to do this. But, from the beginning, I asked myself “why am I the ‘right person’ for this?”

Improving Spyder’s documentation started as part of a NumFOCUS Small Development Grant awarded at the end of last year. The goal of the project was not only to update the documentation for Spyder 4, but also to make it more user-friendly, so users can understand Spyder’s key concepts and get started with it more

(continued...)
Quansight Labs 2020-07-07 22:00:00

Writing docs is not just writing docs

I joined the Spyder team almost two years ago, and I never thought I was going to end up working on docs. Six months ago I started a project with CAM Gerlach and Carlos Cordoba to improve Spyder’s documentation. At first, I didn’t actually understand how important docs are for software, especially for open source projects. However, during all this time I’ve learned how documentation has a huge impact on the open-source community and I’ve been thankful to have been able to do this. But, from the beginning, I asked myself “why am I the ‘right person’ for this?”

Read more… (3 min remaining to read)

Filipe Saraiva's blog 2020-06-25 13:15:22

Sobre o livro “Uma História de Desigualdade”

Finalizei a leitura do premiado livro do Pedro de Souza, “Uma História de Desigualdade – A Concentração de Renda entre os Ricos no Brasil 1926 – 2013“, baseado na tese que defendeu no programa de sociologia da UnB. É um livro de fôlego e que faz jus a todos os elogios que recebeu desde o… Continue a ler »Sobre o livro “Uma História de Desigualdade”
Spyder Blog 2020-06-12 18:00:00

Thanking the people behind Spyder 4

This blogpost was originally published on the Quansight Labs website.

After more than three years in development and more than 5000 commits from 60 authors around the world, Spyder 4 finally saw the light on December 5, 2019! I decided to wait until now to write a blogpost about it because shortly after the initial release, we found several critical performance issues and some regressions with respect to Spyder 3, most of which are fixed now in version 4.1.3, released on May 8th 2020.

This new release comes with a lengthy list of user-requested features aimed at providing an enhanced development experience at the level of top general-purpose editors and IDEs, while strengthening Spyder's specialized focus on scientific programming in Python. The interested reader can take a look at some of them in previous blog posts, and in detail in our Changelog. However, this post is not meant to describe those improvements, but to acknowledge all people that contributed

(continued...)
Gaël Varoquaux - programming 2020-05-27 22:00:00

Technical discussions are hard; a few tips

Note

This post discuss the difficulties of communicating while developing open-source projects and tries to gives some simple advice.

A large software project is above all a social exercise in which technical experts try to reach good decisions together, for instance on github pull requests. But communication is difficult, in …

Pierre de Buyl's homepage - scipy 2020-05-19 09:00:00

Tidynamics, what use?

In 2018 I published small Python library, tidynamics. The scope was deliberately limited: compute the typical correlation functions for stochastic and molecular dynamics: the autocorrelation and the mean-square displacement. Two years later, I wonder about its usage.

NumFOCUS 2020-05-18 19:48:24

Moderna, IMC Renew NumFOCUS Corporate Sponsorships

Monday, May 18th, 2020 Two NumFOCUS corporate supporters recently made fresh commitments to our open source mission. Trading firm IMC and biotechnology company Moderna Therapeutics each renewed their corporate sponsorships earlier this month. Both companies have supported NumFOCUS since 2018 at our Silver and Bronze sponsorship levels, respectively. Asked about his company’s decision to partner […]

The post Moderna, IMC Renew NumFOCUS Corporate Sponsorships appeared first on NumFOCUS.

NumFOCUS 2020-05-18 14:58:09

NumFOCUS Projects helping combat the COVID-19 pandemic

Open source tools are uniquely positioned to help combat the ongoing COVID-19 pandemic through their adaptable and collaborative nature. NumFOCUS sponsored and affiliated projects are being used on a global scale to meet the needs of researchers and data scientists. Our projects are being used in groundbreaking scientific efforts to create response models, visualize and […]

The post NumFOCUS Projects helping combat the COVID-19 pandemic appeared first on NumFOCUS.

Paul Ivanov’s Journal 2020-05-17 07:00:00

Lazy River of Curious Content 0

This is the first post of what I'm calling a Lazy River of Curious Content. This is a way to review stuff that I've been doing, dealing with, or find interesting during the week recently (This was originally written two weeks ago, May 3rd, my shoddy internet connectivity kept me from posting it.). I'm loosely following the format that Justin Sherrill uses with great effect over at https://dragonflydigest.com

Learn NixOS by turning a Raspberry Pi into a Wireless Router Friend of the show, Anthony Scopatz, tried NixOS for the first time and provides a detailed report:

"While I had read the NixOS pamphlets, and listened politely when the faithful came knocking on my door at inconvenient times, I had never walked the path of functional Linux enlightenment myself"

Reading through that made me file away a todo of writing up how I use propellor (and why). But those todo sometimes just pile up for a while...

An interview of one of my long time nerd-crushes, Rob Pike. The questions focus on the Go programming

(continued...)
Living in an Ivory Basement 2020-05-06 22:00:00

sourmash databases as zip files, in sourmash v3.3.0

Use compressed databases directly!

Filipe Saraiva's blog 2020-05-05 18:29:16

LaKademy 2019

Em novembro passado, colaboradores latinoamericanos do KDE desembarcaram em Salvador/Brasil para participarem de mais uma edição do LaKademy – o Latin American Akademy. Aquela foi a sétima edição do evento (ou oitava, se você contar o Akademy-BR como o primeiro LaKademy) e a segunda com Salvador como a cidade que hospedou o evento. Sem problemas… Continue a ler »LaKademy 2019
Filipe Saraiva's blog 2020-05-04 21:20:54

Akademy 2019

Em setembro de 2019 a cidade italiana de Milão sediou o principal encontro mundial dos colaboradores do KDE – o Akademy, onde membros de diferentes áreas como tradutores, desenvolvedores, artistas, pessoal de promo e mais se reúnem por alguns dias para pensar e construir o futuro dos projetos e comunidade(s) do KDE Antes de chegar… Continue a ler »Akademy 2019
NumFOCUS 2020-05-01 16:32:10

Yellowbrick Update – April 2020

Yellowbrick released Version 1.1 on February 25, 2020.  If you haven’t yet upgraded simply type pip install yellowbrick -U or conda install -c districtdatalabs yellow-brick into your terminal/command prompt to get it.  The major improvement in v1.1 is introducing quick methods or one-liners to generate your favorite ML plots more quickly with Yellowbrick.  Dr.  Rebecca […]

The post Yellowbrick Update – April 2020 appeared first on NumFOCUS.

NumFOCUS 2020-04-29 18:34:15

2020 PyData Conferences Update [COVID-19]

We wanted to give an update to our community regarding the upcoming 2020 PyData conferences. We have been closely monitoring the situation and to help ensure the safety of our community given the threat of the COVID-19 virus, the following in-person events have been postponed to 2021: PyData Miami PyData Amsterdam PyData LA PyData London PyData […]

The post 2020 PyData Conferences Update [COVID-19] appeared first on NumFOCUS.

NumFOCUS 2020-04-28 18:16:15

Scientific Software Developer- Contract Basis [SunPy Project]

Scientific Software Developer- Contract Basis NumFOCUS is seeking a Scientific Software Developer to support the SunPy project. SunPy is a Python-based open source scientific software package supporting solar physics data analysis. This is a 1 year contract.     The successful applicant will work to improve SunPy’s functionality. There are four main tasks:   Report on […]

The post Scientific Software Developer- Contract Basis [SunPy Project] appeared first on NumFOCUS.

Spyder Blog 2020-04-22 17:00:00

Creating the ultimate terminal experience in Spyder 4 with Spyder-Terminal

This blogpost was originally published on the Quansight Labs website.

The Spyder-Terminal project is revitalized! The new 0.3.0 version adds numerous features that improve the user experience, and enhances compatibility with the latest Spyder 4 release, in part thanks to the improvements made in the xterm.js project.

Upgrade to ES6/JSX syntax

First, we were able to update all the old JavaScript files to use ES6/JSX syntax and the tests for the client terminal. This change simplified the code base and maintenance and allows us to easily extend the project to new functionalities that the xterm.js API offers. In order to compile this code and run it inside Spyder, we migrated our deployment to Webpack.

Multiple shells per operating system

In the new release, you now have the ability to configure which shell to use in the terminal. On Linux and UNIX systems, bash, sh, ksh, zsh, csh, pwsh, tcsh, screen, tmux, dash and rbash are supported, while cmd and powershell are the

(continued...)
Living in an Ivory Basement 2020-04-19 22:00:00

Software and workflow development practices (April 2020 update)

How we develop software and workflows in the DIB Lab, in 2020.

Filipe Saraiva's blog 2020-04-16 15:12:29

LaKademy 2019

Past November 2019 KDE fellows from Latin-America arrived in Salvador – Brazil to attend an one more edition of LaKademy – the Latin American Akademy. That was the 7th edition of the event (or the 8th, if you count Akademy-BR as the first LaKademy) and the second one with Salvador as host city. No problem… Continue a ler »LaKademy 2019
Martin Fitzpatrick - python 2020-04-13 11:01:00

Is it getting better yet? An optimistic visual guide to the Coronavirus pandemic

As the apocalypse rumbles on, I found myself wondering "Is it getting any better?"

Daily updates of spiralling case numbers (and worse, deaths) does little to give a sense of whether we're getting to, or already past, the worst of it.

To answer that question for myself and you, I …

Living in an Ivory Basement 2020-04-12 22:00:00

How to give a bad online talk

A bad example...

fa.bianp.net 2020-04-06 22:00:00

On the Link Between Polynomials and Optimization

There's a fascinating link between minimization of quadratic functions and polynomials. A link that goes deep and allows to phrase optimization problems in the language of polynomials and vice versa. Using this connection, we can tap into centuries of research in the theory of polynomials and shed new light on …

Paul Ivanov’s Journal 2020-04-03 07:00:00

pheriday 3: infrastructure

Looks like we can't inline audio for your browser. That's cool, just find the direct file links below.

paul's habitual errant ramblings (on Fr)idays

pheridays: 3

2020-04-10: A week ago, I recorded a 5 minute audio segment of some stuff I've been thinking about, but when I started to write it up I stumbled into and kept dropping down a deep technostalgic hole.

fall down along with me:

https://pirsquared.org/blog/pheriday-infrastructure.html

The recording is just shy of five minutes long, you can also download it in different formats, depending on your needs, if the audio tag above doesn't suit you:

https://pirsquared.org/pheridays/2020-04-03.ogg (2.9 Mb)
https://pirsquared.org/pheridays/2020-04-03.mp3 (4.5 Mb)
https://pirsquared.org/pheridays/2020-04-03.m4a (6.3 Mb)

--

Stuff I mentioned in the audio:

Propellor - "configuration management system using Haskell and Git" by Joey Hess

OpenWRT - specifically - reducing Bufferbloat

Mumble - "a free, open source, low latency, high quality voice chat application."

sourcehut.org - "the hacker's forge" also know as sr.ht by Drew DeVault

Jitsi - "Multi-platform open-source video conferencing"

OpenFire - "real time collaboration (RTC) server licensed under the

(continued...)
NumFOCUS 2020-03-13 15:02:49

PyData COVID-19 Response

The safety and well-being of our community are extremely important to us. We have therefore decided to postpone all PyData conferences scheduled to take place until the end of June: PyData Miami PyData London PyData Amsterdam We have been closely monitoring the situation and believe this is the best action to take based on the […]

The post PyData COVID-19 Response appeared first on NumFOCUS.

NumFOCUS 2020-03-10 19:13:20

Statement on Coronavirus

As you are aware, the Coronavirus (COVID-19) is a topic of frequent and ongoing discussions. We would like to provide an update on our status and policies as well as provide resources for additional information. As of today, our event schedule remains as posted on event sites. Any changes or updates will be immediately shared. […]

The post Statement on Coronavirus appeared first on NumFOCUS.

Filipe Saraiva's blog 2020-02-24 16:22:14

Akademy 2019

Past September the Italian city of Milan hosted the KDE contributors meeting called Akademy, the main KDE conference where contributors from different areas like translators, developers, artists, promoters and more stay together for some days thinking and building the future of KDE projects and community(ies). Firstly before Akademy I departed from Brazil to Portugal to… Continue a ler »Akademy 2019
NumFOCUS 2020-02-20 18:08:46

Announcing JupyterCon 2020

NumFOCUS is excited to be a part of JupyterCon 2020. JupyterCon will be held August 10 – 14 in Berlin, Germany at the Berlin Conference Center. We invite you to participate in this exciting community event! Read the full announcement here. JupyterCon 2020 is an event brought to you in partnership by Project Jupyter and NumFOCUS.

The post Announcing JupyterCon 2020 appeared first on NumFOCUS.

Living in an Ivory Basement 2020-02-16 23:00:00

Two talks at JGI in May: sourmash, spacegraphcats, and disease associations in the human microbiome.

Using k-mers and taxonomy to find contamination in metagenomes

Leonardo Uieda 2020-01-23 12:00:00

Advancing research software in the UK through an SSI fellowship

I have been selected as part of the 2020 cohort of Fellows of the Software Sustainability Institute!

The Institute cultivates world-class research with software. It's based at the universities of Edinburgh, Manchester, Southampton, and Oxford in the UK. Their motto says it all:

The SSI has a yearly fellowship program to fund the organization of communities around scientific software (creating of local user groups, workshops, hackathons, etc). Even more importantly, they organize several events to get current and past fellows in the same place doing awesome stuff. I'm really looking forward to this year's Collaborations Workshop (registration is open to all, not just fellows). I applied at the end of last year and was selected to join the 2020 cohort of fellows along with some truly amazing people.

My plan for the fellowship is to

(continued...)
Peekaboo 2020-01-07 17:26:00

Don't fund Software that doesn't exist

I’ve been happy to see an increase in funding for open source software across research areas and across funding bodies. However, I observed that a majority of funding from, say, the NSF, goes to projects that do not exist yet, and where the funding is supposed to create a new project, or to extend projects that are developed and used within a single research lab. I think this top-down approach to creating software comes from a misunderstanding of the existing open source software that is used in science. This post collects thoughts on the effectiveness of current grant-based funding and how to improve it from the perspective of the grant-makers.
Instead of the current approach of funding new projects, I would recommend funding existing open source software, ideally software that is widely used, and underfunded. The story of the underfunded but critically important open source software (which I’ll refer to as infrastructure software) should be an old tale by now.
(continued...)
Living in an Ivory Basement 2020-01-01 23:00:00

sourmash-oddify: a workflow for exploring contamination in metagenome-assembled genomes

Using k-mers and taxonomy to find contamination in metagenomes

Leonardo Uieda 2019-12-08 12:00:00

Two PhD studentships at the University of Liverpool

I have two open positions for funded studentships at the University of Liverpool. Applications are open until 10 January 2020.

Project descriptions

Follow the links for more detailed versions.

Bringing machine learning techniques to geophysical data processing

The goal of this project is to investigate the use of existing machine learning techniques to process gravity and magnetics data using the Equivalent Layer Method. The methods and software developed during this project can be applied to process large amounts of gravity and magnetics data, including airborne and satellite surveys, and produce data products that can enable further scientific investigations. Examples of such data products include global gravity gradient grids from GOCE satellite measurements, regional magnetic grids for the UK, gravity grids for the Moon and Mars, etc.

Large-scale mapping of the thickness of the

(continued...)
Gaël Varoquaux - programming 2019-12-01 05:00:00

Getting a big scientific prize for open-source software

Note

An important acknowledgement for a different view of doing science: open, collaborative, and more than a proof of concept.

A few days ago, Loïc Estève, Alexandre Gramfort, Olivier Grisel, Bertrand Thirion, and myself received the “Académie des Sciences Inria prize for transfer”, for our contributions to the scikit-learn project …

Spyder Blog 2019-11-28 20:00:00

Variable Explorer improvements in Spyder 4

This blogpost was originally published on the Quansight Labs website.

Spyder 4 will be released very soon with lots of interesting new features that you'll want to check out, reflecting years of effort by the team to improve the user experience. In this post, we will be talking about the improvements made to the Variable Explorer.

These include the brand new Object Explorer for inspecting arbitrary Python variables, full support for MultiIndex dataframes with multiple dimensions, and the ability to filter and search for variables by name and type, and much more.

It is important to mention that several of the above improvements were made possible through integrating the work of two other projects. Code from gtabview was used to implement the multi-dimensional Pandas indexes, while objbrowser was the foundation of the new Object Explorer.

New viewer for arbitrary Python objects

For Spyder 4 we added a long-requested feature: full support for inspecting any kind of Python object through the Variable

(continued...)
Spyder Blog 2019-11-12 00:00:00

File management improvements in Spyder 4

This blogpost was originally published on the Quansight Labs website.

Version 4.0 of Spyder is almost ready! It has been in the making for well over two years, and it contains lots of interesting new features. We will focus on the Files pane in this post, where we've made several improvements to the interface and file management tools.

Simplified interface

In order to simplify the Files pane's interface, the columns corresponding to size and kind are hidden by default. To change which columns are shown, use the top-right pane menu or right-click the header directly.

Custom file associations

First, we added the ability to associate different external applications with specific file extensions they can open. Under the File associations tab of the Files preferences pane, you can add file types and set the external program used to open each of them by default.

Once you've set this up, files will automatically launch in the associated application when opened from the Files pane in Spyder.

(continued...)
ListenData 2019-10-28 15:48:00

Loan Amortisation Schedule using R and Python

In this post, we will explain how you can calculate your monthly loan instalments the way bank calculates using R and Python. In financial world, analysts generally use MS Excel software for calculating principal and interest portion of instalment using PPMT, IPMT functions. As data science is growing and trending these days, it is important to know how you can do the same using popular data science programming languages such as R and Python.

When you take a loan from bank at x% annual interest rate for N number of years. Bank calculates monthly (or quarterly) instalments based on the following factors :

  • Loan Amount
  • Annual Interest Rate
  • Number of payments per year
  • Number of years for loan to be repaid in instalments
Loan Amortisation ScheduleIt refers to table of periodic loan payments explaining the breakup of principal and interest in each instalment/EMI until the loan is repaid at the end of its stipulated term. Monthly instalments are generally same every month
(continued...)
I Love Symposia! 2019-10-24 13:59:54

Introducing napari: a fast n-dimensional image viewer in Python

I'm really excited to finally, officially, share a new(ish) project called napari with the world. We have been developing napari in the open from the very first commit, but we didn't want to make any premature fanfare about it… Until now. It's still alpha software, but for months now, both the core napari team and a few collaborators/early adopters have been using napari in our daily work. I've found it life-changing.

The background

I've been looking for a great nD volume viewer in Python for the better part of a decade. In 2009, I joined Mitya Chklovskii's lab and the FlyEM team at the Janelia [Farm] Research Campus to work on the segmentation of 3D electron microscopy (EM) volumes. I started out in Matlab, but moved to Python pretty quickly and it was a very smooth transition (highly recommended! ;). Looking at my data was always annoying though. I was either looking at single 2D slices using matplotlib.pyplot.imshow, or saving the volumes in VTK format and loading them into ITK-SNAP — which worked ok

(continued...)
Filipe Saraiva's blog 2019-10-11 16:29:56

O SERPRO e a validação de documentos digitais

No rascunho do post anterior sobre os documentos digitais no Brasil acabei escrevendo bastante sobre o papel do SERPRO nesse processo – tanto que decidi separá-lo em um post próprio. Com o lançamento do e-Título foi necessário para o TSE criar uma maneira de validar o documento digital para evitar fraudes. A tecnologia adotada foi… Continue a ler »O SERPRO e a validação de documentos digitais
fa.bianp.net 2019-09-26 22:00:00

How to Evaluate the Logistic Loss and not NaN trying

A naive implementation of the logistic regression loss can results in numerical indeterminacy even for moderate values. This post takes a closer look into the source of these instabilities and discusses more robust Python implementations.

Paul Ivanov’s Journal 2019-09-17 07:00:00

Uvas Gold 200

My poem about a rainy 200k was published in the Fall 2019 issue of American Randonneur (a quarterly magazine published by Randonneurs USA)

I've been doing samizdat poetry for as long as I've had a web presence (since 1999), but I am now officially a published poet! (I am deliberately not counting the embarrasing hackjob that was published in a youth anthology when I was in 8th grade.)

You can find "Uvas Gold 200" on page 26 - either directly on this skeuomorphic leafing viewer or the PDF, but I'm republishing both the exposition blurb and the poem below. If you prefer to listen, I recorded a reading of it that you can download in different flavors: a local audio only, a local video, or the embeded video version below.

Uvas Gold 200k starts and ends in Fremont, CA and was held on Saturday, December 1st, 2018. The ride frontloads the climbing by going nearly half-way up Mount

(continued...)
Spyder Blog 2019-08-16 00:00:00

Spyder 4.0: Kite integration is here

This blogpost was originally published on the Quansight Labs website.

Note: Kite is sponsoring the work discussed in this blog post, and in addition supports Spyder 4.0 development through a Quansight Labs Community Work Order.

As part of our next release, we are proud to announce an additional completion client for Spyder, Kite. Kite is a novel completion client that uses Machine Learning techniques to find and predict the best autocompletion for a given text. Additionally, it collects improved documentation for compiled packages, e.g. Matplotlib, NumPy and SciPy, that cannot be obtained easily by using traditional code analysis packages such as Jedi. Although Kite is not open source like Spyder, you can download it without charge at the Kite website.

By incorporating Kite into Spyder, we will improve and provide the ultimate autocompletion and signature retrieval experience for most of the scientific Python stack and beyond. For instance, let’s take a look at the following PyTorch completion. While

(continued...)
ListenData 2019-08-10 21:54:00

Object Oriented Programming in Python : Learn by Examples

This tutorial outlines object oriented programming (OOP) in Python with examples. It is a step by step guide which was designed for people who have no programming experience. Object Oriented Programming is popular and available in other programming languages besides Python which are Java, C++, PHP.
Table of Contents

What is Object Oriented Programming?In object-oriented programming (OOP), you have the flexibility to represent real-world objects like car, animal, person, ATM etc. in your code. In simple words, an object is something that possess some characteristics and can perform certain functions. For example, car is an object and can perform functions like start, stop, drive and brake. These are the function of a car. And the characteristics are color of car, mileage, maximum speed, model year etc.

In the above example, car is an object. Functions are called methods in OOP world. Characteristics are attributes (properties). Technically attributes are variables or values related to the state of the object whereas methods

(continued...)
ListenData 2019-07-29 20:20:00

Precision Recall Curve Simplified

This article outlines precision recall curve and how it is used in real-world data science application. It includes explanation of how it is different from ROC curve. It also highlights limitation of ROC curve and how it can be solved via area under precision-recall curve. This article also covers implementation of area under precision recall curve in Python, R and SAS.
Table of Contents

What is Precision Recall Curve?Before getting into technical details, we first need to understand precision and recall terms in layman's term. It is essential to understand the concepts in simple words so that you can recall it for future work when it is required. Both Precision and Recall are important metrics to check the performance of binary classification model. PrecisionPrecision is also called Positive Predictive Value. Suppose you are building a customer attrition model which has objective to identify customers who are likely to close relationship with the company. The use of this model is to
(continued...)
Living in an Ivory Basement 2019-07-22 22:00:00

Comparing two genome binnings quickly with sourmash

Comparing two sets of MAGs, for fun and profit!

ListenData 2019-07-22 09:20:00

Calculate KS Statistic with Python

Kolmogorov-Smirnov (KS) Statistics is one of the most important metrics used for validating predictive models. It is widely used in BFSI domain. If you are a part of risk or marketing analytics team working on project in banking, you must have heard of this metrics. What is KS Statistics?It stands for Kolmogorov–Smirnov which is named after Andrey Kolmogorov and Nikolai Smirnov. It compares the two cumulative distributions and returns the maximum difference between them. It is a non-parametric test which means you don't need to test any assumption related to the distribution of data. In KS Test, Null hypothesis states null both cumulative distributions are similar. Rejecting the null hypothesis means cumulative distributions are different.

In data science, it compares the cumulative distribution of events and non-events and KS is where there is a maximum difference between the two distributions. In simple words, it helps us to understand how well our predictive model is able to discriminate between events and

(continued...)
ListenData 2019-07-20 16:22:00

A Complete Guide to Python DateTime Functions

In this tutorial, we will cover python datetime module and how it is used to handle date, time and datetime formatted columns (variables). It includes various practical examples which would help you to gain confidence in dealing dates and times with python functions. In general, Date types columns are not easy to manipulate as it comes with a lot of challenges like dealing with leap years, different number of days in a month, different date and time formats or if date values are stored in string (character) format etc.
Table of Contents

Introduction : datetime moduleIt is a python module which provides several functions for dealing with dates and time. It has four classes as follows which are explained in the latter part of this article how these classes work.
  1. datetime
  2. date
  3. time
  4. timedelta

People who have no experience of working with real-world datasets might have not encountered date columns. They might be under impression that working with dates is rarely used and not so

(continued...)
ListenData 2019-07-17 17:32:00

What are *args and **kwargs and How to use them

This article explains the concepts of *args and **kwargs and how and when we use them in python program. Seasoned python developers embrace the flexibility it provides when creating functions. If you are beginner in python, you might not have heard it before. After completion of this tutorial, you will have confidence to use them in your live project.
Table of Contents

Introduction : *argsargs is a short form of arguments. With the use of *args python takes any number of arguments in user-defined function and converts user inputs to a tuple named args. In other words, *args means zero or more arguments which are stored in a tuple named args.

When you define function without *args, it has a fixed number of inputs which means it cannot accept more (or less) arguments than you defined in the function.

In the example code below, we are creating a very basic function which adds two numbers. At the same time, we created a

(continued...)
ListenData 2019-07-12 21:42:00

Python : 10 Ways to Filter Pandas DataFrame

In this article, we will cover various methods to filter pandas dataframe in Python. Data Filtering is one of the most frequent data manipulation operation. It is similar to WHERE clause in SQL or you must have used filter in MS Excel for selecting specific rows based on some conditions. In terms of speed, python has an efficient way to perform filtering and aggregation. It has an excellent package called pandas for data wrangling tasks. Pandas has been built on top of numpy package which was written in C language which is a low level language. Hence data manipulation using pandas package is fast and smart way to handle big sized datasets.
Examples of Data Filtering
It is one of the most initial step of data preparation for predictive modeling or any reporting project. It is also called 'Subsetting Data'. See some of the examples of data filtering below.
  • Select all the active customers whose accounts were opened
(continued...)
ListenData 2019-07-04 19:51:00

Python Dictionary Comprehension with Examples

In this tutorial, we will cover how dictionary comprehension works in Python. It includes various examples which would help you to learn the concept of dictionary comprehension and how it is used in real-world scenarios.
What is Dictionary?
Dictionary is a data structure in python which is used to store data such that values are connected to their related key. Roughly it works very similar to SQL tables or data stored in statistical softwares. It has two main components -
  1. Keys : Think about columns in tables. It must be unique (like column names cannot be duplicate)
  2. Values : It is similar to rows in tables. It can be duplicate.
It is defined in curly braces { }. Each key is followed by a colon (:) and then values.
Syntax of Dictionary

d = {'a': [1,2], 'b': [3,4], 'c': [5,6]}
To extract keys, values and structure of dictionary, you can submit the following commands.

d.keys() # 'a', 'b', 'c'
d.values() # [1, 2], [3, 4], [5,
(continued...)

HTML outputs in Jupyter

Summary

User interaction in data science projects can be improved by adding a small amount of visual deisgn.

To motivate effort around visual design we show several simple-yet-useful examples. The code behind these examples is small and accessible to most Python developers, even if they don’t have much HTML experience.

This post in particular focuses on Jupyter’s ability to add HTML output to any object. This can either be full-fledged interactive widgets, or just rich static outputs like tables or diagrams. We hope that by showing examples here we will inspire some throughts in other projects.

This post was supported by replies to this tweet. The rest of this post is just examples.

Iris

I originally decided to write this post after reading another blogpost from the UK Met office, where they included the HTML output of their library Iris in a a blogpost

(work by Peter Killick, post by Theo McCaie)

The fact that the output provided by an interactive session is the same output that you would provide in a published result helps everyone. The interactive

(continued...)
ListenData 2019-07-03 15:01:00

Python list comprehension : Learn by Examples

This tutorial covers how list comprehension works in Python. It includes many examples which would help you to familiarize the concept and you should be able to implement it in your live project at the end of this lesson.
Table of Contents

What is list comprehension?Python is an object oriented programming language. Almost everything in them is treated consistently as an object. Python also features functional programming which is very similar to mathematical way of approaching problem where you assign inputs in a function and you get the same output with same input value. Given a function f(x) = x2, f(x) will always return the same result with the same x value. The function has no "side-effect" which means an operation has no effect on a variable/object that is outside the intended usage. "Side-effect" refers to leaks in your code which can modify a mutable data structure or variable.

Functional programming is also good for parallel computing as there is no

(continued...)
Peekaboo 2019-07-02 16:11:00

Don't cite the No Free Lunch Theorem

Tldr; You probably shouldn’t be citing the "No Free Lunch" Theorem by Wolpert. If you’ve cited it somewhere, you might have used it to support the wrong conclusion. What it actually (vaguely) says is “You can’t learn from data without making assumptions”.

The paper on the “No Free Lunch Theorem”, actually called "The Lack of A Priori Distinctions Between Learning Algorithms" is one of these papers that are often cited and rarely read, and I hear many people in the ML community refer to it when supporting the claim that “one model can’t be the best at everything” or “one model won’t always be better than another model”. The point of this post is to convince you that this is not what the paper or theorem says (at least not the one usually cited by Wolpert), and you should not cite this theorem in this context; and also that common versions cited of the "No Free Lunch" Theorem (continued...)
ListenData 2019-06-28 22:46:00

15 ways to read CSV file with pandas

This tutorial explains how to read a CSV file in python using read_csv function of pandas package. Without use of read_csv function, it is not straightforward to import CSV file with python object-oriented programming. Pandas is an awesome powerful python package for data manipulation and supports various functions to load and import data from various formats. Here we are covering how to deal with common issues in importing CSV file.
Table of Contents

Install and Load Pandas Package
Make sure you have pandas package already installed on your system. If you set up python using Anaconda, it comes with pandas package so you don't need to install it again. Otherwise you can install it by using command pip install pandas. Next step is to load the package by running the following command. pd is an alias of pandas package. We will use it instead of full name "pandas".
import pandas as pd
Create Sample Data for Import
The program below creates a sample
(continued...)
ListenData 2019-06-25 11:31:00

Matplotlib Tutorial : Learn by Examples

This tutorial outlines how to perform plotting and data visualization in python using Matplotlib library. The objective of this post is to get you familiar with the basics and advanced plotting functions of the library. It contains several examples which will give you hands-on experience in generating plots in python.
Table of Contents

What is Matplotlib?It is a powerful python library for creating graphics or charts. It takes care of all of your basic and advanced plotting requirements in Python. It took inspiration from MATLAB programming language and provides a similar MATLAB like interface for graphics. The beauty of this library is that it integrates well with pandas package which is used for data manipulation. With the combination of these two libraries, you can easily perform data wrangling along with visualization and get valuable insights out of data. Like ggplot2 library in R, matplotlib library is the grammar of graphics in Python and most used library for charts in Python.
Basics
(continued...)

Write Short Blogposts

I encourage my colleagues to write blogposts more frequently. This is for a few reasons:

  1. It informs your broader community what you’re up to, and allows that community to communicate back to you quickly.

    You communicating to the community fosters a sense of collaboration, openness, and trust. You gain collaborators, build momentum behind your work, and curate a body of knowledge that early adopters can consume to become experts quickly.

    Getting feedback from your community helps you to course-correct early in your work, and stops you from wasting time in inefficient courses of action.

    You can only work for a long time without communicating if you are either entirely confident in what you’re doing, or reckless, or both.

  2. It increases your visibility, and so is good for your career.

    I have a great job. I find my work to be both

(continued...)
ListenData 2019-06-19 13:20:00

How to drop one or multiple columns from Pandas Dataframe

In this tutorial, we will cover how to drop or remove one or multiple columns from pandas dataframe.
What is pandas in Python?
pandas is a python package for data manipulation. It has several functions for the following data tasks:
  1. Drop or Keep rows and columns
  2. Aggregate data by one or more columns
  3. Sort or reorder data
  4. Merge or append multiple dataframes
  5. String Functions to handle text data
  6. DateTime Functions to handle date or time format columns
Import or Load Pandas library
To make use of any python library, we first need to load them up by using import command.
import pandas as pd
import numpy as np
Let's create a fake dataframe for illustration
The code below creates 4 columns named A through D.
df = pd.DataFrame(np.random.randn(6, 4), columns=list('ABCD'))
          A         B         C         D
0 -1.236438 -1.656038
(continued...)
ListenData 2019-06-09 21:07:00

String Functions in Python with Examples

This tutorial outlines various string (character) functions used in Python. To manipulate strings and character values, python has several in-built functions. It means you don't need to import or have dependency on any external package to deal with string data type in Python. It's one of the advantage of using Python over other data science tools. Dealing with string values is very common in real-world. Suppose you have customers' full name and you were asked by your manager to extract first and last name of customer. Or you want to fetch information of all the products that have code starting with 'QT'.
Table of Contents

List of frequently used string functions The table below shows many common string functions along with description and its equivalent function in MS Excel. We all use MS Excel in our workplace and familiar with the functions used in MS Excel. The comparison of string functions in MS EXCEL and Python would help you to learn
(continued...)
Ralf Gommers | Reflections 2019-06-05 00:00:00

The cost of an open source contribution

Open source is massively successful. Some say it’s eating the world, although to my ears that phrasing doesn’t sound entirely like a good thing. Open source maintainers are always in need of help, and over the past years I’ve seen a lot of focus on ways open source projects can grow their communities and gain new contributors. Guidance on how to go about finding new contributors is easily found. E.
Spyder Blog 2019-06-02 00:00:00

TDK-Micronas partners with Quansight to sponsor Spyder

This blogpost was originally published on the Quansight Labs website

TDK-Micronas is sponsoring Spyder development efforts through Quansight Labs. This will enable the development of some features that have been requested by our users, as well as new features that will help TDK develop custom Spyder plugins in order to complement their Automatic Test Equipment (ATE’s) in the development of their Application Specific Integrated Circuits (ASIC’s).

At this point it may be useful to clarify the relationship the role of Quansight Labs in Spyder's development and the relationship with TDK. To quote Ralf Gommers (director of Quansight Labs):

"We're an R&D lab for open source development of core technologies around data science and scientific computing in Python. And focused on growing communities around those technologies. That's how I see it for Spyder as well: Quansight Labs enables developers to be employed to work on Spyder, and helps with connecting them to developers of other projects in similar situations. Labs should be an enabler to let the Spyder project, its community and individual developers grow.

(continued...)
I Love Symposia! 2019-05-28 08:41:54

Why citations are not enough for open source software

A few weeks ago I wrote about why you should cite open source tools. Although I think citations important, though, there are major problems in relying on them alone to support open source work.

The biggest problem is that papers describing a software library can only give credit to the contributors at the time that the paper was written. The preferred citation for the SciPy library is “Eric Jones, Travis Oliphant, Pearu Peterson, et al”, 2001. The “et al” is not an abbreviation here, but a fixed shorthand for all other contributors. Needless to say many, many people have contributed to the SciPy library since 2001 (GitHub counts 716 contributors as of this writing), and they are unable to get credit within the academic system for those contributions. (As an aside, Google counts about 1,200 citations to SciPy, which is a breathtaking undercounting of its value and influence, and reinforces my earlier point: cite open source software! Definitely don't use this post as an excuse not to cite it!!!)

Not surprisingly, we have had

(continued...)