Planet SciPy 2021-04-13 08:14:00

Building Machine Learning Chatbots – Choose the Right Platform and Applications

As someone who does machine learning, you’ve probably been asked to build a chatbot for a business, or you’ve come across a chatbot project before.  When I started my ML journey, a friend asked me to build a chatbot for her business. Lots of failed attempts later, someone told me to check ML platforms with […]

The post Building Machine Learning Chatbots – Choose the Right Platform and Applications appeared first on 2021-04-12 22:00:00

On the Link Between Optimization and Polynomials, Part 4

While the most common accelerated methods like Polyak and Nesterov incorporate a momentum term, a little known fact is that simple gradient descent –no momentum– can achieve the same rate through only a well-chosen sequence of step-sizes. In this post we'll derive this method and through simulations discuss its practical … 2021-04-12 10:41:00

How to Organize Your ML Development in an Efficient Way

One major issue that every data scientist and ML practitioner will eventually encounter is workflow management. Testing different scenarios and use cases, logging information and details, sharing and comparing results from a particular set of samples, visualizing the data, keeping track of insights. These are key components of data science workflow management. They help business […]

The post How to Organize Your ML Development in an Efficient Way appeared first on

Quansight Labs 2021-04-11 14:00:00

A step towards educating with Spyder

As a community manager in the Spyder team, I have been looking for ways of involving more users in the community and making Spyder useful for a larger number of people. With this, a new idea came: Education.

For the past months, we have been wondering with the team whether Spyder could also serve as a teaching-learning platform, especially in this era where remote instruction has become necessary. We submitted a proposal to the Essential Open Source Software for Science (EOSS) program of the Chan Zuckerberg Initiative, during its third cycle, with the idea of providing a simple way inside Spyder to create and share interactive tutorials on topics relevant to scientific research. Unfortunately, we didn’t get this funding, but we didn’t let this great idea die.

We submitted a second proposal to the Python Software Foundation from which we were awarded $4000. For me, this is the perfect opportunity for us to take the first step towards using Spyder for education.

Read more… (2 min remaining to read) 2021-04-11 08:40:00

Deep Learning Guide: Choosing Your Data Annotation Tool

We all know what data annotation is. It’s a part of any supervised deep learning project, including computer vision. A common computer vision task, like image classification, object detection, and segmentation requires annotations for each and every image fed into the model training algorithm.  You simply must get a good tool for image annotation. In […]

The post Deep Learning Guide: Choosing Your Data Annotation Tool appeared first on 2021-04-10 13:08:00

Gradient Boosted Decision Trees [Guide] – a Conceptual Explanation

Gradient boosted decision trees have proven to outperform other models. It’s because boosting involves implementing several models and aggregating their results. Gradient boosted models have recently become popular thanks to their performance in machine learning competitions on Kaggle.  In this article, we’ll see what gradient boosted decision trees are all about.  Gradient boosting In gradient […]

The post Gradient Boosted Decision Trees [Guide] – a Conceptual Explanation appeared first on

NumFOCUS 2021-04-09 18:02:05

NumFOCUS Welcomes Tesco Technology to Corporate Sponsors

NumFOCUS is pleased to announce our new partnership with Tesco Technology. A long-time PyData event sponsor, Tesco Technology joined NumFOCUS as a Silver Corporate Sponsor in December 2020. “We are very excited to formalize our partnership with Tesco Technology,” said Leah Silen, NumFOCUS Executive Director. “Tesco Technology has partnered with NumFOCUS for the past several […]

The post NumFOCUS Welcomes Tesco Technology to Corporate Sponsors appeared first on NumFOCUS.

Quansight Labs 2021-04-09 14:00:00

PyTorch TensorIterator Internals - 2021 Update

For contributors to the PyTorch codebase, one of the most commonly encountered C++ classes is TensorIterator. TensorIterator offers a standardized way to iterate over elements of a tensor, automatically parallelizing operations, while abstracting device and data type details.

In April 2020, Sameer Deshmukh wrote a blog article discussing PyTorch TensorIterator Internals. Recently, however, the interface has changed significantly. This post describes how to use the current interface as of April 2021. Much of the information from the previous article is directly copied here, but with updated API calls and some extra details.

Read more… (8 min remaining to read) 2021-04-09 07:10:00

Overfitting vs Underfitting in Machine Learning – Everything You Need to Know

We live in a world where data dictates a lot of our activity. Some say data is the new fuel. Data doesn’t only tell us about the past. If we model it carefully, with accurate methods, then we can find patterns and correlations to predict stock markets, generate protein sequences, explore biological structures like viruses, […]

The post Overfitting vs Underfitting in Machine Learning – Everything You Need to Know appeared first on

NumFOCUS 2021-04-08 21:14:55

Job Posting | Communications and Marketing Manager

Job Title: Communications and Marketing Manager Position Overview The primary role of the Communications & Marketing Manager is to manage the NumFOCUS brand by overseeing all outgoing communications between NumFOCUS and our stakeholders. You will serve the project communities by playing a key role in their event marketing management and assist with project promotional and […]

The post Job Posting | Communications and Marketing Manager appeared first on NumFOCUS.

Anaconda Blog 2021-04-08 14:00:00

There Is No Data – Only Frozen Models

For a deeper look at this topic, check out this episode from The a16z Podcast, featuring a conversation between Peter and Martin Casado. 2021-04-08 10:06:00

Best Metadata Store Solutions: Kubeflow Metadata vs TensorFlow Extended (TFX) ML Metadata (MLMD) vs Mlflow vs Neptune

How do you get the most precise machine learning model? Through experiments, of course! Whether you’re testing which algorithm to use, changing variable values, or choosing features to include, ML experiments help you decide.  But, there’s a downside. They produce massive amounts of artifacts. The output could be a trained model, a model checkpoint, or […]

The post Best Metadata Store Solutions: Kubeflow Metadata vs TensorFlow Extended (TFX) ML Metadata (MLMD) vs Mlflow vs Neptune appeared first on 2021-04-07 09:34:00

Binarized Neural Network (BNN) and Its Implementation in Machine Learning

Binarized Neural Network (BNN) comes from a paper by Courbariaux, Hubara, Soudry, El-Yaniv and Bengio from 2016. It introduced a new method to train neural networks, where weights and activations are binarized at train time, and then used to compute the gradients.  This way, memory size is reduced, and bitwise operations improve the power efficiency. […]

The post Binarized Neural Network (BNN) and Its Implementation in Machine Learning appeared first on 2021-04-06 09:20:00

Randomly Wired Neural Networks – What You Actually Need to Know

Innovative wiring of neural networks is a big part of the success of neural network architectures like ResNets and DenseNets. With Neural architecture search (NAS), researchers explore the joint optimization of wiring and operation types. In a paper “Exploring Randomly Wired Neural Networks for Image Recognition” from Facebook AI Research, authors investigate connectivity patterns through […]

The post Randomly Wired Neural Networks – What You Actually Need to Know appeared first on

Share Your R and Python Notebooks 2021-04-05 09:08:27.873805

Data Analysis With Pyspark Dataframe

Data Analysis With Pyspark Dataframe
Install Pyspark

!pip install pyspark

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
In [2]:
import pyspark
from pyspark.rdd import RDD
from pyspark.sql import Row
from pyspark.sql import DataFrame
from pyspark.sql import SparkSession
from pyspark.sql import SQLContext
from pyspark.sql import functions
from pyspark.sql.functions import lit, desc, col, size, array_contains\
, isnan, udf, hour, array_min, array_max, countDistinct
from pyspark.sql.types import *

from  import Pipeline     
from pyspark.sql.functions import mean,col,split, col, regexp_extract, when, lit
Pyspark Example

For this exercise, I will use the purchase data. Let us take a look at this data using unix head command. We can run unix commands in Python Jupyter notebook using ! in front of every command.

In [3]:
!head -1 purchases.csv
12-29	11:06	Fort Wayne	Sporting Goods	199.82	Cash

Firstly, We need to create a spark container by calling SparkSession. This step is necessary before doing anything

In [4]:
from pyspark.sql import SparkSession
from pyspark.sql.types import *

#create session in
(continued...) 2021-04-05 08:46:00

Why You Should Use Continuous Integration and Continuous Deployment in Your Machine Learning Projects

Continuous integration (CI), continuous delivery (CD) and continuous testing (CT) are at the core of Machine Learning Operation (MLOps) principles. If you’re a data scientist or machine learning engineer who knows DevOps principles, in this article I’ll show you how to apply them to ML workflows. It might also be useful if you’re an IT […]

The post Why You Should Use Continuous Integration and Continuous Deployment in Your Machine Learning Projects appeared first on 2021-04-04 07:57:00

Fighting Overfitting with L1 or L2 Regularization – Which One Is Better?

Poor performance in machine learning models comes from either overfitting or underfitting, and we’ll take a close look at the first one. Overfitting happens when the learned hypothesis is fitting the training data so well that it hurts the model’s performance on unseen data. The model generalizes poorly to new instances that aren’t a part […]

The post Fighting Overfitting with L1 or L2 Regularization – Which One Is Better? appeared first on

Anaconda Blog 2021-03-25 17:11:00

Why Organizations Should Invest in a Chief Data Officer

Not every company may be ready for a dedicated Chief Data Scientist; some may need to roll these particular responsibilities into a broader CDO position, while others might need to set aside a percentage of a CIO or CTO’s time to devote to data issues. But every company should have at least one senior leader who is accountable for ensuring the organization gets strategic value from its data and stewards it ethically and legally. Data science and ML can unlock vast benefits for organizations, but only if these initiatives are handled responsibly and thoughtfully. This is why the trend toward an increasing number of C-level data roles will continue.
Quansight Labs 2021-03-25 08:00:00

Accessibility: Who's Responsible?

JupyterLab Accessibility Journey Part 1

For the past few months, I've been part of a group of people in the JupyterLab community who've committed to start chipping away at the many accessibility failings of JupyterLab. I find this work is critical, fascinating, and a learning experience for everyone involved. So I'm going to document my personal experience and lessons I've learned in a series of blog posts. Welcome!

Read more… (6 min remaining to read)

jbencook 2021-03-22 23:54:00

PyTorch Tensor to NumPy Array and Back

You can easily convert a NumPy array to a PyTorch tensor and a PyTorch tensor to a NumPy array. This post explains how it works.

The post PyTorch Tensor to NumPy Array and Back appeared first on jbencook.

jbencook 2021-03-20 03:15:00

TorchVision Transforms: Image Preprocessing in PyTorch

TorchVision, a PyTorch computer vision package, has a great API for image pre-processing in its torchvision.transforms module. This post gives some basic usage examples, describes the API and shows you how to create and use custom image transforms.

The post TorchVision Transforms: Image Preprocessing in PyTorch appeared first on jbencook.

Anaconda Blog 2021-03-12 19:16:00

Q&A With Anaconda Experts: How Do You Become a Data Scientist?

Want to have a career in data science? Good news: many roads lead to your goal.
Blog – Enthought 2021-03-09 18:37:16

Giving Visibility to Renewable Energy

The EnergizAIR Infrastructure framework and key interfaces, with the Enthought responsibility on the project shown in the central, grey box. The ultimate project goal was to raise individual awareness of the contribution of renewable energy sources, and ultimately change behaviors. Now ten years later, with orders of magnitude more data, AI/machine learning, cloud, and smartphones …
Continue Reading
Anaconda Blog 2021-03-08 15:15:00

Recognizing International Women’s Day and Diversity in Tech

This year has provided some exciting and positive milestones for women. Bumble's founder, and fellow Austin woman in tech, Whitney Wolfe Herd, made history for being the 22nd female founder and the youngest to take a company public. Chloe Zhao recently won the Golden Globe for Best Director for the movie "Nomadland," being the second woman ever to win this award and the first Asian woman. And the first female vice president Kamala Harris is the highest-ranking female official in U.S. history and represents the first Black and first Asian-American in the position.
Anaconda Blog 2021-03-03 22:00:00

Why Data Preparation Should Never Be Fully Automated

Want more tips for efficient data preparation? Check out this guide that shares tools and tricks to make each step of the process more effective.
jbencook 2021-03-03 17:10:00

NumPy Where: Understanding np.where()

The NumPy where function is like a vectorized switch that you can use to combine two arrays.

The post NumPy Where: Understanding np.where() appeared first on jbencook.

jbencook 2021-03-02 14:05:00

Finding the Mode of an Empirical Continuous Distribution

You can find the mode of an empirical continuous distribution by plotting the histogram and looking for the maximum bin.

The post Finding the Mode of an Empirical Continuous Distribution appeared first on jbencook. 2021-03-01 23:00:00

On the Link Between Optimization and Polynomials, Part 3

I've seen things you people wouldn't believe.
Valleys sculpted by trigonometric functions.
Rates on fire off the shoulder of divergence.
Beams glitter in the dark near the Polyak gate.
All those landscapes will be lost in time, like tears in rain.
Time to halt.

A momentum optimizer *

Anaconda Blog 2021-02-25 19:04:00

How Can Higher Education Better Prepare Students to Enter the Data Science Field?

The skills and experience needed to succeed as a data scientist today will not be the same as those required in five or ten years. The worlds of education and business must collaborate to best prepare students to enter the data science workforce.
jbencook 2021-02-25 14:03:00

NumPy All: Understanding np.all()

The np.all() function tests whether all elements in a NumPy array evaluate to true.

The post NumPy All: Understanding np.all() appeared first on jbencook.

Quansight Labs 2021-02-25 08:00:00

Enhancements to Numba's guvectorize decorator

Starting from Numba 0.53, Numba will ship with an enhanced version of the @guvectorize decorator. Similar to the @vectorize decorator, @guvectorize now has two modes of operation:

  • Eager, or decoration-time compilation and
  • Lazy, or call-time compilation

Before, only the eager approach was supported. In this mode, users are required to provide a list of concrete supported types beforehand as its first argument. Now, this list can be omitted if desired and as one calls it, Numba dynamically generates new kernels for previously unsupported types.

Read more… (3 min remaining to read)

While My MCMC Gently Samples 2021-02-23 15:00:00

Introducing PyMC Labs: Saving the World with Bayesian Modeling

After I left Quantopian in 2020, something interesting happened: various companies contacted me inquiring about consulting to help them with their PyMC3 models.

Usually, I don't hear how people are using PyMC3 -- they mostly show up on GitHub or Discourse when something isn't working right. So, hearing about all these …

jbencook 2021-02-22 13:57:00

Binary Cross Entropy Explained

A simple NumPy implementation of the binary cross entropy loss function and some intuition about why it works.

The post Binary Cross Entropy Explained appeared first on jbencook.

Martin Fitzpatrick - python 2021-02-22 08:00:00

Using MicroPython and uploading libraries on Raspberry Pi Pico — Using rshell to upload custom code

MicroPython is an implementation of the Python 3 programming language, optimized to run microcontrollers. It's one of the options available for programming your Raspberry Pi Pico and a nice friendly way to get started with microcontrollers.

MicroPython can be installed easily on your Pico, by following the instructions on the …

jbencook 2021-02-19 13:56:00

Filtering DataFrames with the .query() Method in Pandas

Pandas provides a .query() method on DataFrame's with a convenient string syntax for filtering DataFrames. This post describes the method and gives simple usage examples.

The post Filtering DataFrames with the .query() Method in Pandas appeared first on jbencook.

Blog – Enthought 2021-02-16 20:16:34

SciPy 2021

As in 2020, this year’s SciPy Conference will be virtual, offering increased opportunities for attendance. 2020 set an attendance record of over 1,500, almost double the 2019 Austin, Texas conference. The event brings together attendees from industry, academia, national labs and more – showcasing projects, sharing knowledge and collaborating on code development.   Author: Kristen Leiser, …
Continue Reading
jbencook 2021-02-15 13:53:00

Linear Interpolation in Python: An np.interp() Example

It's easy to linearly interpolate a 1-dimensional set of points in Python using the np.interp() function from NumPy.

The post Linear Interpolation in Python: An np.interp() Example appeared first on jbencook.

NumFOCUS 2021-02-10 19:54:10

Job Posting | Events and Digital Marketing Coordinator

Job Title: Events and Digital Marketing Coordinator Position Overview The primary role of the Events and Digital Marketing Coordinator is to support and assist the Events Manager and the Community Communications and Marketing Manager to advance one of NumFOCUS’s primary missions of educating and building the community of users and developers of open source scientific […]

The post Job Posting | Events and Digital Marketing Coordinator appeared first on NumFOCUS.

jbencook 2021-02-09 13:47:00

NumPy Meshgrid: Understanding np.meshgrid()

You can create multi-dimensional coordinate arrays using the np.meshgrid() function, which is also available in PyTorch and TensorFlow. But watch out! PyTorch uses different indexing by default so the results might not be the same.

The post NumPy Meshgrid: Understanding np.meshgrid() appeared first on jbencook.

Blog – Enthought 2021-02-08 17:11:33

Strategy, Digitalization and Global Trends – C Suite Reflections

Key points are presented from the first of a series of LinkedIn articles where JSR Board Chairman Mitsunobu Koshiba (‘Nobu’) provides thought provoking insights on business strategy in the context of trends in three time horizons. The short term is dominated by an increased acceptance of Modern Monetary Theory. The mid-term is a shift in …
Continue Reading
jbencook 2021-02-08 13:42:00

SageMaker Studio Quick Start

A step-by-step quick start guide for SageMaker Studio. Start a Studio session, launch a notebook on a GPU instance and run object detection inference with a detectron2 pre-trained model.

The post SageMaker Studio Quick Start appeared first on jbencook.

Living in an Ivory Basement 2021-02-01 23:00:00

Transition your Python project to use pyproject.toml and setup.cfg! (An example.)

Updating old Python packages, in this year of the PSF 2021!

Martin Fitzpatrick - python 2021-01-28 14:00:00

SAM Coupé SCREEN$ Converter — Interrupt optimizing image converter

The SAM Coupé was a British 8 bit home computer that was pitched as a successor to the ZX Spectrum, featuring improved graphics and sound and higher processor speed.

The SAM Coupé's high-color MODE4 could manage 256x192 resolution graphics, with 16 colors from a choice of 128. Each pixel …

Anaconda Blog 2021-01-26 18:30:00

New Year’s resolutions for data scientists in 2021

The start of a new year is a popular time to recalibrate habits and set goals for improvement, both personally and professionally. First, it’s helpful to take stock of where things stand, in order to hone in on potential areas for betterment. In data science, the past year represented another step forward in the maturation of the discipline, especially with the onset of the COVID-19 pandemic. We saw researchers come together to harness the power of data and open-source software for public health, and concepts of statistical modeling become mainstream as we looked to curb the disease’s spread. At the same time, we had a glimpse into a future of division if we don’t pay heed to issues of bias in data and explainability in algorithms.
Living in an Ivory Basement 2021-01-24 23:00:00

A snakemake hack for checkpoints

snakemake checkpoints r awesome

Quansight Labs 2021-01-24 04:00:00

Python packaging in 2021 - pain points and bright spots

At Quansight we have a weekly "Q-share" session on Fridays where everyone can share/demo things they have worked on, recently learned, or that simply seem interesting to share with their colleagues. This can be about anything, from new utilities to low-level performance, from building inclusive communities to how to write better documentation, from UX design to what legal & accounting does to support the business. This week I decided to try something different: hold a brainstorm on the state of Python packaging today.

The ~30 participants were mostly from the PyData world, but not exclusively - it included people with backgrounds and preferences ranging from C, C++ and Fortran to JavaScript, R and DevOps - and with experience as end-users, packagers, library authors, and educators. This blog post contains the raw output of the 30-minute brainstorm (only cleaned up for textual issues) and my annotations on it (in italics) which capture some of the discussion during the session and links and context that may be helpful. I think it sketches a decent picture of

Martin Fitzpatrick - python 2021-01-22 14:00:00

SAM Coupé Reader — Preserving FRED retro disk magazine text, by decoding the Entropy Reader

FRED was the most popular disk magazine for the SAM Coupé 8 bit home computer.Published by Colin MacDonald out of sunny Monifieth, Scotland, the magazine ran from it's first issue in 1990 through to it's last (82) in 1998.

For the SAM networking project I was hoping there might …

Quansight Labs 2021-01-22 14:00:00

Making SciPy's Image Interpolation Consistent and Well Documented

SciPy n-dimensional Image Processing

SciPy's ndimage module provides a powerful set of general, n-dimensional image processing operations, categorized into areas such as filtering, interpolation and morphology. Traditional image processing deals with 2D arrays of pixels, possibly with an additional array dimension of size 3 or 4 to represent color channel and transparency information. However, there are many scientific applications where we may want to work with more general arrays such as the 3D volumetric images produced by medical imaging methods like computed tomography (CT) or magnetic resonance imaging (MRI) or biological imaging approaches such as light sheet microscopy. Aside from spatial axes, such data may have additional axes representing other quantities such as time, color, spectral frequency or different contrasts. Functions in ndimage have been implemented in a general n-dimensional manner so that they can be applied across 2D, 3D or more dimensions. A more detailed overview of the module is available in the SciPy ndimage tutorial. SciPy's image functions are

Martin Fitzpatrick - python 2021-01-21 07:00:00

micro:bit Space Invaders — MicroPython retro game in just 25 pixels

How much game can you fit into 25 pixels? Quite a bit it turns out.

This is a mini clone of arcade classic Space Invaders for the BBC micro:bit microcomputer. Using the accelerometer and two buttons for input, to can beat off wave after wave of aliens that advance …

Anaconda Blog 2021-01-08 16:35:00

What’s to come in 2021: 5 predictions for the future of data science and AI/ML

Since our founding in 2012, we’ve set out to create a movement that brings together data science practitioners, enterprises, and the open-source community. Data science has gone from a “nice-to-have” to a requisite for most businesses, and we’re proud to have witnessed its growth and expansion in recent years. But we also know there’s still much more to come.
ListenData 2021-01-06 10:35:00

Run SAS in Python without Installation

In the past few years python has gained a huge popularity as a programming language in data science world. Many banks and pharma organisations have started using Python and some of them are in transition stage, migrating SAS syntax library to Python. Many big organisations have been using SAS since early 2000 and they developed a hundreds of SAS codes for various tasks ranging from data extraction to model building and validation. Hence it's a marathon task to migrate SAS code to any other programming language. Migration can only be done in phases so day to day tasks would not be hit by development and testing of python code. Since Python is open source it becomes difficult sometimes in terms of maintaining the existing code. Some SAS procedures are very robust and powerful in nature its alternative in Python is still not implemented, might be doable but not a straightforward way for average developer or analyst.

Do you wish

Quansight Labs 2021-01-04 08:00:00

Welcoming Tania Allard as Quansight Labs co-director

Today I'm incredibly excited to welcome Tania Allard to Quansight as Co-Director of Quansight Labs. Tania (GitHub, Twitter, personal site) is a well-known and prolific PyData community member. In the past few years she has been involved as a conference organizer (JupyterCon, SciPy, PyJamas, PyCon UK, PyCon LatAm, JuliaCon and more), as a community builder (PyLadies, NumFOCUS, RForwards), as a contributor to Matplotlib and Jupyter, and as a regular speaker and mentor. She also brings relevant experience in both industry and academia - she joins us from Microsoft where she was a senior developer advocate, and has a PhD in computational modelling.

Read more… (4 min remaining to read)

Filipe Saraiva's blog 2020-12-30 12:43:56

Disnatia X/Potências de X

Nenhuma equipe de heróis me é tão querida quanto X-Men. Lá pelo final dos anos 90 comecei a colecionar por alguns anos, mas em seguida veio o fatídico aumento de preço com as Super-Heróis Premium, o que me acabou desmotivando a comprar. De lá para cá, acompanho esporadicamente, lendo notícias sobre, comprando uma ou outra… Continue a ler »Disnatia X/Potências de X
Quansight Labs 2020-12-22 09:00:00

Develop a JupyterLab Winter Theme

JupyterLab 3.0 is about to be released and provides many improvements to the extension system. Theming is a way to extend JupyterLab and benefits from those improvements.

While theming is often disregarded as a purely cosmetic endeavour, it can greatly improve software. Theming can be great help for accessibility, and the Jupyter team pays attention to making the default appearance accessibility-aware by using sufficient contrast. For users with a high visual acuity you may also choose to increase the information density.

Theming can also be a great way to improve communication by increasing or decreasing emphasis of the user interface, which can be of use for teaching or presenting. Theming may also help with security, for example, by having a clear distinction between staging and production.

Finally Theming can be a great way to express oneself, for example, by using a branded version of software that fits well into a context, or expressing one's artistic preferences or opinions.

In the following blog post, we will show you step-by-step how you

ListenData 2020-12-21 14:50:00

Wish Christmas with Python and R

This post is dedicated to all the Python and R Programming Lovers...Flaunt your knowledge in your peer group with the following programs. As a data science professional, you want your wish to be special on eve of christmas. If you observe the code, you may also learn 1-2 tricks which you can use later in your daily tasks.

Method 1 : Run the following program and see what I mean

R Code

paste(rep(intToUtf8(acos(exp(0)/2)*180/pi+2^4+3*2),2), collapse = intToUtf8(0)),
LETTERS[5^(3-1)], intToUtf8(atan(1/sqrt(3))*180/pi+2),
sep = intToUtf8(0)

Python Code

import math
import datetime

(chr(int(math.acos(math.log(1))*180/math.pi-13)) \
+, 2, 1).strftime('%B')[1] \
+ 2 *, 2, 1).strftime('%B')[3] \
+, 2, 1).strftime('%B')[7] \
+ chr(int(math.atan(1/math.sqrt(3))*180/math.pi+2)) \
+, 10, 1).strftime('%B')[1] \
+ chr(int(math.acos(math.log(1))*180/math.pi-18)) \
+, 4, 1).strftime('%B')[2:4] \
+ chr(int(math.acos(math.exp(0)/2)*180/math.pi+2**4+3*2+1)) \
+ chr(int(math.acos(math.exp(0)/2)*180/math.pi+2**4+2*4)) \
+ chr(int(math.acos(math.log(1))*180/math.pi-13)) \
+ "{:c}".format(97) \
+ chr(int(math.atan(1/math.sqrt(3))*180/math.pi*3-7))).upper()
Method 2 : Audio Wish for Christmas

Turn on computer speakers before running the code.

R Code

christmas_file <- tempfile()
download.file("", christmas_file, mode = "wb")
(continued...) 2020-12-20 23:00:00

On the Link Between Optimization and Polynomials, Part 2

An analysis of momentum can be tightened using a combination Chebyshev polynomials of the first and second kind. Through this connection we'll derive one of the most iconic methods in optimization: Polyak momentum.

ListenData 2020-12-19 15:59:00

How to use variable in a query in pandas

Suppose you want to reference a variable in a query in pandas package in Python. This seems to be a straightforward task but it becomes daunting sometimes. Let's discuss it with examples in the article below.

Let's create a sample dataframe having 3 columns and 4 rows. This dataframe is used for demonstration purpose.

import pandas as pd
df = pd.DataFrame({"col1" : range(1,5),
"col2" : ['A A','B B','A A','B B'],
"col3" : ['A A','A A','B B','B B']
Filter a value A A in column col2
In order to do reference of a variable in query, you need to use @.
NumFOCUS 2020-12-18 21:21:54

NumFOCUS hires Open Source Developer Advocate!

  NumFOCUS is pleased to announce that Arliss Collins has been hired as our organization’s first Open Source Developer Advocate. Founded in 2012, NumFOCUS has finally grown beyond just providing non-technical needs for our 40+ sponsored projects! As our first technical hire, Arliss will work to help understand our projects from a technical perspective and […]

The post NumFOCUS hires Open Source Developer Advocate! appeared first on NumFOCUS.

Blog – Enthought 2020-12-17 15:46:42

Digital Transformation in Practice

Taken from the webinar, the frequent strategy of digitalization-layering digital tools and technology onto existing processes-provides incremental value, which soon flattens out. At the other end, companies born with digital DNA (for example an Amazon) are capable of incredible innovation and adaptation. The reality is, most companies must transform to develop digital DNA. Applied digital …
Continue Reading
NumFOCUS 2020-12-11 19:37:25

A Pivotal Time in NumFOCUS’s Project Aimed DEI Efforts

NumFOCUS is pleased to announce the launch of our Contributor Diversification & Retention Research Project funded by a grant from the Gordon and Betty Moore Foundation.  “We were eager to support NumFOCUS’s diversity initiative because it aims to get to the heart of what is preventing greater participation in data science. We are hopeful that […]

The post A Pivotal Time in NumFOCUS’s Project Aimed DEI Efforts appeared first on NumFOCUS.

Anaconda Blog 2020-12-10 17:21:00

Data literacy is for everyone - not just data scientists

Today more than ever before, businesses around the world are embracing data and data-driven decisions as key aspects of operating a modern, successful organization. Yet, despite the surface-level recognition of the importance of data and the insights it can provide, the process of harnessing data and mining those insights is often perceived as abstract and mysterious—a domain that solely belongs to experts with PhDs.
Blog – Enthought 2020-12-08 17:38:17

Up the ‘Digital Level’ of Your R+D Lab

Image: A key role of materials and chemistry R&D researchers is to invert the primary function of their labs – that of creating materials from chemical structures, formulations and processes – to one of determining the inputs that will produce materials with the desired properties with minimal iteration. This process can be significantly accelerated by …
Continue Reading
Anaconda Blog 2020-12-01 21:30:00

Six must-have soft skills for every data scientist

Today, it’s hard to imagine a world without data science. Over the last few decades, it has become ingrained in society. Particularly during the COVID-19 pandemic, data is front and center in headlines every day. That being said, it’s important to remember that data science is still a new field, and one not without its challenges to overcome.
Blog – Enthought 2020-12-01 17:26:45

The Challenges of Scaling Digital Advances in Life Sciences

An illustration by Michelle Macroni evokes the complex web of ingredients to possibilities in life sciences through applied digital innovation in R&D.  Author: Robyn Cardwell, Ph.D., Director, Life Sciences Solutions Incremental Advances Versus Created Possibilities        Many of the improvements in life sciences R&D labs today come through introducing digital technologies to existing processes, with an …
Continue Reading
Blog – Enthought 2020-11-24 11:19:46

Enthought at the 2020 Materials Research Society Conference

Machine learning classification model learns complex printability window for inkjet printed polymer films using data from automated formulation and printing system. Authors: Michael Heiber, Ph.D., Applications Engineer and Frank Longford, Ph.D., Scientific Software Developer The Materials Research Society (MRS) is a global community of materials researchers, built to promote the advancement of interdisciplinary materials research and …
Continue Reading
NumFOCUS 2020-11-23 14:44:42

Anaconda Announces Multi-Year Partnership with NumFOCUS

A key stakeholder in the open source scientific computing ecosystem has further formalized their long-standing partnership with NumFOCUS. Anaconda, the Austin, Texas-based software development and consulting company which provides global distribution of Python and R software packages, last month introduced their Anaconda Dividend Program. Through this initiative, Anaconda plans to direct a portion of their […]

The post Anaconda Announces Multi-Year Partnership with NumFOCUS appeared first on NumFOCUS.

Pierre de Buyl's homepage - scipy 2020-11-23 10:00:00

What's in a model

During the coronavirus epidemic, the belgian federal group of scientific experts came up regularly in the official communication of the government. How can scientists understand the spread of an epidemic? By using a model: a mathematical description of a phenomenon. By varying the parameters of the model, one can test …

Quansight Labs 2020-11-19 17:29:55

A second CZI grant for NumPy and OpenBLAS

I am happy to announce that NumPy and OpenBLAS have once again been awarded a grant from the Chan Zuckerberg Initiative through Cycle 3 of the Essential Open Source Software for Science (EOSS) program. This new grant totaling $140,000 will fund part of our efforts to improve usability and sustainability in both projects and is excellent news for the scientific computing community, which will certainly benefit from this work downstream.

Read more… (4 min remaining to read)

NumFOCUS 2020-11-18 18:36:55

NumFOCUS Receives Support from Heising-Simons

NumFOCUS is grateful to announce that we received a grant award of $50,000 in October from the Heising-Simons Foundation. This generous grant funding will provide general support resources to NumFOCUS and will benefit all of our Sponsored and Affiliated Projects as well as our organization’s several programs and initiatives. “This grant award from Heising-Simons will […]

The post NumFOCUS Receives Support from Heising-Simons appeared first on NumFOCUS.

Quansight Labs 2020-11-18 05:00:30

Introduction to Design in Open Source

This blog post is a conversation. Portions lead by Tim George are marked with TG, and those lead by Isabela Presedo-Floyd are marked with IPF.

TG: When I speak with other designers, one common theme I see concerning why they chose this career path is they want to make a difference in the world. We design because we imagine a better world and we want to help make it real. Part of the reason we design as a career is we're unable to go through life without designing; we're always thinking about how things are and how they could be better. This ethos also exists in many open-source communities. It seems like it ought to be an ideal match.

So what's the disconnect? I'm still exploring that myself, but after a few years in open source I want to share my observations, experiences, and hope for a stronger collaboration between design and development. I don't think I have a complete solution, and some days I'm not even sure I grasp the entire

Blog – Enthought 2020-11-11 15:49:00

Digital-centric R+D Laboratories

To have a transformative impact, labs must reinvent workflows through digital technologies and skills, adopting a strong data culture. A figure from the white paper captures Level 5, where innovation through digital-centric systems confidently produces new materials that meet customer specifications orders of magnitude faster than before, enabling broader business transformation.  Authors: Chris Farrow, Ph.D., …
Continue Reading
Filipe Saraiva's blog 2020-11-05 14:50:03

Bate-papo com Vivi Reis sobre tecnologia e política

Hoje à noite (5 de novembro) às 20h conversarei com Vivi Reis, candidata a vereadora pelo PSOL em Belém. No bate-papo vamos focar bastante sobre temas que entrelaçam tecnologia e política. Entre os pontos, teremos o Escritório de Dados, dados e políticas públicas, software livre na administração pública, conectividade em Belém, inclusão digital, aplicativos cidadãos,… Continue a ler »Bate-papo com Vivi Reis sobre tecnologia e política
Spyder Blog 2020-11-05 00:00:00

New features in Spyder 4's new debugger!

IPython is a great improvement over the standard Python interpreter, bringing many enhancements such as autocompletion and "magic" commands. When debugging, however, many of these features become inaccessible. With Spyder, we aim to bring back these capabilities and more for a truly premium debugging experience! (And believe me, I use this debugger a lot, and not only because I write code that might contain bugs :p).

In this post, I will describe the debugger improvements we've already made in Spyder 4, as well as those that are already implemented or under review for Spyder 4.2 and beyond.

Make the debugger more like IPython

IPython improves on the stock Python interpreter by adding syntax highlighting, completion, and history. We have done the same for the debugger!

The output is prettier (and easier to read) than plain black text, as it was in Spyder 3!

Code completion and history for the debugger use the same functionality as the IPython console, so you should not notice any difference in behaviour. Just press

NumFOCUS 2020-11-04 00:10:51

JupyterCon 2020: Code of Conduct Reports

Following the reports to the NumFOCUS Code-of-Conduct committee on Jeremy Howard’s keynote at JupyterCon 2020, and the controversy that followed, the NumFOCUS Code of Conduct Committee issued a public apology to Jeremy Howard and escalated the case to the board of directors. The context In his keynote at JupyterCon 2020, Jeremy Howard gave a point-by-point rebuttal of […]

The post JupyterCon 2020: Code of Conduct Reports appeared first on NumFOCUS.

NumFOCUS 2020-10-30 18:51:02

Public Apology to Jeremy Howard

We, the NumFOCUS Code of Conduct Enforcement Committee, issue a public apology to Jeremy Howard for our handling of the JupyterCon 2020 reports. We should have done better. We thank you for sharing your experience and we will use it to improve our policies going forward. We acknowledge that it was an extremely stressful experience, […]

The post Public Apology to Jeremy Howard appeared first on NumFOCUS.

Paul Ivanov’s Journal 2020-10-29 07:00:00

Money and California Propositions (2020)

Ten years ago, I made some plots for how much money was contributed to and spent by the various proposition campaigns in California.

I decided to update these for this election, and here's the result:

Just in case you didn't get the full picture, here is the same data plotted on a common scale:

So, whereas 10 years ago, we had a total of ~$58 million on the election, the overwhelming amount of in support, this time, we had ~$662 million, an 11 fold increase!

The Cal-Access Campaign Finance Activity: Propositions & Ballot Measures source I used last time was still there, but there are way more propositions this time (12 vs 5), and the money details are broken out by committee, with some propositions have a dozen committees. Another wrinkle is that website has protected by some fancy scraping protection. I could browse it just fine in Firefox, even with Javascript turned off, but couldn't download it using wget, curl,

NumFOCUS 2020-10-26 18:13:17

TARDIS Joins NumFOCUS as a Sponsored Project

NumFOCUS is pleased to announce the newest addition to our fiscally sponsored projects: TARDIS TARDIS is an open-source, Monte Carlo based radiation transport simulator for supernovae ejecta. TARDIS simulates photons traveling through the outer layers of an exploded star including relevant physics like atomic interactions between the photons and the expanding gas. The TARDIS collaboration […]

The post TARDIS Joins NumFOCUS as a Sponsored Project appeared first on NumFOCUS.

Filipe Saraiva's blog 2020-10-26 13:51:04

Por um Escritório de Dados para Políticas Públicas em Belém

Dados sempre foram determinantes para a concepção e implementação de políticas públicas nas mais diferentes esferas governamentais. Acompanhamentos de indicadores econômicos, de saúde, de violência, de deslocamentos urbanos, de distribuição espacial da população, de áreas de cobertura de locais de lazer, entre outros, são apenas alguns dos dados que podem embasar o desenho de políticas… Continue a ler »Por um Escritório de Dados para Políticas Públicas em Belém
ListenData 2020-10-23 16:03:00

Translating Web Page while Scraping

Suppose you need to scrape data from a website after translating the web page in R and Python. In google chrome, there is an option (or functionality) to translate any foreign language. If you are an english speaker and don't know any other foreign language and you want to extract data from the website which does not have option to convert language to English, this article would help you how to perform translation of a webpage.
What is Selenium?You may not familiar with Selenium so it is important to understand the background. Selenium is an open-source tool which is very popular in testing domain and used for automating web browsers. It allows you to write test scripts in several programming languages. Selenium is available in both R and Python. Translate Page in Web Scraping in R and PythonIn R there is a package named RSelenium whereas Selenium can be installed by installing selenium package in Python. (continued...)
NumFOCUS 2020-10-23 15:25:08

NumFOCUS Earns Transparency Recognition from GuideStar

Earlier this week, NumFOCUS earned our first-ever Silver Seal of Transparency from GuideStar, an independent organization which classifies nonprofit organizations based on multiple metrics pertaining to transparency and accountability. Fewer than 5% of US-based nonprofits have received this type of recognition. “This respected acknowledgment comes as we prepare to enter our year-end fundraising season,” said […]

The post NumFOCUS Earns Transparency Recognition from GuideStar appeared first on NumFOCUS.

Blog – Enthought 2020-10-15 17:04:12

SEG 2020 Attendees Asked. We Answered.

In an example away from seismic, this shows a thin section, where machine learning techniques can be applied across multiple images, ones previously unused due to the significant demands of expert time, and difficulties in organizing and sharing data. See a demo at: Author: Brendon Hall, Ph.D., Director, Energy Solutions   The SEG 2020 …
Continue Reading
Blog – Enthought 2020-10-14 12:12:37

Deep Learning Can Now Interpret Seismic the Way Experts Do

The SubsurfaceAI custom deep learning application for seismic allows experts to annotate data, identify sequences and, in this example, define a fault complex. This forms the basis of a workflow that allows a seismic expert to apply deep learning to ‘interpret the way experts do,’ creating bespoke models for seismic interpretation.  Author: Ben Lasscock, Ph.D., …
Continue Reading
ListenData 2020-10-11 14:45:00

Learn Python for Data Science

This tutorial would help you to learn Data Science with Python by examples. It is designed for beginners who want to get started with Data Science in Python. Python is an open source language and it is widely used as a high-level programming language for general-purpose programming. It has gained high popularity in data science world. In the PyPL Popularity of Programming language index, Python scored second rank with a 14 percent share. In advanced analytics and predictive analytics market, it is ranked among top 3 programming languages for advanced analytics.
Data Science with Python Tutorial
Python is widely used and very popular for a variety of software engineering tasks such as website development, cloud-architecture, back-end etc. It is equally popular in data science world. In advanced analytics world, there has been several debates on R vs. Python. There are some areas such as number of libraries for statistical analysis, where R wins over Python but Python is catching up
Paul Ivanov’s Journal 2020-10-08 07:00:00

aka: also known as

I was chatting with Anthony Scopatz last week, and one of the things we covered was how it'd be cool to have a subcommand launcher, kind of like git, where the subcommands were swappable. If you're not familiar, git automatically calls out to git-something (note the dash) whenever you run

$ git something

and something is not one of the builtin git commands. For me, ~/bin is in my PATH, so

$ git lost
git: 'lost' is not a git command. See 'git --help'.
$ echo "echo how rude!" > ~/bin/git-lost; chmod +x ~/bin/git-lost
$ git lost
how rude!

And so what Anthony was talking about was having two commands that are supposed to do the same thing, and being able to switch between them. For example: maybe we have git-away and git-gone and both of them perform a similar function, and we wish call our preferred one when we run git lost.

One way to do this would be to copy or symlink our chosen version as git-lost, and replace that file whenever

ListenData 2020-09-20 08:18:00

How to rename columns in Pandas Dataframe

In this tutorial, we will cover various methods to rename columns in pandas dataframe in Python. Renaming or changing the names of columns is one of the most common data wrangling task. If you are not from programming background and worked only in Excel Spreadsheets in the past you might feel it not so easy doing this in Python as you can easily rename columns in MS Excel by just typing in the cell what you want to have. If you are from database background it is similar to ALIAS in SQL. In Python there is a popular data manipulation package called pandas which simplifies doing these kind of data operations.
2 Methods to rename columns in Pandas
In Pandas there are two simple methods to rename name of columns.

First step is to install pandas package if it is not already installed. You can check if the package is installed on your machine by running

Filipe Saraiva's blog 2020-08-29 18:48:00

Seqtembro de eventos virtuais e gratuitos sobre Qt e KDE

(Ok a piada com seqtembro funciona melhor na versão em inglês, seqtember, mas simbora) Por uma grande coincidência, obra do destino, ou nada disso, teremos um Setembro de 2020 repleto de eventos virtuais e gratuitos de alta qualidade sobre Qt e KDE. Começando de 4 à 11 do referido mês teremos o Akademy 2020, o… Continue a ler »Seqtembro de eventos virtuais e gratuitos sobre Qt e KDE
Neural Ensemble News 2020-08-08 19:27:00

CARLsim5 Released!


CARLsim5 is an efficient, easy-to-use, GPU-accelerated library for simulating large-scale spiking neural network (SNN) models with a high degree of biological detail. It allows execution of networks of Izhikevich spiking neurons with realistic synaptic dynamics using multiple off-the-shelf GPUs and x86 CPUs. The simulator provides a PyNN-like programming interface in C/C++, which allows for details and parameters to be specified at the synapse, neuron, and network level.

The present release, CARLsim 5, builds on the efficiency and scalability of earlier releases (Nageswaran et al., 2009; Richert et al., 2011, and Beyeler et al., 2015; Chou et al., 2018). The functionality of the simulator has been greatly expanded by the addition of a number of features that enable and simplify the creation, tuning, and simulation of complex networks with spatial structure.

New Features

1. PyNN Compatibility

pyCARL is a interface between the simulator-independent language PyNN and a CARLsim5 based back-end. In other words, you can write the code for a SNN model once, using the

Filipe Saraiva's blog 2020-08-04 23:27:02

O que será do Lev com o “fim” da Saraiva?

Disclaimer: apesar do sobrenome, não tenho qualquer relação com a Saraiva. E também não tenho respostas para a pergunta do título. Como usuário do Lev acompanho com interesse a agonia da Saraiva. A rede de livrarias, uma das maiores do Brasil, está há anos em um imbróglio judicial devendo diversas editoras, em um processo que… Continue a ler »O que será do Lev com o “fim” da Saraiva?
NumFOCUS 2020-07-31 17:52:20

Dask Life Sciences Fellow [Open Job]

Dask is an open-source library for parallel computing in Python that interoperates with existing Python data science libraries like Numpy, Pandas, Scikit-Learn, and Jupyter.  Dask is used today across many different scientific domains. Recently, we’ve observed an increase in use in a few life sciences applications: Large scale imaging in microscopy Single cell analysis Genomics […]

The post Dask Life Sciences Fellow [Open Job] appeared first on NumFOCUS.

Spyder Blog 2020-07-25 10:00:00

STX Next, Python development company, uses Spyder to improve their workflow

STX Next, one of Europe's largest Python development companies, has shared with us how Spyder has been a powerful tool for them when performing data analysis. It is a pleasure for us on the Spyder team to work every day to improve the workflow of developers, scientists, engineers and data analysts. We are very glad to receive and share a STX Next testimonial about Spyder, along with an interview with one of their developers, Michael Wiśniewski, who has found Spyder very useful in his job.

What Michael Wiśniewski says about Spyder

In an era of a continuously growing demand for analysis of vast amounts of data, we are facing increasingly complex tasks to perform. Sure, we are not alone—there are many great tools designed for scientists and data analysts. We have NumPy, SciPy, Matplotlib, Pandas, and others. But, wouldn't it be nice to have one extra tool that could combine all the required packages into one compact working environment? Asking this question is precisely how

Filipe Saraiva's blog 2020-07-24 14:49:05

Educação Vigiada

Essa época de pandemia tem sido de produção em muitas frentes, o que infelizmente implica na redução de tempo para divulgação das mesmas aqui no blog. Nesse post quero me redimir dessa falta falando de um dos projetos que acho dos mais importantes que contribui recentemente, o Educação Vigiada. Há alguns meses o projeto Educação… Continue a ler »Educação Vigiada
Filipe Saraiva's blog 2020-07-10 23:09:48

Engrenagem Ep. 04 – Aplicações KDE favoritas dos KDErs brasileiros

Nesse sábado dia 11/07 às 10h o KDE Brasil vai voltar com episódios do Engrenagem, o videocast da comunidade brasileira (que está há 4 anos sem episódios inéditos 🙂 ). Para retomar os trabalhos, o episódio trará 6 colaboradores brasileiros (Ângela, Aracele, Caio, Filipe (eu), Fred e Tomaz) falando de suas aplicações KDE favoritas –… Continue a ler »Engrenagem Ep. 04 – Aplicações KDE favoritas dos KDErs brasileiros
Spyder Blog 2020-07-08 10:00:00

Writing docs is not just writing docs

This blogpost was originally published on the Quansight Labs website.

I joined the Spyder team almost two years ago, and I never thought I was going to end up working on docs. Six months ago I started a project with CAM Gerlach and Carlos Cordoba to improve Spyder’s documentation. At first, I didn’t actually understand how important docs are for software, especially for open source projects. However, during all this time I’ve learned how documentation has a huge impact on the open-source community and I’ve been thankful to have been able to do this. But, from the beginning, I asked myself “why am I the ‘right person’ for this?”

Improving Spyder’s documentation started as part of a NumFOCUS Small Development Grant awarded at the end of last year. The goal of the project was not only to update the documentation for Spyder 4, but also to make it more user-friendly, so users can understand Spyder’s key concepts and get started with it more

Filipe Saraiva's blog 2020-06-25 13:15:22

Sobre o livro “Uma História de Desigualdade”

Finalizei a leitura do premiado livro do Pedro de Souza, “Uma História de Desigualdade – A Concentração de Renda entre os Ricos no Brasil 1926 – 2013“, baseado na tese que defendeu no programa de sociologia da UnB. É um livro de fôlego e que faz jus a todos os elogios que recebeu desde o… Continue a ler »Sobre o livro “Uma História de Desigualdade”
Spyder Blog 2020-06-12 18:00:00

Thanking the people behind Spyder 4

This blogpost was originally published on the Quansight Labs website.

After more than three years in development and more than 5000 commits from 60 authors around the world, Spyder 4 finally saw the light on December 5, 2019! I decided to wait until now to write a blogpost about it because shortly after the initial release, we found several critical performance issues and some regressions with respect to Spyder 3, most of which are fixed now in version 4.1.3, released on May 8th 2020.

This new release comes with a lengthy list of user-requested features aimed at providing an enhanced development experience at the level of top general-purpose editors and IDEs, while strengthening Spyder's specialized focus on scientific programming in Python. The interested reader can take a look at some of them in previous blog posts, and in detail in our Changelog. However, this post is not meant to describe those improvements, but to acknowledge all people that contributed

Gaël Varoquaux - programming 2020-05-27 22:00:00

Technical discussions are hard; a few tips


This post discuss the difficulties of communicating while developing open-source projects and tries to gives some simple advice.

A large software project is above all a social exercise in which technical experts try to reach good decisions together, for instance on github pull requests. But communication is difficult, in …

Pierre de Buyl's homepage - scipy 2020-05-19 09:00:00

Tidynamics, what use?

In 2018 I published small Python library, tidynamics. The scope was deliberately limited: compute the typical correlation functions for stochastic and molecular dynamics: the autocorrelation and the mean-square displacement. Two years later, I wonder about its usage.