Planet SciPy 2022-01-12 19:04:51

5 Ways Machine Learning Teams Use CI/CD in Production

One of the core concepts in DevOps that is now making its way to machine learning operations (MLOps) is CI/CD—Continuous Integration and Continuous Delivery or Continuous Deployment. CI/CD as a core DevOps practice embraces tools and methods to deliver software applications reliably by streamlining the building, testing, and deployment of your applications to production. Let’s […]

The post 5 Ways Machine Learning Teams Use CI/CD in Production appeared first on

Quansight Labs 2022-01-12 13:00:00

IPython 8.0, Lessons learned maintaining software

This is a companion post from the Official release of IPython 8.0, that describe what we learned with this large new major IPython release. We hope it will help you apply best practices, and have an easier time maintaining your projects, or helping other. We'll focus on many patterns that made it easier for us to make IPython 8.0 what it is with minimal time involved.

Read more… (8 min remaining to read)

Anaconda Blog 2022-01-11 14:00:00

A New Way To Connect with Other Anaconda Users

Back in December 2021, we launched the Anaconda Community, our first-ever space for users to get insights into the newest developments in the world of data, get "unstuck" where they may have a problem, and reach out for technical help. It also allows individuals to engage with other professionals and ask questions to the broader data community. 2022-01-11 09:13:07

When to Choose CatBoost Over XGBoost or LightGBM [Practical Guide]

Boosting algorithms have become one of the most powerful algorithms for training on structural (tabular) data. The three most famous boosting algorithm implementations that have provided various recipes for winning ML competitions are: In this article, we will primarily focus on CatBoost, how it fares against other algorithms and when you should choose it over […]

The post When to Choose CatBoost Over XGBoost or LightGBM [Practical Guide] appeared first on 2022-01-09 23:00:00

Optimization Nuggets: Implicit Bias of Gradient-based Methods

When an optimization problem has multiple global minima, different algorithms can find different solutions, a phenomenon often referred to as the implicit bias of optimization algorithms. In this post we'll characterize the implicit bias of gradient-based methods on a class of regression problems that includes linear least squares and Huber … 2022-01-04 17:18:54

ARIMA vs Prophet vs LSTM for Time Series Prediction

Assuming we subscribe to a linear understanding of time and causality, as Dr. Sheldon Cooper says, then representing historical events as a series of values and features observed over time provides the foundations for learning from the past. However, time series are somewhat different from other datasets, including sequential data like text or DNA sequences. […]

The post ARIMA vs Prophet vs LSTM for Time Series Prediction appeared first on 2021-12-30 15:46:15

Data-Centric Approach vs Model-Centric Approach in Machine Learning

Code and data are the foundations of the AI system. Both of these components play an important role in the development of a robust model but which one should you focus on more? In this article, we’ll go through the data-centric vs model-centric approaches, and see which one is better, we would also talk about […]

The post Data-Centric Approach vs Model-Centric Approach in Machine Learning appeared first on 2021-12-29 15:16:48

Model Deployment Challenges: 6 Lessons From 6 ML Engineers

Deploying machine learning models is hard! If you don’t believe me, ask any ML engineer or data team that has been asked to put their models into production. To further back up this claim, Algorithima’s “2021 State of Enterprise ML” reports that the time required for organizations to deploy a machine learning model is increasing, […]

The post Model Deployment Challenges: 6 Lessons From 6 ML Engineers appeared first on

Anaconda Blog 2021-12-28 13:00:00

2021: A Year in Review

Partnerships and Collaborations 2021-12-23 09:23:31

Best Practices When Working With Docker for Machine Learning

Application containers may be created, deployed, and executed using the Docker tool. It’s just a packed bundle of application code and the libraries and other dependencies that are needed for it to run. Once executed, a Docker Image turns into a Container and contains all the components required to run an application. However, what’s the […]

The post Best Practices When Working With Docker for Machine Learning appeared first on

Share Your R and Python Notebooks 2021-12-19 09:05:50.983615

Stock Charts Detection Using Image Classification Model ResNet

Stock Charts Detection Using Image Classification Model ResNet

This tutorial explores image classification in PyTorch using state-of-the-art computer vision models. The dataset used in this tutorial will have 3 classes that are very imbalanced. So, we will explore augmentation as a solution to the imbalance problem.

Data used in this notebook can be found at

  1. Data loading
    • Loading labels
    • Train-test splitting
    • Augmentation
    • Creating Datasets
    • Random Weighted Sampling and DataLoaders
  2. CNN building and fine-tuning ResNet
    • CNN
    • ResNet
  3. Setup and training
  4. Evaluation
  5. Testing
Data Loading
In [1]:
import os
import random
import numpy as np
import pandas as pd
from PIL import Image
from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split
import torch
from torch import nn
import torch.nn.functional as F
from import Dataset, DataLoader, WeightedRandomSampler
from torchvision import datasets, models
from torchvision import transforms
import matplotlib.pyplot as plt

Setting the device to make use of the GPU.

In [2]:
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

Identifying the data paths.

In [4]:
data_dir = "images/"
labels_file = "images_labeled.csv"
Loading Labels

Since the labels are in a CSV file, we use

(continued...) 2021-12-15 15:43:18

7 Cross-Validation Mistakes That Can Cost You a Lot [Best Practices in ML]

We all want our models to generalize well so that they remain at their peak performance on any kind of dataset. To ensure such demands we often rely on cross-validation in our machine learning projects, a resampling procedure used to evaluate machine learning models on limited data samples. It could be a nightmare to realize […]

The post 7 Cross-Validation Mistakes That Can Cost You a Lot [Best Practices in ML] appeared first on 2021-12-14 23:00:00

Optimization Nuggets: Exponential Convergence of SGD

This is the first of a series of blog posts on short and beautiful proofs in optimization (let me know what you think in the comments!). For this first post in the series I'll show that stochastic gradient descent (SGD) converges exponentially fast to a neighborhood of the solution. 2021-12-14 11:22:56

How to Select a Model For Your Time Series Prediction Task [Guide]

Working with time series data? Here’s a guide for you. In this article, you will learn how to compare and select time series models based on predictive performance. In the first part, you will be introduced to numerous models for time series. This part is divided into three parts: classical time series models, supervised models, […]

The post How to Select a Model For Your Time Series Prediction Task [Guide] appeared first on

Anaconda Blog 2021-12-13 21:56:00

An Update on the Apache Log4j Vulnerability

In response to the reported vulnerability CVE-2021-44228 in the Apache Log4j2 Java library, Anaconda is conducting a thorough review of its products, repositories, packages, and internal systems to determine any potential impact on our services or our customers. Our findings detailed below indicate that Anaconda products and services are not affected by CVE-2021-44228. We will continue to monitor the situation and update this article with additional information.
Anaconda Blog 2021-12-13 14:00:00

5 Expert Predictions for AI/ML and Data Science in 2022

From working to mitigating bias in AI models to the dominating rise of Python, it’s been a year of growth for the data community. We’re excited to see further progress in the field and more widespread adoption among thousands of different use cases in the new year in the enterprise setting. There truly is something for everyone in data science, and we’re proud to be a part of that! Join our webinar on Dec. 15 at 2 PM EST/11 AM PST to hear more predictions from industry leaders.
Quansight Labs 2021-12-10 06:00:00

A year of Jupyter community calls

A framing for open source is that the software and code are kernels of community. The code, and its abstractions, unite developers and their patrons; a struggle for growing/evolving open communities is to make sure these groups remain connected. A lot of us showed up for the code, but hung around for the community. We'll continue this post talking about the monthly Jupyter community calls, and how they help all jovyans, Project Jupyter's pet name for their developers and users, stay connected.

Read more… (2 min remaining to read)

Anaconda Blog 2021-12-09 15:30:00

What Does It Mean to Be a Data Scientist?

After soliciting information for our 2021 State of Data Science report, the ubiquitous nature of the title “Data Scientist” was immediately apparent. We had over 4,000 respondents, and only 11% of them actually identified themselves as Data Scientists. Another 11% identified as Business Analysts, and the rest of the respondents fell into a multitude of other categories including, Developers, DevOps, MLOps, and more. There's a lot of crossover amongst these titles, which means they all encompass aspects of what it means to be a Data Scientist. 2021-12-09 09:58:54

ML Model Registry: What It Is, Why It Matters, How to Implement It

Why do you have to know more about model registry? If you were once the only data scientist on your team you can probably relate to this: you start working on a machine learning project and perform a series of experiments that produce various models (and artifacts) that you “track” through non-standard naming conventions. Since […]

The post ML Model Registry: What It Is, Why It Matters, How to Implement It appeared first on

Anaconda Blog 2021-12-06 18:00:00

Paving the way for Community Innovation with Security Features for Signed Packages

At Anaconda, we believe that open-source software (OSS) is a gateway for our customers and users to unlock and leverage innovation from the community and benefit from the latest and greatest in software development. However, with a recent uptick of cyberattacks, the "new normal" includes increased security measures to counteract the risks of malicious actors in the open-source ecosystem.
Anaconda Blog 2021-12-02 14:00:00

Debunking the Biggest Myths in Data Science

As data scientists continually seek to integrate more effectively with other business units in their organizations, it’s essential to take the time to dispel common myths like these, where feasible. Raising awareness for how data scientists work can help improve everything from the accuracy of model predictions to the quality of candidates recruited to fill open positions. 2021-12-01 16:49:54

Pix2pix: Key Model Architecture Decisions

Generative Adversarial Networks or GANs is a type of neural network that belongs to the class of unsupervised learning. It is used for the task of deep generative modeling.  In deep generative modeling, the deep neural networks learn a probability distribution over a given set of data points and generate similar data points. Since it […]

The post Pix2pix: Key Model Architecture Decisions appeared first on

Anaconda Blog 2021-11-22 15:00:00

How Heavily-Regulated Industries Can Accelerate Open-Source Innovation

IBM® Anaconda Repository for IBM Cloud Pak® for Data can be installed in air-gapped environments to provide organizations access to curated, open-source packages without connecting to the internet. Anaconda Repository allows enterprises to centralize their data science projects and confidently manage the security of their open-source packages and libraries used for AI.
Anaconda Blog 2021-11-19 14:00:00

Anaconda Individual Edition 2021.11

Beyond this Anaconda Individual Edition release, we’d like to mention that there is an initial macOS Apple M1 Miniconda installer for Python 3.8 available in the Miniconda Repository. The Miniconda installer and other available packages are built and tested on Apple M1 machines. More information on Miniconda with the latest installer links can be found in the Miniconda - Conda documentation. Although there is not an Anaconda Individual Edition 2021.11 installer for macOS Apple M1, there is a comprehensive list of packages available to be installed for macOS Apple M1 with Conda available here. We will share more information about this when it is available.
Quansight Labs 2021-11-17 10:00:00

A vision for extensibility to GPU & distributed support for SciPy, scikit-learn, scikit-image and beyond

Over the years, array computing in Python has evolved to support distributed arrays, GPU arrays, and other various kinds of arrays that work with specialized hardware, or carry additional metadata, or use different internal memory representations. The foundational library for array computing in the PyData ecosystem is NumPy. But NumPy alone is a CPU-only library - and a single-threaded one at that - and in a world where it's possible to get a GPU or a CPU with a large core count in the cloud cheaply or even for free in a matter of seconds, that may not seem enough. For the past couple of years, a lot of thought and effort has been spent on devising mechanisms to tackle this problem, and evolve the ecosystem in a gradual way towards a state where PyData libraries can run on a GPU, as well as in distributed mode across multiple GPUs.

We feel like a shared vision has emerged, in bits and pieces. In this post, we aim to articulate that vision and

Anaconda Blog 2021-11-11 16:00:00

How Data Visualization Improves Decision-Making

There are many good tools for data visualization, ranging from the built-in graphing of Microsoft Excel to the business-intelligence plotting and dashboarding tools like Tableau, Looker, and Microsoft’s Power BI. What about when you want to integrate plotting with your Python data analytics workflows? Luckily, there are many solid Python plotting options as well, all of which are listed and compared at Matplotlib, Plotly, and Bokeh (along with tools built on top of them) are the most popular and are all solid choices. Any of these tools can help you make sure you draw the right conclusions at every step of your analysis and can help you build presentations driven directly from Python.
Quansight Labs 2021-11-03 17:23:40

NumPy Benchmarking

In this blog post, I'll be talking about my journey in Quansight. I want to share all things I was involved in and accomplished. What issues I faced, and most importantly, what were awesome life hacks I learned during this period.

First of all, I'd like to express my gratitude to the whole team for allowing me to be a part of such a great team. My work was majorly focused on providing performance benchmarks to NumPy in realistic situations. The target was to show the world that NumPy is efficient in handling quasi real-life situations too.

The primary technical outcome of my work is available in the numpy documentation.

Read more… (6 min remaining to read)

Gaël Varoquaux - programming 2021-10-28 22:00:00

Hiring an engineer and post-doc to simplify data science on dirty data


Join us to work on reinventing data-science practices and tools to produce robust analysis with less data curation.

It is well known that data cleaning and preparation are a heavy burden to the data scientist.

Dirty data research

In the dirty data project, we have been conducting machine-learning research …

Blog – Enthought 2021-10-28 21:17:32

Enthought’s Takeaways from SEMI SMC 2021

At this year’s SEMI Strategic Materials Conference, leaders in the semiconductor industry across the supply chain came together to discuss the big challenges and opportunities that are likely to emerge over the next 5 years.  Our Takeaways Authors: Michael Heiber, Application Engineer, Materials Science Solutions Group, Tim Diller, Director of Digital Transformation Services, Materials Science …
Continue Reading
Sparrow Computing 2021-10-22 21:27:39

TorchVision Datasets: Getting Started

The TorchVision datasets subpackage is a convenient utility for accessing well-known public image and video datasets. You can use these tools to start training new computer vision models very quickly. TorchVision Datasets Example To get started, all you have to do is import one of the Dataset classes. Then, instantiate ... Read More

The post TorchVision Datasets: Getting Started appeared first on Sparrow Computing.

Sparrow Computing 2021-10-21 14:19:21

NumPy Any: Understanding np.any()

The np.any() function tests whether any element in a NumPy array evaluates to true: The input can have any shape and the data type does not have to be boolean (as long as it’s truthy). If none of the elements evaluate to true, the function returns false: Passing in a ... Read More

The post NumPy Any: Understanding np.any() appeared first on Sparrow Computing.

Quansight Labs 2021-10-21 09:00:00

Dataframe interchange protocol: cuDF implementation

This is Ismaël Koné from Côte d'Ivoire (Ivory Coast). I am a fan of open source software. In the next lines, I'll try to capture my experience at Quansight Labs as an intern working on the cuDF implementation of the dataframe interchange protocol.

We'll continue by motivating this project through details about cuDF and the dataframe interchange protocol.

Read more… (9 min remaining to read)

Quansight Labs 2021-10-19 14:00:00

An efficient method of calling C++ functions from numba using clang++/ctypes/rbc

The aim of this post is to explore a method of calling C++ library functions from Numba compiled functions --- Python functions that are decorated with numba.jit(nopython=True).

While there exist ways to wrap C++ codes to Python (see Appendix below), calling these wrappers from Numba compiled functions is often not as straightforward and efficient as one would hope.

Read more… (5 min remaining to read)

Quansight Labs 2021-10-13 10:04:54

Array Libraries Interoperability

In this blog post I talk about the work that I was able to accomplish during my internship at Quansight Labs and the efforts being made towards making array libraries more interoperable.

Going ahead, I'll assume basic understanding of array and tensor libraries with their usage in the Python Scientific and Data Science software stack.

Master NumPy leading the young Tensor Turtles

Read more… (15 min remaining to read)

Quansight Labs 2021-10-11 07:51:58

Re-Engineering CI/CD pipelines for SciPy

In this blog post I talk about the projects and my work during my internship at Quansight Labs. My efforts were geared towards re-engineering CI/CD pipelines for SciPy to make them more efficient to use with GitHub Actions. I also talk about the milestones that I achieved, along with the associated learnings and improvements that I made.

This blog post would assume a basic understanding of CI/CD and GitHub Actions. I will also assume a basic understanding of Python and the SciPy ecosystem.

Re-Engineering CI/CD pipelines for SciPy

Read more… (14 min remaining to read)

Sparrow Computing 2021-10-07 20:52:03

PyTorch DataLoader Quick Start

PyTorch comes with powerful data loading capabilities out of the box. But with great power comes great responsibility and that makes data loading in PyTorch a fairly advanced topic. One of the best ways to learn advanced topics is to start with the happy path. Then add complexity when you ... Read More

The post PyTorch DataLoader Quick Start appeared first on Sparrow Computing.

Sparrow Computing 2021-10-06 16:53:32

How the NumPy append operation works

Understanding the np.append() operation and when you might want to use it.

The post How the NumPy append operation works appeared first on Sparrow Computing.

Quansight Labs 2021-10-06 12:00:00

Using Hypothesis to test array-consuming libraries

Over the summer, I've been interning at Quansight Labs to develop testing tools for the developers and users of the upcoming Array API standard. Specifically, I contributed "strategies" to the testing library Hypothesis, which I'm excited to announce are now available in hypothesis.extra.array_api. Check out the primary pull request I made for more background.

This blog post is for anyone developing array-consuming methods (think SciPy and scikit-learn) and is new to property-based testing. I demonstrate a typical workflow of testing with Hypothesis whilst writing an array-consuming function that works for all libraries adopting the Array API, catching bugs before your users do.

Read more… (12 min remaining to read)

Blog – Enthought 2021-10-06 03:41:00

Webinar Q&A: Accelerating Product Reformulation with Machine Learning

In our recent C&EN Webinar: Accelerating Consumer Products Reformulation with Machine Learning, we demonstrated how to leverage digital tools and technology to bring new products to market faster. The webinar was well attended by scientists, engineers, and business leaders across the product development spectrum eager to learn how these concepts can be applied to their …
Continue Reading
Quansight Labs 2021-10-04 15:30:58

Dataframe interchange protocol and Vaex

The work I briefly describe in this blog post is the implementation of the dataframe interchange protocol into Vaex which I was working on through the three month period as a Quansight Labs Intern.

Connection between dataframe libraries with dataframe protocol

About | What is all that?

Today there are quite a number of different dataframe libraries available in Python. Also, there are quite a number of, for example, plotting libraries. In most cases they accept only the general Pandas dataframe and so the user is quite often made to convert between dataframes in order to be able to use the functionalities of a specific plotting library. It would be extremely cool to be able to use plotting libraries on any kind of dataframe, would it not?

Read more… (13 min remaining to read)

Gaël Varoquaux - programming 2021-09-13 22:00:00

Hiring someone to develop scikit-learn community and industry partners


With the growth of scikit-learn and the wider PyData ecosystem, we want to recruit in the Inria scikit-learn team for a new role. Departing from our usual focus on excellence in algorithms, statistics, or code, we want to add to the team someone with some technical understanding, but an …

Blog – Enthought 2021-09-01 21:45:02

Introducing Enthought Edge: Unlocking the Value of R&D Data

While the value of R&D data is clear, finding a way to sort through it can be daunting given the special handling required to extract its value. In fact, 75 percent of surveyed R&D executives believe advanced analytics techniques would play a pivotal role in their future R&D activities, but only 25 percent state that …
Continue Reading
Blog – Enthought 2021-09-01 21:44:06

Introducing Enthought Edge

Introducing Enthought Edge: A New DataOps Solution Designed to Unlock the Value in R&D Data  Designed for scientists, by scientists, Edge centralizes and standardizes data in easily accessible, analysis-ready form. Early Access Program now available. Austin, TX – September 1, 2021 – Enthought, the leading provider of services and technology powering digital transformation for science, …
Continue Reading
Pierre de Buyl's homepage - scipy 2021-08-24 13:00:00

A paper on the Lees-Edwards method

A few years ago1, Sebastian contacted me to help with simulations. Great, I like simulation studies, so we start discussing the details. The idea: use an established method, the Lees-Edwards boundary condition, to study colloids under shear.

Blog – Enthought 2021-08-10 15:44:41

Machine Learning in Materials Science

The process of materials discovery is complex and iterative, requiring a level of expertise to be done effectively. Materials workflows that require human judgement present a specific challenge to the discovery process, which can be leveraged as an opportunity to introduce digital technologies.  In the lab, many tasks require manual data collection and judgement. And …
Continue Reading
Blog – Enthought 2021-07-23 13:25:20

FORGE-ing Ahead: Charting the Future of Geothermal Energy

A microseismic event loaded from the Frontier Observatory for Research in Geothermal Energy (FORGE) distributed acoustic sensing (DAS) data into a Jupyter notebook showing energy from a microseismic event arriving at about 7.5 seconds. These microseisms bring information about the process of stimulation. However, in the data set there are relatively few and they are …
Continue Reading
Living in an Ivory Basement 2021-07-19 22:00:00

A biotech career panel in the DIB Lab

Careers outside of universities!

Sparrow Computing 2021-07-08 16:09:47

Poetry for Package Management in Machine Learning Projects

When you’re building a production machine learning system, reproducibility is a proxy for the effectiveness of your development process. But without locking all your Python dependencies, your builds are not actually repeatable. If you work in a Python project without locking long enough, you will eventually get a broken build ... Read More

The post Poetry for Package Management in Machine Learning Projects appeared first on Sparrow Computing.

Sparrow Computing 2021-06-29 20:38:29

Development containers in VS Code: a quick start guide

If you’re building production ML systems, dev containers are the killer feature of VS Code. Dev containers give you full VS Code functionality inside a Docker container. This lets you unify your dev and production environments if production is a Docker container. But even if you’re not targeting a Docker ... Read More

The post Development containers in VS Code: a quick start guide appeared first on Sparrow Computing.

Living in an Ivory Basement 2021-06-28 22:00:00

New sourmash databases are available!

Databases are now available for GTDB!

Filipe Saraiva's blog 2021-06-25 12:06:45

Colunando no O Estado do Piauí

O Estado do Piauí é um novo jornal que surgiu recentemente pelas bandas de lá. Com um foco maior em reportagens longas e densas, misturando jornalismo investigativo e literário, o projeto pretende discutir em profundidade os temas de interesse do estado, descobrir histórias piauienses únicas, repercutir situações problemáticas, apontar alternativas e muito mais. Não se… Continue a ler »Colunando no O Estado do Piauí
Blog – Enthought 2021-06-23 16:27:51

Lessons for Geoscientists from the book Real World AI: A Practical Guide for Responsible Machine Learning

In this blog article Enthought Energy Solutions vice president Mason Dykstra looks at the recently published book titled “Real World AI: A Practical Guide for Responsible Machine Learning” in the context of both the technical challenges faced by geoscientists and how to scale. Author: Mason Dykstra, Ph.D., Vice President, Energy Solutions  In the newly released …
Continue Reading
Blog – Enthought 2021-06-22 13:27:21

Leveraging AI in Cell Culture Analysis

Mammalian cell culture is a fundamental tool for many discoveries, innovations and products in the life sciences. Currently, cells are the smallest unit of sustainable life outside the body, thereby providing an essential platform for testing hypotheses and mimicking biological processes. The applications of cell culture, while not limitless, are plentiful.  Every cell type, downstream …
Continue Reading
Filipe Saraiva's blog 2021-06-21 21:51:57

Ciclo de Entrevistas sobre as Pesquisas no PPGCC da UFPA – Inteligência Computacional

A Faculdade de Computação e o Programa de Pós-Graduação em Ciência da Computação da UFPA estão desenvolvendo um projeto que pretende atingir dois objetivos: o primeiro, fazer uma melhor divulgação para o público externo à universidade do que produzimos em nossas pesquisas; o segundo, uma melhor divulgação INTERNA da mesma coisa – o que desenvolvemos… Continue a ler »Ciclo de Entrevistas sobre as Pesquisas no PPGCC da UFPA – Inteligência Computacional
Blog – Enthought 2021-06-15 19:54:08

Enthought Announces Formation of Digital Transformation, Materials Science Advisory Boards

Austin, TX – June 15, 2021 – Enthought, the leading provider of technologies and services that deliver digital innovation to science-driven companies, is experiencing rapid growth as companies look to accelerate their adoption of new technologies, such as artificial intelligence and machine learning, in response to COVID-19. In support of Enthought’s growth, strategic vision and …
Continue Reading
AI Pool Articles 2021-06-08 18:19:38

Visualization with Seaborn

This article will enable you to use the seaborn python package to visualize your structured data with seaborn barchart, scatter plot, seaborn histogram, line, and seaborn distplot.
Living in an Ivory Basement 2021-06-07 22:00:00

Searching all public metagenomes with sourmash

Searching all the things!

AI Pool Articles 2021-05-29 13:40:17

Introduction of Fast Fourier Transformation (FFT)

This article comprises of introduction to the Fourier series, Fourier analysis, Fourier transformation, why do we use it, an explanation of the FFT algorithm, and its implementation.
AI Pool Articles 2021-05-24 16:10:20

Understanding of Probability Distribution and Normal Distribution

Introduction of probability distribution and its types. Here you can find the intuition about the normal or gaussian distribution, standard normal distribution with the normal curve and normal distribution formula.
Pierre de Buyl's homepage - scipy 2021-05-21 13:00:00

Is your software ready for the Journal of Open Source Software?

For the unaware reader, the Journal of Open Source Software (JOSS) is an open-access scientific journal founded in 2016 and aimed at publishing scientific software. A JOSS article in itself is short and its publication contributes to recognize the work on the software. I share here my point of view on what makes some software tools more ready to be published in JOSS. I do not comment on the size or the relevance for research which are both documented on JOSS' website.

Living in an Ivory Basement 2021-05-16 22:00:00

sourmash 4.1.0 released!!

sourmash v4.1.0 is here!

AI Pool Articles 2021-05-15 12:19:22

Using Autoencoder to generate digits with Keras

This article contains a real-time implementation of an autoencoder which we will train and evaluate using very known public benchmark dataset called MNIST data.
AI Pool Articles 2021-05-15 10:22:56

Understanding of Support Vector Machine (SVM)

Explanation of the support vector machine algorithm, the types, how it works, and its implementation using the python programming language with the sklearn machine learning package
Sparrow Computing 2021-05-14 20:11:16

Basic Counting in Python

I love fancy machine learning algorithms as much as anyone. But sometimes, you just need to count things. And Python’s built-in data structures make this really easy. Let’s say we have a list of strings: With a list like this, you might care about a few different counts. What’s the ... Read More

The post Basic Counting in Python appeared first on Sparrow Computing.

AI Pool Articles 2021-05-14 16:19:07

Confidence Interval Understanding

Explanation of confidence intervals and the how-to calculate it for different scenarios, and also the equation that makes the confidence interval and the parameters involved with it
AI Pool Articles 2021-05-14 16:15:32

Decision Trees

Intuition and implementation of the first tree-based algorithm in machine learning
AI Pool Articles 2021-05-14 16:01:47

Dimensionality Reduction, PCA Intro

We will be covering a dimensionality reduction algorithm called PCA (Principal Components Analysis) and will show how it helps to understand the data you have.
AI Pool Articles 2021-05-13 18:17:40

Understanding Autoencoders - An Unsupervised Learning approach

This article covers the concept of Autoencoders. Concepts like What are Autoencoders, Architecture of an Autoencoder, and intuition behind the training of Autoencoders.
Sparrow Computing 2021-05-13 18:11:11

How to Use the PyTorch Sigmoid Operation

The PyTorch sigmoid function is an element-wise operation that squishes any real number into a range between 0 and 1. This is a very common activation function to use as the last layer of binary classifiers (including logistic regression) because it lets you treat model predictions like probabilities that their ... Read More

The post How to Use the PyTorch Sigmoid Operation appeared first on Sparrow Computing.

AI Pool Articles 2021-05-13 16:07:08

Optimization Methods, Gradient Descent

This article covers a sublime explanation and a simple example of Vanilla Gradient Descent algorithm, Stochastic Gradient Descent, Momentum Optimizer, and Adam Optimizer in which RMSProp is also explained
AI Pool Articles 2021-05-11 17:24:10

Understanding of Regularization in Neural Networks

This article includes the different techniques of regularization like Data Augmentation, L1, L2, Dropout, and Early Stopping
AI Pool Articles 2021-05-10 18:04:00

Diving into Object Detection Basics

A guide for Object Detection basic concepts which cover What is Object Detection and how does it work, Concept of Anchor Boxes, Why is Loss function necessary, some free datasets, and finally, implementation of SSD.
AI Pool Articles 2021-05-10 18:03:29

Normalization in Deep learning

Different types of Normalization in Deep Learning. A very useful technique to avoid overfitting and generalize your model better.
AI Pool Articles 2021-05-10 18:03:08

Dropout in Deep Learning

Understanding Dropouts in Deep Learning to reduce overfitting
AI Pool Articles 2021-05-10 18:02:37

Yolov3 and Yolov4 in Object Detection

Explanation of object detection with various use cases and algorithms. Specifically, how the yolov3 and yolov4 architectures are structured, and how they perform object detection
AI Pool Articles 2021-05-10 18:02:03

End-To-End PyTorch Example of Image Classification with Convolutional Neural Networks

Image classification solutions in PyTorch with popular models like ResNet and its variations. End-To-End solution for CIFAR10/100 and ImageNet datasets.
AI Pool Articles 2021-05-10 18:00:28

Supervised learning with Scikit-Learn Library

How to create a model for supervised learning like linear and logistic regression with scikit-learn python library
AI Pool Articles 2021-05-10 18:00:13

Linear and Logistic Regression

Intuition and implementation behind the base algorithms for supervised machine learning
AI Pool Articles 2021-05-10 17:59:02

Random Forests Understanding

Intuition and Implementation on a key algorithm to reduce overfitting in tree based algorithms
AI Pool Articles 2021-05-10 17:57:58

Activation Functions for Neural Networks

In this article, explaination of various activation functions has been given like Linear, ELU, ReLU, Sigmoid, and tanh.
Blog – Enthought 2021-05-06 12:12:46

AI Needs the ‘Applied Sciences’ Treatment

As industries rapidly advance in AI/machine learning, a key to unlocking the power of these approaches for companies is an enabling environment. Domain experts need to be able to use artificial intelligence on data relevant to their work, but they should not have to know computer or data science techniques to solve their problems. An …
Continue Reading 2021-04-12 22:00:00

On the Link Between Optimization and Polynomials, Part 4

While the most common accelerated methods like Polyak and Nesterov incorporate a momentum term, a little known fact is that simple gradient descent –no momentum– can achieve the same rate through only a well-chosen sequence of step-sizes. In this post we'll derive this method and through simulations discuss its practical …

NumFOCUS 2021-04-09 18:02:05

NumFOCUS Welcomes Tesco Technology to Corporate Sponsors

NumFOCUS is pleased to announce our new partnership with Tesco Technology. A long-time PyData event sponsor, Tesco Technology joined NumFOCUS as a Silver Corporate Sponsor in December 2020. “We are very excited to formalize our partnership with Tesco Technology,” said Leah Silen, NumFOCUS Executive Director. “Tesco Technology has partnered with NumFOCUS for the past several […]

The post NumFOCUS Welcomes Tesco Technology to Corporate Sponsors appeared first on NumFOCUS.

NumFOCUS 2021-04-08 21:14:55

Job Posting | Communications and Marketing Manager

Job Title: Communications and Marketing Manager Position Overview The primary role of the Communications & Marketing Manager is to manage the NumFOCUS brand by overseeing all outgoing communications between NumFOCUS and our stakeholders. You will serve the project communities by playing a key role in their event marketing management and assist with project promotional and […]

The post Job Posting | Communications and Marketing Manager appeared first on NumFOCUS.

Acoular 2021-04-01 05:00:00

Getting started with Acoular - Part 1

This is the first in a series of three blog posts about the basic use of Acoular. It explains some fundamental concepts and walks through a simple example. Acoular is a Python library that processes multichannel data (up to a few hundred channels) from acoustic measurements with a microphone array. The focus of the processing is on the construction of a map of acoustic sources. This is somewhat similar to taking an acoustic photograph of some sound sources.
Acoular 2021-04-01 05:00:00

Getting started with Acoular - Part 2

This is the second in a series of three blog posts about the basic use of Acoular. It assumes that you already have read the first post and continues by explaining some more concepts and additional methods. Acoular is a Python library that processes multichannel data (up to a few hundred channels) from acoustic measurements with a microphone array. The focus of the processing is on the construction of a map of acoustic sources. This is somewhat similar to taking an acoustic photograph of some sound sources.
Acoular 2021-04-01 05:00:00

Getting started with Acoular - Part 3

This is the third and final in a series of three blog posts about the basic use of Acoular. It assumes that you already have read the first two posts and continues by explaining additional concepts to be used with time domain methods. Acoular is a Python library that processes multichannel data (up to a few hundred channels) from acoustic measurements with a microphone array. The focus of the processing is on the construction of a map of acoustic sources. This is somewhat similar to taking an acoustic photograph of some sound sources. To continue, we do the same set up as in Part 1. However, as we are setting out to do some signal processing in time domain, we define only TimeSamples, MicGeom, RectGrid and SteeringVector objects but no PowerSpectra or BeamformerBase. import acoular ts = acoular.TimeSamples( name="three_sources.h5" ) mg = acoular.MicGeom( from_file="array_64.xml" ) rg = acoular.RectGrid( x_min=-0.2, x_max=0.2, y_min=-0.2, y_max=0.2, z=0.3, increment=0.01 ) st = acoular.SteeringVector( grid=rg, mics=mg (continued...)
Sparrow Computing 2021-03-22 23:54:00

PyTorch Tensor to NumPy Array and Back

You can easily convert a NumPy array to a PyTorch tensor and a PyTorch tensor to a NumPy array. This post explains how it works.

The post PyTorch Tensor to NumPy Array and Back appeared first on Sparrow Computing.

Sparrow Computing 2021-03-20 03:15:00

TorchVision Transforms: Image Preprocessing in PyTorch

TorchVision, a PyTorch computer vision package, has a great API for image pre-processing in its torchvision.transforms module. This post gives some basic usage examples, describes the API and shows you how to create and use custom image transforms.

The post TorchVision Transforms: Image Preprocessing in PyTorch appeared first on Sparrow Computing. 2021-03-01 23:00:00

On the Link Between Optimization and Polynomials, Part 3

I've seen things you people wouldn't believe.
Valleys sculpted by trigonometric functions.
Rates on fire off the shoulder of divergence.
Beams glitter in the dark near the Polyak gate.
All those landscapes will be lost in time, like tears in rain.
Time to halt.

A momentum optimizer *

While My MCMC Gently Samples 2021-02-23 15:00:00

Introducing PyMC Labs: Saving the World with Bayesian Modeling

After I left Quantopian in 2020, something interesting happened: various companies contacted me inquiring about consulting to help them with their PyMC3 models.

Usually, I don't hear how people are using PyMC3 -- they mostly show up on GitHub or Discourse when something isn't working right. So, hearing about all these …

Martin Fitzpatrick - python 2021-02-22 08:00:00

Using MicroPython and uploading libraries on Raspberry Pi Pico — Using rshell to upload custom code

MicroPython is an implementation of the Python 3 programming language, optimized to run microcontrollers. It's one of the options available for programming your Raspberry Pi Pico and a nice friendly way to get started with microcontrollers.

MicroPython can be installed easily on your Pico, by following the instructions on the …

NumFOCUS 2021-02-10 19:54:10

Job Posting | Events and Digital Marketing Coordinator

Job Title: Events and Digital Marketing Coordinator Position Overview The primary role of the Events and Digital Marketing Coordinator is to support and assist the Events Manager and the Community Communications and Marketing Manager to advance one of NumFOCUS’s primary missions of educating and building the community of users and developers of open source scientific […]

The post Job Posting | Events and Digital Marketing Coordinator appeared first on NumFOCUS.

Living in an Ivory Basement 2021-02-01 23:00:00

Transition your Python project to use pyproject.toml and setup.cfg! (An example.)

Updating old Python packages, in this year of the PSF 2021!