Planet SciPy

Sparrow Computing 2021-10-22 21:27:39

TorchVision Datasets: Getting Started

The TorchVision datasets subpackage is a convenient utility for accessing well-known public image and video datasets. You can use these tools to start training new computer vision models very quickly. TorchVision Datasets Example To get started, all you have to do is import one of the Dataset classes. Then, instantiate ... Read More

The post TorchVision Datasets: Getting Started appeared first on Sparrow Computing. 2021-10-22 09:07:24

Azure ML (AML) Alternatives for MLOps

Azure Machine Learning (AML) is a cloud-based machine learning service for data scientists and ML engineers. You can use AML to manage the machine learning lifecycle—train, develop, and test models, but also run MLOps processes with speed, efficiency, and quality. For organizations that want to scale ML operations and unlock the potential of AI, tools […]

The post Azure ML (AML) Alternatives for MLOps appeared first on

Sparrow Computing 2021-10-21 14:19:21

NumPy Any: Understanding np.any()

The np.any() function tests whether any element in a NumPy array evaluates to true: The input can have any shape and the data type does not have to be boolean (as long as it’s truthy). If none of the elements evaluate to true, the function returns false: Passing in a ... Read More

The post NumPy Any: Understanding np.any() appeared first on Sparrow Computing. 2021-10-20 12:13:34

ML Engineer vs Data Scientist

In 2010, DJ Patil and Thomas Davenport famously proclaimed Data Scientist (DS) to be the “Sexiest Job of the 21st century” [1]. The progress in data science and machine learning over the last decade has been monumental. Data science has successfully empowered global businesses and organizations with predictive intelligence and data-driven decision-making to the extent […]

The post ML Engineer vs Data Scientist appeared first on

Quansight Labs 2021-10-19 14:00:00

An efficient method of calling C++ functions from numba using clang++/ctypes/rbc

The aim of this document is to explore a method of calling C++ library functions from Numba compiled functions --- Python functions that are decorated with numba.jit(nopython=True).

While there exist ways to wrap C++ codes to Python, calling these wrappers from Numba compiled functions is often not as straightforward and efficient as one would hope.

Read more… (5 min remaining to read)

Anaconda Blog 2021-10-14 12:54:00

Why Data Visualization is One of the Hardest but Most Important Tasks

As data scientists, we have the power to help shape business decisions, public policy, medical research, and other essential areas of daily life. It’s incumbent on us to practice our craft responsibly and ethically, and that includes the data visualization process. To the best of our ability, we need to ensure our visualizations make clear any assumptions or biases that might be baked into our results, and that they support viewers in asking further questions, rather than serving as a “period” on any discussion. Whether exploratory or narrative in purpose, data visualizations will fundamentally anchor the way the data and topic are viewed, so if it’s worth making a chart in the first place, it’s worth taking the time to do it right. 2021-10-14 10:11:30

Continuous Control With Deep Reinforcement Learning

This time I want to explore how deep reinforcement learning can be utilized e.g. making a humanoid model walk. This kind of task is a continuous control task. A solution to such a task differs from the one you might know and use to play Atari games, like Pong, with e.g. Deep Q-Network (DQN). I’ll […]

The post Continuous Control With Deep Reinforcement Learning appeared first on 2021-10-13 14:29:54

Tips and Tricks to Train State-Of-The-Art NLP Models

This is the era of state-of-the-art transformer-based NLP models. With the introduction of packages like transformers by huggingface, it is very convenient to train NLP models for any given task. But how do you get an extra edge when everyone is doing the same? How to get that extra performance out of the model which […]

The post Tips and Tricks to Train State-Of-The-Art NLP Models appeared first on

Quansight Labs 2021-10-13 10:04:54

Array Libraries Interoperability

In this blog post I talk about the work that I was able to accomplish during my internship at Quansight Labs and the efforts being made towards making array libraries more interoperable.

Going ahead, I'll assume basic understanding of array and tensor libraries with their usage in the Python Scientific and Data Science software stack.

Master NumPy leading the young Tensor Turtles

Read more… (15 min remaining to read) 2021-10-12 11:08:27

Version Control for Machine Learning and Data Science

Version control tracks and manages changes in a collection of related entities. It records changes and modifications over time, so you can recall, revert, compare, reference, and restore anything you want.  Version control is also known as source control or revision control. Each version is associated with a timestamp, and the ID of the person […]

The post Version Control for Machine Learning and Data Science appeared first on 2021-10-11 10:33:20

Transformer Models for Textual Data Prediction

Transformer models such as Google’s BERT and Open AI’s GPT3 continue to change how we think about Machine Learning (ML) and Natural Language Processing (NLP). Look no further than GitHub’s recent launch of a predictive programming support tool called Copilot. It’s trained on billions of lines of code, and claims to understand “the context you’ve […]

The post Transformer Models for Textual Data Prediction appeared first on

Quansight Labs 2021-10-11 07:51:58

Re-Engineering CI/CD pipelines for SciPy

In this blog post I talk about the projects and my work during my internship at Quansight Labs. My efforts were geared towards re-engineering CI/CD pipelines for SciPy to make them more efficient to use with GitHub Actions. I also talk about the milestones that I achieved, along with the associated learnings and improvements that I made.

This blog post would assume a basic understanding of CI/CD and GitHub Actions. I will also assume a basic understanding of Python and the SciPy ecosystem.

Re-Engineering CI/CD pipelines for SciPy

Read more… (14 min remaining to read) 2021-10-08 11:22:26

Tips for MLOps Setup—Things We Learned From 7 ML Experts

The term ‘MLOps’ has gained much more traction now compared to just two years ago when it was mostly considered a “buzzword”. Today, machine learning (ML) developers usually have a clear idea of the term instead of a vague interpretation by comparing it with the concept of DevOps.  This development can be credited to the […]

The post Tips for MLOps Setup—Things We Learned From 7 ML Experts appeared first on

Sparrow Computing 2021-10-07 20:52:03

PyTorch DataLoader Quick Start

PyTorch comes with powerful data loading capabilities out of the box. But with great power comes great responsibility and that makes data loading in PyTorch a fairly advanced topic. One of the best ways to learn advanced topics is to start with the happy path. Then add complexity when you ... Read More

The post PyTorch DataLoader Quick Start appeared first on Sparrow Computing.

Anaconda Blog 2021-10-07 15:29:00

Celebrating Hispanic Heritage Month: Hispanic and Latino(a)(x) Innovators Who Inspire Us at Anaconda

National Hispanic Heritage Month, celebrated from September 15th to October 15th, recognizes the histories, cultures, and contributions of Hispanic and Latinx Americans. The term Hispanic refers to someone from, or is a descendant of, a Spanish-speaking country. Latino(a)(x) refers to someone who comes from Latin America, or is a descendant from any Latin American country. While it’s called “Hispanic Heritage Month,” this month is meant to recognize a broader group of Americans whose ancestors came from Spain, Mexico, the Caribbean, Central, and South America. 2021-10-07 14:12:59

9 Steps of Debugging Deep Learning Model Training

The first computer bug was literally a bug. A moth entered one of the computing machines at Harvard University in 1947 and caused a disruption in the computations. When engineers opened the computer box, they quickly detected the bug that was causing the problems. Nowadays, it is very unlikely that a bug will crawl into […]

The post 9 Steps of Debugging Deep Learning Model Training appeared first on

Sparrow Computing 2021-10-06 16:53:32

How the NumPy append operation works

Understanding the np.append() operation and when you might want to use it.

The post How the NumPy append operation works appeared first on Sparrow Computing. 2021-10-06 14:52:50

Knowledge Distillation: Principles, Algorithms, Applications

Large-scale machine learning and deep learning models are increasingly common. For instance, GPT-3 is trained on 570 GB of text and consists of 175 billion parameters. However, whilst training large models helps improve state-of-the-art performance, deploying such cumbersome models especially on edge devices is not straightforward. Additionally, the majority of data science modeling work focuses […]

The post Knowledge Distillation: Principles, Algorithms, Applications appeared first on

Quansight Labs 2021-10-06 12:00:00

Using Hypothesis to test array-consuming libraries

Over the summer, I've been interning at Quansight Labs to develop testing tools for the developers and users of the upcoming Array API standard. Specifically, I contributed "strategies" to the testing library Hypothesis, which I'm excited to announce are now available in hypothesis.extra.array_api. Check out the primary pull request I made for more background.

This blog post is for anyone developing array-consuming methods (think SciPy and scikit-learn) and is new to property-based testing. I demonstrate a typical workflow of testing with Hypothesis whilst writing an array-consuming function that works for all libraries adopting the Array API, catching bugs before your users do.

Read more… (12 min remaining to read) 2021-10-05 14:51:55

The Best Open-Source MLOps Tools You Should Know

You don’t need to spend a lot on MLOps tools to bring the magic of DevOps to your machine learning projects. But, be careful—open-source tools aren’t always 100% free all of the time. For example, Kuberflow has client and server components, and both are open. However, some tools might open-source only one of these components. […]

The post The Best Open-Source MLOps Tools You Should Know appeared first on

Quansight Labs 2021-10-04 15:30:58

Dataframe interchange protocol and Vaex

The work I briefly describe in this blog post is the implementation of the dataframe interchange protocol into Vaex which I was working on through the three month period as a Quansight Labs Intern.

Connection between dataframe libraries with dataframe protocol

About | What is all that?

Today there are quite a number of different dataframe libraries available in Python. Also, there are quite a number of, for example, plotting libraries. In most cases they accept only the general Pandas dataframe and so the user is quite often made to convert between dataframes in order to be able to use the functionalities of a specific plotting library. It would be extremely cool to be able to use plotting libraries on any kind of dataframe, would it not?

Read more… (13 min remaining to read)

Quansight Labs 2021-09-28 18:00:00

Low-code contributions through GitHub

Healthy, inclusive communities are critical to impactful open source projects. A challenge for established projects is that the history and implicit technical debt increase the barrier to contribute to significant portions of code base. The literacy of large code bases happens over time through incremental contributions, and we'll discuss a format that can help people begin this journey.

At Quansight Labs, we are motivated to provide opportunities for new contributors to experience open source community work regardless of their software literacy. Community workshops are a common format for onboarding, but sometimes the outcome can be less than satisfactory for participants and organizers. In these workshops, there are implicit challenges that need to be overcome to contribute to projects' revision history like Git or setting up development environments.

Our goal with the following low-code workshop is to offer a way for folks to join a project's contributors list without the technical overhead. To achieve this we'll discuss a format that relies solely on the GitHub web interface.

Read more… (5 min remaining to

Anaconda Blog 2021-09-28 15:20:00

Securing the Open-Source Pipeline with Anaconda CVE Curation

Take advantage of Anaconda Team Edition to secure your open-source pipeline so your team can spend more time building models, analyzing data, and making data-driven decisions.
Anaconda Blog 2021-09-23 14:25:00

What’s new with fastparquet?

In addition to what is detailed above, other external changes were happening in the background. While some required fixes, others offered opportunities to continue improving the performance and features of fastparquet. I'm excited about the advancements we've made today and look forward to sharing a future update about the continued progress with this project.
Share Your R and Python Notebooks 2021-09-15 09:05:45.210770

PyTorch Tutorial A Complete Use Case Example

PyTorch Tutorial: A Complete Use-case Example

This tutorial shows a full use-case of PyTorch in order to explain several concepts by example. The application will be hand-written number detection using MNIST. MNIST is a popular (perhaps the most popular) educational computer vision dataset. It is composed of 70K images of hand-written digits (0-9) split into 60K-10K training and test sets respectively. The images are tiny (28x28), which makes them easy to work with.

  1. Data loading
    • Loading for tables
    • Loading for text (NLP)
    • Loading for images (CV)
  2. Neural Network building
    • Skeleton
    • Layers
    • Activation functions
  3. ML components
    • Loss functions
    • Optimizer
  4. Training loop
  5. Testing
  6. Saving/loading models
PyTorch Data Loading

When using PyTorch, there are many ways to load your data. It depends mainly on the type of data (tables, images, text, audio, etc.) and the size. Many text datasets are small enough to load into memory in full. Some image datasets (such as MNIST can also be loaded to memory in full due to the small image size. However, in most real-life applications,

Quansight Labs 2021-09-14 18:00:00

Not a checklist: different accessibility needs in JupyterLab

JupyterLab Accessibility Journey Part 3

In a pandemic, the template joke-starter “x and y walk into a bar” seems like a stretch from my reality. So let’s try this remote version:

Two community members with accessibility knowledge enter a virtual meeting room to talk about JupyterLab. They’ve both updated themselves on GitHub issues ahead of time. They’ve both identified major problems with the interface. They both get ready to express to the rest of the community what is indisputably, one hundred percent for-sure the biggest accessibility blocker in JupyterLab for users. Here it is, the moment of truth!

And they each say totally different things.

Read more… (5 min remaining to read)

Gaël Varoquaux - programming 2021-09-13 22:00:00

Hiring someone to develop scikit-learn community and industry partners


With the growth of scikit-learn and the wider PyData ecosystem, we want to recruit in the Inria scikit-learn team for a new role. Departing from our usual focus on excellence in algorithms, statistics, or code, we want to add to the team someone with some technical understanding, but an …

Anaconda Blog 2021-09-09 14:00:00

Why Anaconda Created a Company Policy to Give More Time Off

As we continue to develop as a company and see changes in how we work, live, and spend our free time, we will continue to grow our policies to best match the needs of our team. We’re excited to see how our company culture continues to evolve and the impact of Snake Days on our team’s morale and wellbeing over time.
Quansight Labs 2021-09-06 17:39:34

Making Numpy Accessible: Guidelines and Tools

Header illustration by author, Mars Lee

Numpy is now foundational to Python scientific computing. Our efforts reach millions of developers each month. As our user base grows, we recognize that we are neglecting the disabled community by not having our website and documentation up to modern accessibility standards.

Read more… (7 min remaining to read)

Blog – Enthought 2021-09-01 21:45:02

Introducing Enthought Edge: Unlocking the Value of R&D Data

While the value of R&D data is clear, finding a way to sort through it can be daunting given the special handling required to extract its value. In fact, 75 percent of surveyed R&D executives believe advanced analytics techniques would play a pivotal role in their future R&D activities, but only 25 percent state that …
Continue Reading
Blog – Enthought 2021-09-01 21:44:06

Introducing Enthought Edge

Introducing Enthought Edge: A New DataOps Solution Designed to Unlock the Value in R&D Data  Designed for scientists, by scientists, Edge centralizes and standardizes data in easily accessible, analysis-ready form. Early Access Program now available. Austin, TX – September 1, 2021 – Enthought, the leading provider of services and technology powering digital transformation for science, …
Continue Reading
Quansight Labs 2021-08-31 17:01:00

CZI EOSS4 Grants at Quansight Labs

Here, at Quansight Labs, our goal is to work on sustaining the future of Open Source. We make sure we can live up to that goal by spending a significant amount of time working on impactful and critical infrastructure and projects within the Scientific Ecosystem.

As such, our goals align with those of the Chan Zuckerberg Initiative and, in particular, the Essential Open Source Software for Science (EOSS) program that supports tools essential to biomedical research via funds for software maintenance, growth, development, and community engagement.

CZI’s Essential Open Source Software for Science program supports software maintenance, growth, development, and community engagement for open source tools critical to science. And the Chan Zuckerberg Initiative was founded in 2015 to help solve some of society’s toughest challenges — from eradicating disease and improving education, to addressing the needs of our local communities. Their mission is to build a more inclusive, just, and healthy future for everyone.

Today, we are thrilled to announce

Anaconda Blog 2021-08-30 13:00:00

Pyston Team Joins Anaconda to Expand Open-Source Project Development

We’re optimistic about the potential for Pyston to improve the Python experience for all users and reduce the costs of deploying Python applications at scale. Keep your eyes on this space for future announcements about the Pyston roadmap and other Anaconda initiatives to advance scalable computing in Python.
Pierre de Buyl's homepage - scipy 2021-08-24 13:00:00

A paper on the Lees-Edwards method

A few years ago1, Sebastian contacted me to help with simulations. Great, I like simulation studies, so we start discussing the details. The idea: use an established method, the Lees-Edwards boundary condition, to study colloids under shear.

Quansight Labs 2021-08-18 00:01:00

Is GitHub Actions suitable for running benchmarks?

Benchmarking software is a tricky business. For robust results, you need dedicated hardware that only runs the benchmarking suite under controlled conditions. No other processes! No OS updates! Nothing else! Even then, you might find out that CPU throttling, thermal regulation and other issues can introduce noise in your measurements.

So, how are we even trying to do it on a CI provider like GitHub Actions? Every job runs in a separate VM instance with frequent updates and shared resources. It looks like it would just be a very expensive random number generator.

Well, it turns out that there is a sensible way to do it: relative benchmarking. And we know it works because we have been collecting stability data points for several weeks.

Read more… (13 min remaining to read)

Anaconda Blog 2021-08-17 13:00:00

How OpenEye Scientific Leverages Anaconda to Power its Cloud-Native Molecular Design Platform

Anaconda provides OpenEye Scientific with a reliable solution for Python Packaging Management in its Orion platform. Our tools enable Orion to utilize Python environments and provide essential features, including computation, storage, analysis, and more. Having easy access to scientific libraries is a powerful benefit for scientific developers working with Orion, and without Anaconda, this would not be feasible. Additionally, our seamless user experience plays an invaluable role in contributing to Orion’s Cloud-Native Molecular Design Platform. At Anaconda, we’re happy to provide a tool that supports growth in the scientific community.
Blog – Enthought 2021-08-10 15:44:41

Machine Learning in Materials Science

The process of materials discovery is complex and iterative, requiring a level of expertise to be done effectively. Materials workflows that require human judgement present a specific challenge to the discovery process, which can be leveraged as an opportunity to introduce digital technologies.  In the lab, many tasks require manual data collection and judgement. And …
Continue Reading
Anaconda Blog 2021-08-05 18:15:00

From Agriculture to Art, Four Unexpected Ways Data Science is Improving our World

The application of data science to a broad set of fields can improve innovation across the whole realm of human experience. Highlighting just a few ways data science has already contributed to these advances makes us even more optimistic for the eventual impact that ongoing breakthroughs in the data profession will have. At Anaconda, we’re excited to continue our support of this innovation. We can’t wait to see the unique and compelling ways data science will continue to improve our world.
Anaconda Blog 2021-07-29 14:30:00

State of Data Science 2021: Becoming “Essential,” Though Untapped Potential Remains

Among the many new concepts introduced to us throughout 2020 was the idea of the “essential worker.” What jobs or roles are so integral to the functioning of a business that the company wouldn’t be able to operate without them? The idea of “essentialness” provides a helpful framework to consider how effectively data science has integrated with an organization’s most important operations.
Blog – Enthought 2021-07-23 13:25:20

FORGE-ing Ahead: Charting the Future of Geothermal Energy

A microseismic event loaded from the Frontier Observatory for Research in Geothermal Energy (FORGE) distributed acoustic sensing (DAS) data into a Jupyter notebook showing energy from a microseismic event arriving at about 7.5 seconds. These microseisms bring information about the process of stimulation. However, in the data set there are relatively few and they are …
Continue Reading
Living in an Ivory Basement 2021-07-19 22:00:00

A biotech career panel in the DIB Lab

Careers outside of universities!

Anaconda Blog 2021-07-15 15:25:00

A Python Data Scientist’s Guide to the Apple Silicon Transition

The M1 Macs are an exciting opportunity to see what laptop/desktop-class ARM64 CPUs can achieve. For general usage, the performance is excellent, but these systems are not aimed at the data science and scientific computing user yet. If you want an M1 for other reasons, and intend to do some light data science, they are perfectly adequate. For more intense usage, you’ll want to stick with Intel Macs for now, but keep an eye on both software development as compatibility improves and future ARM64 Mac hardware, which likely will remove some of the constraints we see today.
Sparrow Computing 2021-07-08 16:09:47

Poetry for Package Management in Machine Learning Projects

When you’re building a production machine learning system, reproducibility is a proxy for the effectiveness of your development process. But without locking all your Python dependencies, your builds are not actually repeatable. If you work in a Python project without locking long enough, you will eventually get a broken build ... Read More

The post Poetry for Package Management in Machine Learning Projects appeared first on Sparrow Computing.

Sparrow Computing 2021-06-29 20:38:29

Development containers in VS Code: a quick start guide

If you’re building production ML systems, dev containers are the killer feature of VS Code. Dev containers give you full VS Code functionality inside a Docker container. This lets you unify your dev and production environments if production is a Docker container. But even if you’re not targeting a Docker ... Read More

The post Development containers in VS Code: a quick start guide appeared first on Sparrow Computing.

Living in an Ivory Basement 2021-06-28 22:00:00

New sourmash databases are available!

Databases are now available for GTDB!

Filipe Saraiva's blog 2021-06-25 12:06:45

Colunando no O Estado do Piauí

O Estado do Piauí é um novo jornal que surgiu recentemente pelas bandas de lá. Com um foco maior em reportagens longas e densas, misturando jornalismo investigativo e literário, o projeto pretende discutir em profundidade os temas de interesse do estado, descobrir histórias piauienses únicas, repercutir situações problemáticas, apontar alternativas e muito mais. Não se… Continue a ler »Colunando no O Estado do Piauí
Blog – Enthought 2021-06-23 16:27:51

Lessons for Geoscientists from the book Real World AI: A Practical Guide for Responsible Machine Learning

In this blog article Enthought Energy Solutions vice president Mason Dykstra looks at the recently published book titled “Real World AI: A Practical Guide for Responsible Machine Learning” in the context of both the technical challenges faced by geoscientists and how to scale. Author: Mason Dykstra, Ph.D., Vice President, Energy Solutions  In the newly released …
Continue Reading
Blog – Enthought 2021-06-22 13:27:21

Leveraging AI in Cell Culture Analysis

Mammalian cell culture is a fundamental tool for many discoveries, innovations and products in the life sciences. Currently, cells are the smallest unit of sustainable life outside the body, thereby providing an essential platform for testing hypotheses and mimicking biological processes. The applications of cell culture, while not limitless, are plentiful.  Every cell type, downstream …
Continue Reading
Filipe Saraiva's blog 2021-06-21 21:51:57

Ciclo de Entrevistas sobre as Pesquisas no PPGCC da UFPA – Inteligência Computacional

A Faculdade de Computação e o Programa de Pós-Graduação em Ciência da Computação da UFPA estão desenvolvendo um projeto que pretende atingir dois objetivos: o primeiro, fazer uma melhor divulgação para o público externo à universidade do que produzimos em nossas pesquisas; o segundo, uma melhor divulgação INTERNA da mesma coisa – o que desenvolvemos… Continue a ler »Ciclo de Entrevistas sobre as Pesquisas no PPGCC da UFPA – Inteligência Computacional
Blog – Enthought 2021-06-15 19:54:08

Enthought Announces Formation of Digital Transformation, Materials Science Advisory Boards

Austin, TX – June 15, 2021 – Enthought, the leading provider of technologies and services that deliver digital innovation to science-driven companies, is experiencing rapid growth as companies look to accelerate their adoption of new technologies, such as artificial intelligence and machine learning, in response to COVID-19. In support of Enthought’s growth, strategic vision and …
Continue Reading
AI Pool Articles 2021-06-08 18:19:38

Visualization with Seaborn

This article will enable you to use the seaborn python package to visualize your structured data with seaborn barchart, scatter plot, seaborn histogram, line, and seaborn distplot.
Living in an Ivory Basement 2021-06-07 22:00:00

Searching all public metagenomes with sourmash

Searching all the things!

AI Pool Articles 2021-05-24 16:10:20

Understanding of Probability Distribution and Normal Distribution

Introduction of probability distribution and its types. Here you can find the intuition about the normal or gaussian distribution, standard normal distribution with the normal curve and normal distribution formula.
Pierre de Buyl's homepage - scipy 2021-05-21 13:00:00

Is your software ready for the Journal of Open Source Software?

For the unaware reader, the Journal of Open Source Software (JOSS) is an open-access scientific journal founded in 2016 and aimed at publishing scientific software. A JOSS article in itself is short and its publication contributes to recognize the work on the software. I share here my point of view on what makes some software tools more ready to be published in JOSS. I do not comment on the size or the relevance for research which are both documented on JOSS' website.

AI Pool Articles 2021-05-29 13:40:17

Introduction of Fast Fourier Transformation (FFT)

This article comprises of introduction to the Fourier series, Fourier analysis, Fourier transformation, why do we use it, an explanation of the FFT algorithm, and its implementation.
Living in an Ivory Basement 2021-05-16 22:00:00

sourmash 4.1.0 released!!

sourmash v4.1.0 is here!

AI Pool Articles 2021-05-15 12:19:22

Using Autoencoder to generate digits with Keras

This article contains a real-time implementation of an autoencoder which we will train and evaluate using very known public benchmark dataset called MNIST data.
AI Pool Articles 2021-05-15 10:22:56

Understanding of Support Vector Machine (SVM)

Explanation of the support vector machine algorithm, the types, how it works, and its implementation using the python programming language with the sklearn machine learning package
Sparrow Computing 2021-05-14 20:11:16

Basic Counting in Python

I love fancy machine learning algorithms as much as anyone. But sometimes, you just need to count things. And Python’s built-in data structures make this really easy. Let’s say we have a list of strings: With a list like this, you might care about a few different counts. What’s the ... Read More

The post Basic Counting in Python appeared first on Sparrow Computing.

AI Pool Articles 2021-05-14 16:19:07

Confidence Interval Understanding

Explanation of confidence intervals and the how-to calculate it for different scenarios, and also the equation that makes the confidence interval and the parameters involved with it
AI Pool Articles 2021-05-14 16:15:32

Decision Trees

Intuition and implementation of the first tree-based algorithm in machine learning
AI Pool Articles 2021-05-14 16:01:47

Dimensionality Reduction, PCA Intro

We will be covering a dimensionality reduction algorithm called PCA (Principal Components Analysis) and will show how it helps to understand the data you have.
AI Pool Articles 2021-05-13 18:17:40

Understanding Autoencoders - An Unsupervised Learning approach

This article covers the concept of Autoencoders. Concepts like What are Autoencoders, Architecture of an Autoencoder, and intuition behind the training of Autoencoders.
Sparrow Computing 2021-05-13 18:11:11

How to Use the PyTorch Sigmoid Operation

The PyTorch sigmoid function is an element-wise operation that squishes any real number into a range between 0 and 1. This is a very common activation function to use as the last layer of binary classifiers (including logistic regression) because it lets you treat model predictions like probabilities that their ... Read More

The post How to Use the PyTorch Sigmoid Operation appeared first on Sparrow Computing.

AI Pool Articles 2021-05-13 16:07:08

Optimization Methods, Gradient Descent

This article covers a sublime explanation and a simple example of Vanilla Gradient Descent algorithm, Stochastic Gradient Descent, Momentum Optimizer, and Adam Optimizer in which RMSProp is also explained
AI Pool Articles 2021-05-11 17:24:10

Understanding of Regularization in Neural Networks

This article includes the different techniques of regularization like Data Augmentation, L1, L2, Dropout, and Early Stopping
AI Pool Articles 2021-05-10 18:04:00

Diving into Object Detection Basics

A guide for Object Detection basic concepts which cover What is Object Detection and how does it work, Concept of Anchor Boxes, Why is Loss function necessary, some free datasets, and finally, implementation of SSD.
AI Pool Articles 2021-05-10 18:03:29

Normalization in Deep learning

Different types of Normalization in Deep Learning. A very useful technique to avoid overfitting and generalize your model better.
AI Pool Articles 2021-05-10 18:03:08

Dropout in Deep Learning

Understanding Dropouts in Deep Learning to reduce overfitting
AI Pool Articles 2021-05-10 18:02:37

Yolov3 and Yolov4 in Object Detection

Explanation of object detection with various use cases and algorithms. Specifically, how the yolov3 and yolov4 architectures are structured, and how they perform object detection
AI Pool Articles 2021-05-10 18:02:03

End-To-End PyTorch Example of Image Classification with Convolutional Neural Networks

Image classification solutions in PyTorch with popular models like ResNet and its variations. End-To-End solution for CIFAR10/100 and ImageNet datasets.
AI Pool Articles 2021-05-10 18:00:28

Supervised learning with Scikit-Learn Library

How to create a model for supervised learning like linear and logistic regression with scikit-learn python library
AI Pool Articles 2021-05-10 18:00:13

Linear and Logistic Regression

Intuition and implementation behind the base algorithms for supervised machine learning
AI Pool Articles 2021-05-10 17:59:02

Random Forests Understanding

Intuition and Implementation on a key algorithm to reduce overfitting in tree based algorithms
AI Pool Articles 2021-05-10 17:57:58

Activation Functions for Neural Networks

In this article, explaination of various activation functions has been given like Linear, ELU, ReLU, Sigmoid, and tanh.
Blog – Enthought 2021-05-06 12:12:46

AI Needs the ‘Applied Sciences’ Treatment

As industries rapidly advance in AI/machine learning, a key to unlocking the power of these approaches for companies is an enabling environment. Domain experts need to be able to use artificial intelligence on data relevant to their work, but they should not have to know computer or data science techniques to solve their problems. An …
Continue Reading 2021-04-12 22:00:00

On the Link Between Optimization and Polynomials, Part 4

While the most common accelerated methods like Polyak and Nesterov incorporate a momentum term, a little known fact is that simple gradient descent –no momentum– can achieve the same rate through only a well-chosen sequence of step-sizes. In this post we'll derive this method and through simulations discuss its practical …

NumFOCUS 2021-04-09 18:02:05

NumFOCUS Welcomes Tesco Technology to Corporate Sponsors

NumFOCUS is pleased to announce our new partnership with Tesco Technology. A long-time PyData event sponsor, Tesco Technology joined NumFOCUS as a Silver Corporate Sponsor in December 2020. “We are very excited to formalize our partnership with Tesco Technology,” said Leah Silen, NumFOCUS Executive Director. “Tesco Technology has partnered with NumFOCUS for the past several […]

The post NumFOCUS Welcomes Tesco Technology to Corporate Sponsors appeared first on NumFOCUS.

NumFOCUS 2021-04-08 21:14:55

Job Posting | Communications and Marketing Manager

Job Title: Communications and Marketing Manager Position Overview The primary role of the Communications & Marketing Manager is to manage the NumFOCUS brand by overseeing all outgoing communications between NumFOCUS and our stakeholders. You will serve the project communities by playing a key role in their event marketing management and assist with project promotional and […]

The post Job Posting | Communications and Marketing Manager appeared first on NumFOCUS.

Acoular 2021-04-01 05:00:00

Getting started with Acoular - Part 1

This is the first in a series of three blog posts about the basic use of Acoular. It explains some fundamental concepts and walks through a simple example. Acoular is a Python library that processes multichannel data (up to a few hundred channels) from acoustic measurements with a microphone array. The focus of the processing is on the construction of a map of acoustic sources. This is somewhat similar to taking an acoustic photograph of some sound sources.
Acoular 2021-04-01 05:00:00

Getting started with Acoular - Part 2

This is the second in a series of three blog posts about the basic use of Acoular. It assumes that you already have read the first post and continues by explaining some more concepts and additional methods. Acoular is a Python library that processes multichannel data (up to a few hundred channels) from acoustic measurements with a microphone array. The focus of the processing is on the construction of a map of acoustic sources. This is somewhat similar to taking an acoustic photograph of some sound sources.
Acoular 2021-04-01 05:00:00

Getting started with Acoular - Part 3

This is the third and final in a series of three blog posts about the basic use of Acoular. It assumes that you already have read the first two posts and continues by explaining additional concepts to be used with time domain methods. Acoular is a Python library that processes multichannel data (up to a few hundred channels) from acoustic measurements with a microphone array. The focus of the processing is on the construction of a map of acoustic sources. This is somewhat similar to taking an acoustic photograph of some sound sources. To continue, we do the same set up as in Part 1. However, as we are setting out to do some signal processing in time domain, we define only TimeSamples, MicGeom, RectGrid and SteeringVector objects but no PowerSpectra or BeamformerBase. import acoular ts = acoular.TimeSamples( name="three_sources.h5" ) mg = acoular.MicGeom( from_file="array_64.xml" ) rg = acoular.RectGrid( x_min=-0.2, x_max=0.2, y_min=-0.2, y_max=0.2, z=0.3, increment=0.01 ) st = acoular.SteeringVector( grid=rg, mics=mg (continued...)
Blog – Enthought 2021-03-24 18:55:46

Geophysics in the Cloud Competition

Join the 2021 GSH Geophysics in the cloud competition. Build a novel seismic inversion app and access all the data on demand with serverless cloud storage. Example notebooks show how to access this data and use AWS SageMaker to build your ML models. With prizes. Author: Ben Lasscock, Ph.D., Manager, Strategic Technologies, Energy Solutions   …
Continue Reading
Sparrow Computing 2021-03-22 23:54:00

PyTorch Tensor to NumPy Array and Back

You can easily convert a NumPy array to a PyTorch tensor and a PyTorch tensor to a NumPy array. This post explains how it works.

The post PyTorch Tensor to NumPy Array and Back appeared first on Sparrow Computing.

Sparrow Computing 2021-03-20 03:15:00

TorchVision Transforms: Image Preprocessing in PyTorch

TorchVision, a PyTorch computer vision package, has a great API for image pre-processing in its torchvision.transforms module. This post gives some basic usage examples, describes the API and shows you how to create and use custom image transforms.

The post TorchVision Transforms: Image Preprocessing in PyTorch appeared first on Sparrow Computing.

Blog – Enthought 2021-03-09 18:37:16

Giving Visibility to Renewable Energy

The EnergizAIR Infrastructure framework and key interfaces, with the Enthought responsibility on the project shown in the central, grey box. The ultimate project goal was to raise individual awareness of the contribution of renewable energy sources, and ultimately change behaviors. Now ten years later, with orders of magnitude more data, AI/machine learning, cloud, and smartphones …
Continue Reading 2021-03-01 23:00:00

On the Link Between Optimization and Polynomials, Part 3

I've seen things you people wouldn't believe.
Valleys sculpted by trigonometric functions.
Rates on fire off the shoulder of divergence.
Beams glitter in the dark near the Polyak gate.
All those landscapes will be lost in time, like tears in rain.
Time to halt.

A momentum optimizer *

While My MCMC Gently Samples 2021-02-23 15:00:00

Introducing PyMC Labs: Saving the World with Bayesian Modeling

After I left Quantopian in 2020, something interesting happened: various companies contacted me inquiring about consulting to help them with their PyMC3 models.

Usually, I don't hear how people are using PyMC3 -- they mostly show up on GitHub or Discourse when something isn't working right. So, hearing about all these …

Martin Fitzpatrick - python 2021-02-22 08:00:00

Using MicroPython and uploading libraries on Raspberry Pi Pico — Using rshell to upload custom code

MicroPython is an implementation of the Python 3 programming language, optimized to run microcontrollers. It's one of the options available for programming your Raspberry Pi Pico and a nice friendly way to get started with microcontrollers.

MicroPython can be installed easily on your Pico, by following the instructions on the …

NumFOCUS 2021-02-10 19:54:10

Job Posting | Events and Digital Marketing Coordinator

Job Title: Events and Digital Marketing Coordinator Position Overview The primary role of the Events and Digital Marketing Coordinator is to support and assist the Events Manager and the Community Communications and Marketing Manager to advance one of NumFOCUS’s primary missions of educating and building the community of users and developers of open source scientific […]

The post Job Posting | Events and Digital Marketing Coordinator appeared first on NumFOCUS.

Living in an Ivory Basement 2021-02-01 23:00:00

Transition your Python project to use pyproject.toml and setup.cfg! (An example.)

Updating old Python packages, in this year of the PSF 2021!

Martin Fitzpatrick - python 2021-01-28 14:00:00

SAM Coupé SCREEN$ Converter — Interrupt optimizing image converter

The SAM Coupé was a British 8 bit home computer that was pitched as a successor to the ZX Spectrum, featuring improved graphics and sound and higher processor speed.

The SAM Coupé's high-color MODE4 could manage 256x192 resolution graphics, with 16 colors from a choice of 128. Each pixel …

Living in an Ivory Basement 2021-01-24 23:00:00

A snakemake hack for checkpoints

snakemake checkpoints r awesome

Martin Fitzpatrick - python 2021-01-21 07:00:00

micro:bit Space Invaders — MicroPython retro game in just 25 pixels

How much game can you fit into 25 pixels? Quite a bit it turns out.

This is a mini clone of arcade classic Space Invaders for the BBC micro:bit microcomputer. Using the accelerometer and two buttons for input, to can beat off wave after wave of aliens that advance …

ListenData 2021-01-06 10:35:00

Run SAS in Python without Installation

In the past few years python has gained a huge popularity as a programming language in data science world. Many banks and pharma organisations have started using Python and some of them are in transition stage, migrating SAS syntax library to Python. Many big organisations have been using SAS since early 2000 and they developed a hundreds of SAS codes for various tasks ranging from data extraction to model building and validation. Hence it's a marathon task to migrate SAS code to any other programming language. Migration can only be done in phases so day to day tasks would not be hit by development and testing of python code. Since Python is open source it becomes difficult sometimes in terms of maintaining the existing code. Some SAS procedures are very robust and powerful in nature its alternative in Python is still not implemented, might be doable but not a straightforward way for average developer or analyst.

Do you wish