SciPy

Planet SciPy

Sparrow Computing 2023-03-31 17:28:38

How to Label Data for Machine Learning

Machine learning has revolutionized the world of technology, playing a crucial role in various applications, from self-driving cars and facial recognition systems to language translation and sentiment analysis. The success of machine learning models largely depends on the quality and quantity of data they are trained on. In particular, labeled ... Read more

The post How to Label Data for Machine Learning appeared first on Sparrow Computing.

neptune.ai 2023-03-31 08:02:17

Real-World MLOps Examples: End-To-End MLOps Pipeline for Visual Search at Brainly

In this second installment of the series “Real-world MLOps Examples,” Paweł Pęczek, Machine Learning Engineer at Brainly, will walk you through the end-to-end Machine Learning Operations (MLOps) process in the Visual Search team at Brainly. And because it takes more than technologies and processes to succeed with MLOps, he will also share details on:  Enjoy…
ListenData 2023-03-30 08:01:00

Open Source ChatGPT Models: A Step-by-Step Guide

In this article we will explain how Open Source ChatGPT Models works and how you can run them. We will cover two different open source models, namely Alpaca and GPT4All. By the end of this article you should have good understanding of these models and you should be able to run them in Python. Since these models are open source, they are available for free and you don't need to use the paid OpenAI API to access them.

Table of Contents

Alpaca Introduction : Alpaca

A team of researchers from Stanford University developed an open-source language model called Alpaca. It is based on Meta's large-scale language model LLaMA. The team used OpenAI's GPT API (text-davinci-003) to fine tune the LLaMA 7 billion (7B) parameters sized model. The goal of the team is to make AI available for everyone for free so that academicians can do further research without worrying about expensive hardwares to execute these memory-intensive algorithms. Although these open

(continued...)
Sparrow Computing 2023-03-29 20:52:44

Understanding the Data Science Process for Entrepreneurs

As an entrepreneur looking to harness the power of machine learning (ML) in your business, understanding the data science process is crucial. This process can be broken down into three main steps: The goal is to move through these stages as quickly as possible so that you can gather feedback ... Read more

The post Understanding the Data Science Process for Entrepreneurs appeared first on Sparrow Computing.

Anaconda Blog 2023-03-28 13:14:00

Anaconda’s Q1 2023 Open-Source Roundup

About the Author Martin Durant is a former astrophysicist with several years of scientific research experience. He has also worked in medical imaging, building AI/ML pipelines and a research platform. After a brief stint as a data scientist in ad-tech, Martin moved to Anaconda to work on PyData education. He now leads a number of open-source PyData projects, focussing on data access, formats, and parallel processing.
Anaconda Blog 2023-03-23 12:39:00

5 Capabilities Your AI Platform Should Have

AI technology has come a long way in recent years, and it’s become an invaluable tool for businesses looking to stay ahead of the competition. But there’s no such thing as a one-size-fits-all AI platform. Every organization has its own objectives and requirements and its own deployment strategies, so it’s important to select an AI platform that meets your needs.
neptune.ai 2023-03-23 09:24:59

Deploying Large NLP Models: Infrastructure Cost Optimization

NLP models in commercial applications such as text generation systems have experienced great interest among the user. These models have achieved various groundbreaking results in many NLP tasks like question-answering, summarization, language translation, classification, paraphrasing, et cetera.  Models like for example ChatGPT, Gopher **(280B), GPT-3 (175B), Jurassic-1 (178B), and Megatron-Turing NLG (530B) are predominantly very…
Anaconda Blog 2023-03-22 14:22:00

Anaconda’s Response to the CircleCI Security Breach

You may have heard about the recent security breach that affected CircleCI, a container-based continuous integration service that conda-forge uses to build packages in Linux and sometimes, OSX packages.
Anaconda Blog 2023-03-21 21:08:00

New Release: Anaconda Distribution 2023.03

We are pleased to announce the release of Anaconda Distribution 2023.03! Find the relevant release notes here, and download the installer here.
neptune.ai 2023-03-21 11:29:52

Building a Machine Learning Platform [Definitive Guide]

Moving across the typical machine learning lifecycle can be a nightmare. From gathering and processing data to building models through experiments, deploying the best ones, and managing them at scale for continuous value in production—it’s a lot.  As the number of ML-powered apps and services grows, it gets overwhelming for data scientists and ML engineers…
neptune.ai 2023-03-20 14:07:01

Managing Dataset Versions in Long-Term ML Projects

Long-term ML project involves developing and sustaining applications or systems that leverage machine learning models, algorithms, and techniques. As a result of the life span of these apps and systems, the ML models associated require to be constantly updated, redeployed, and maintained, which means that they require proper dataset version management.  An example of a…
neptune.ai 2023-03-15 18:38:39

How to Build a CI/CD MLOps Pipeline [Case Study]

Based on the McKinsey survey, 56% of orgs today are using machine learning in at least one business function. It’s clear that the need for efficient and effective MLOps and CI/CD practices is becoming increasingly vital.  This article is a real-life study of building a CI/CD MLOps pipeline. We’ll delve into the MLOps practices and strategies…
neptune.ai 2023-03-15 15:55:33

Comparing Tools For Data Processing Pipelines

If you will ask data professionals about what is the most challenging part of their day to day work, you will likely discover their concerns around managing different aspects of data before they get to graduate to the data modeling stage.  Data professionals spend most of their time managing data in various forms – be…
neptune.ai 2023-03-14 13:50:19

How Did We Get to ML Model Reproducibility

When working on real-world ML projects, you come face-to-face with a series of obstacles. The ml model reproducibility problem is one of them. This article is going to take you through an experience-based, step-by-step approach to solve the ml reproducibility challenge taken by my ML team working on a fraud detection system for the insurance…
ListenData 2023-03-12 07:26:00

Complete Guide to Visual ChatGPT

In this post, we will talk about how to run Visual ChatGPT in Python with Google Colab. ChatGPT has garnered huge popularity recently due to its capability of human style response. As of now, it only provides responses in text format, which means it cannot process, generate or edit images. Microsoft recently released a solution for the same to handle images. Now you can ask ChatGPT to generate or edit the image for you.

Demo of Visual ChatGPT

In the image below, you can see the final output of Visual ChatGPT - how it looks like.

READ MORE »
This post appeared first on ListenData
Anaconda Blog 2023-03-08 15:12:00

ChatGPT: AI-Assisted Writing Pros, Cons, and Tips to 3x Content Production

Prompt I gave ChatGPT and Copymatic: Write a blog article with tips for enterprise teams that want to add AI writers to their content production processes. Include the pros and cons of using AI writers for content generation, then provide 5 good recommendations for how teams can use AI writers in the content strategy and copywriting processes.
Living in an Ivory Basement 2023-03-02 23:00:00

snakemake for doing bioinformatics - using wildcards to generalize your rules

Slithering your way into bioinformatics with snakemake, wildcard version

Anaconda Blog 2023-03-02 18:16:00

ChatGPT: Is Adding AI Writers To Your Content Production Process Worth It?

And this is only the beginning, as the GPT-3 model already has new capabilities and is considered to be in version GPT-3.5 today. Already, search-engine marketers are buzzing about GPT-4, OpenAI’s highly-anticipated next version of the language model, expected later this year.
Sparrow Computing 2023-03-02 16:32:08

Saving Utility Companies Years with Computer Vision

How do utility companies monitor thousands of miles of electrical wire to find small imperfections that threaten the entire system? For the entire history of electrical infrastructure, the only answer has been ‘very slowly.’ Now, Sparrow’s computer vision capabilities, combined with Fast Forward’s thermal imaging system, can accomplish what used ... Read more

The post Saving Utility Companies Years with Computer Vision appeared first on Sparrow Computing.

neptune.ai 2023-02-28 08:18:09

Distributed Training: Errors to Avoid

In this era of large language models (LLMs), monolithic foundation models, and increasingly enormous datasets, distributed training is a must, as both data and model weights very rarely fit on a single machine. However, distributed training in ML is complex and error-prone, with many hidden pitfalls that can cause huge issues in the model training…
neptune.ai 2023-02-27 13:13:24

Managing Computer Vision Projects with Michał Tadeusiak

This article was originally an episode of the MLOps Live, an interactive Q&A session where ML practitioners answer questions from other ML practitioners.  Every episode is focused on one specific ML topic, and during this one, we talked to Michal Tadeusiak about managing computer vision projects. You can watch it on YouTube: Or listen to…
Anaconda Blog 2023-02-22 19:10:00

Upcoming Release(s): Anaconda Distribution 2023.03 and Beyond

Anaconda Distribution 2023.03 Installer We are pleased to announce the upcoming release of the Anaconda Distribution 2023.03 installer, scheduled for March 2023. The Anaconda Distribution 2023.03 installer comes with support for Python 3.10 and an updated Anaconda Navigator 2.3.2.
Anaconda Blog 2023-02-14 14:30:00

Four Open-Source Projects and the Anaconda Maintainers Who Love Them

To share the love, we are offering one month of free access to Anaconda Starter tier with code "LOVEDATA." For a limited time, this code will grant you full access to these features:
Anaconda Blog 2023-02-07 15:11:00

Code in the Cloud With Anaconda—for Free!

Last Fall, we introduced Anaconda's fully-loaded and ready-to-code cloud notebook as part of Anaconda’s paid subscription plans. Today, we are thrilled to deliver on our mission to empower people with data literacy and announce the free availability of Anaconda’s cloud notebook. Now, truly anyone can break into the world of data science and start coding immediately.
neptune.ai 2023-02-06 07:53:27

Training Models on Streaming Data [Practical Guide]

What comes into your mind when you hear Streaming Data? May be data generated through video streaming platforms like YouTube, but this is not the only thing which qualifies as streaming data. There are many platforms and sources that generate this kind of data. In this article: What is streaming data? “Streaming data is a…
Anaconda Blog 2023-02-02 22:00:00

Take Our OSS Security Survey!

OSS Sparks and Accelerates Innovation Open-source software (OSS) reflects a comprehensive and quickly evolving ecosystem of innovators who collaborate on a global scale. OSS offers individuals and organizations flexibility, control, and a cost-effective way to harness the power of this community. As such, usage of OSS has become extensive; in fact, a 2022 report by Synopsys reveals that 97% of audited codebases use OSS, with OSS comprising 78% of the code in said codebases. OSS is one of the main drivers contributing to the rise and widespread adoption of machine learning and artificial intelligence. The ubiquitousness of OSS is reflected in everything from searching the web to ordering a product on a smartphone.
neptune.ai 2023-01-25 15:48:47

Building a Sentiment Classification System With BERT Embeddings: Lessons Learned

Sentiment analysis, commonly referred to as opinion mining/sentiment classification, is the technique of identifying and extracting subjective information from source materials using computational linguistics, text analysis, and natural language processing. It is frequently used to assess a speaker or writer’s perspective on a subject or the overall contextual polarity of a piece of writing. The…
neptune.ai 2023-01-23 17:09:35

MLOps Is an Extension of DevOps. Not a Fork — My Thoughts on THE MLOPS Paper as an MLOps Startup CEO

By now, everyone must have seen THE MLOps paper. “Machine Learning Operations (MLOps): Overview, Definition, and Architecture” By Dominik Kreuzberger, Niklas Kühl, Sebastian Hirschl Great stuff. If you haven’t read it yet, definitely do so. The authors give a solid overview of: They tackle the ugly problem in the canonical MLOps movement: How do all…
Living in an Ivory Basement 2023-01-22 23:00:00

snakemake for doing bioinformatics - a beginner's guide (part 2)

Slithering your way into bioinformatics with snakemake, round 2.

Living in an Ivory Basement 2023-01-13 23:00:00

snakemake for doing bioinformatics - a beginner's guide (part 1)

Slithering your way into bioinformatics with snakemake

Living in an Ivory Basement 2023-01-07 23:00:00

sourmash has a plugin interface!

Enabling plugins in sourmash, for less directed & more incoherent progress!

Filipe Saraiva's blog 2022-12-15 01:13:41

A obsolescência humana na novela

Passei o dia no trabalho brincando com o ChatGPT, a inteligência artificial para conversas. Travamos diálogos surreais e esdrúxulos: perguntei a ela como seria a América Latina caso tivesse sido colonizada pela Inglaterra e também qual a relação entre Senhor dos Anéis e Game of Thrones. Em outra, pedi que escrevesse um diálogo fictício entre… Continue a ler »A obsolescência humana na novela
Sparrow Computing 2022-12-14 17:55:08

Speed Trap

Overview This post is going to showcase the development of a vehicle speed detector using Sparrow Computing’s open-source libraries and PyTorch Lightning. The exciting news here is that we could make this speed detector for any traffic feed without prior knowledge about the site (no calibration required), or specialized imaging ... Read more

The post Speed Trap appeared first on Sparrow Computing.

ListenData 2022-12-09 08:31:00

ChatGPT-4 Is a Smart Analyst, Unlike GPT-3.5

ChatGPT has been trending on social media platforms. It has crossed one million users in just a week time. Those who haven't heard about ChatGPT, it's a large language model trained by OpenAI. In simple words, it's a chat bot which answers your questions and the responses it provides may sound human-like. It's an impressive machine learning solution. With the release of GPT-4 we can rely on it over Google search for learning on any topic.

Update: I updated this article with reviews on GPT-4.
Why ChatGPT-3.5 Isn't Smart enough, but GPT-4 is

You can't trust ChatGPT-3.5 for preparation on any certification or exam. It's a Big NO if you think you can refer ChatGPT-3.5 for answering questions in a telephonic interview round. Yes I know it's a cheating if you even use Google for the same but wanted to give a WARNING as many people do this and many social media influencers posted on how to leverage ChatGPT-3.5 for cracking

(continued...)
Spyder Blog 2022-11-30 00:00:00

Improvements to the Spyder IDE installation experience

Juan Sebastian Bautista, C.A.M. Gerlach and Carlos Cordoba also contributed to this post.

Spyder 5.4.0 was released recently, featuring some major enhancements to its Windows and macOS standalone installers. You'll now get more detailed feedback when new versions are available, and you can download and start the update to them from right within Spyder, instead of having to install them manually. In this post, we'll go over how these new update features work and how you can start using them!

Before proceeding, we want to acknowledge that this work was made possible by a Small Development Grant awarded to Spyder by NumFOCUS, which has enabled us to hire a new developer (Juan Sebastian Bautista Rojas) to be in charge of all the implementation details.

Before these improvements, Spyder already had a mechanism to detect more recent versions, but that functionality was very simple. There was a pop-up dialog warning that a new version was available, but users had to

(continued...)
scikit-learn Blog 2022-11-30 00:00:00

Interview with Meekail Zain, scikit-learn Team Member

Author: Reshama Shaikh , Meekail zain
Spyder Blog 2022-11-18 12:00:00

Introducing the Spyder-Watchlist plugin

Spyder's Variable Explorer is a great tool which aids the development and debugging of Python code by displaying all variables from the current scope. One thing the Variable Explorer is missing is the ability to display the value of arbitrary, user-definable expressions while debugging. For example, it might be useful to see the value of a specific attribute of an object, or the value of an array at some index. Such a feature is known as a "watchlist" or "watches" in other Integrated Development Environments (IDEs). This blog post introduces the Watchlist plugin developed for Spyder.

Features

The watchlist consists of a user-definable list of expressions. They are evaluated after each debugger step, and the result of the evaluation is displayed as a string. This means that value = str(eval(expression)) is performed behind the scenes, and the result is shown in the plugin. The watchlist is a very powerful tool, but this comes at a cost: Any side effect of an expression will affect the execution environment.

Expressions can be

(continued...)
Filipe Saraiva's blog 2022-11-15 02:42:48

Por que abandonamos os blogs?

Interface de escrita do Twitter Estamos nesses dias assistindo o Elon Musk destruir o Twitter. Se espera que nessa dinâmica, ao longo do tempo, a rede social vá perdendo usuários e relevância – isso se não explodir de uma vez, pois seu novo dono fala até em falência. Não é a primeira vez que uma… Continue a ler »Por que abandonamos os blogs?
scikit-learn Blog 2022-11-08 00:00:00

Pandas DataFrame Output for sklearn Transformers

Author: Sangam SwadiK
fa.bianp.net 2022-10-14 22:00:00

The Russian Roulette: An Unbiased Estimator of the Limit

The idea for what was later called Monte Carlo method occurred to me when I was playing solitaire during my illness.

Stanislaw Ulam, Adventures of a Mathematician

The Russian Roulette offers a simple way to construct an unbiased estimator for the limit of a sequence. It allows for example to …

scikit-learn Blog 2022-10-13 00:00:00

scikit-learn and Hugging Face join forces

Author: Lysandre Debut , François Goupil
fa.bianp.net 2022-08-25 22:00:00

Notes on the Frank-Wolfe Algorithm, Part III: backtracking line-search

Backtracking step-size strategies (also known as adaptive step-size or approximate line-search) that set the step-size based on a sufficient decrease condition are the standard way to set the step-size on gradient descent and quasi-Newton methods. However, these techniques are much less common for Frank-Wolfe-like algorithms. In this blog post I …

scikit-learn Blog 2022-09-29 00:00:00

scikit-learn Sprint in Salta, Argentina

Author: Juan Martín Loyola
Spyder Blog 2022-07-25 12:00:00

New 2022 roadmap and grant funding

For the last couple of months, the Spyder team has been working on defining a new roadmap and submitting grant proposals to fund more features and improvements. We are pleased to announce our roadmap for the rest of 2022, and that two proposals were funded!

The roadmap

Considering the importance of sharing a clear perspective of where the Spyder project is going and where we will be focusing our efforts over the coming months, the team has created an initial roadmap for the rest of 2022. We prioritized the highlighted features and enhancements based on input from issues, face-to-face and virtual discussions, Stack Overflow, social media and other feedback, to try to best capture the interests of our users and community.

The proposals

To help make our roadmap achievable, we wrote and submitted proposals to several different venues and organizations in the last couple of months. While we have yet to hear back from some of them, two have already been funded!

The first was for the

(continued...)
ListenData 2022-07-11 16:05:00

Pollution in India : Real-time AQI Data

Air pollution has become a serious problem in recent years across the world. Effects of Air Pollution is devastating and its harmful effects are not just limited to Humans but also animals and plants as well. It also leads to global warming which is esentially increasing air and ocean temperatures around the world.

Indian cities have been topping the list of polluted cities. In order to solve the problem of air pollution the most important thing is to track air pollution on real-time basis first which alerts people to avoid outdoor activities during high air Pollution. This post explains how you can fetch real-time Air Quality Index (AQI) of Indian cities using Python and R code. It allows both Python and R programmers to pull pollution data.

You can download the dataset which contains static information about Indian states, cities and AQI stations. Variables stored in this dataset will be used further to fetch real-time data.


     
(continued...)
Gaël Varoquaux - programming 2022-07-09 22:00:00

My Mayavi story: discovering open source communities

The Mayavi Python software, and my personal history: A thread on Python and scipy ecosystems, building open source codebase, and meeting really cool and friendly people

I am writing today as a goodbye to the project: I used to be one of the core contributors and maintainers but have been …

ListenData 2022-06-30 14:04:00

Pointwise mutual information (PMI) in NLP

Natural Language Processing (NLP) has secured so much acceptance recently as there are many live projects running and now it's not just limited to academics only. Use cases of NLP can be seen across industries like understanding customers' issues, predicting the next word user is planning to type in the keyboard, automatic text summarization etc. Many researchers across the world trained NLP models in several human languages like English, Spanish, French, Mandarin etc so that benefit of NLP can be seen in every society. In this post we will talk about one of the most useful NLP metric called Pointwise mutual information (PMI) to identify words that can go together along with its implementation in Python and R.

Table of Contents

What is Pointwise mutual information?

PMI helps us to find related words. In other words, it explains how likely the co-occurrence of two words than we would expect by chance. For example the word "Data Science" has a specific meaning when these

(continued...)
Acoular 2022-06-24 05:00:00

How to import your data into Acoular

Acoular is a Python library that processes multichannel data (up to a few hundred channels) from acoustic measurements with a microphone array which is stored in an HDF5 file. This blog post explains how to convert data available in other formats into this file format. As examples for other file formats we will use both .csv (comma separated text files) and .mat (Matlab files).
fa.bianp.net 2022-05-26 22:00:00

On the Link Between Optimization and Polynomials, Part 5


Six: All of this has happened before.
Baltar: But the question remains, does all of this have to happen again?
Six: This time I bet no.
Baltar: You know, I've never known you to play the optimist. Why the change of heart?
Six: Mathematics. Law of averages. Let a complex …

scikit-learn Blog 2022-05-22 00:00:00

Interview with Norbert Preining, scikit-learn Team Member

Author: Reshama Shaikh , Norbert Preining
ListenData 2022-05-06 11:06:00

Only size-1 arrays can be converted to Python scalars

Numpy is one of the most used module in Python and it is used in a variety of tasks ranging from creating array to mathematical and statistical calculations. Numpy also bring efficiency in Python programming. While using numpy you may encounter this error TypeError: only size-1 arrays can be converted to Python scalars It is one of the frequently appearing error and sometimes it becomes a daunting challenge to solve it.
Meaning : Only Size 1 Arrays Can Be Converted To Python Scalars Error This error generally appears when Python expects a single value but you passed an array which consists of multiple values. For example : you want to calculate exponential value of an array but the function for exponential value was designed for scalar variable (which means single value). When you pass numpy array in the function, it will return this error. This error handling is to prevent your code to process further and avoids unexpected output from the (continued...)
scikit-learn Blog 2022-05-04 00:00:00

Interview with Lucy Liu, scikit-learn Team Member

Author: Reshama Shaikh , Lucy Liu
Living in an Ivory Basement 2022-04-21 22:00:00

Storing 64-bit unsigned integers in SQLite databases, for fun and profit

Storing unsigned longs in SQLite is possible, and can be fast.

scikit-learn Blog 2022-03-21 00:00:00

Behind the Scenes of Data Umbrella scikit-learn Open Source Sprints

Author: Reshama Shaikh , Angela Okune
Living in an Ivory Basement 2022-03-04 23:00:00

The First Common Fund Data Ecosystem Hackathon

We ran a successful pilot hackathon, and we will run a second one soon!

Filipe Saraiva's blog 2022-02-06 14:31:39

Mestrado em Ciência da Computação 2022: Metaheurísticas

Estamos ainda com algumas vagas abertas para o Mestrado em Ciência da Computação na UFPA, Belém. Os interessados, favor olhar as instruções para submissão na página de seleção do programa. Desde meu ingresso no programa venho orientando alunos em diferentes pesquisas sobre inteligência computacional aplicados a problemas de smart grids. Já tivemos trabalhos sobre sistemas multiagentes… Continue a ler »Mestrado em Ciência da Computação 2022: Metaheurísticas
Martin Fitzpatrick - python 2022-01-26 11:00:00

DiffCast: Hands-free Python Screencast Creator — Create reproducible programming screencasts without typos or edits

Programming screencasts are a popular way to teach programming and demo tools. Typically people will open up their favorite editor and record themselves tapping away. But this has a few problems. A good setup for coding isn't necessarily a good setup for video -- with text too small, a window too …

fa.bianp.net 2022-01-09 23:00:00

Optimization Nuggets: Implicit Bias of Gradient-based Methods

When an optimization problem has multiple global minima, different algorithms can find different solutions, a phenomenon often referred to as the implicit bias of optimization algorithms. In this post we'll characterize the implicit bias of gradient-based methods on a class of regression problems that includes linear least squares and Huber …

fa.bianp.net 2021-12-14 23:00:00

Optimization Nuggets: Exponential Convergence of SGD

This is the first of a series of blog posts on short and beautiful proofs in optimization (let me know what you think in the comments!). For this first post in the series I'll show that stochastic gradient descent (SGD) converges exponentially fast to a neighborhood of the solution.

Gaël Varoquaux - programming 2021-10-28 22:00:00

Hiring an engineer and post-doc to simplify data science on dirty data

Note

Join us to work on reinventing data-science practices and tools to produce robust analysis with less data curation.

It is well known that data cleaning and preparation are a heavy burden to the data scientist.

Dirty data research

In the dirty data project, we have been conducting machine-learning research …

Sparrow Computing 2021-10-22 21:27:39

TorchVision Datasets: Getting Started

The TorchVision datasets subpackage is a convenient utility for accessing well-known public image and video datasets. You can use these tools to start training new computer vision models very quickly. TorchVision Datasets Example To get started, all you have to do is import one of the Dataset classes. Then, instantiate ... Read more

The post TorchVision Datasets: Getting Started appeared first on Sparrow Computing.

Sparrow Computing 2021-10-21 14:19:21

NumPy Any: Understanding np.any()

The np.any() function tests whether any element in a NumPy array evaluates to true: The input can have any shape and the data type does not have to be boolean (as long as it’s truthy). If none of the elements evaluate to true, the function returns false: Passing in a ... Read more

The post NumPy Any: Understanding np.any() appeared first on Sparrow Computing.

Sparrow Computing 2021-10-07 20:52:03

PyTorch DataLoader Quick Start

PyTorch comes with powerful data loading capabilities out of the box. But with great power comes great responsibility and that makes data loading in PyTorch a fairly advanced topic. One of the best ways to learn advanced topics is to start with the happy path. Then add complexity when you ... Read more

The post PyTorch DataLoader Quick Start appeared first on Sparrow Computing.

Sparrow Computing 2021-10-06 16:53:32

How the NumPy append operation works

Understanding the np.append() operation and when you might want to use it.

The post How the NumPy append operation works appeared first on Sparrow Computing.

Gaël Varoquaux - programming 2021-09-13 22:00:00

Hiring someone to develop scikit-learn community and industry partners

Note

With the growth of scikit-learn and the wider PyData ecosystem, we want to recruit in the Inria scikit-learn team for a new role. Departing from our usual focus on excellence in algorithms, statistics, or code, we want to add to the team someone with some technical understanding, but an …

Pierre de Buyl's homepage - scipy 2021-08-24 13:00:00

A paper on the Lees-Edwards method

A few years ago1, Sebastian contacted me to help with simulations. Great, I like simulation studies, so we start discussing the details. The idea: use an established method, the Lees-Edwards boundary condition, to study colloids under shear.

Living in an Ivory Basement 2021-07-19 22:00:00

A biotech career panel in the DIB Lab

Careers outside of universities!

Sparrow Computing 2021-07-08 16:09:47

Poetry for Package Management in Machine Learning Projects

When you’re building a production machine learning system, reproducibility is a proxy for the effectiveness of your development process. But without locking all your Python dependencies, your builds are not actually repeatable. If you work in a Python project without locking long enough, you will eventually get a broken build ... Read more

The post Poetry for Package Management in Machine Learning Projects appeared first on Sparrow Computing.

Sparrow Computing 2021-06-29 20:38:29

Development containers in VS Code: a quick start guide

If you’re building production ML systems, dev containers are the killer feature of VS Code. Dev containers give you full VS Code functionality inside a Docker container. This lets you unify your dev and production environments if production is a Docker container. But even if you’re not targeting a Docker ... Read more

The post Development containers in VS Code: a quick start guide appeared first on Sparrow Computing.

Living in an Ivory Basement 2021-06-28 22:00:00

New sourmash databases are available!

Databases are now available for GTDB!

Filipe Saraiva's blog 2021-06-25 12:06:45

Colunando no O Estado do Piauí

O Estado do Piauí é um novo jornal que surgiu recentemente pelas bandas de lá. Com um foco maior em reportagens longas e densas, misturando jornalismo investigativo e literário, o projeto pretende discutir em profundidade os temas de interesse do estado, descobrir histórias piauienses únicas, repercutir situações problemáticas, apontar alternativas e muito mais. Não se… Continue a ler »Colunando no O Estado do Piauí
Filipe Saraiva's blog 2021-06-21 21:51:57

Ciclo de Entrevistas sobre as Pesquisas no PPGCC da UFPA – Inteligência Computacional

A Faculdade de Computação e o Programa de Pós-Graduação em Ciência da Computação da UFPA estão desenvolvendo um projeto que pretende atingir dois objetivos: o primeiro, fazer uma melhor divulgação para o público externo à universidade do que produzimos em nossas pesquisas; o segundo, uma melhor divulgação INTERNA da mesma coisa – o que desenvolvemos… Continue a ler »Ciclo de Entrevistas sobre as Pesquisas no PPGCC da UFPA – Inteligência Computacional
Living in an Ivory Basement 2021-06-07 22:00:00

Searching all public metagenomes with sourmash

Searching all the things!

Pierre de Buyl's homepage - scipy 2021-05-21 13:00:00

Is your software ready for the Journal of Open Source Software?

For the unaware reader, the Journal of Open Source Software (JOSS) is an open-access scientific journal founded in 2016 and aimed at publishing scientific software. A JOSS article in itself is short and its publication contributes to recognize the work on the software. I share here my point of view on what makes some software tools more ready to be published in JOSS. I do not comment on the size or the relevance for research which are both documented on JOSS' website.

Living in an Ivory Basement 2021-05-16 22:00:00

sourmash 4.1.0 released!!

sourmash v4.1.0 is here!

fa.bianp.net 2021-04-12 22:00:00

On the Link Between Optimization and Polynomials, Part 4

While the most common accelerated methods like Polyak and Nesterov incorporate a momentum term, a little known fact is that simple gradient descent –no momentum– can achieve the same rate through only a well-chosen sequence of step-sizes. In this post we'll derive this method and through simulations discuss its practical …

NumFOCUS 2021-04-09 18:02:05

NumFOCUS Welcomes Tesco Technology to Corporate Sponsors

NumFOCUS is pleased to announce our new partnership with Tesco Technology. A long-time PyData event sponsor, Tesco Technology joined NumFOCUS as a Silver Corporate Sponsor in December 2020. “We are very excited to formalize our partnership with Tesco Technology,” said Leah Silen, NumFOCUS Executive Director. “Tesco Technology has partnered with NumFOCUS for the past several […]

The post NumFOCUS Welcomes Tesco Technology to Corporate Sponsors appeared first on NumFOCUS.

NumFOCUS 2021-04-08 21:14:55

Job Posting | Communications and Marketing Manager

Job Title: Communications and Marketing Manager Position Overview The primary role of the Communications & Marketing Manager is to manage the NumFOCUS brand by overseeing all outgoing communications between NumFOCUS and our stakeholders. You will serve the project communities by playing a key role in their event marketing management and assist with project promotional and […]

The post Job Posting | Communications and Marketing Manager appeared first on NumFOCUS.

Acoular 2021-04-01 05:00:00

Getting started with Acoular - Part 2

This is the second in a series of three blog posts about the basic use of Acoular. It assumes that you already have read the first post and continues by explaining some more concepts and additional methods. Acoular is a Python library that processes multichannel data (up to a few hundred channels) from acoustic measurements with a microphone array. The focus of the processing is on the construction of a map of acoustic sources. This is somewhat similar to taking an acoustic photograph of some sound sources.
Acoular 2021-04-01 05:00:00

Getting started with Acoular - Part 3

This is the third and final in a series of three blog posts about the basic use of Acoular. It assumes that you already have read the first two posts and continues by explaining additional concepts to be used with time domain methods. Acoular is a Python library that processes multichannel data (up to a few hundred channels) from acoustic measurements with a microphone array. The focus of the processing is on the construction of a map of acoustic sources. This is somewhat similar to taking an acoustic photograph of some sound sources. To continue, we do the same set up as in Part 1. However, as we are setting out to do some signal processing in time domain, we define only TimeSamples, MicGeom, RectGrid and SteeringVector objects but no PowerSpectra or BeamformerBase. import acoular ts = acoular.TimeSamples( name="three_sources.h5" ) mg = acoular.MicGeom( from_file="array_64.xml" ) rg = acoular.RectGrid( x_min=-0.2, x_max=0.2, y_min=-0.2, y_max=0.2, z=0.3, increment=0.01 ) st = acoular.SteeringVector( grid=rg, mics=mg (continued...)
Acoular 2021-04-01 05:00:00

Getting started with Acoular - Part 1

This is the first in a series of three blog posts about the basic use of Acoular. It explains some fundamental concepts and walks through a simple example. Acoular is a Python library that processes multichannel data (up to a few hundred channels) from acoustic measurements with a microphone array. The focus of the processing is on the construction of a map of acoustic sources. This is somewhat similar to taking an acoustic photograph of some sound sources.
fa.bianp.net 2021-03-01 23:00:00

On the Link Between Optimization and Polynomials, Part 3

I've seen things you people wouldn't believe.
Valleys sculpted by trigonometric functions.
Rates on fire off the shoulder of divergence.
Beams glitter in the dark near the Polyak gate.
All those landscapes will be lost in time, like tears in rain.
Time to halt.

A momentum optimizer *

While My MCMC Gently Samples 2021-02-23 15:00:00

Introducing PyMC Labs: Saving the World with Bayesian Modeling

After I left Quantopian in 2020, something interesting happened: various companies contacted me inquiring about consulting to help them with their PyMC3 models.

Usually, I don't hear how people are using PyMC3 -- they mostly show up on GitHub or Discourse when something isn't working right. So, hearing about all these …

Martin Fitzpatrick - python 2021-02-22 08:00:00

Using MicroPython and uploading libraries on Raspberry Pi Pico — Using rshell to upload custom code

MicroPython is an implementation of the Python 3 programming language, optimized to run microcontrollers. It's one of the options available for programming your Raspberry Pi Pico and a nice friendly way to get started with microcontrollers.

MicroPython can be installed easily on your Pico, by following the instructions on the …