Python MapReduce library written in Cython.

Related tags

Miscellaneoushadoopy
Overview
Brandyn White 
  
Andrew Miller 
  
   

Source  
   https://github.com/bwhite/hadoopy/
Issues  
   https://github.com/bwhite/hadoopy/issues
Docs    
   http://bwhite.github.com/hadoopy/

IRC: #hadoopy @ freenode.net

Requirements
python development headers (python-dev), build tools (build-essential)

Optional
cython (>=.13) (without this it falls back to the pregenerated .c files)

Features
- oozie support
- Automated job parallelization 'auto-oozie' available in the hadoopy_flow project (maintained out of branch)
- typedbytes support (very fast)
- Local execution of unmodified MapReduce job with launch_local
- Read/write sequence files of TypedBytes directly to HDFS from python (readtb, writetb)
- Works on OS X
- Allows printing to stdout and stderr in Hadoop tasks without causing problems (uses the 'pipe hopping' technique, both are available in the task's stderr)
- critical path is in Cython
- works on clusters without any extra installation, Python, or any Python libraries (uses Pyinstaller that is included in this source tree)
- Simple HDFS access (readtb and ls) inside Python, even inside running jobs
- Unit test interface
- Reporting using status and counters (and print statements! no need to be scared of them in Hadoopy)
- Supports design patterns in the Lin/Dyer book (
   http://www.umiacs.umd.edu/~jimmylin/book.html)

Limitations
- Hadoop Local currently unsupported due to a bug in Hadoop's handling of the distributed cache in this mode.  Use psuedo-distributed instead for now.  (
   https://github.com/bwhite/hadoopy/issues/40)

Used in
- A Case for Query by Image and Text Content: Searching Computer Help using Screenshots and Keywords (to appear in WWW'11)
- Web-Scale Computer Vision using MapReduce for Multimedia Data Mining (at KDD'10)
- Vitrieve: Visual Search engine
- Picarus: Hadoop computer vision toolbox

Ubuntu Install (others are similar)
sudo apt-get install python-dev build-essential
sudo python setup.py install
  
 
Owner
Brandyn White
Brandyn White
An example module hooking system, will be used in PySAMP.

An example module hooking system, will be used in PySAMP.

2 May 01, 2022
Explore-bikeshare-data - GitHub project as part of the Programming for Data Science with Python Nanodegree from Udacity

Date created February 10, 2022 Project Title Explore US Bikeshare Data Descripti

Thárcyla 1 Feb 14, 2022
Python Cheat Sheet

Introduction Pysheeet was created with intention of collecting python code snippets for reducing coding hours and making life easier and faster. Any c

CHANG-NING TSAI 7.5k Dec 30, 2022
Python - Aprendendo Python na ByLearn

PYTHON Identação Escopo Pai Escopo filho Escopo neto Variaveis

Italo Rafael 3 May 31, 2022
A professional version for LBS

呐 Yuki Pro~ 懒兵服御用版本,yuki小姐觉得没必要单独造一个仓库,但懒兵觉得有必要并强制执行 将na-yuki框架抽象为模块,功能拆分为独立脚本,使用脚本注释器使其作为py运行 文件结构: na_yuki_pro_example.py 是一个说明脚本,用来直观展示na,yuki! Pro

1 Dec 21, 2021
Advanced python code - For students in my advanced python class

advanced_python_code For students in my advanced python class Week Topic Recordi

Ariel Avshalom 3 May 27, 2022
Never see escaped bytes in output.

Uniout It makes Python print the object representation in readable chars instead of the escaped string. Example from pprint import pprint lang

Mosky Liu 156 Oct 21, 2022
A Gura parser implementation for Python

Gura parser This repository contains the implementation of a Gura format parser in Python. Installation pip install gura-parser Usage import gura gur

JWare Solutions 19 Jan 25, 2022
Import Apex legends mprt files exported from Legion

Apex-mprt-importer-for-Blender Import Apex legends mprt files exported from Legion. REQUIRES CAST IMPORTER Usage: Use a VPK extracter to extract the m

15 Dec 18, 2022
The purpose of this script is to bypass disablefund, provide some useful information, and dig the hook function of PHP extension.

The purpose of this script is to bypass disablefund, provide some useful information, and dig the hook function of PHP extension.

Firebasky 14 Aug 02, 2021
Howell County, Missouri, COVID-19 data and (unofficial) estimates

COVID-19 in Howell County, Missouri This repository contains the daily data files used to generate my COVID-19 dashboard for Howell County, Missouri,

Jonathan Thornton 0 Jun 18, 2022
Repo with data from local elections in Portugal from 2009 to 2013

autarquicas - local elections in Portugal Repo with data from local elections in Portugal from 2009 to 2013 Objective To provide, to all, raw data fro

Jorge Gomes 6 Apr 06, 2022
Runtime Type Checking in Python 3

typo This package intends to provide run-time type checking for functions annotated with argument type hints (standard library typing module in Python

Ivan Smirnov 26 Dec 13, 2022
AIST++ API This repo contains starter code for using the AIST++ dataset.

Explainability for Vision Transformers (in PyTorch) This repository implements methods for explainability in Vision Transformers

Google 260 Dec 30, 2022
Some Python scripts that fx(hash) users might find useful.

fx_hash_utils Some Python scripts that fx(hash) users might find useful. get_images This script downloads all the static images of the tokens generate

30 Oct 05, 2022
We'll be using HTML, CSS and JavaScript for the frontend

We'll be using HTML, CSS and JavaScript for the frontend. Nothing to install in specific. Open your text-editor and start coding a beautiful front-end.

Mugada sai tilak 1 Dec 15, 2021
Manjaro CN Repository

Manjaro CN Repository Automatically built packages based on archlinuxcn/repo and manjarocn/docker. Install Add manjarocn to /etc/pacman.conf: Please m

Manjaro CN 28 Jun 26, 2022
A tool for study using pomodoro methodology, while study mode spotify or any other .exe app is opened and while resting is closed.

Pomodoro-Timer-With-Spotify-Connection A tool for study using pomodoro methodology, while study mode spotify or any other .exe app is opened and while

2 Oct 23, 2022
management tool for systemd-nspawn containers

nspctl nspctl, management tool for systemd-nspawn containers. Why nspctl? There are different tools for systemd-nspawn containers. You can use native

Emre Eryilmaz 5 Nov 27, 2022
Islam - This is a simple python script.In this script I have written all the suras of Al Quran. As a result, by using this script, you can know the number of any sura at the moment.

Introduction: If you want to know sura number of al quran by just typing the name of sura than you can use this script. Usage in termux: $ pkg install

Fazle Rabbi 1 Jan 02, 2022