Port of dplyr and other related R packages in python, using pipda.

Last update: Dec 21, 2022

Related tags

Overview

datar

Port of dplyr and other related R packages in python, using pipda.

Documentation | Reference Maps | Notebook Examples | API | Blog

Unlike other similar packages in python that just mimic the piping syntax, datar follows the API designs from the original packages as much as possible, and is tested thoroughly with the cases from the original packages. So that minimal effort is needed for those who are familar with those R packages to transition to python.

Installtion

pip install -U datar
# to make sure dependencies to be up-to-date
# pip install -U varname pipda datar

datar requires python 3.7.1+ and is backended by pandas (1.2+).

Example usage

Comments

Any way to stop the re package being overwritten?
re is needed to do regular expressions. Then re has to be imported after the datar.

from datar.all import * import re

I always import re first.Then id doesn't work after being overwritten. Sometimes I use re in function like this :

def test(x,y) re.sub(..) re.replace(..) return ...
question
opened by antonio-yu 27

f.duplicated() not working in filter

Sometimes I wanna keep all the duplicated rows. While in pandas, done like this mtcarss[mtcars.duplicated(keep=False)] In datar, it does not work.

from datar.all import * 
from datar.datasets import mtcars

mtcars >> select('cyl','hp','gear','disp')>> filter(f.duplicated(keep=False))

But in the follow two ways,it works.

# 1  f.series 

mtcars >> select('cyl','hp','gear','disp')>> filter(f.cyl.duplicated(keep=False))

# 2 select all the columns 

mtcars >> select('cyl','hp','gear','disp')>> filter(f['cyl'].duplicated(keep=False))

It seems that only series can be passed to the filter

optim

opened by antonio-yu 27

feature request: have simple way to create functions for use inside verbs that are unaware of groupedness
Discussed in https://github.com/pwwang/datar/discussions/136

^{Originally posted by ftobin August 26, 2022} Older versions of datar used pipda with a fairly easy to use way of creating vectorized functions:

@register_func(None, context=Context.EVAL) def weighted_mean( x: NumericOrIter, w: NumericOrIter = None, na_rm: bool = False, )

From what I gather from the current methodology (see in the current weighted_mean()), there are multiple functions that needed to handle grouped/non-grouped versions of the dataframe.

What would be great if there if there an approach that allows me to be "dplyr-ish" and just have a vectorized function that is unaware of groupedness, similar to how functions used inside of dplyr functions can work over grouped and non-grouped tibbles.
enhancement
opened by pwwang 24

piping to verbs is not passing the dataframe argument

Piping into verbs like mutate seems to be broken in 0.9.0. Passing in the dataframe directly works, but still generates a warning.

>>> import datar.all
>>> import datar.datasets
>>> datar.datasets.mtcars >> datar.all.mutate(x = 3)
/home/bizdev/.pyenv/versions/3.9.13/lib/python3.9/site-packages/pipda/utils.py:68: VerbCallingCheckWarning: Failed to detect AST node calling `mutate`, assuming a normal call.
  warnings.warn(
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/bizdev/.pyenv/versions/3.9.13/lib/python3.9/site-packages/pipda/verb.py", line 216, in __call__
    raise TypeError(f"{self.__name__}() missing at least one argument.")
TypeError: mutate() missing at least one argument.
>>> datar.__version__
'0.9.0'
>>> import pipda
>>> pipda.__version__
'0.7.6'

>>> datar.all.mutate(datar.datasets.mtcars, x = 3)
/home/bizdev/.pyenv/versions/3.9.13/lib/python3.9/site-packages/pipda/utils.py:68: VerbCallingCheckWarning: Failed to detect AST node calling `mutate`, assuming a normal call.
  warnings.warn(
                          mpg     cyl      disp      hp      drat        wt      qsec      vs      am    gear    carb       x
                    <float64> <int64> <float64> <int64> <float64> <float64> <float64> <int64> <int64> <int64> <int64> <int64>
Mazda RX4                21.0       6     160.0     110      3.90     2.620     16.46       0       1       4       4       3
Mazda RX4 Wag            21.0       6     160.0     110      3.90     2.875     17.02       0       1       4       4       3

opened by ftobin 15

`datar` not working on RStudio notebooks

I am trying to run code on RStudio R notebooks with datar but the code does not run. I would like to use R notebooks to highlight parts of the datar code and run separately.

@pwwang what is the issue with running datar on RStudio.

doc needs more info

opened by rleyvasal 14

Lubridate commands

Hi @pwwang , do you have plans to add lubridate commands to datar?

I am trying to convert the Date column on stock time series data to date time with datar mutate.

Data from yahoo finance

import pandas as pd
from datar.all import *

aapl = pd.read_csv("AAPL.csv")

aapl.Date = pd.to_datetime(aapl.Date.astype('str')) # with pandas this works to change the data type to datetime

aapl = aapl >> mutate(Date = as_datetime(f.Date))  # this does not work and shows error message

aapl = aapl >> mutate(Date = as_date(f.Date))  #this does not work and does not show error message

doc enhancement

opened by rleyvasal 14

ImportError: cannot import name 'VarnameException'

Issue

I just upgraded datar from 0.4.3 to 0.4.4 with pip install -U datar and got error ImportError: cannot import name 'VarnameException' when importing datar with code from datar.all import *

below is the error message:

ImportError: cannot import name 'VarnameException' from 'varname' (C:\Anaconda3\lib\site-packages\varname\__init__.py)
---------------------------------------------------------------------------
ImportError                               Traceback (most recent call last)
<ipython-input-2-268be173473d> in <module>
----> 1 from datar.all import *
      2 from datar.datasets import mtcars
      3 mtcars  >> mutate(mpl = f.mpg/4)

C:\Anaconda3\lib\site-packages\datar\all.py in <module>
      7 from .base import *
      8 from .base import _warn as _
----> 9 from .datar import *
     10 from .dplyr import _no_warn as _
     11 from .dplyr import _builtin_names as _dplyr_builtin_names

C:\Anaconda3\lib\site-packages\datar\datar\__init__.py in <module>
      1 """Specific verbs/funcs from this package"""
      2 
----> 3 from .verbs import get, flatten, drop_index
      4 from .funcs import itemgetter

C:\Anaconda3\lib\site-packages\datar\datar\verbs.py in <module>
      9 from ..core.contexts import Context
     10 from ..core.grouped import DataFrameGroupBy
---> 11 from ..dplyr import select, slice_
     12 
     13 

C:\Anaconda3\lib\site-packages\datar\dplyr\__init__.py in <module>
      4 from .across import across, c_across, if_all, if_any
      5 from .arrange import arrange
----> 6 from .bind import bind_cols, bind_rows
      7 from .context import (
      8     cur_column,

C:\Anaconda3\lib\site-packages\datar\dplyr\bind.py in <module>
     15 from ..core.names import repair_names
     16 from ..core.grouped import DataFrameGroupBy
---> 17 from ..tibble import tibble
     18 
     19 

C:\Anaconda3\lib\site-packages\datar\tibble\__init__.py in <module>
      1 """APIs for R-tibble"""
----> 2 from .tibble import tibble, tibble_row, tribble, zibble
      3 from .verbs import (
      4     enframe,
      5     deframe,

C:\Anaconda3\lib\site-packages\datar\tibble\tibble.py in <module>
      4 
      5 from pandas import DataFrame
----> 6 from varname import argname, varname, VarnameException
      7 
      8 import pipda

ImportError: cannot import name 'VarnameException' from 'varname' (C:\Anaconda3\lib\site-packages\varname\__init__.py)

Expected

Expect datar to work without issue after upgrading to new versions.

opened by rleyvasal 13

Operator `&` losing index
when the case_when is used the output is not as expected.

Code to replicate

mtcars >> mutate(gas_milage = case_when( f.mpg >21 and f.mpg <= 22, "ok", f.mpg >22, "best", True, "other" ))

Issue: The last line in the output does not meet the f.mpg >21 and f.mpg <= 22, "ok" but it is still applied the "ok" label

Expected result

Only rows meeting the f.mpg >21 and f.mpg <= 22, "ok" condition labeled "ok", all other rows not meeting any condition should be labeled "other"
bug
opened by rleyvasal 11
Function to show the translation to `pandas`
Hey @pwwang , does datar have a function to display the translation to pandas commands?

I believe it would be a very useful addition for many reasons, one being that it would help a lot datar and pandas users to work together in a project.

In R, an analogous function would be dplyr::show_query(), which shows the translation to SQL, like in the example below:

df <- dbplyr::lazy_frame(mtcars) df |> dplyr::select(mpg) |> dplyr::show_query() #> <SQL> #> SELECT `mpg` #> FROM `df`

Thank you
opened by GitHunter0 9
pipe operator doesn't work in plain python prompt
seems that the pipe operator doesn't work when using datar in virtual anaconda environments. here's a snippet of running the example code ran in anaconda prompt:

from datar.all import f, mutate, filter, if_else, tibble

[2021-08-03 19:57:12][datar][WARNING] Builtin name "filter" has been overriden by datar.

df = tibble( x=range(4), y=['zero', 'one', 'two', 'three'] )

C:\ProgramData\Anaconda3\envs\conda_start\lib\site-packages\pipda\utils.py:159: UserWarning: Failed to fetch the node calling the function, call it with the original function. warnings.warn(

df >> mutate(z=f.x)

C:\ProgramData\Anaconda3\envs\conda_start\lib\site-packages\pipda\utils.py:159: UserWarning: Failed to fetch the node calling the function, call it with the original function. warnings.warn( Traceback (most recent call last): File "", line 1, in File "C:\ProgramData\Anaconda3\envs\conda_start\lib\site-packages\pipda\register.py", line 396, in wrapper return calling_rule(generic, args, kwargs, envdata) File "C:\ProgramData\Anaconda3\envs\conda_start\lib\site-packages\pipda_calling.py", line 93, in verb_calling_rule3 return generic(*args, **kwargs) File "C:\ProgramData\Anaconda3\envs\conda_start\lib\functools.py", line 872, in wrapper raise TypeError(f'{funcname} requires at least ' TypeError: _not_implemented requires at least 1 positional argument

mutate(df, z=f.x)

> x y z > <int64> <object> <int64> > 0 0 zero 0 > 1 1 one 1 > 2 2 two 2 > 3 3 three 3

pandas 1.2.3 python 3.8.1

ps thanks a lot for the package, hopefully the issue can be closed soon :)
opened by thegiordano 9

Piping syntax not running in raw python REPL

@GitHunter0

I'm just having an issue with multi-line execution of datar code in VScode.

If a run this line by line, it works smoothly.

from datar.all import (f, mutate, tibble, fct_infreq, fct_inorder, pull)
df = tibble(var=['b','b','b','c','a','a'])
df = df >> mutate(fct_var = f['var'].astype("category"))

However, if I select all the lines and execute them, it returns:

C:\Users\user_name\miniconda3\envs\py38\lib\site-packages\pipda\utils.py:161: UserWarning: Failed to fetch the node calling the function, call it with the original function.

>>> df = df >> mutate(fct_var = f['var'].astype("category"))

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Users\user_name\miniconda3\envs\py38\lib\site-packages\pipda\register.py", line 396, in wrapper
    return calling_rule(generic, args, kwargs, envdata)
  File "C:\Users\user_name\miniconda3\envs\py38\lib\site-packages\pipda\_calling.py", line 93, in verb_calling_rule3
    return generic(*args, **kwargs)
  File "C:\Users\user_name\miniconda3\envs\py38\lib\functools.py", line 872, in wrapper
    raise TypeError(f'{funcname} requires at least '
TypeError: _not_implemented requires at least 1 positional argument

Originally posted by @GitHunter0 in https://github.com/pwwang/datar/discussions/48#discussioncomment-1286583

doc raw python repl

opened by pwwang 8

[BUG] relocate() with any_of() does not order column names correctly
datar version checks

[X] I have checked that this issue has not already been reported.

[X] I have confirmed this bug exists on the latest version of datar and its backends.

Issue Description

When a column (in this case 'x2') is not present, relocate() with any_of() does not respect the order passed.

import datar.all as d from datar import f import pandas as pd df = pd.DataFrame({'x1': [1], 'x4': [4], 'x3': [3]}) df >> d.relocate(d.any_of(['x1', 'x2', 'x3', 'x4'])) #> x1 x4 x3 <int64> <int64> <int64> 0 1 4 3

Expected Behavior

The new order should be 'x1', 'x3', 'x4'

Installed Versions

python : 3.10.8 | packaged by conda-forge | (main, Nov 24 2022, 14:07:00) [MSC v.1916 64 bit (AMD64)] datar : 0.11.1 simplug : 0.2.2 executing : 1.2.0 pipda : 0.11.0 datar-numpy : 0.1.0 numpy : 1.23.5 datar-pandas: 0.2.0 pandas : 1.5.2

bug
opened by GitHunter0 0
`TibbleGrouped` object is not expandable in VSCode jupyter data viewer
When I create grouped data with datar's group_by(), I get an undesirable DataFrameGroupBy element instead of a DataFrame. It is not desirable to have a DataFrameGroupBy in VSCode because the dataframe cannot be clicked on the Variables Window of VSCode to see the entire dataframe, whereas the mtcars can be click to exposed the full dataset because it is a DataFrame.

The code below creates grouped data in datar and grouped data in pandas; However, datar creates a DataFrameGroupBy instead of a dataframe.

from datar.all import * from datar.datasets import mtcars datar_group = mtcars >> group_by(f.hp) >> count() pandas_group = mtcars.groupby('hp').size().reset_index().rename(columns = {0:"n"})

3rd-party
opened by rleyvasal 9

Releases(0.11.1)

0.11.1(Dec 15, 2022)
🐛 Fix get_versions() not showing plugin versions

🐛 Fix plugins not loaded when loading datasets

🚸 Add github issue templates

What's Changed

0.11.1 by @pwwang in https://github.com/pwwang/datar/pull/164

Full Changelog: https://github.com/pwwang/datar/compare/0.11.0...0.11.1
Source code(tar.gz)
Source code(zip)
0.11.0(Dec 15, 2022)
📝 Add testimonials and backend badges in README.md

🐛 Load entrypoint plugins only when APIs are called (#162)

💥 Rename other module to misc

What's Changed

0.11.0 by @pwwang in https://github.com/pwwang/datar/pull/163

Full Changelog: https://github.com/pwwang/datar/compare/0.10.3...0.11.0
Source code(tar.gz)
Source code(zip)
0.10.3(Dec 9, 2022)
⬆️ Bump simplug to 0.2.2

✨ Add apis.other.array_ufunc to support numpy ufuncs

💥 Change hook data_api to load_dataset

✨ Allow backend for c[]

✨ Add DatarOperator.with_backend() to select backend for operators

✅ Add tests

📝 Update docs for backend supports

What's Changed

0.10.3 by @pwwang in https://github.com/pwwang/datar/pull/160

Full Changelog: https://github.com/pwwang/datar/compare/0.10.2...0.10.3
Source code(tar.gz)
Source code(zip)
0.10.2(Dec 7, 2022)
🚑 Fix false warning when importing from all

What's Changed

0.10.2 by @pwwang in https://github.com/pwwang/datar/pull/159

Full Changelog: https://github.com/pwwang/datar/compare/0.10.1...0.10.2
Source code(tar.gz)
Source code(zip)
0.10.1(Dec 5, 2022)
Pump simplug to 0.2

What's Changed

0.10.1 by @pwwang in https://github.com/pwwang/datar/pull/158

Full Changelog: https://github.com/pwwang/datar/compare/0.10.0...0.10.1
Source code(tar.gz)
Source code(zip)
0.10.0(Dec 2, 2022)
Detach backend support, so that more backends can be supported easier in the future

numpy backend: https://github.com/pwwang/datar-numpy

pandas backend: https://github.com/pwwang/datar-pandas

Adopt pipda 0.10 so that functions can be pipeable (#148)

Support pandas 1.5+ (#144), but v1.5.0 excluded (see pandas-dev/pandas#48645)

What's Changed

0.10.0 by @pwwang in https://github.com/pwwang/datar/pull/157

Full Changelog: https://github.com/pwwang/datar/compare/0.9.1...0.10.0
Source code(tar.gz)
Source code(zip)
0.9.1(Oct 13, 2022)
Pump pipda to 0.8.0 (fixes #149)

Source code(tar.gz)
Source code(zip)
0.9.0(Sep 14, 2022)
Fixes

Fix weighted_mean not handling group variables with NaN values (#137)

Fix weighted_mean on NA raising error instead of returning NA (#139)

Fix pandas .groupby() used internally not inheriting sort, dropna and observed (#138, #142)

Fix mutate/summarise not counting references inside function as used for _keep "used"/"unused"

Fix metadata _datar of nested TibbleGrouped not frozen

Breaking changes

Refactor core.factory.func_factory() (#140)

Use base.c[...] for range short cut, instead of f[...]

Use tibble.fibble() when constructing Tibble inside a verb, instead of tibble.tibble()

Make n a keyword-only argument for base.ntile

Deprecation

Deprecate verb_factory, use register_verb from pipda instead

Deprecate base.data_context

Dependences

Adopt pipda v0.7.1

Remove varname dependency

Install pdtypes by default

What's Changed

0.9.0 by @pwwang in https://github.com/pwwang/datar/pull/143

Full Changelog: https://github.com/pwwang/datar/compare/0.8.6...0.9.0
Source code(tar.gz)
Source code(zip)
0.8.6(Aug 25, 2022)
🐛 Fix weighted_mean not working for grouped data (#133)

✅ Add tests for weighted_mean on grouped data

⚡️ Optimize distinct on existing columns (#128)

What's Changed

🔖 0.8.6 by @pwwang in https://github.com/pwwang/datar/pull/134

Full Changelog: https://github.com/pwwang/datar/compare/0.8.5...0.8.6
Source code(tar.gz)
Source code(zip)
0.8.5(May 23, 2022)
What's Changed

🔖 0.8.5 by @pwwang in https://github.com/pwwang/datar/pull/125

🐛 Fix columns missing after Join by same columns using mapping (https://github.com/pwwang/datar/issues/122)

Full Changelog: https://github.com/pwwang/datar/compare/0.8.4...0.8.5
Source code(tar.gz)
Source code(zip)
0.8.4(May 14, 2022)
What's Changed

0.8.4 by @pwwang in https://github.com/pwwang/datar/pull/120

➖ Add optional deps to extras so they aren't installed by default

🎨 Give better message when optional packages not installed

Full Changelog: https://github.com/pwwang/datar/compare/0.8.3...0.8.4
Source code(tar.gz)
Source code(zip)
0.8.3(May 13, 2022)
⬆️ Upgrade pipda to v0.6

⬆️️ Upgrade thon-simpleconf to 5.5

Source code(tar.gz)
Source code(zip)
0.8.2(May 10, 2022)
♻️ Move glimpse to dplyr (as glimpse is a tidyverse-dplyr API)

🐛 Fix glimpse() output not rendering in qtconsole (#117)

🐛 Fix base.match() for pandas 1.3.0

🐛 Allow base.match() to work with grouping data (#115)

📌 Use rtoml (python-simpleconf) instead of toml (See https://github.com/pwwang/toml-bench)

📌 Update dependencies

Source code(tar.gz)
Source code(zip)
0.8.1(Apr 19, 2022)
🐛 Fix month_abb and month_name being truncated (#112)

🐛 Fix unite() not keeping other columns (#111)

Source code(tar.gz)
Source code(zip)
0.8.0(Apr 12, 2022)
✨ Support base.glimpse() (#107, machow/siuba#409)

🐛 Register base.factor() and accept grouped data (#108)

✨ Allow configuration file to save default options

💥 Replace option warn_builtin_names with import_names_conflict (#73)

🩹 Attach original __module__ to func_factory registed functions

⬆️ Bump pipda to 0.5.9

Source code(tar.gz)
Source code(zip)
0.7.2(Apr 7, 2022)
✨ Allow tidyr.unite() to unite multiple columns into a list, instead of join them (#105)

🩹 Fix typos in argument names of tidyr.pivot_longer() (#104)

🐛 Fix base.sprintf() not working with Series (#102)

Source code(tar.gz)
Source code(zip)
0.7.1(Mar 28, 2022)
🐛 Fix settingwithcopywarning in tidyr.pivot_wider()

📌 Pin deps for docs

💚 Don't upload coverage in PR

📝 Fix typos in docs (#99, #100) (Thanks to @pdwaggoner)

Source code(tar.gz)
Source code(zip)
0.7.0(Mar 24, 2022)
✨ Support modin as backend :kissing_heart:

✨ Add _return argument for datar.options()

🐛 Fix tidyr.expand() when nesting(f.name) as argument

Source code(tar.gz)
Source code(zip)
0.6.4(Mar 23, 2022)
Breaking changes

🩹 Make base.ntile() labels 1-based (#92)

Fixes

🐛 Fix order_by argument for dplyr.lead-lag

Enhancements

🚑 Allow base.paste/paste0() to work with grouped data

🩹 Change dtypes of base.letters/LETTERS/month_abb/month_name

Housekeeping

📝 Update and fix reference maps

📝 Add environment.yml for binder to work

📝 Update styles for docs

📝 Update styles for API doc in notebooks

📝 Update README for new description about the project and add examples from StackOverflow

Source code(tar.gz)
Source code(zip)
0.6.3(Mar 16, 2022)
✨ Allow base.c() to handle groupby data

🚑 Allow base.diff() to work with groupby data

✨ Allow forcats.fct_inorder() to work with groupby data

✨ Allow base.rep()'s arguments length and each to work with grouped data

✨ Allow base.c() to work with grouped data

✨ Allow base.paste()/base.paste0() to work with grouped data

🐛 Force &/| operators to return boolean data

🚑 Fix base.diff() not keep empty groups

🐛 Fix recycling non-ordered grouped data

🩹 Fix dplyr.count()/tally()'s warning about the new name

🚑 Make dplyr.n() return groupoed data

🐛 Make dplyr.slice() work better with rows/indices from grouped data

🩹 Make dplyr.ntile() labels 1-based

✨ Add datar.attrgetter(), datar.pd_str(), datar.pd_cat() and datar.pd_dt()

Source code(tar.gz)
Source code(zip)
0.6.2(Mar 12, 2022)
🚑 Fix #87 boolean operator losing index

🚑 Fix false alarm from rename()/relocate() for missing grouping variables (#89)

✨ Add base.diff()

📝 [doc] Update/Fix doc for case_when (#87)

📝 [doc] Fix links in reference map

📝 [doc] Update docs for dplyr.base

Source code(tar.gz)
Source code(zip)
0.6.1(Mar 9, 2022)
🐛 Fix rep(df, n) producing a nested df

🐛 Fix TibbleGrouped.__getitem__() not keeping grouping structures

Source code(tar.gz)
Source code(zip)
0.6.0(Mar 7, 2022)
Adopt pipda 0.5.7

Reimplement the split-apply-combine rule to solve all performance issues

Drop support for pandas v1.2, require pandas v1.3+

Remove all base0_ options and all indices are now 0-based, except base.seq(), ranks and their variants

Remove messy type annotations for now, will add them back in the future

Move implementation of data type display for frames in terminal and notebook to pdtypes package

Change all arguments end with "_" to arguments start with it to avoid confusion

Move module datar.stats to datar.base.stats

Default all na_rm arguments to True

Rename all ptype arguments for tidyr verbs into dtypes

See more changes: https://pwwang.github.io/datar/CHANGELOG/#060
Source code(tar.gz)
Source code(zip)
0.5.6(Feb 3, 2022)
🐛 Hotfix for types registered for base.proportions (#77)

👽️ Fix for pandas 1.4

Source code(tar.gz)
Source code(zip)
0.5.5(Dec 28, 2021)
Fix #71: semi_join returns duplicated rows

Source code(tar.gz)
Source code(zip)
0.5.4(Oct 21, 2021)
Fix filter() restructures group_data incorrectly (#69)

Source code(tar.gz)
Source code(zip)
0.5.3(Oct 5, 2021)
⚡️ Optimize dplyr.arrange when data are series from the df itself

🐛 Fix sub-df order of apply for grouped df (#63)

📝 Update doc for argument by for join functions (#62)

🐛 Fix mean() with option na_rm=False does not work (#65)

Source code(tar.gz)
Source code(zip)
0.5.2(Sep 22, 2021)
More of a maintenance release.

🔧 Add metadata for datasets

🔊 Send logs to stderr, instead of stdout

📌Pin dependency versions

🚨 Switch linter to flake8

📝 Update some docs to fit datar-cli

Source code(tar.gz)
Source code(zip)
0.5.1(Sep 16, 2021)
Add documentation about "blind" environment (#45, #54, #55)

Change base.as_date() to return pandas datetime types instead python datetime types (#56)

Add base.as_pd_date() to be an alias of pandas.to_datetime() (#56)

Expose trimws to datar.all (#58)

Source code(tar.gz)
Source code(zip)
0.5.0(Sep 3, 2021)
Added:

Added forcats (#51 )

Added base.is_ordered(), base.nlevels(), base.ordered(), base.rank(), base.order(), base.sort(), base.tabulate(), base.append(), base.prop_table() and base.proportions()

Added gss_cat dataset

Fixed:

Fixed an issue when Collection dealing with numpy.int_

Enhanced:

Added base0_ argument for datar.get()

Passed __calling_env to registered functions/verbs when used internally (this makes sure the library to be robust in different environments)

Source code(tar.gz)
Source code(zip)

Owner

GitHub Repository https://pwwang.github.io/datar/

Automated Exploration Data Analysis on a financial dataset

Automated EDA on financial dataset Just a simple way to get automated Exploration Data Analysis from financial dataset (OHLCV) using Streamlit and ta.

28 Nov 27, 2022

follow-analyzer helps GitHub users analyze their following and followers relationship

follow-analyzer follow-analyzer helps GitHub users analyze their following and followers relationship by providing a report in html format which conta

2 May 02, 2022

Active Learning demo using two small datasets

ActiveLearningDemo How to run step one put the dataset folder and use command below to split the dataset to the required structure run utils.py For ea

3 Nov 10, 2021

Data Science Environment Setup in single line

datascienv is package that helps your to setup your environment in single line of code with all dependency and it is also include pyforest that provide single line of import all required ml libraries

55 Dec 16, 2022

Project under the certification "Data Analysis with Python" on FreeCodeCamp

Sea Level Predictor Assignment You will anaylize a dataset of the global average sea level change since 1880. You will use the data to predict the sea

3 Jan 31, 2022

Larch: Applications and Python Library for Data Analysis of X-ray Absorption Spectroscopy (XAS, XANES, XAFS, EXAFS), X-ray Fluorescence (XRF) Spectroscopy and Imaging

Larch: Data Analysis Tools for X-ray Spectroscopy and More Documentation: http://xraypy.github.io/xraylarch Code: http://github.com/xraypy/xraylarch L

95 Dec 13, 2022

CaterApp is a cross platform, remotely data sharing tool created for sharing files in a quick and secured manner.

CaterApp is a cross platform, remotely data sharing tool created for sharing files in a quick and secured manner. It is aimed to integrate this tool with several more features including providing a U

3 Jun 27, 2021

BasstatPL is a package for performing different tabulations and calculations for descriptive statistics.

BasstatPL is a package for performing different tabulations and calculations for descriptive statistics. It provides: Frequency table constr

1 Oct 31, 2021

A Big Data ETL project in PySpark on the historical NYC Taxi Rides data

Processing NYC Taxi Data using PySpark ETL pipeline Description This is an project to extract, transform, and load large amount of data from NYC Taxi

2 Dec 12, 2021

sportsdataverse python package

sportsdataverse-py See CHANGELOG.md for details. The goal of sportsdataverse-py is to provide the community with a python package for working with spo

37 Dec 27, 2022

Statistical & Probabilistic Analysis of Store Sales, University Survey, & Manufacturing data

Statistical_Modelling Statistical & Probabilistic Analysis of Store Sales, University Survey, & Manufacturing data Statistical Methods for Decision Ma

1 Jan 27, 2022

Bamboolib - a GUI for pandas DataFrames

Community repository of bamboolib bamboolib is joining forces with Databricks. For more information, please read our announcement. Please note that th

863 Jan 08, 2023

Processo de ETL (extração, transformação, carregamento) realizado pela equipe no projeto final do curso da Soul Code Academy.

1 Feb 03, 2022

MS in Data Science capstone project. Studying attacks on autonomous vehicles.

Surveying Attack Models for CAVs Guide to Installing CARLA and Collecting Data Our project focuses on surveying attack models for Connveced Autonomous

1 Dec 09, 2021

Nobel Data Analysis

Nobel_Data_Analysis This project is for analyzing a set of data about people who have won the Nobel Prize in different fields and different countries

1 Jan 24, 2022

BigDL - Evaluate the performance of BigDL (Distributed Deep Learning on Apache Spark) in big data analysis problems

Evaluate the performance of BigDL (Distributed Deep Learning on Apache Spark) in big data analysis problems.

1 Jan 06, 2022

apricot implements submodular optimization for the purpose of selecting subsets of massive data sets to train machine learning models quickly.

Please consider citing the manuscript if you use apricot in your academic work! You can find more thorough documentation here. apricot implements subm

457 Dec 20, 2022

MeSH2Matrix - A set of Python codes for the generation of biomedical ontologies from the MeSH keywords of the PubMed scholarly publications

A set of Python codes for the generation of biomedical ontologies from the MeSH keywords of the PubMed scholarly publications

6 Nov 30, 2022

Containerized Demo of Apache Spark MLlib on a Data Lakehouse (2022)

Spark-DeltaLake-Demo Reliable, Scalable Machine Learning (2022) This project was completed in an attempt to become better acquainted with the latest b

8 Mar 21, 2022

CSV database for chihuahua (HUAHUA) blockchain transactions

super-fiesta Shamelessly ripped components from https://github.com/hodgerpodger/staketaxcsv - Thanks for doing all the hard work. This code does only

1 Jan 07, 2022

Port of dplyr and other related R packages in python, using pipda.

Related tags

Overview

datar

Installtion

Example usage

CLI interface

Comments

Discussed in https://github.com/pwwang/datar/discussions/136

Issue

Expected

Code to replicate

Issue: The last line in the output does not meet the f.mpg >21 and f.mpg <= 22, "ok" but it is still applied the "ok" label

Expected result

datar version checks

Issue Description

Expected Behavior

Installed Versions

Releases(0.11.1)

0.11.1(Dec 15, 2022)

What's Changed

0.11.0(Dec 15, 2022)

What's Changed

0.10.3(Dec 9, 2022)

What's Changed

0.10.2(Dec 7, 2022)

What's Changed

0.10.1(Dec 5, 2022)

What's Changed

0.10.0(Dec 2, 2022)

What's Changed

0.9.1(Oct 13, 2022)

0.9.0(Sep 14, 2022)

Fixes

Breaking changes

Deprecation

Dependences

What's Changed

0.8.6(Aug 25, 2022)

What's Changed

0.8.5(May 23, 2022)

What's Changed

0.8.4(May 14, 2022)

What's Changed

0.8.3(May 13, 2022)

0.8.2(May 10, 2022)

0.8.1(Apr 19, 2022)

0.8.0(Apr 12, 2022)

0.7.2(Apr 7, 2022)

0.7.1(Mar 28, 2022)

0.7.0(Mar 24, 2022)

0.6.4(Mar 23, 2022)

Breaking changes

Fixes

Enhancements

Housekeeping

0.6.3(Mar 16, 2022)

0.6.2(Mar 12, 2022)

0.6.1(Mar 9, 2022)

0.6.0(Mar 7, 2022)

0.5.6(Feb 3, 2022)

0.5.5(Dec 28, 2021)

0.5.4(Oct 21, 2021)

0.5.3(Oct 5, 2021)

0.5.2(Sep 22, 2021)

0.5.1(Sep 16, 2021)

0.5.0(Sep 3, 2021)

Owner

Automated Exploration Data Analysis on a financial dataset

follow-analyzer helps GitHub users analyze their following and followers relationship

Active Learning demo using two small datasets

Data Science Environment Setup in single line

Project under the certification "Data Analysis with Python" on FreeCodeCamp

Larch: Applications and Python Library for Data Analysis of X-ray Absorption Spectroscopy (XAS, XANES, XAFS, EXAFS), X-ray Fluorescence (XRF) Spectroscopy and Imaging

CaterApp is a cross platform, remotely data sharing tool created for sharing files in a quick and secured manner.

BasstatPL is a package for performing different tabulations and calculations for descriptive statistics.

A Big Data ETL project in PySpark on the historical NYC Taxi Rides data

sportsdataverse python package

Statistical & Probabilistic Analysis of Store Sales, University Survey, & Manufacturing data

Bamboolib - a GUI for pandas DataFrames