Hacker News new | past | comments | ask | show | jobs | submit login

Same here. But have you tried duckdb? You can do sql in the pandas dfs and it is fast af.

https://duckdb.org/2021/05/14/sql-on-pandas.html




  mydf = pd.DataFrame({'a' : [1, 2, 3]})
  print(duckdb.query("SELECT SUM(a) FROM mydf").to_df())
I can see the appeal, but if you're working in Python, something doesn't sit right with me when having to write out variable names as strings. E.g., if I want to refactor the code, my LSP or parser won't pick up those references.

> The SQL table name mydf is interpreted as the local Python variable mydf [...] Not only is this process painless, it is highly efficient.

It might be painless and convenient at first, but I feel like this could get you in trouble down the line. Is there a way to avoid this?



Duckdb is sick. You can also do queries on parquet, etc.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: