Open-source Β· github.com/YuZh98/latex2arxiv
Submit to arXiv without the headache. One command cleans your project, catches rejection-causing errors, and walks you through the upload.
| |
Your original project is never modified. All output goes to a new
_arxiv.zipfile.
Try the built-in demo:
| |
This processes a bundled self-documenting paper, opens the cleaned PDF, and writes a step-by-step arXiv upload guide with copy-paste-ready metadata. The cleaned demo’s PDF is attached to every GitHub Release as demo_project_arxiv.pdf.
Before / After
On a real statistics paper (arXiv:2504.11630): 934 β 40 files, 80.6 MB β 3.1 MB.

| Before (Overleaf export) | After (latex2arxiv output) |
|---|---|
| π Images/ | π Images/ |
| π JASA_main.tex | π JASA_main.tex |
| π JASA_main_backup.tex | π ref.bib |
| π main_bak_svm.tex | π Supplementary_Materials.tex |
| π cover_letter.md | |
| π response.tex | |
| π ref.bib | |
| π JASA_main.aux/.log/.bbl/.pdf | |
| π jasa_comments/, jasa_revision/ | |
| … (and ~930 more) | |
| 934 files, 80.6 MB | 40 files, 3.1 MB |
Who is this for?
You’ve never submitted to arXiv before. Your project compiles locally. arXiv might still reject it for reasons nobody warned you about. latex2arxiv paper.zip --compile --guide flags the rejection-causing issues and writes you a copy-paste-ready upload walkthrough.
You wrote it in Overleaf. Overleaf gave you hundreds of files and messy tex files. You need to tidy up everything safely. Overleaf β arXiv quickstart β
You’re CI-gating a paper repo. latex2arxiv paper.zip --dry-run exits non-zero on rejection-causing errors. Drop it into your build matrix.
Your paper has revision tracking. \added{}, \deleted{}, \textcolor{red}{} β gone, no manual cleanup. Custom removal rules β
What it does
| Feature | What it does |
|---|---|
| π¦ One command, any input | Accepts a .zip, directory, or git URL; outputs an arXiv-ready .zip; optionally compiles and opens the PDF for review |
| βοΈ Prunes your project to submission-ready | Keeps only files reachable from your main .tex; removes build artifacts, editor files, cover letters, unused figures |
π§Ή Cleans your .tex | Strips comments, removes \todo{} / \hl{} / draft packages, handles nested braces correctly (\deleted{see \cite{x}} works) |
| π¨ Catches submission blockers before you upload | [error] for shell-escape packages that will fail on arXiv (minted, pythontex); [warn] for biblatex without .bbl, missing index files, oversized output, undefined citations, problematic filenames β full list |
| πΊοΈ Guides you through upload | --guide extracts title, authors, abstract, page/figure/table counts and writes a step-by-step arXiv upload walkthrough |
Also: --flatten (single-file output, docs), --json (CI integration, schema), --resize (image downscaling), --dry-run (preview without writing), BibTeX normalization, \pdfoutput=1 injection.
Dependency tracking respects \input, \include, \subfile, \includegraphics, \graphicspath, and \bibliography. Commented-out commands are ignored.
Upload guide (--guide)
Pass --guide and latex2arxiv writes a plain-text file alongside your output zip with everything you need for the arXiv upload form:
| |
No more guessing what goes where.
Works everywhere
Terminal β one command, full pipeline:
| |
CI β gate your paper repo on arXiv compliance:
| |
AI agents β Claude, Cursor, or Copilot validate and fix issues in conversation:
| |
| |
Installation
| |
If you get an
externally-managed-environmenterror frompip, usepipx:
| |
On macOS, install via Homebrew (no Python toolchain required):
| |
First
brew installbuilds Pillow from source. To avoid 5+ min silence, add--verboseto monitor installation progress.
Or from source:
| |
pdflatex is required only for --compile (install via TeX Live or MacTeX).
Usage
| |
input can be a .zip file, a directory of LaTeX sources, or a git URL (https or ssh). Directories are zipped internally; git URLs are cloned with --depth 1.
| Flag | Description |
|---|---|
--main FILENAME | Specify the main .tex file (e.g. JASA_main.tex). Auto-detected via \documentclass if omitted. |
--resize PX | Resize images so longest side β€ PX pixels (e.g. --resize 1600). Requires Pillow. |
--config FILE | YAML config file for custom removal rules (see below). |
--compile | Run pdflatex on the output and open the resulting PDF. |
--guide | Write a detailed arXiv upload guide (metadata + step-by-step instructions) to a text file alongside the output. |
--dry-run | Preview what would be removed/processed without writing any output. |
--flatten | Inline every \input / \include / \subfile into the main .tex for single-file output. Details. |
--json | Emit a machine-readable JSON summary on stdout; route progress to stderr. Schema. |
--demo | Run the built-in demo project (no input file needed). |
--version | Print version and exit. |
Examples
| |
Pre-flight checks
Before producing the output zip, latex2arxiv validates the project against arXiv’s LaTeX submission guide. [error] lines block submission (the tool exits non-zero, useful for CI gating); [warn] lines are advisory and do not affect the exit code.
| |
Either [error] line would have caused arXiv to reject the submission after upload. The exit code is non-zero on errors, so a CI step like latex2arxiv paper.zip --dry-run fails the build before the bad submission ever leaves the repo.
See docs/pre-flight.md for the full list of checks and silent fixes.
Custom removal rules (--config)
For revision markup and other project-specific cleanup, create a YAML config file. A template is in arxiv_config.yaml.
If your project root contains
arxiv_config.yaml, it is applied automatically β no need to pass--config.
| |
The brace-balanced matcher correctly handles nested commands like \deleted{see \cite{x}}. Unknown top-level keys warn β typos like command_to_delete (singular) no longer silently no-op.
latex2arxiv vs. arxiv_latex_cleaner
arxiv_latex_cleaner is the incumbent β Google-backed, mature, cleans well. Here’s how the two compare on the things that change your workflow.
What only latex2arxiv does
latex2arxiv | arxiv_latex_cleaner | |
|---|---|---|
Pre-flight [error] / [warn] (details) | β | β |
Upload walkthrough (--guide) | β | β |
| Non-zero exit on errors (CI-gateable) | β | β |
Outputs the .zip you upload | β | β |
| MCP server (Claude / Cursor / Copilot) | β | β |
GitHub Action + pre-commit hook | β | β |
| VS Code extension | β | β |
Multiple input forms (.zip / directory / git URL) | β | β |
--compile preview | β | β |
--dry-run | β | β |
--demo | β | β |
Auto-detect main .tex | β | β |
| Brace-balanced config | β | β |
What only arxiv_latex_cleaner does
latex2arxiv | arxiv_latex_cleaner | |
|---|---|---|
| PDF compression (Ghostscript) | β | β |
| PNG β JPG conversion | β | β |
If you need image transcoding for size, run arxiv_latex_cleaner first, or use latex2arxiv --resize PX.
Both do
BibTeX normalization Β· image resizing (Pillow).
Maturity
latex2arxiv | arxiv_latex_cleaner | |
|---|---|---|
v1.0 production-stable Β· 380 tests Β· Python 3.10β3.13 matrix Β· live pdflatex+biber end-to-end CI Β· 10 regression fixtures | ~5kβ , years in production |
Integrations
| Surface | Status | Details |
|---|---|---|
| CLI | β | pip install latex2arxiv |
| GitHub Action | β | action.yml |
pre-commit hook | β | latex2arxiv-dryrun |
| MCP server (AI agents) | β | pip install "latex2arxiv[mcp]" β setup |
| VS Code extension | β | Marketplace β ext install YuZh98.latex2arxiv |
| Homebrew formula | β | brew tap YuZh98/latex2arxiv && brew install latex2arxiv |
Known limitations
Dynamically constructed filenames β \includegraphics{\figpath/fig1} cannot be resolved statically and the image will be deleted. Expand path macros before running.
\subfile vs \input path resolution β \input/\include paths resolve relative to the project root; \subfile paths resolve relative to the subfile’s own directory. Unusual nested setups may cause images to be incorrectly pruned; use --compile to verify.
--compile is a local sanity check β a successful local compile doesn’t guarantee arXiv will compile it. arXiv pins specific TeX Live versions. Always check the arXiv submission preview after uploading.
FAQ
1. arXiv rejected my submission even though latex2arxiv said it was clean. Pre-flight catches the documented submission-blocking patterns. arXiv pins specific TeX Live versions and occasionally surfaces new edge cases β always run the arXiv submission preview after upload. If you hit a reproducible miss, file an issue with your project zip.
2. What’s the difference between [error] and [warn]?
Errors block submission and exit the tool non-zero β use them to gate CI. Warnings are advisory: the build will likely succeed on arXiv but a human should look. Example: missing .bbl is a warn (arXiv will run BibTeX); \usepackage{minted} is an error (shell-escape isn’t allowed).
3. My main .tex isn’t being auto-detected correctly.
Auto-detection ranks files containing \documentclass by \input reference count. For ambiguous projects (response letters next to the paper, multiple \documentclass files), pass --main paper.tex explicitly.
4. Will this modify my original files?
No. All output goes to a new _arxiv.zip (or whatever path you pass). The source project is read-only.
5. My CI step keeps failing on what I thought were just warnings.
Warnings don’t fail CI. If your build is failing, it’s an [error] β read the message. Use --json for a machine-readable summary.
6. Why does brew install hang for 5+ minutes?
Homebrew compiles Pillow’s C extensions from source and suppresses progress output. Add --verbose to see what’s happening.
β Found this useful? Star on GitHub β it helps others find the tool.
π Issues or feature requests: github.com/YuZh98/latex2arxiv/issues
π¦ Install: pip install latex2arxiv Β· brew install latex2arxiv (after brew tap YuZh98/latex2arxiv)
π¬ Try the demo: latex2arxiv --demo --compile --guide
Made by Hugh Zheng Β· MIT License