Karim shoair commited on
Commit ·
35e5856
1
Parent(s): 5ec834e
docs: Updating the contribution guide
Browse files- .github/PULL_REQUEST_TEMPLATE.md +5 -7
- CONTRIBUTING.md +95 -28
- docs/contributing.md +0 -102
- mkdocs.yml +1 -1
.github/PULL_REQUEST_TEMPLATE.md
CHANGED
|
@@ -5,10 +5,8 @@
|
|
| 5 |
|
| 6 |
## Proposed change
|
| 7 |
<!--
|
| 8 |
-
Describe the big picture of your changes here to communicate to the
|
| 9 |
-
|
| 10 |
-
or resolves a feature request, be sure to link to that issue in the
|
| 11 |
-
additional information section.
|
| 12 |
-->
|
| 13 |
|
| 14 |
|
|
@@ -34,12 +32,12 @@
|
|
| 34 |
|
| 35 |
### Additional information
|
| 36 |
<!--
|
| 37 |
-
Details are important
|
| 38 |
Please be sure to fill out additional details, if applicable.
|
| 39 |
-->
|
| 40 |
|
| 41 |
-
- This PR fixes or closes issue: fixes #
|
| 42 |
-
- This PR is related to issue:
|
| 43 |
- Link to documentation pull request: **
|
| 44 |
|
| 45 |
### Checklist:
|
|
|
|
| 5 |
|
| 6 |
## Proposed change
|
| 7 |
<!--
|
| 8 |
+
Describe the big picture of your changes here to communicate to the maintainers why we should accept this pull request.
|
| 9 |
+
If it fixes a bug or resolves a feature request, be sure to link to that issue in the additional information section.
|
|
|
|
|
|
|
| 10 |
-->
|
| 11 |
|
| 12 |
|
|
|
|
| 32 |
|
| 33 |
### Additional information
|
| 34 |
<!--
|
| 35 |
+
Details are important and help maintainers processing your PR.
|
| 36 |
Please be sure to fill out additional details, if applicable.
|
| 37 |
-->
|
| 38 |
|
| 39 |
+
- This PR fixes or closes an issue: fixes #
|
| 40 |
+
- This PR is related to an issue: #
|
| 41 |
- Link to documentation pull request: **
|
| 42 |
|
| 43 |
### Checklist:
|
CONTRIBUTING.md
CHANGED
|
@@ -1,39 +1,106 @@
|
|
| 1 |
# Contributing to Scrapling
|
| 2 |
-
Everybody is invited and welcome to contribute to Scrapling. Smaller changes have a better chance to get included in a timely manner. Adding unit tests for new features or test cases for bugs you've fixed help us to ensure that the Pull Request (PR) is fine.
|
| 3 |
|
| 4 |
-
|
| 5 |
-
- If you are not a developer perhaps you would like to help with the [documentation](https://github.com/D4Vinci/Scrapling/tree/docs)?
|
| 6 |
-
- If you are a developer, most of the features I'm planning to add in the future are moved to [roadmap file](https://github.com/D4Vinci/Scrapling/blob/main/ROADMAP.md) so consider reading it.
|
| 7 |
|
| 8 |
-
|
| 9 |
-
```bash
|
| 10 |
-
$ pytest
|
| 11 |
-
=============================== test session starts ===============================
|
| 12 |
-
platform darwin -- Python 3.12.7, pytest-8.3.3, pluggy-1.5.0
|
| 13 |
-
rootdir: /<some_where>/Scrapling
|
| 14 |
-
configfile: pytest.ini
|
| 15 |
-
plugins: cov-5.0.0, anyio-4.6.0
|
| 16 |
-
collected 16 items
|
| 17 |
|
| 18 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 19 |
|
| 20 |
-
=============================== 16 passed in 0.22s ================================
|
| 21 |
-
```
|
| 22 |
-
Also, consider setting the scrapling logging level to `debug` so it's easier to know what's happening in the background.
|
| 23 |
-
```python
|
| 24 |
-
>>> import logging
|
| 25 |
-
>>> logging.getLogger("scrapling").setLevel(logging.DEBUG)
|
| 26 |
-
```
|
| 27 |
|
| 28 |
-
##
|
| 29 |
|
| 30 |
-
|
| 31 |
-
- Fork Scrapling [git repository](https://github.com/D4Vinci/Scrapling).
|
| 32 |
-
- Make your changes.
|
| 33 |
-
- Ensure tests work.
|
| 34 |
-
- Create a Pull Request against the [**dev**](https://github.com/D4Vinci/Scrapling/tree/dev) branch of Scrapling.
|
| 35 |
|
| 36 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 37 |
```commandline
|
| 38 |
pip3 install git+https://github.com/D4Vinci/Scrapling.git@dev
|
| 39 |
```
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
# Contributing to Scrapling
|
|
|
|
| 2 |
|
| 3 |
+
Thank you for your interest in contributing to Scrapling!
|
|
|
|
|
|
|
| 4 |
|
| 5 |
+
Everybody is invited and welcome to contribute to Scrapling.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 6 |
|
| 7 |
+
Minor changes have a better chance of being included promptly. Adding unit tests for new features or test cases for bugs you've fixed helps us ensure that the Pull Request (PR) is acceptable.
|
| 8 |
+
|
| 9 |
+
There are many ways to contribute to Scrapling. Here are some of them:
|
| 10 |
+
|
| 11 |
+
- Report bugs and request features using the [GitHub issues](https://github.com/D4Vinci/Scrapling/issues). Please follow the issue template to help us resolve your issue quickly.
|
| 12 |
+
- Blog about Scrapling. Tell the world how you’re using Scrapling. This will help newcomers with more examples and increase the Scrapling project's visibility.
|
| 13 |
+
- Join the [Discord community](https://discord.gg/EMgGbDceNQ) and share your ideas on how to improve Scrapling. We’re always open to suggestions.
|
| 14 |
+
- If you are not a developer, perhaps you would like to help with translating the [documentation](https://github.com/D4Vinci/Scrapling/tree/docs)?
|
| 15 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 16 |
|
| 17 |
+
## Finding work
|
| 18 |
|
| 19 |
+
If you have decided to make a contribution to Scrapling, but you do not know what to contribute, here are some ways to find pending work:
|
|
|
|
|
|
|
|
|
|
|
|
|
| 20 |
|
| 21 |
+
- Check out the [contribution](https://github.com/D4Vinci/Scrapling/contribute) GitHub page, which lists open issues tagged as good first issue. These issues provide a good starting point.
|
| 22 |
+
- There are also the [help wanted](https://github.com/D4Vinci/Scrapling/issues?q=is%3Aissue%20label%3A%22help%20wanted%22%20state%3Aopen) issues, but know that some may require familiarity with the Scrapling code base first. You can also target any other issue, provided it is not tagged as `invalid`, `wontfix`, or similar tags.
|
| 23 |
+
- If you enjoy writing automated tests, you can work on increasing our test coverage. Currently, the test coverage is around 90–92%.
|
| 24 |
+
- Join the [Discord community](https://discord.gg/EMgGbDceNQ) and ask questions in the `#help` channel.
|
| 25 |
+
|
| 26 |
+
## Coding style
|
| 27 |
+
Please follow these coding conventions as we do when writing code for Scrapling:
|
| 28 |
+
- We use [pre-commit](https://pre-commit.com/) to automatically address simple code issues before every commit, so please install it and run `pre-commit install` to set it up. This will install hooks to run [ruff](https://docs.astral.sh/ruff/), [bandit](https://github.com/PyCQA/bandit), and [vermin](https://github.com/netromdk/vermin) on every commit. We are currently using a workflow to automatically run these tools on every PR, so if your code doesn't pass these checks, the PR will be rejected.
|
| 29 |
+
- We use type hints for better code clarity and [pyright](https://github.com/microsoft/pyright) for static type checking, which depends on the type hints, of course.
|
| 30 |
+
- We use the conventional commit messages format as [here](https://gist.github.com/qoomon/5dfcdf8eec66a051ecd85625518cfd13#types), so for example, we use the following prefixes for commit messages:
|
| 31 |
+
|
| 32 |
+
| Prefix | When to use it |
|
| 33 |
+
|-------------|--------------------------|
|
| 34 |
+
| `feat:` | New feature added |
|
| 35 |
+
| `fix:` | Bug fix |
|
| 36 |
+
| `docs:` | Documentation change/add |
|
| 37 |
+
| `test:` | Tests |
|
| 38 |
+
| `refactor:` | Code refactoring |
|
| 39 |
+
| `chore:` | Maintenance tasks |
|
| 40 |
+
|
| 41 |
+
Then include the details of the change in the body/description of the commit message.
|
| 42 |
+
|
| 43 |
+
Example:
|
| 44 |
+
```
|
| 45 |
+
feat: add `adaptive` for similar elements
|
| 46 |
+
|
| 47 |
+
- Added find_similar() method
|
| 48 |
+
- Implemented pattern matching
|
| 49 |
+
- Added tests and documentation
|
| 50 |
+
```
|
| 51 |
+
|
| 52 |
+
> Please don’t put your name in the code you contribute; git provides enough metadata to identify the author of the code.
|
| 53 |
+
|
| 54 |
+
## Development
|
| 55 |
+
Setting the scrapling logging level to `debug` makes it easier to know what's happening in the background.
|
| 56 |
+
```python
|
| 57 |
+
import logging
|
| 58 |
+
logging.getLogger("scrapling").setLevel(logging.DEBUG)
|
| 59 |
+
```
|
| 60 |
+
Bonus: You can install the beta of the upcoming update from the dev branch as follows
|
| 61 |
```commandline
|
| 62 |
pip3 install git+https://github.com/D4Vinci/Scrapling.git@dev
|
| 63 |
```
|
| 64 |
+
|
| 65 |
+
## Building Documentation
|
| 66 |
+
Documentation is built using [MkDocs](https://www.mkdocs.org/). You can build it locally using the following commands:
|
| 67 |
+
```bash
|
| 68 |
+
pip install mkdocs-material
|
| 69 |
+
mkdocs serve # Local preview
|
| 70 |
+
mkdocs build # Build the static site
|
| 71 |
+
```
|
| 72 |
+
|
| 73 |
+
## Tests
|
| 74 |
+
Scrapling includes a comprehensive test suite that can be executed with pytest. However, first, you need to install all libraries and `pytest-plugins` listed in `tests/requirements.txt`. Then, running the tests will result in an output like this:
|
| 75 |
+
```bash
|
| 76 |
+
$ pytest tests -n auto
|
| 77 |
+
=============================== test session starts ===============================
|
| 78 |
+
platform darwin -- Python 3.13.8, pytest-8.4.2, pluggy-1.6.0 -- /Users/<redacted>/.venv/bin/python3.13
|
| 79 |
+
cachedir: .pytest_cache
|
| 80 |
+
rootdir: /Users/<redacted>/scrapling
|
| 81 |
+
configfile: pytest.ini
|
| 82 |
+
plugins: asyncio-1.2.0, anyio-4.11.0, xdist-3.8.0, httpbin-2.1.0, cov-7.0.0
|
| 83 |
+
asyncio: mode=Mode.AUTO, debug=False, asyncio_default_fixture_loop_scope=function, asyncio_default_test_loop_scope=function
|
| 84 |
+
10 workers [271 items]
|
| 85 |
+
scheduling tests via LoadScheduling
|
| 86 |
+
|
| 87 |
+
...<shortened>...
|
| 88 |
+
|
| 89 |
+
=============================== 271 passed in 52.68s ==============================
|
| 90 |
+
```
|
| 91 |
+
Hence, we used `-n auto` in the command above to run tests in threads to increase speed.
|
| 92 |
+
|
| 93 |
+
Bonus: You can also see the test coverage with the `pytest` plugin below
|
| 94 |
+
```bash
|
| 95 |
+
pytest --cov=scrapling tests/
|
| 96 |
+
```
|
| 97 |
+
|
| 98 |
+
## Making a Pull Request
|
| 99 |
+
To ensure that your PR gets accepted, please make sure that your PR is based on the latest changes from the dev branch and that it satisfies the following requirements:
|
| 100 |
+
|
| 101 |
+
- The PR should be made against the [**dev**](https://github.com/D4Vinci/Scrapling/tree/dev) branch of Scrapling. Any PR made against the main branch will be rejected.
|
| 102 |
+
- The code should be passing all available tests. We are using tox with GitHub's CI to run the current tests on all supported Python versions with every commit.
|
| 103 |
+
- The code should be passing all code quality checks we mentioned above. We are using GitHub's CI to enforce the code style checks performed by pre-commit. If you were using the pre-commit hooks we discussed above, you should not see any issues when committing your changes.
|
| 104 |
+
- Make your changes, keep the code clean with an explanation of any part that might be vague, and remember to create a separate virtual environment for this project.
|
| 105 |
+
- If you are adding a new feature, please add tests for it.
|
| 106 |
+
- If you are fixing a bug, please add code with the PR that reproduces the bug.
|
docs/contributing.md
DELETED
|
@@ -1,102 +0,0 @@
|
|
| 1 |
-
Thank you for your interest in contributing to Scrapling!
|
| 2 |
-
|
| 3 |
-
Everybody is invited and welcome to contribute to Scrapling.
|
| 4 |
-
|
| 5 |
-
Smaller changes have a better chance of getting included in a timely manner. Adding unit tests for new features or test cases for bugs you've fixed helps us to ensure that the Pull Request (PR) is acceptable.
|
| 6 |
-
|
| 7 |
-
There is a lot to do...
|
| 8 |
-
|
| 9 |
-
- If you are not a developer, you can help us improve the documentation.
|
| 10 |
-
- If you are a developer, most of the features I'm planning to add in the future are moved to [roadmap file](https://github.com/D4Vinci/Scrapling/blob/main/ROADMAP.md), so consider reading it.
|
| 11 |
-
|
| 12 |
-
## Running tests
|
| 13 |
-
Scrapling includes a comprehensive test suite that can be executed with pytest, but first, you need to install all libraries and `pytest-plugins` inside `tests/requirements.txt`. Then, running the tests will result in an output like this:
|
| 14 |
-
```bash
|
| 15 |
-
$ pytest tests
|
| 16 |
-
=============================== test session starts ===============================
|
| 17 |
-
platform darwin -- Python 3.12.8, pytest-8.3.3, pluggy-1.5.0 -- /Users/<redacted>/.venv/bin/python3.12
|
| 18 |
-
cachedir: .pytest_cache
|
| 19 |
-
rootdir: /Users/<redacted>/scrapling
|
| 20 |
-
configfile: pytest.ini
|
| 21 |
-
plugins: cov-5.0.0, asyncio-0.25.0, base-url-2.1.0, httpbin-2.1.0, playwright-0.5.2, anyio-4.6.2.post1, xdist-3.6.1, typeguard-4.3.0
|
| 22 |
-
asyncio: mode=Mode.AUTO, asyncio_default_fixture_loop_scope=function
|
| 23 |
-
collected 83 items
|
| 24 |
-
|
| 25 |
-
...<shortened>...
|
| 26 |
-
|
| 27 |
-
=============================== 83 passed in 157.52s (0:02:37) =====================
|
| 28 |
-
```
|
| 29 |
-
Hence, you can add `-n auto` to the command above to run tests in threads to increase speed.
|
| 30 |
-
|
| 31 |
-
Bonus: You can also see the test coverage with the pytest plugin below
|
| 32 |
-
```bash
|
| 33 |
-
pytest --cov=scrapling tests/
|
| 34 |
-
```
|
| 35 |
-
|
| 36 |
-
## Installing the latest unstable version from the dev branch
|
| 37 |
-
```bash
|
| 38 |
-
pip3 install git+https://github.com/D4Vinci/Scrapling.git@dev
|
| 39 |
-
```
|
| 40 |
-
|
| 41 |
-
## Development
|
| 42 |
-
Setting the scrapling logging level to `debug` makes it easier to know what's happening in the background.
|
| 43 |
-
```python
|
| 44 |
-
>>> import logging
|
| 45 |
-
>>> logging.getLogger("scrapling").setLevel(logging.DEBUG)
|
| 46 |
-
```
|
| 47 |
-
### Code Style
|
| 48 |
-
|
| 49 |
-
We use:
|
| 50 |
-
|
| 51 |
-
1. Type hints for better code clarity
|
| 52 |
-
2. Flake8, bandit, isort, and other hooks through `pre-commit`. <br/>Please install the hooks before committing with:
|
| 53 |
-
```bash
|
| 54 |
-
pip install pre-commit
|
| 55 |
-
pre-commit install
|
| 56 |
-
```
|
| 57 |
-
It will run automatically on the code you push with each commit.
|
| 58 |
-
3. Conventional commit messages format. We use the below format for commit messages
|
| 59 |
-
|
| 60 |
-
| Prefix | When to use it |
|
| 61 |
-
|-------------|--------------------------|
|
| 62 |
-
| `feat:` | New feature added |
|
| 63 |
-
| `fix:` | Bug fix |
|
| 64 |
-
| `docs:` | Documentation change/add |
|
| 65 |
-
| `test:` | Tests |
|
| 66 |
-
| `refactor:` | Code refactoring |
|
| 67 |
-
| `chore:` | Maintenance tasks |
|
| 68 |
-
|
| 69 |
-
Example:
|
| 70 |
-
```
|
| 71 |
-
feat: add `adaptive` for similar elements
|
| 72 |
-
|
| 73 |
-
- Added find_similar() method
|
| 74 |
-
- Implemented pattern matching
|
| 75 |
-
- Added tests and documentation
|
| 76 |
-
```
|
| 77 |
-
|
| 78 |
-
### Push changes to the library
|
| 79 |
-
|
| 80 |
-
Then, the process is straightforward.
|
| 81 |
-
|
| 82 |
-
- Read [How to get faster PR reviews](https://github.com/kubernetes/community/blob/master/contributors/guide/pull-requests.md#best-practices-for-faster-reviews) by Kubernetes (but skip step 0 and 1)
|
| 83 |
-
- Fork Scrapling [Git repository](https://github.com/D4Vinci/Scrapling.git).
|
| 84 |
-
- Make your changes, and don't forget to create a separate virtual environment for this project.
|
| 85 |
-
- Ensure all tests are passing.
|
| 86 |
-
- Create a Pull Request against the [**dev**](https://github.com/D4Vinci/Scrapling/tree/dev) branch of Scrapling.
|
| 87 |
-
|
| 88 |
-
A bonus: if you have more than one version of Python installed, you can use tox to run tests on each version with:
|
| 89 |
-
```bash
|
| 90 |
-
pip install tox
|
| 91 |
-
tox
|
| 92 |
-
```
|
| 93 |
-
|
| 94 |
-
> Note: All tests are automatically run with each push on Github on all supported Python versions using tox, so ensure all tests pass, or your PR will not be accepted.
|
| 95 |
-
|
| 96 |
-
|
| 97 |
-
## Building Documentation
|
| 98 |
-
```bash
|
| 99 |
-
pip install mkdocs-material
|
| 100 |
-
mkdocs serve # Local preview
|
| 101 |
-
mkdocs build # Build the static site
|
| 102 |
-
```
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
mkdocs.yml
CHANGED
|
@@ -79,7 +79,7 @@ nav:
|
|
| 79 |
- Writing your retrieval system: development/adaptive_storage_system.md
|
| 80 |
- Using Scrapling's custom types: development/scrapling_custom_types.md
|
| 81 |
- Support and Advertisement: donate.md
|
| 82 |
-
- Contributing:
|
| 83 |
- Changelog: 'https://github.com/D4Vinci/Scrapling/releases'
|
| 84 |
|
| 85 |
markdown_extensions:
|
|
|
|
| 79 |
- Writing your retrieval system: development/adaptive_storage_system.md
|
| 80 |
- Using Scrapling's custom types: development/scrapling_custom_types.md
|
| 81 |
- Support and Advertisement: donate.md
|
| 82 |
+
- Contributing: 'https://github.com/D4Vinci/Scrapling/blob/main/CONTRIBUTING.md'
|
| 83 |
- Changelog: 'https://github.com/D4Vinci/Scrapling/releases'
|
| 84 |
|
| 85 |
markdown_extensions:
|