What
This is a tool that allows you to create a personal index of web pages. You can then search for pages in your index using keywords. The tool is composed of a web app and a Firefox extension. The web app allows you to upload pages to your index, while the extension allows you to upload the current page to your index. The tool is currently in beta, so please be patient if you encounter any bugs. If you have any suggestions, please let me know on Github
How
To use
- Create an account: You can use a temporary email service like Temp-Mail or a tool like Firefox Relay. It's fine if you choose either of those options; I just needed a way to differentiate users and their indexes.
- Log in: Enter your credentials to access your account.
- Add URLs to scrape: Provide the software with a URL to scrape. I understand that it might be considered unethical to take content from others, but I couldn't think of a better approach. The scraped content will be added to your index.
- Repeat the process: You can repeat steps 3 multiple times to add more content to your index. However, please note that this functionality is currently limited due to the risk of IP banning or blacklisting.
- Alternatively, use the Firefox extension: I have created a Firefox extension that allows you to upload your current page directly to your index. This extension is useful because it has no limits, and it can upload personal data displayed on dynamic pages. However, please be cautious as I am still a rando person on the internet.
- Manage your pages: You can add categories to your pages in the "/me/pages" section
- Search for pages: Visit the provided website and use the search function with "keywords" (I mean... the query is sent directly to MeiliSearch). You will receive a list of pages from your index that match the query.
Limitations
Right now, the main limitation is on the number of pages that can be scraped. I have set a limit of 5 pages per month. This is because I don't want to get my ip blacklisted. As a makeshift, I am currently routing the requests through a proxy, but I am still working on a solution to this problem. Suggestion are welcome. Also, I call "scaping" a simple GET request to the page (and so the text), so if you need to login to see the content, you can't scrape it (but you can use the extension).
Privacy
Well, of course, if I want to index every web page uploaded, I think there is no alternative but to obtain the clear/plain text (note that this is also valid if you want to index personal data through the extension). In case there is a way to protect the indexed text as well, please let me know. I ask only for an email and save only 2 cookies (that a know of): a JWT and an id. There are no analitycs. The extention is 1.6kB of js, you can look it up. (just rename the file from .xpi to .zip)