About This File
irc-rss-feed-bot
irc-rss-feed-bot is a dockerized Python 3.11 and IRC based RSS/Atom and scraped HTML/JSON/CSV feed posting bot. It essentially posts the entries of feeds in IRC channels, one entry per message. More specifically, it posts the titles and shortened URLs of entries.
Contents
Features
- Multiple channels on an IRC server are supported, with each channel having its own set of feeds. For use with multiple servers, a separate instance of the bot process can be run for each server.
- Entries are posted only if the channel has not had any conversation for a certain minimum amount of time, thereby avoiding the interruption of any preexisting conversations. This amount of time is 15 minutes for any feed which has a polling period greater than 12 minutes. There is however no delay for any feed which has a polling period less than or equal to 12 minutes as such a feed is considered urgent.
- A SQLite database file records hashes of the entries that have been posted, thereby preventing them from being reposted.
- Posted URLs are shortened using the da.gd service.
-
The
hext
,jmespath
, andpandas
DSLs are supported for flexibly parsing arbitrary HTML, JSON, and CSV content respectively. These parsers also support configurable recursive crawling. - Entry titles are formatted for neatness. Any HTML tags and excessive whitespace are stripped, all-caps are replaced, and excessively long titles are sanely truncated.
- A TTL and ETag based compressed disk cache of URL content is used for preventing unnecessary URL reads. Any websites with a mismatched strong ETag are probabilistically detected, and this caching is then disabled for them for the duration of the process. Note that this detection is skipped for a weak ETag.
- Encoded Google News and FeedBurner URLs are decoded.
For several more features, see the customizable global and feed-specific settings, and commands.
Links
Caption | Link |
---|---|
Repo | https://github.com/impredicative/irc-rss-feed-bot |
Changelog | https://github.com/impredicative/irc-rss-feed-bot/releases |
Image | https://hub.docker.com/r/ascensive/irc-rss-feed-bot |
Examples
<FeedBot> [ArXiv:cs.AI] Concurrent Meta Reinforcement Learning → https://arxiv.org/abs/1903.02710v1
<FeedBot> [ArXiv:cs.AI] Attack Graph Obfuscation → https://arxiv.org/abs/1903.02601v1
<FeedBot> [InfoWorld] What is a devops engineer? And how do you become one? → https://da.gd/dvXh9
<FeedBot> [InfoWorld] What is Jupyter Notebook? Data analysis made easier → https://da.gd/yrCi
<FeedBot> [AWS:OpenData] COVID-19 Open Research Dataset (CORD-19): Full-text and metadata dataset of
COVID-19 research articles. → https://registry.opendata.aws/cord-19
Development
For software development purposes only, the project can be set up on Ubuntu as below.
make setup-ppa
make install-py
make setup-venv
make shell
make install
make test
make build
Usage
Configuration: secret
Prepare a private secrets.env
environment file using the sample below.
IRC_PASSWORD=YourActualPassword GITHUB_TOKEN=c81a62ca23caa140715bbfc175997c02d0fdd768
GITHUB_TOKEN
This is optional. Refer to the publish.github
feature.
Configuration: non-secret
Prepare a version-controlled config.yaml
file using the sample below. A full-fledged real-world example is also available.
host: irc.libera.chat ssl_port: 6697 #ssl_verify: true nick: MyFeedBot admin: mynick!myident@myhost alerts_channel: '#mybot-alerts' mode: #mirror: '#mybot-mirror' #publish: # github: MyGithubServiceAccountUsername/IrcServerName-MyBotName-live #defaults: # new: all feeds: "##mybot-alerts": irc-rss-feed-bot: url: https://github.com/impredicative/irc-rss-feed-bot/releases.atom period: 12 shorten: false "#some_chan1": AWS:OpenData: url: https://registry.opendata.aws/rss.xml message: summary: true CDC:FoodSafety: url: https://tools.cdc.gov/api/v2/resources/media/316422.rss redirect: true j:AJCN: url: https://academic.oup.com/rss/site_6122/3981.xml mirror: false period: 12 blacklist: title: - ^Calendar\ of\ Events$ LitCovid: url: https://www.ncbi.nlm.nih.gov/research/coronavirus-api/export pandas: |- read_csv(file, comment="#", sep="\t") \ .assign(link=lambda r: "https://pubmed.ncbi.nlm.nih.gov/" + r["pmid"].astype("str")) \ .convert_dtypes() MedicalXpress:nutrition: url: https://medicalxpress.com/rss-feed/search/?search=nutrition r/FoodNerds: url: https://www.reddit.com/r/FoodNerds/new/.rss shorten: false sub: url: pattern: ^https://www\.reddit\.com/r/.+?/comments/(?P<id>.+?)/.+$ repl: https://redd.it/\g<id> "##some_chan2": ArXiv:cs.AI: &ArXiv url: http://export.arxiv.org/rss/cs.AI period: 1.5 https: true shorten: false group: ArXiv:cs alerts: empty: false format: re: title: '^(?P<name>.+?)\.?\ \(arXiv:.+(?P<ver>v\d+)\ ' str: title: '{name}' url: '{url}{ver}' ArXiv:cs.NE: <<: *ArXiv url: http://export.arxiv.org/rss/cs.NE ArXiv:stat.ML: <<: *ArXiv url: http://export.arxiv.org/rss/stat.ML group: null AWS:status: url: https://status.aws.amazon.com/rss/all.rss period: .2 https: true new: none sub: title: pattern: ^(?:Informational\ message|Service\ is\ operating\ normally):\ \[RESOLVED\] repl: '[RESOLVED]' format: re: id: /\#(?P<service>[^_]+) str: title: '[{service}] {title} | {summary}' url: '{id}' Fb:Research: url: https://research.fb.com/publications/ hext: |- <div> <a href:link><h3 @text:title/></a> <div class="areas-wrapper"><a href @text:category/></div> </div> <div><form class="download-form" action/></div> whitelist: category: - ^(?:Facebook\ AI\ Research|Machine\ Learning|Natural\ Language\ Processing\ \&\ Speech)$ InfoWorld: url: https://www.infoworld.com/index.rss order: reverse j:MDPI:N: # https://www.mdpi.com/journal/nutrients (open access) url: https://www.mdpi.com/rss/journal/nutrients www: false KDnuggets: url: https://us-east1-ml-feeds.cloudfunctions.net/kdnuggets new: some libraries.io/pypi/scikit-learn: url: https://libraries.io/pypi/scikit-learn/versions.atom new: none period: 8 shorten: false MedRxiv: url: - https://connect.medrxiv.org/medrxiv_xml.php?subject=Health_Informatics - https://connect.medrxiv.org/medrxiv_xml.php?subject=Nutrition alerts: read: false https: true r/MachineLearning:100+: url: https://www.reddit.com/r/MachineLearning/hot/.json?limit=50 jmespath: 'data.children[*].data | [?score >= `100`].{title: title, link: join(``, [`https://redd.it/`, id])}' shorten: false r/wallstreetbets:50+: url: https://www.reddit.com/r/wallstreetbets/hot/.json?limit=98 jmespath: 'data.children[*].data | [?(not_null(link_flair_text) && score >= `50`)].{title: join(``, [`[`, link_flair_text, `] `, title]), link: join(``, [`https://redd.it/`, id]), category: link_flair_text}' emoji: false shorten: false blacklist: category: - ^(?:Daily\ Discussion|Gain|Loss|Meme|Weekend\ Discussion|YOLO)$ PwC:Latest: url: https://us-east1-ml-feeds.cloudfunctions.net/pwc/latest period: 0.5 dedup: channel PwC:Trending: url: https://us-east1-ml-feeds.cloudfunctions.net/pwc/trending period: 0.5 dedup: channel SeekingAlpha: period: 0.2 sub: url: pattern: ^(?P<main_url>https://seekingalpha\.com/[a-z]+/[0-9]+).*$ repl: \g<main_url> shorten: false topic: "Daily calendar": \b(?i:economic\ calendar)\b "Daily prep": '^Wall\ Street\ Breakfast:\ ' "Hourly status": ^On\ the\ hour$ url: - https://seekingalpha.com/market_currents.xml - https://seekingalpha.com/feed.xml - https://seekingalpha.com/tag/etf-portfolio-strategy.xml - https://seekingalpha.com/tag/wall-st-breakfast.xml SSRN: url: https://papers.ssrn.com/sol3/Jeljour_results.cfm?form_name=journalBrowse&journal_id=3526423&Network=no&lim=false&npage=1 hext: select: <a href:link href^="https://ssrn.com/abstract=" @text:title /> follow: <a class="jeljour_pagination_number" @text:prepend("https://papers.ssrn.com/sol3/Jeljour_results.cfm?form_name=journalBrowse&journal_id=3526423&Network=no&lim=false&npage="):url/> period: 6 TalkRL: url: https://www.talkrl.com/feed period: 8 message: title: false summary: true YT:3Blue1Brown: &YT url: https://www.youtube.com/feeds/videos.xml?channel_id=UCYO_jab_esuFRV4b17AJtAw period: 12 shorten: false style: name: bg: red fg: white bold: true sub: url: pattern: ^https://www\.youtube\.com/watch\?v=(?P<id>.+?)$ repl: https://youtu.be/\g<id> YT:AGI: url: https://www.youtube.com/results?search_query=%22artificial+general+intelligence%22&sp=CAISBBABGAI%253D hext: <a href:filter("/watch\?v=(.+)"):prepend("https://youtu.be/"):link href^="/watch?v=" title:title/> period: 12 shorten: false alerts: emptied: true blacklist: title: - \bWikipedia\ audio\ article\b YT:LexFridman: <<: *YT url: https://www.youtube.com/feeds/videos.xml?channel_id=UCSHZKyawb77ixDdsGog4iWA whitelist: title: - \bAGI\b
Global settings
Mandatory
-
host
: IRC server address. -
ssl_port
: IRC server SSL port. -
ssl_verify
: Iffalse
, the TLS/SSL certificate is not verified. Its default istrue
. -
nick
: This is a registered IRC nick. If the nick is in use, it will be regained. Ensure that the email verification of the registered nick, as applicable to many IRC servers, is complete. Without this email verification being completed, the bot can fail to receive the required event 900 and therefore fail to function.
Recommended
-
admin
: Administrative commands by this user pattern are accepted and executed. Its format isnick!ident@host
. An example isJDoe11!sid654321@gateway/web/irccloud.com/x-*
. A case-insensitive pattern match is tested for usingfnmatch
. -
alerts_channel
: Some but not all warning and error alerts are sent to this channel. Its default value is##{nick}-alerts
. The key{nick}
, if present in the value, is formatted with the actual nick. For example, if the nick isMyFeedBot
, alerts will by default be sent to##MyFeedBot-alerts
. Since a channel name starts with #, the name if provided must be quoted. It is recommended that the alerts channel be registered and monitored. -
mode
: This can for example be+igR
for Libera and+igpR
for Rizon.
Optional
-
mirror
: If specified as a channel name, all posts across all channels are mirrored to this channel. This however doubles the time between consecutive posts in any given channel. Mirroring can however individually be disabled for a feed by setting<feed>.mirror
. -
publish.github
: This is the username and repo name of a GitHub repo, e.g.feedarchive/libera-feedbot-live
. All posts are published to the repo, thereby providing a basic option to archive them. A new CSV file is written to the repo for each posted feed having one or more new posts. The following requirements apply:-
The repo must exist; it is not created by the bot. It is recommended that an empty new repo is used. If the repo is of public interest, it can be requested to be moved into the
feedarchive
organization by filing an issue. - The GitHub user must have access to write to the repo. It is recommended that a dedicated new service account be used, not your primary user account.
-
A GitHub personal access token is required with access to the entire
repo
scope. Therepo
scope is used for making commits. The token is provisioned for the bot via theGITHUB_TOKEN
secret environment variable.
-
The repo must exist; it is not created by the bot. It is recommended that an empty new repo is used. If the repo is of public interest, it can be requested to be moved into the
Developer
-
log.irc
: Iftrue
, low level IRC events are logged byminiirc
. These are quite noisy. Its default isfalse
. -
once
: Iftrue
, each feed is queued only once. It is for testing purposes. Its default isfalse
. -
tracemalloc
: Iftrue
, memory allocation tracing is enabled. The top usage and positive-diff statistics are then logged hourly. It is for diagnostic purposes. Its default isfalse
.
Feed-specific settings
A feed is defined under a channel as in the sample configuration. The feed's key represents its name.
The order of execution of the interacting operations is: redirect
, blacklist
, whitelist
, https
, www
, emoji
, sub
, format
, shorten
. Refer to the sample configuration for usage examples.
YAML anchors and references can be used to reuse nodes. Examples of this are in the sample.
Mandatory
-
<feed>.url
: This is either a single URL or a list of URLs of the feed. If a list, the URLs are read in sequence with an interval of one second between them.
Optional
These are optional and are independent of each other:
-
<feed>.alerts.empty
: Iftrue
, an alert is sent if any source URL of the feed has no entries before their validation. Iffalse
, such an alert is not sent. Its default value istrue
. -
<feed>.alerts.emptied
: Iftrue
, an alert is sent if the feed has entries before but not after their validation. Iffalse
, such an alert is not sent. Its default value isfalse
. -
<feed>.alerts.read
: Iftrue
, an alert is sent if an error occurs three or more consecutive times when reading or processing the feed, but no more than once every 15 minutes. Iffalse
, such an alert is not sent. Its default value istrue
. -
<feed>.blacklist.category
: This is an arbitrarily nested dictionary or list or their mix of regular expression patterns that result in an entry being skipped if a search finds any of the patterns in any of the categories of the entry. The nesting permits lists to be creatively reused between feeds via YAML anchors and references. -
<feed>.blacklist.title
: This is an arbitrarily nested dictionary or list or their mix of regular expression patterns that result in an entry being skipped if a search finds any of the patterns in the title. The nesting permits lists to be creatively reused between feeds via YAML anchors and references. -
<feed>.blacklist.url
: Similar to<feed>.blacklist.title
. -
<feed>.dedup
: This indicates how to deduplicate posts for the feed, thereby preventing them from being reposted. The default value isfeed
(per-feed per-channel), and an alternate possible value ischannel
(per-channel). -
<feed>.emoji
: Iffalse
, emojis in entry titles are removed. Its default value isnull
. -
<feed>.group
: If a string, this delays the processing of a feed that has just been read until all other feeds having the same group are also read. This encourages multiple feeds having the same group to be be posted in succession, except if interrupted by conversation. It is however possible that unrelated feeds of any channel gets posted between ones having the same group. To explicitly specify the absence of a group when using a YAML reference, the value can be specified asnull
. It is recommended that feeds in the same group have the sameperiod
. -
<feed>.https
: Iftrue
, entry links that start withhttp://
are changed to start withhttps://
instead. Its default value isfalse
. -
<feed>.message.summary
: Iftrue
, the entry summary (description) is included in its message. The entry title, if included, is then formatted bold. This is applied using IRC formatting if astyle
is defined for the feed, otherwise using unicode formatting. The default value isfalse
. -
<feed>.message.title
: Iffalse
, the entry title is not included in its message. Its default value istrue
. -
<feed>.mirror
: Iffalse
, mirroring is disabled for this feed. Its default value istrue
, subject to the global-setting for mirroring. -
<feed>.new
: This indicates up to how many entries of a new feed to post. A new feed is defined as one with no prior posts in its channel. The default value issome
which is interpreted as 3. The default is intended to limit flooding a channel when one or more new feeds are added. A string value ofnone
is interpreted as 0 and will skip all entries for a new feed. A value ofall
will skip no entries for a new feed; it is not recommended and should be used sparingly if at all. In any case, future entries in the feed are not affected by this option on subsequent reads, and they are all forwarded without a limit. -
<feed>.order
: Ifreverse
, the order of the entries is reversed. -
<feed>.period
: This indicates how frequently to read the feed in hours on an average. Its default value is 1. Conservative polling is recommended. Any value below 0.2 is changed to a minimum of 0.2. Note that 0.2 hours is equal to 12 minutes. To make service restarts safer by preventing excessive reads, the first read is delayed by half the period. To better distribute the load of reading multiple feeds, a uniformly distributed random ±5% is applied to the period for each read. -
<feed>.redirect
: This indicates whether to substitute each entry URL with its redirect target. The default value isfalse
. -
<feed>.shorten
: This indicates whether to post shortened URLs for the feed. The default value istrue
. The alternative valuefalse
is recommended if the URL is naturally small, or ifsub
orformat
can be used to make it small. If a "Blacklisted long URL" error is experienced for a reasonable website which should not be blacklisted, it can be reported here, using this issue as an example. -
<feed>.style.name.bg
: This is a string representing the name of a background color applied to the feed's name. It can be one of: white, black, blue, green, red, brown, purple, orange, yellow, lime, teal, aqua, royal, pink, grey, silver. The channel modes must allow formatting for this option to be effective. -
<feed>.style.name.bold
: Iftrue
, bold formatting is applied to the feed's name. Its default value isfalse
. The channel modes must allow formatting for this option to be effective. -
<feed>.style.name.fg
: Foreground color similar to<feed>.style.name.bg
. -
<feed>.topic
: This updates the channel topic with the short URL of a matching entry. It requires auto-op (+O) to allow the topic to be updated. The topic is divided into logical sections separated by|
(<space><pipe><space>
). For any matching entry, only its matching section in the topic is updated. Its value can be a dictionary in which each key is a section name and each value is a regular expression pattern. If a regular expression search matches an entry's title, the section in the topic is updated with the entry's short URL. The topic's length is not checked. -
<feed>.whitelist.category
: This is an arbitrarily nested dictionary or list or their mix of regular expression patterns that result in an entry being skipped unless a search finds any of the patterns in any of the categories of the entry. The nesting permits lists to be creatively reused between feeds via YAML anchors and references. -
<feed>.whitelist.explain
: This applies only to<feed>.whitelist.title
. It can be useful for understanding which portion of a post's title matched the whitelist. Iftrue
, the first match of each posted title is italicized. This is applied using IRC formatting if astyle
is defined for the feed, otherwise using unicode formatting. For example, "This is a matching sample title". The default value isfalse
. -
<feed>.whitelist.title
: This is an arbitrarily nested dictionary or list from which all leaf values are used. The leaf values are regular expression patterns. This result in an entry being skipped unless a search finds any of the patterns in the title. The nesting permits lists to be creatively reused between feeds via YAML anchors and references. -
<feed>.whitelist.url
: Similar to<feed>.whitelist.title
. -
<feed>.www
: Iffalse
, entry links that contain thewww.
prefix are changed to remove this prefix. Its default value isnull
.
Parser
For a non-XML feed, one of the following non-default parsers can be used. Multiple parsers cannot be used for a feed. The parsers are searched for in the alphabetical order listed below, and the first to be found is used. Each parsed entry must at a minimum return a title
, a link
, an optional summary
(description), and zero or more values for category
The title
can be a string or a list of strings.
-
<feed>.hext
: This is a string representing the hext DSL for parsing a list of entry dictionaries from an HTML web page. Before using, it can be tested in the form here. Note thatmax_searches
is set to 100_000 to protect against resource exhaustion. -
<feed>.jmespath
: This is a string representing the jmespath DSL for parsing a list of entry dictionaries from JSON. Before using, it can be tested in the form here. -
<feed>.pandas
: This is a string command evaluated using pandas for parsing a dataframe of entries. The raw content is made available to the parser as a file-like object namedfile
. This parser useseval
which is unsafe, and so its use must be confirmed to be safe. The provisioned packages arejson
,numpy
(asnp
), andpandas
(aspd
). The value requires compatibility with the versions ofpandas
andnumpy
defined inrequirements.txt
, noting that these version requirements are expected to be routinely updated.
For recursive crawling, the value of a parser can alternatively be:
-
<feed>.<parser>.select
: This is the string which was hitherto documented as the value for<feed>.<parser>.
. The parser uses it to return the entries to post. -
<feed>.<parser>.follow
: The is an optional string which the parser uses to return zero or more additional URLs to read. The returned URLs can a list of strings or a list of dictionaries with the keyurl
. Crawling applies recursively to each returned URL. Each unique URL is read once. There is an interval of at least one second between the end of a read and the start of the next read. Care should nevertheless be taken to avoid crawling a large number of URLs.
Some sites require a custom user agent or other custom headers for successful scraping; such a customization can be requested by creating an issue.
Conditional
The sample configuration above contains examples of these:
-
<feed>.format.re.title
: This is a single regular expression pattern that is searched for in the title. It is used to collect named key-value pairs from the match if there is one. -
<feed>.format.re.url
: Similar to<feed>.format.re.title
. -
<feed>.format.str.title
: The key-value pairs collected using<feed>.format.re.title
and<feed>.format.re.url
, both of which are optional, are combined along with the default additions oftitle
,url
,categories
, andfeed.url
as keys. Any additional keys returned by the parser are also available. The key-value pairs are used to format the provided quoted title string. If the title formatting fails for any reason, a warning is logged, and the title remains unchanged. The default value is{title}
. -
<feed>.format.str.url
: Similar to<feed>.format.str.title
. The default value is{url}
. If this is specified, it can sometimes be relevant to setshorten
tofalse
for the feed. -
<feed>.sub.summary.pattern
: This is a single regular expression pattern that if found results in the entry summary being substituted. -
<feed>.sub.summary.repl
: If<feed>.sub.summary.pattern
is found, the entry summary is replaced with this replacement, otherwise it is forwarded unchanged. -
<feed>.sub.title.pattern
: Similar to<feed>.sub.summary.pattern
. -
<feed>.sub.title.repl
: Similar to<feed>.sub.summary.repl
. -
<feed>.sub.url.pattern
: Similar to<feed>.sub.summary.pattern
. If a pattern is specified, it can sometimes be relevant to setshorten
tofalse
for the feed. -
<feed>.sub.url.repl
: Similar to<feed>.sub.summary.repl
.
Feed default settings
A global default value can optionally be set under defaults
for some feed-specific settings, namely new
and shorten
. This value overrides its internal default. It facilitates not having to set the same value individually for many feeds.
Refer to "Feed-specific settings" for the possible values and internal defaults of these settings. Refer to the embedded sample configuration for a usage example.
Commands
Commands can be sent to the bot either as a private message or as a directed public message. Private messages may however be prohibited for security purposes using the mode
configuration. Public messages to the bot must be directed as MyBotNick: my_command
.
Administrative
Administrative commands are accepted from the configured admin
. If admin
is not configured, the commands are not processed. It is expected but not required that administrative commands to the bot will typically be sent in the alerts_channel
. The supported commands are:
-
exit
: Gracefully exit with code 0. The exit is delayed until any feeds that are currently being posted finish posting and being written to the database. If running the bot as a Docker Compose service, using this command withrestart: on-failure
will (due to code 0) prevent the bot from automatically restarting. Note that a repeated invocation of this command has no effect. -
fail
: Similar toexit
but with code 1. If running the bot as a Docker Compose service, using this command withrestart: on-failure
will (due to a nonzero code) cause the bot to automatically be restarted. -
quit
: Alias ofexit
.
Deployment
-
As a reminder, it is recommended that the alerts channel be registered and monitored.
-
It is recommended that the bot be auto-voiced (+V) in each channel. Failing this, messages from the bot risk being silently dropped by the server. This is despite the bot-enforced limit of two seconds per message across the server.
-
It is recommended that the bot be run as a Docker container using using Docker ≥18.09.2, possibly with Docker Compose ≥1.24.0. To run the bot using Docker Compose, create or add to a version-controlled
docker-compose.yml
file such as:
version: '3.7' services: irc-rss-feed-bot: container_name: irc-rss-feed-bot image: ascensive/irc-rss-feed-bot:<VERSION> # network_mode: host # If having DNS name resolution issues. restart: on-failure # restart: always logging: options: max-size: 2m max-file: "5" volumes: - ./irc-rss-feed-bot:/config env_file: - ./irc-rss-feed-bot/secrets.env environment: TZ: America/New_York # Select TZ database name from https://en.wikipedia.org/wiki/List_of_tz_database_time_zones
-
In the above service definition in
docker-compose.yml
:-
image
: Use a specific versioned tag, e.g.0.12.0
. -
volumes
: Customize the relative path to the previously createdconfig.yaml
file, e.g../irc-rss-feed-bot
. This volume source directory must be writable by the container using the UID defined in the Dockerfile; it is 999. A simple way to ensure it is writable is to run a command such aschmod -R a+w ./irc-rss-feed-bot
once on the host. -
env_file
: Customize the relative path tosecrets.env
. -
environment
: Optionally customize the environment variableTZ
to the preferred time zone as represented by a TZ database name. Note that the date and time are prefixed in each log message.
-
-
From the directory containing
docker-compose.yml
, rundocker-compose up -d irc-rss-feed-bot
. Usedocker logs -f irc-rss-feed-bot
to see and follow informational logs.
Maintenance
Service
It is recommended that the supported administrative commands be used together with Docker Compose or a comparable container service manager to shutdown or restart the service.
Config
-
If
config.yaml
is updated, the container must be restarted to use the updated file. -
If
secrets.env
or the service definition indocker-compose.yml
are updated, the container must be recreated (and not merely restarted) to use the updated file.
Database
-
A
posts.v2.db
database file is written by the bot in the same directory asconfig.yaml
. This database file must be preserved with routine backups. After restoring a backup, before starting the container, ensure the database file is writable by running a command such aschmod a+w ./irc-rss-feed-bot/posts.v2.db
. - The database file grows as new posts are made. For the most part this indefinite growth can be ignored. Currently, the standard approach for handling this, if necessary, is to stop the bot and delete the database file if it has grown unacceptably large. Restarting the bot after deleting the database will then create a new database file, and all configured feeds will be handled as new. This deletion is however discouraged as a routine measure.
Disk cache
-
An ephemeral directory
/app/.ircrssfeedbot_cache
is written by the bot in the container. It contains one or more independent disk caches. The size of each independent disk cache in this directory is limited to approximately 2 GiB. If needed, this directory can optionally be mounted as an external volume.