In Darcs, any command can have a post hook (and a pre hook), and the hook
command can be set using a command-line option to the darcs command that you
run. So, in the Vervis SSH server, if we add a --posthook option when running
`darcs apply` to apply remotely received patches, we get a chance to process
the patch data much like in the git post-receive hook.
The setup this patch creates is similar to the git one: It writes a
_darcs/prefs/defaults file to all Darcs repos, and that defaults file sets the
posthook line for `darcs apply`. The posthook line simply executes the actual
hook program written in Haskell.
The current hook program is a one-liner that prints a line to stdout, so every
time you `darcs push` you can tell the hook got executed. The next step is to
implement the actual hook logic, by reading patch data from the environment
variable in which Darcs puts it.
This patch contains migrations that require that there are no follow records.
If you have any, the migration will (hopefully) fail and you'll need to
manually delete any follow records you have. In the next patch I'll try to add
automatic following on the pseudo-client side by running both e.g. createNoteC
and followC in the same POST request handler.
Here's how it works:
- When Vervis starts, it writes a config file and it writes post-receive hooks
into all the repos it manages
- When a git push is accepted, git runs the post-receive hook, which is a
trivial shell script that executes the actual Haskell program implementing
the hook logic
- The Haskell hook program generates a Push JSON object and HTTP POSTs it to
Vervis running on localhost
- Vervis currently responds with an error, the next step is to implement the
actual publishing of ForgeFed Push activities
Currently it's a paged Collection where the items are merely URIs. This could
be changed to have actual Commit objects as items; for that we need to examine
the whole thing with the LogEntry type and the Patch type and have an
AP-friendly log item representation, but without commit diffs.
FedURIs, until now, have been requiring HTTPS, and no port number, and DNS
internet domain names. This works just fine on the forge fediverse, but it
makes local dev builds much less useful.
This patch introduces URI types that have a type tag specifying one of 2 modes:
- `Dev`: Works with URIs like `http://localhost:3000/s/fr33`
- `Fed`: Works with URIs like `https://dev.community/s/fr33`
This should allow even to run multiple federating instances for development,
without needing TLS or reverse proxies or editing the hosts files or anything
like that.
This patch also disables the ability to specify deps when creating a ticket,
because those deps won't be in the ticket object anymore. Instead of coding a
workaround and getting complications later, I just disabled that thing. It
wasn't really being used by anyone anyway.
aeson-pretty implements by formatting using a text Builder, and the ByteString
is encoded from that. So instead of decoding the ByteString to produce Text or
Builder, use the Builder as the starting point, to match how aeson-pretty works
and save computation and weird backwards-decoding stuff.
highligher2 doesn't have a JSON syntax and the JS lexer seems to be failing,
not sure exactly why yet. To have an alternative, I'm adding a Skylighting
option.
This patch doesn't just add the handler code, it also does lots of refactoring
and moves around pieces of code that are used in multiple places. There is
still lots of refactoring to make though. In this patch I tried to make minimal
changes to the existing Note handler to avoid breaking it. In later patches
I'll do some more serious refactoring, hopefully resulting with less mess in
the code.
`updateGet` isn't atomic by default. In PostgreSQL the default isolation level
if committed read, and an `update` followed by a `get` doesn't guarantee you
get the same value you sent. However I'm making a patch for `persistent` to
make `updateGet` atomic for PostgreSQL.
- The data returned from activity authentication has nicer types now, and no
mess of big tuples.
- Activity authentication code has its own module now, Vervis.Federation.Auth.
- The sharer inbox handler can now handle and store activities by a local
project actor, forwarded from a remote actor. This isn't in use right now,
but once projects start publishing Accept activities, or other things, it may
be needed.
CRITICAL: Due to the requirement that each new ticket points to its Offer
activity, ticket creation has been disabled! The next patches should implement
C2S submission of Offer Ticket, and then ticket creation will work again. Sorry
for that.
Since MonadSite now requires MonadIO, and not MonadUnliftIO, to allow for more
instances, the MonadUnliftIO constraint may need to be added manually
sometimes.
This allows to browse via e.g. localhost:3000 even if the instance host is
something else and the rendered URLs don't have a port number. It still makes
many things impossible or inconvenient, but at least you can launch Vervis
locally for development and see pages. Right now even CSS doesn't work because
of the URLs not matching the actual localhost:3000 access. Maybe gradually I'll
figure it out.
- Fork and async are no longer class methods, which simplifies things a lot and
allows for many more trivial instances, much like with MonadHandler. Fork and
async are still available, but instead of unnecessarily being class methods,
they are now provided as follows: You can fork and async a worker (no more
fork/async for handler, because I never actually need that, and not sure
there's ever a need for that in general), and you can do that from any
MonadSite. So, you can fork or async a worker from a Handler, from a Worker,
from a ReaderT on top of them e.g. inside runDB, and so on.
- Following the simplification, new MonadSite instances are provided, so far
just the ones in actual use in the code. ReaderT, ExceptT and lazy RWST. More
can be added easily. Oh, and WidgetFor got an instance too.
In particular, this change means there's no usage of `forkHandler` anymore, at
all. I wonder if it ever makes a difference to `forkWorker` versus
`forkHandler`. Like, does it cause memory leaks or anything. I guess could
check why `forkResource` etc. is good for in `forkHandler` implementation. I
suppose if needed, I could fix possible memory leaks in `forkWorker`.
* No more full URIs, all terms are used as short non-prefixed names
* Some terms support parsing full URI form for compatibility with objects in DB
* No more @context checking when parsing
* Use the new ForgeFed context URI specified in the spec draft
* Use an extension context URI for all custom properties not specific to forges
* Rename "events" property to "history", thanks cjslep for suggesting this name
* Have a project team collection, content is the same as ticket team (but
potentially ticket team allows people to opt out of updates on specific
tickets, while project team isn't tied to any specific ticket or other child
object)
* Have a project followers collection, and address it in ticket comments in
addition to the already used recipients (project, ticket team, ticket
followers)
This allows the inbox system to be separate from Person, allowing other kinds
of objects to have inboxes too. Much like there's FollowerSet which works
separately from Tickets, and will allow to have follower sets for projects,
users, etc. too.
Inboxes are made independent from Person users because I'm going to give
Projects inboxes too.
There used to be project roles and repo roles, and they were separate. A while
ago I merged them, and there has been a single role system, used with both
repos and projects. However the table names were still "ProjectRole" and things
like that. This patch renames some tables to just refer to a "Role" because
there's only one kind of role system.
A thing still missing there is that it sets empty audience for comments on
remote tickets, but that's fine because dev.angeley.es doesn't have such
comments in the database.
I added a migration that creates an ugly fake OutboxItem for messages that
don't have one. I'll try to turn it into a real one. And then very possibly
remove the whole ugly migration, replacing it with addFielfRefRequiredEmpty,
which should work for empty instances.
- Allow client to specify recipients that don't need to be delivered to
- When fetching recipient, recognize collections and don't try to deliver to
them
- Remember collections in DB, and use that to skip HTTP delivery
Worker is enough and seems much simpler. forkHandler does stuff with
forkResourceT and more stuff that I don't exactly understand and which may
involve more resource allocation. I guess forkWorker would generally be the
preferred approach, and there are bugs with delivery leading to sudden
CPU/memory peaks forcing me to kill the process. Maybe not related, just
mentioning it ^_^
In the new inbox forwarding scheme, we use an additional special HTTP signature
to indicate that we allow or expect forwarding, and to allow that forwarding to
later be verified. When delivering a comment on a remote ticket, we'd like the
project to do inbox forwarding. Based on the URI alone, it's impossible to tell
which recipient is the project, and I guess there are various tricks we could
use here, but for now a very simple solution is used: Enable forwarding for all
remote recipients whose host is the same as the ticket's host.
Until now, there were some simple host checks when verifying the HTTP sig,
meant to forbid hosts that are IP addresses, local hosts, and maybe other weird
cases. These checks moved to Network.FedURI, so now FedURIs in general aren't
allowed to have such hosts. The host type is still `Text` though, for now.
This patch does a small simple change, however at the cost of the request body
not being available for display in the latest activity list, unless processing
succeeds. I'll fix this situation in a separate patch.
It runs checks against all the relevant tables, but ultimately just inserts the
activity into the recipient's inbox and nothing more, leaving the RemoteMessage
creation and inbox forwarding to the project inbox handler.
Inbox post is disabled but in the next patches I'll code and integrate a fixed
complete one, hopefully finally getting ticket comment federation ready for
testing.
I'm making this change because if an actor receives an activity due to being
addressed in bto, ot bcc, or being listed in some remote collection, the server
doesn't have a way to tell which actor(s) are the intended recipients, without
having an individual inbox URL for each actor. I could use a different hack for
this, but it wouldn't be compatible with other AP servers (unless the whole
fediverse agrees on a method).
I wasn't using sharedInbox anyway, and it's an optimization either way.
Each ticket has a single discussion ID, and each ticket has a unique one, so,
given an inner join of tickets and discussions, I think there should be exactly
1 way select a (ticket, discussion) pair given any of these.
But for some reason, PostgreSQL started complaining. Not sure what changed.
Anyway, for now, I switched the groupBy from discussion.id to ticket.id, which
is essentially the same, but for some reason makes PostgreSQL happy. It can't
tell that given a discussion ID, there's exactly 1 way to choose the ticket. Or
something like that. I wonder if I messed up something in DB migrations.
Before this patch, the shared fetch used plain insert, because it relied on
being the only place in the codebase where new RemoteActors get inserted. I was
hoping for that to be the case, but while I tweak things and handle fetching
URIs that can be an actor or a public key (for which ActorFetchShare isn't
sufficient without some smart modification), I'd like concurrent insertions to
be safe, without getting in the way of ActorFetchShare.
With this patch, it now uses insertBy', which doesn't mind concurrent
insertions.
I wrote a function handleOutboxNote that's supposed to do the whole outbox POST
handler process. There's an outbox item table in the DB now, I adapted things
in various source files. Ticket comment federation work is still in progress.
The custom module provides a parametric wrapper, allowing any specific
FromJSON/ToJSON instance to be used. It's a standalone module though, and not a
wrapper of persistent-postgresql, because persistent-postgresql uses aeson
Value and it prevents using toEncoding to get from the value directly to a
string.
* Adapt DB related code to return the InstanceId and RemoteSharerId
* Previously, when fetching a known shared key, we were running a DB
check/update for the shared usage record. I noticed - and hopefully I
correctly noticed - that this check already runs when we discover the keyId
points to a shared key we already know. So, after successful sig
verification, there's no need to run the check again. So I removed it.
- Exclude hosts without periods, so things like localhost and IPv6 are rejected
- Exclude hosts without letters, so things like IPv4 are rejected
- Exclude the instance's own host, just in case somehow some fake activity
slips in and gets approved, maybe even accidentally when delivered by another
server
Before, things worked like this:
* Only signatures of Ed25519 keys could be verified
* Key encoding placed the plain binary Ed25519 key in the PEM, instead of the
key's ASN1 encoding
With this patch it now works like this:
* Ed25519 signatures are supported as before
* RSA keys are now supported too, assuming RSA-SHA256 signatures
* Both Ed25519 and RSA keys are encoded and decoded using actual PEM with ASN1
When we verify an HTTP signature,
* If we know the key, check in the DB whether we know the actor lists it. If it
doesn't, and there's room left for keys, HTTP GET the actor and update the DB
accordingly.
* If we know the key but had to update it, do the same, check usage in DB and
update DB if needed
* If we don't know the key, record usage in DB
However,
* If we're GETing a key and discovering it's a shared key, we GET the actor to
verify it lists the key. When we don't know the key at all yet, that's fine
(can be further optimized but it's marginal), but if it's a key we do know,
it means we already know the actor and for now it's enough for us to rely
only on the DB to test usage.
Previously, when verifying an HTTP signature and we fetched the key and
discovered it's shared, we'd fetch the actor and make sure it lists the key URI
in the `publicKey` field. But if we already knew the key, had it cached in our
DB, we wouldn't check the actor at all, despite not knowing whether it lists
the key.
With this patch, we now always GET the actor when the key is shared,
determining the actor URI from the `ActivityPub-Actor` request header, and we
verify that the actor lists the key URI. We do that regardless of whether or
not we have the key in the DB, although these two cases and handled in
different parts of the code right now (for a new key, it's in Web.ActivityPub
fetchKey; for a known key, it's in Vervis.Foundation httpVerifySig).
Previously, when verifying an HTTP signature and we find out we have the
provided keyid in the DB, and this key is a personal key, we would just grab
the key owner from the DB and ignore the ActivityPub-Actor header.
This patch adds a check: If we find the key in the DB and it's a personal key,
do grab the owner from that DB row, but also check the actor header: If it's
provided, it has to be identical to the key owner ID URI.
If the key we fetched is a shared key, the only way to determine the actor to
which the signature applies is to read the HTTP header ActivityPub-Actor. But
if it's a personal key, we can detect the actor by checking the key's owner
field. Still, if that actor header is provided, we now compare it to the key
owner and make sure they're identical.
When fetching a key that is embedded in the actor document, we were already
comparing the actor ID with the actor header, so that part didn't require
changes.
When a local user wants to publish an activity, we were always GETing the
recipient actor, so that we could determine their inbox and POST the activity
to it. But now, instead, whenever we GET an actor (whether it's for the key sig
verification or for determining inbox URI), we keep their inbox URI in the
database, and we don't need to GET it again next time.
Using a dedicated type allows to record in the type the guarantees that we
provide, such as scheme being HTTPS and authority being present. Allows to
replace ugly `fromJust` and such with direct field access.
Before, there was a single key used as a personal key for all actors. Now,
things work like this:
- There are 2 keys, each time one is rotated, this way the old key remains
valid and we can freely rotate without a risk of race conditions on other
servers and end up with our posts being rejected
- The keys are explicitly instance-scope keys, all actors refer to them
- We add the ActivityPub-Actor header to all activity POSTs we send, to declare
for which specific actor our signature applies. Activities and otherwise
different payloads may have varying ways to specify attribution; using this
header will be a standard uniform way to specify the actor, regardless of
payload format. Of course, servers should make sure the actual activity is
attributed to the same actor we specified in the header. (This is important
with instance-scope keys; for personal keys it's not critical)
Allow keys to specify expiration time using w3c security vocabulary. If a key
has expired, we treat it like sig validation failure and re-fetch the key from
the other server. And we never accept a sig, even a valid sig, if the key has
expired.
Since servers keep actors and keys in the DB, expiration can be a nice way to
ask that keys aren't used more than we want them to. The security vocab spec
also recommends to set expiration time on keys, so it's nice to support this
feature.
It's now possible for activities we be attributed to actors that have more than
one key. We allow up to 2 keys. We also store in the DB. Scaling to support any
number of keys is trivial, but I'm limiting to 2 to avoid potential trouble and
because 2 is the actual number we need.
By having 2 keys, and replacing only one of them in each rotation, we avoid
race conditions. With 1 key, the following can happen:
1. We send an activity to another server
2. We rotate our key
3. The server reaches the activity in its processing queue, tries to verify our
request signature, but fails because it can't fetch the key. It's the old
key and we discarded it already, replaced it with the new one
When we use 2 keys, the previous key remains available and other servers have
time to finish processing our requests signed with that key. We can safely
rotate, without worrying about whether the user sent anything right before the
rotation time.
Caveat: With this feature, we allow OTHER servers to rotate freely. It's safe
because it's optional, but it's just Vervis right now. Once Vervis itself
starts using 2 keys, it will be able to rotate freely without race condition
risk, but probably Mastodon etc. won't accept its signatures because of the use
of 2 keys and because they're server-scope keys.
Maybe I can get these features adopted by the fediverse?
Shared key means the key is used for multiple actors. I'm not sure explicitly
specifying this will be necessary, but I prefer to have it in place to help
with debugging in case something unexpected comes from other servers, or my
format overlaps with stuff used in other software and encodes a different
meaning.
Each public key can specify whether it's shared or personal, and this patch
checks for that when verifying a request signature. It rejects shared keys,
accepting valid sigs only from personal keys.
Very soon I'll add shared key support.
* Repo collab now supports basic default roles developer/user/guest like
project collab does
* User/Anon collab for repos and projects are now stored as fields instead of
in dedicated tables, there was never a need for dedicated tables but I didn't
see that before
* Repo push op is now part of `ProjectOperation`
* `RepoRole` and related code has been entirely removed, only project roles
remain and they're used for both repos and projects
* This is the first not-totally-trivial DB migration in Vervis, it's automatic
but please be careful and report errors
* When adding collaborators, you don't need a custom role. If you don't choose
one, a basic default "developer" role will be used
* If you don't assign a `ProjectCollabUser` role, a default "user" role is
assumed for logged in users, otherwise a "guest" role
* The "guest" role currently has no access at all
* Theoretically there may also be a "maintainer" role allowing project
sharers/maintainers to give maintainer-level access to more people, but right
now maintainer role would be the same as developer so I haven't added it yet
It already had one, but it didn't have a public key and it was using the old
mess of the Vervis.ActivityStreams module, which I'll possibly remove soon.
It's hopefully more elegant now.
This patch includes some ugliness and commented out code. Sorry for that. I'll
clean it up soon.
Basically there's a TVar holding a Vector of at most 10 AP activities. You can
freely POST stuff to /inbox, and then GET /inbox and see what you posted, or an
error description saying why your activity was rejected.
The actor key will be used for all actors on the server. It's held in a `TVar`
so that it can always be safely updated and safely retrieved (technically there
is a single writer so IORef and MVar could work, but they require extra care
while TVar is by design suited for this sort of thing).
In Haskell by default if a thread has an exception, the main thread isn't
notified at all. This patch changes service thread launching to re-throw their
exceptions in the main thread, so that their failure is noticed.
I suppose there's no performance difference in using one, but it requires
`http-conduit` as a build dependency, so potentially we may be reducing build
time by removing unnecessary deps.
Git pull uses a POST request, which is treated as a write request and the CSRF
token is checked. However, no modification to the server is made by git pulls,
as far as I know (actually I'm not sure why it uses a POST). The entire
response is handled by the git command, and the client side is usually the git
command running in the terminal, there's no session and no cookies (as far as I
know). So I'm just disabling CSRF token checking for this route.
The sharer and repo were being taken and used as is to check push permissions,
which is how it's supposed to be, *but* they were also being used as is to
build the repo path! So sharer and repo names that aren't all lowercase were
getting "No such repository" errors when trying to push.
I changed `RepoSpec` to hold `ShrIdent` and `RpIdent` instead of plain `Text`,
to avoid confusions like that and be clear and explicit about the
representation, and failures to find a repo after verifying it against the DB
are now logged as errors to help with debugging.
I hope this fixes the problem.
We have gained:
* Haskell-side validation of schema changes before their execution
* Report of results of migration process
* Handling of old deployments
However:
* The validation code hasn't been tested yet at all
* Most of the migration list hasn't been applied at all yet
* Adding lists of entities from a model file is NOT VALIDATED!!! It's totally
possible to implement, just need to catch all the small details right
Until now the list of DB migration actions was incomplete, containing only
changes made since I added the migration system itself. It now contains the
2016-08-04 model, and then every change made since then.
IMPORTANT: The 2016-08-04 instance doesn't have a schema version entity at all,
so it is assigned version 0, while the actual version of its schema is 1. I'm
going to patch persistent-migration to allow it to be 1, making the migration
path smooth.
A workflow is a new entity in Vervis. It defines the workflow of a
projects' ticket system. That includes the possible ticket states,
custom ticket fields, various filters and so on. All ticket system
customization is currently planned to be managed using workflows.
Currently workflows are private and per sharer, but the plan is to
support public workflows that can be shared and cloned.
If `darcs init` isn't given a `--repodir`, even if you do specify the
new repository's path, it complains that it can't run inside a
repository, because it's running from a darcs clone of Vervis itself. If
the repo dir is specified using `--repodir` instead, Darcs doesn't
complain.
That's at least the situation with 2.8.5, didn't check other versions.
At least in PostgreSQL, at most one reference is allowed. My undirected
recursion code used a UNION of two recursive steps, one for each
direction. That is invalid, so instead I define a CTE that's a union of
the edges and their reverse, and do a single recursion step on that CTE
instead of on the edge table itself.
I thought SQL arrays were common and PersistList corresponded to SQL
array values. But that isn't the case. PersistList seems to be
serialized as a JSON list, and `filterClause` uses IN, not ANY. So I'm
doing the same thing here and using IN.
Note that I'm building the list myself using Text concatenation, not
using `filterClause`, because the latter takes a filter on an existing
`PersistEntity` while my filters often apply to temporary tables.
My implementation in Haskell does work, but ref discovery also includes
capabilities. Since I'm going to use the git binary for the next steps,
I need the git binary to specify here which capabilities it supports.
The transitive reduction query works by removing all the edges which
aren't the only paths between their nodes, i.e. longer paths exist. The
first step is to pick all the paths which include 2 or more edges.
The initial code did that appending in-edges to all paths, which results
with unnecessary duplicates and an INNER JOIN. Now, instead, just pick
all the paths with length of more than 3 nodes. This is hopefully not
just simpler, but also faster.
I used this chance to make some name changes, add some utils, tweak some
imports, remove more `setTitle`s and so on. I also made person, repo,
key and project creation forms verify CI-uniqueness.
I decided to add some safety to routes:
- Use dedicated newtypes
- Use CI for the CI-unique DB fields
Since such a change requires so many changes in many source files, this
is also a chance to do other such breaking changes. I'm recording the
change gradually. It won't build until I finish, so for now don't waste
time trying to build the app.
Darcs does export most of its module tree, but there's a problem: Darcs
relies on the current directory. It changes the current directory of the
process to the repo, and then proceeds using paths relative to the repo
dir. This is bad for my case here. If some other thread uses a relative
path (e.g. currently any repo path is relative by default) in parallel,
it will fail.
For now, the quick path around this problem is to use the `darcs`
program.
At the beginning the rendering was invalid because it parsed the entire
content as a single line. For some reason, when I read the ticket
description from the DB, all newlines are returned as CRLF. I don't know
why yet or whether it can or should be changed, but as a quick fix, I
made the handler function filter out the CRs from the text. Then the
rendering is correct.
This matches the documentation of Pandoc, which mentions the readers
assume newlines are encoded as LF.
HTML forms support only GET and POST methods. One way to bypass that is
to send the form using JS. But I don't want that. Another is to send a
POST with a hidden form field which specifies the read method. This is
what 'postTicketR' does.