Consider the following:
"MULTILINESTRING((10 10,10 40),(40 40,30 30,40 20,30 10))".
I want to transform this into:
I use the functions
replace()to format this. I get some dirty code and probably not the most efficient like
Because I'm doing this on a huge dataset, I'm looking for an efficient way to do it.
If you're looking for clean code that doesn't do too much, I'd recommend a two step process involving the
- split your string into smaller chunks on comma using
- for each chunk, extract coordinates with
For performance, I'd recommend pre-compiling a regex-pattern using
re.compile, since we'll be calling it repeatedly inside a loop.
>>> import re >>> p = re.compile(r'\d+(?:\.\d+)?') >>> [list(map(int, p.findall(x)) for x in mstring.split(',')] [[10, 10], [10, 40], [40, 40], [30, 30], [40, 20], [30, 10]]
mstring is your string data.
\d+ # match one or more digits (?: # specify non-capturing group \. # literal period/decimal \d+ )? # optional
Semantically, this regex will match integers OR floats (Ajax1234's solution currently only accounts for integers, and is guaranteed to be finish searching in fewer cycles).