-
|
I wasted a whole day researching on this but there is literally less than 5 related results on the whole internet:
and maybe some stack exchange unanswered questions. From the very, very, very poor written migration guide from DOCX to Markdown by Better BibTeX I tried: pandoc -f docx+citations -t markdown -i "Thesis Proposal Real.docx" -o proposal.mdand pandoc -f docx+citations -t latex -i "Thesis Proposal Real.docx" -o proposal.tex and this is what I got: Nishida and Nakayama (2020)[@2018] adapt unsupervised syntactic parsing
(Viterbi EM) to discourse by hypothesizing that discourse and syntax
share similar constituent regularities. and for LaTeX Nishida and Nakayama (2020){[}51{]} adapt unsupervised syntactic parsing
(Viterbi EM) to discourse by hypothesizing that discourse and syntax
share similar constituent regularities.Notice how the markdown citation tag differs from the LaTeX one, and from the original DOCX rendered in Microsoft Word: I also tried the CSL format and it only made it worse: Nishida and Nakayama (2020){[}{[}CSL STYLE ERROR: reference with no
printed form.{]}{]} adapt unsupervised syntactic parsing (Viterbi EM) to
discourse by hypothesizing that discourse and syntax share similar
constituent regularities. To reply to the original comment by @iandol , when I tried to export the library to Quick Look, All I get is a file with a pair of To clarify, I want to convert DOCX with Zotero citations, to LaTeX. The LaTeX file should have citations readily available in the standard format of I have the .bib exported from Zotero with good usable tags with it, like this: @article{yang_xlnet_nodate,
title = {{XLNet}: {Generalized} {Autoregressive} {Pretraining} for {Language} {Understanding}},
abstract = {With the capability of modeling bidirectional contexts, denoising autoencoding based pretraining like BERT achieves better performance than pretraining approaches based on autoregressive language modeling. However, relying on corrupting the input with masks, BERT neglects dependency between the masked positions and suffers from a pretrain-finetune discrepancy. In light of these pros and cons, we propose XLNet, a generalized autoregressive pretraining method that (1) enables learning bidirectional contexts by maximizing the expected likelihood over all permutations of the factorization order and (2) overcomes the limitations of BERT thanks to its autoregressive formulation. Furthermore, XLNet integrates ideas from Transformer-XL, the state-of-the-art autoregressive model, into pretraining. Empirically, under comparable experiment setting, XLNet outperforms BERT on 20 tasks, often by a large margin, including question answering, natural language inference, sentiment analysis, and document ranking.1.},
language = {en},
author = {Yang, Zhilin and Dai, Zihang and Yang, Yiming and Carbonell, Jaime and Salakhutdinov, Russ R and Le, Quoc V},
file = {PDF:/Users/tommyvct/Zotero/storage/E3TSZ7YX/Yang et al. - XLNet Generalized Autoregressive Pretraining for Language Understanding.pdf:application/pdf},
}while if I use Better BibTeX, this is what I get: @article{
title = {{{DiMLex}}: {{A Lexicon}} of {{Discourse Markers}} for {{Text Generation}} and {{Understanding}}},
author = {Stede, Manfred and Umbach, Carla},
abstract = {Discourse markers ('cue words') are lexical items that signal the kind of coherence relation holding between adjacent text spans; for example, because, since, and for this reason are different markers for causal relations. Discourse markers are a syntactically quite heterogeneous group of words, many of which are traditionally treated as function words belonging to the realm of grammar rather than to the lexicon. But for a single discourse relation there is often a set of similar markers, allowing for a range of paraphrases for expressing the relation. To capture the similarities and differences between these, and to represent them adequately, we are developing DiMLex, a lexicon of discourse markers. After describing our methodology and the kind of information to be represented in DiMLex, we briefly discuss its potential applications in both text generation and understanding.},
langid = {english},
file = {/Users/tommyvct/Zotero/storage/NB5A5FNW/Stede and Umbach - DiMLex A Lexicon of Discourse Markers for Text Generation and Understanding.pdf;/Users/tommyvct/Zotero/storage/NPN3NEW8/Stede and Umbach - DiMLex A lexicon of discourse markers for text generation and understanding.pdf}
}Please help. Thanks. |
Beta Was this translation helpful? Give feedback.
Replies: 3 comments 17 replies
-
|
Why don't you upload the docx, or a reduced version of it that suffices for an example. |
Beta Was this translation helpful? Give feedback.
-
|
One problem is the following:
This is a relevant issue: #10550 and I created a lua filter to handle this case, this filter assumes the YAML references are present in the metadata: https://github.com/iandol/dotpandoc/blob/master/filters/citation-key.lua @jgm -- Pandoc could check for ID and cite-key and prefer the cite-key natively I assume and it would make this workflow simpler? My test docx: The Zotero ref in that docx exported as CSL JSON: [
{
"id": "lamme2018",
"type": "article-journal",
"abstract": "Significant progress…",
"citation-key": "lamme2018",
"container-title": "Philosophical Transactions of the Royal Society B: Biological Sciences",
"DOI": "10.1098/rstb.2017.0344",
"issue": "1755",
"page": "20170344",
"PMID": "30061458",
"title": "Challenges for theories of consciousness: seeing or knowing, the missing ingredient and how to deal with panpsychism.",
"volume": "373",
"author": [
{
"family": "Lamme",
"given": "VAF"
}
],
"issued": {
"date-parts": [
[
"2018"
]
]
}
}
] |
Beta Was this translation helpful? Give feedback.
-
Beta Was this translation helpful? Give feedback.








You are getting a properly formatted final document across multiple pandoc outputs, i.e. a document you can submit. There are some cases where BibLaTeX may offer some edge case features that citeproc doesn't, but the point is to get a formatted bibliography surely?
A single command gets your IEEE refs correctly in the PDF, why are you worried that the LaTeX source is formatted, that is by design:
If you have a reason why citeproc formatting is not suitable, you can still use the TWO S…